Hub linking (federation)

Join several uhub instances so they appear to clients as one logical hub, sharing a single user list across nodes.

Experimental. Linking is available from uhub 0.7.0 and is validated for two nodes (and a star of members linking to one coordinator). Read Status & limitations before deploying, and only link hubs you trust — see Security.

When hubs are linked, users connected to any node see one unified user list and can chat, search, exchange private messages and set up peer-to-peer transfers across node boundaries. There are two ways to use this:

  • Multi-process, one machine — run one hub as several worker processes to use every CPU core. uhub configures the links for you.
  • Federation across machines — link separate hubs over the network into one logical hub.

One logical hub across CPU cores

uhub is single-threaded, so one process uses one core. To use a whole machine, run the hub as several worker processes that link to each other and present a single logical hub — the same federation machinery, but within one host. Set in uhub.conf:

workers = 4      # or 0 for one worker per CPU core

The top-level process becomes a master that forks the workers and supervises them (restarting any that exit); it serves no clients itself. Each worker is configured automatically:

  • all workers bind the same server_port with SO_REUSEPORT, so the kernel load-balances incoming connections across cores;
  • they link to each other over Unix sockets in a private runtime directory (worker_socket_dir, default /tmp), with an ephemeral shared secret generated by the master and never written to disk;
  • the SID space is partitioned per worker automatically, so session IDs stay unique.

A client connecting to the single port lands on some worker and sees the whole cluster's roster; the CPU-bound per-connection work (notably TLS) is spread across cores. The trade-offs are the federation ones below: each worker holds the full roster (memory grows with worker count), and public chat crosses worker boundaries. SO_REUSEPORT is enabled automatically when workers > 1; the standalone server_reuseport option is only for the advanced case of supervising the worker processes yourself (for example, a systemd template).

Linking two hubs across machines

A link reuses the hub's normal listening port — there is no separate link port. Both ends authenticate each other with a mutual challenge-response keyed on a shared secret that must match on both hubs; the secret itself is never sent over the wire.

Because the whole cluster shares one session-ID space, each node must own a distinct slice of it. For two hubs, tell each node the cluster size (node_count) and its own index (node_id).

On hub A (the coordinator):

link_secret = "a-long-shared-secret"
node_count  = 2
node_id     = 0

On hub B, which connects out to hub A:

link_peer   = hubA.example.org:1511
link_secret = "a-long-shared-secret"
node_count  = 2
node_id     = 1

link_peer is a comma-separated list; each entry is either a host:port (TCP) or a path beginning with / (a Unix socket). A hub with no link_peer only accepts inbound links. When both hubs are running and the secret matches, each streams its user list to the other and they stay in sync from then on.

The shared session-ID space

The hard ceiling is the ADC session ID (SID): it is 20 bits, so the entire logical hub — local users plus every linked node's users — shares one space of 1,048,576 sessions, regardless of node count. That space is divided into node_count equal windows, and each node allocates its local SIDs from its own window so two nodes never hand out the same ID.

Each node obtains its window from node_id:

node_idBehaviour
>= 0Static. The node owns window node_id and starts serving immediately. Every node must agree on node_count and use a unique node_id; node_id = 0 is the coordinator.
-1Dynamic. The node starts pending, accepts no local logins, and leases a free window from the coordinator over the link. node_count still sets the window size.

Tip: pick a generous fixed node_count ceiling (say 64) on every node so windows never need re-slicing as the cluster grows. Then either assign unique static node_ids, or set node_id = -1 everywhere and let the nodes elect a coordinator and lease windows among themselves. If linking is configured with node_count = 1, the hub warns at start-up that SIDs will collide.

Coordinator election & dynamic leasing

The coordinator is not statically pinned — nodes elect one per link. Each node has an election id; the lower id wins and acts as coordinator, leasing SID windows to the others. node_id = 0 always wins (a fixed leader); node_id = -1 takes a random id and elects dynamically. This lets an entire cluster come up with no node designated as coordinator: give every node node_id = -1, the same node_count and a shared link_secret. When a link drops, any windows leased over it return to the free pool.

What stays in sync

  • Roster: on linking, each node streams its local users to the peer as remote users; live joins, INF updates and quits then propagate as deltas, so every client sees one unified user list.
  • Messages: private messages, search results and connect requests to a remote user are forwarded over the link that owns the target. Public chat and searches are relayed once over each link.
  • Topic: setting the hub description (!topic) propagates to every node.
  • Bans: !ban propagates cluster-wide — the user is disconnected wherever connected and cannot reconnect on any node, making a ban a cluster-wide kick.
  • Netsplit: when a link drops, each node removes the remote users it learned over that link and tells its local clients they quit, exactly like an IRC netsplit. Rosters re-synchronise when the link comes back.

Only live changes propagate today: a node that links in later does not yet receive the existing bans or the current topic. Seed each node's hub_description consistently for a stable initial topic.

Master-slave authentication

Hubs on different machines each have their own config and cannot share an account database by sharing a file. Rather than copy accounts around (uhub stores passwords in plaintext, which should not be replicated), make one node the auth master — a normal hub that loads the account database (mod_auth_simple / mod_auth_sqlite) — and the others slaves that defer registered-user login to it. On each slave:

auth_proxy = 1

A slave holds no accounts of its own; its upstream link_peer is the master. When a registered user logs in on the slave, the slave asks the master whether the nick is registered and forwards the password challenge-response to the master to verify — so passwords never leave the master. Guests are unaffected and log in locally. The master needs no special flag.

Security

A link is highly privileged: a linked peer can inject arbitrary users into the local roster. Treat every link as a trust boundary and only link hubs you control.

The shared-secret handshake is a v1 baseline with the weaknesses of any shared password — one secret for the whole cluster, no per-hub identity, no rotation or revocation, and it lives in the config file. Run links only over a trusted or private network. The intended long-term posture is mutual TLS with per-peer certificate pinning; do not treat the shared secret as the final mechanism.

Configuration reference

OptionMeaning
link_peerPeer(s) to connect to, comma-separated. Each is a host:port (TCP) or a path beginning with / (Unix socket). Empty = accept inbound links only.
link_secretShared secret for the mutual challenge-response. Must match on both ends; empty disables linking.
link_socketUnix socket path to also accept links on (for same-host worker links). Must be short.
node_countNumber of windows the SID space is divided into (cluster size or a fixed ceiling). Must agree across nodes.
node_idThis node's window index: >= 0 static (0 = coordinator), -1 = lease dynamically and participate in election.
auth_proxySlave auth: defer registered-user login to the upstream master.
workersRun one logical hub as N worker processes (0 = one per CPU core).
worker_socket_dirDirectory for the inter-worker Unix sockets (default /tmp).
server_reuseportSet SO_REUSEPORT on the listen socket (auto-on when workers > 1).

See the configuration reference for the full description of each option.

Status & limitations

Implemented and tested (two nodes / star-to-a-coordinator, and multi-process on one host):

  • link detection and the mutual authentication handshake, over TCP or a Unix socket;
  • static and dynamic SID partitioning with window leasing, and coordinator election;
  • roster snapshot plus live join/update/quit deltas;
  • directed and broadcast message routing across one hop;
  • topic propagation and cluster-wide bans;
  • master-slave authentication;
  • multi-process "one logical hub" via workers;
  • netsplit cleanup on link drop.

Not yet implemented:

  • multi-hop relay for a 3+ node mesh — only two hops are loop-safe today;
  • globally consistent election across a mesh (election is per-link);
  • coordinator failover — window leases are held in memory, so a coordinator restart forgets them;
  • an initial-state snapshot on link establish (only live topic/ban changes propagate);
  • auth-master election or failover (a slave proxies to a single master);
  • TLS-wrapped links (detection is plaintext-only today);
  • reconnect/backoff — a failed outbound link is attempted once;
  • mutual-TLS / certificate-pinned link authentication.