Multi-Instance Aggregation

This tutorial walks through setting up multiple ros2_medkit_gateway instances and aggregating their entity trees into a single unified API.

Prerequisites

  • ros2_medkit built and installed (colcon build && source install/setup.bash)

  • Two or more terminals available

  • Familiarity with the Getting Started guide

What You Will Learn

  • How to run two gateways on different ports

  • How to configure static peer connections

  • How to enable mDNS auto-discovery

  • What the merged API looks like

  • How to handle peer failures

  • How to set up chain topologies

Step 1: Start Two Gateways

Open two terminals. In each, source the ROS 2 workspace:

source /opt/ros/jazzy/setup.bash
source install/setup.bash

Terminal 1 - Gateway A (port 8080):

ros2 run ros2_medkit_gateway gateway_node --ros-args \
    -p server.port:=8080 \
    -r __node:=gateway_a \
    -r __ns:=/subsystem_a

Terminal 2 - Gateway B (port 8081):

ros2 run ros2_medkit_gateway gateway_node --ros-args \
    -p server.port:=8081 \
    -r __node:=gateway_b \
    -r __ns:=/subsystem_b

Verify both gateways are healthy:

curl http://localhost:8080/api/v1/health
curl http://localhost:8081/api/v1/health

Each gateway should report {"status": "healthy"}.

Step 2: Configure Static Peers

To make Gateway A aggregate entities from Gateway B, create a config file:

gateway_a_params.yaml
ros2_medkit_gateway:
  ros__parameters:
    server:
      port: 8080
    aggregation:
      enabled: true
      peer_urls: ["http://localhost:8081"]
      peer_names: ["subsystem_b"]

Restart Gateway A with the config:

ros2 run ros2_medkit_gateway gateway_node --ros-args \
    --params-file gateway_a_params.yaml \
    -r __node:=gateway_a

Now Gateway A merges entities from both itself and Gateway B.

Step 3: Explore the Merged API

List all components (merged from both gateways):

curl -s http://localhost:8080/api/v1/components | jq

The response includes components from both gateways. Remote-only Components have "source": "peer:subsystem_b" in their metadata.

If both gateways have a component with the same ID (e.g., both hosts are named robot), they are merged by ID into a single entity (tags and metadata are combined). Remote-only Components appear as separate entries:

{
  "items": [
    {"id": "robot", "source": "runtime"},
    {"id": "arm_controller", "source": "peer:subsystem_b"}
  ]
}

List all areas (merged by ID):

curl -s http://localhost:8080/api/v1/areas | jq

Areas are merged by ID - if both gateways discover a root area, only one root appears in the response.

Access data from a remote entity:

# If subsystem_b has a component "arm_controller"
curl -s http://localhost:8080/api/v1/components/arm_controller/data | jq

The request is transparently forwarded to Gateway B. The client does not need to know which gateway owns the entity.

List all functions:

curl -s http://localhost:8080/api/v1/functions | jq

Functions are merged by ID with combined hosts lists. A navigation function that exists on both gateways appears once, listing hosts from both.

Step 4: Enable mDNS Auto-Discovery

Instead of listing peers statically, use mDNS to discover gateways automatically on the local network.

Gateway A config with mDNS:

gateway_a_mdns.yaml
ros2_medkit_gateway:
  ros__parameters:
    server:
      port: 8080
    aggregation:
      enabled: true
      announce: true
      discover: true
      mdns_name: "gateway_a"

Gateway B config with mDNS:

gateway_b_mdns.yaml
ros2_medkit_gateway:
  ros__parameters:
    server:
      port: 8081
    aggregation:
      enabled: true
      announce: true
      discover: true
      mdns_name: "gateway_b"

Note

mdns_name must be set explicitly when running multiple gateways on the same host. Without it, all instances share the same hostname and filter each other out as “self”. On separate hosts or containers with distinct hostnames, this parameter can be omitted.

Start both gateways with their configs:

# Terminal 1
ros2 run ros2_medkit_gateway gateway_node --ros-args \
    --params-file gateway_a_mdns.yaml -r __node:=gateway_a

# Terminal 2
ros2 run ros2_medkit_gateway gateway_node --ros-args \
    --params-file gateway_b_mdns.yaml -r __node:=gateway_b

After a few seconds, each gateway discovers the other via mDNS. Check the health endpoint to see discovered peers:

curl -s http://localhost:8080/api/v1/health | jq '.peers'

Note

mDNS requires multicast network support. If using Docker, ensure containers share a network that supports multicast, or use host networking. For bridge-networked containers, use static peers instead.

Step 5: Handle Peer Failures

When a peer goes down, the gateway handles it gracefully:

  1. Health checks detect failure: The cache refresh cycle (default: 10 seconds) runs check_all_health() and detects the peer is unreachable, marking it unhealthy.

  2. Entity collection endpoints serve cached data: Endpoints like GET /api/v1/components continue to return the last cached entity set. Remote entities from the unhealthy peer are dropped from the cache on the next refresh cycle.

  3. Resource fan-out returns partial results: Per-entity resource collection endpoints (data, operations, faults, configurations, logs) perform real-time fan-out. When some peer requests fail (peer unreachable or non-2xx response), the response includes x-medkit.partial: true and x-medkit.failed_peers.

  4. Entity-specific requests return 502: If a request targets a remote entity whose peer is down, the gateway returns 502 Bad Gateway.

Test this by stopping Gateway B and querying Gateway A:

# Stop Gateway B (Ctrl+C in Terminal 2)

# Wait for cache refresh interval, then:
curl -s http://localhost:8080/api/v1/components | jq
# Returns only local components (remote entities dropped from cache)

# Faults show partial results:
curl -s http://localhost:8080/api/v1/faults | jq '.["x-medkit"].partial'
# Returns true

# Try accessing a remote entity:
curl -s http://localhost:8080/api/v1/apps/subsystem_b__some_node/data
# Returns 502 Bad Gateway

When Gateway B comes back online, it is automatically re-included after the next successful health check.

Step 6: Chain Topology

For hierarchical systems, gateways can be chained. Gateway A aggregates from B, which aggregates from C:

gateway_a_chain.yaml
ros2_medkit_gateway:
  ros__parameters:
    server:
      port: 8080
    aggregation:
      enabled: true
      peer_urls: ["http://localhost:8081"]
      peer_names: ["mid_level"]
gateway_b_chain.yaml
ros2_medkit_gateway:
  ros__parameters:
    server:
      port: 8081
    aggregation:
      enabled: true
      peer_urls: ["http://localhost:8082"]
      peer_names: ["leaf_system"]
gateway_c_chain.yaml
ros2_medkit_gateway:
  ros__parameters:
    server:
      port: 8082
    aggregation:
      enabled: false

Start all three gateways, then query the top-level:

curl -s http://localhost:8080/api/v1/components | jq

Gateway A returns components from all three levels. Requests for entities on Gateway C are forwarded through B to C.

Summary

  • Static peers: List known gateways in aggregation.peer_urls and aggregation.peer_names for deterministic connections.

  • mDNS discovery: Set aggregation.announce and aggregation.discover to true for zero-configuration peer discovery.

  • Entity merging: Areas, Functions, and Components merge by ID. Apps get peer-name prefixes on collision.

  • Transparent forwarding: Requests for remote entities are forwarded to the owning peer. Clients interact with a single API endpoint.

  • Graceful degradation: Unhealthy peers are excluded from fan-out. Partial results are clearly marked.

Troubleshooting

mDNS socket bind failure (port 5353)

mDNS announcement requires binding to UDP port 5353, which is a privileged port (below 1024). If the gateway logs an error like:

mDNS: Failed to open mDNS announce socket on port 5353.

This means the process does not have permission to bind to port 5353. Solutions (choose one):

  1. Grant the capability to the binary (recommended for production):

    sudo setcap cap_net_bind_service=+ep $(which gateway_node)
    
  2. Run as root (not recommended for production):

    sudo ros2 run ros2_medkit_gateway gateway_node --ros-args ...
    
  3. Use Docker with host networking:

    docker run --net=host ...
    
  4. Fall back to static peers: If mDNS is not viable in your environment, disable aggregation.announce and aggregation.discover, and configure peers explicitly with aggregation.peer_urls and aggregation.peer_names.

Note

On systems where another mDNS responder is already running (e.g., Avahi, systemd-resolved), port 5353 may already be in use. Either stop the existing responder or use static peers.

Next Steps