Multi-Instance Aggregation ========================== This tutorial walks through setting up multiple ros2_medkit_gateway instances and aggregating their entity trees into a single unified API. .. contents:: Table of Contents :local: :depth: 2 Prerequisites ------------- - ros2_medkit built and installed (``colcon build && source install/setup.bash``) - Two or more terminals available - Familiarity with the :doc:`../getting_started` guide What You Will Learn ------------------- - How to run two gateways on different ports - How to configure static peer connections - How to enable mDNS auto-discovery - What the merged API looks like - How to handle peer failures - How to set up chain topologies Step 1: Start Two Gateways -------------------------- Open two terminals. In each, source the ROS 2 workspace: .. code-block:: bash source /opt/ros/jazzy/setup.bash source install/setup.bash **Terminal 1 - Gateway A (port 8080):** .. code-block:: bash ros2 run ros2_medkit_gateway gateway_node --ros-args \ -p server.port:=8080 \ -r __node:=gateway_a \ -r __ns:=/subsystem_a **Terminal 2 - Gateway B (port 8081):** .. code-block:: bash ros2 run ros2_medkit_gateway gateway_node --ros-args \ -p server.port:=8081 \ -r __node:=gateway_b \ -r __ns:=/subsystem_b Verify both gateways are healthy: .. code-block:: bash curl http://localhost:8080/api/v1/health curl http://localhost:8081/api/v1/health Each gateway should report ``{"status": "healthy"}``. Step 2: Configure Static Peers ------------------------------ To make Gateway A aggregate entities from Gateway B, create a config file: .. code-block:: yaml :caption: gateway_a_params.yaml ros2_medkit_gateway: ros__parameters: server: port: 8080 aggregation: enabled: true peer_urls: ["http://localhost:8081"] peer_names: ["subsystem_b"] Restart Gateway A with the config: .. code-block:: bash ros2 run ros2_medkit_gateway gateway_node --ros-args \ --params-file gateway_a_params.yaml \ -r __node:=gateway_a Now Gateway A merges entities from both itself and Gateway B. Step 3: Explore the Merged API ------------------------------ **List all components (merged from both gateways):** .. code-block:: bash curl -s http://localhost:8080/api/v1/components | jq The response includes components from both gateways. Remote-only Components have ``"source": "peer:subsystem_b"`` in their metadata. If both gateways have a component with the same ID (e.g., both hosts are named ``robot``), they are **merged by ID** into a single entity (tags and metadata are combined). Remote-only Components appear as separate entries: .. code-block:: json { "items": [ {"id": "robot", "source": "runtime"}, {"id": "arm_controller", "source": "peer:subsystem_b"} ] } **List all areas (merged by ID):** .. code-block:: bash curl -s http://localhost:8080/api/v1/areas | jq Areas are merged by ID - if both gateways discover a ``root`` area, only one ``root`` appears in the response. **Access data from a remote entity:** .. code-block:: bash # If subsystem_b has a component "arm_controller" curl -s http://localhost:8080/api/v1/components/arm_controller/data | jq The request is transparently forwarded to Gateway B. The client does not need to know which gateway owns the entity. **List all functions:** .. code-block:: bash curl -s http://localhost:8080/api/v1/functions | jq Functions are merged by ID with combined ``hosts`` lists. A ``navigation`` function that exists on both gateways appears once, listing hosts from both. Step 4: Enable mDNS Auto-Discovery ----------------------------------- Instead of listing peers statically, use mDNS to discover gateways automatically on the local network. **Gateway A config with mDNS:** .. code-block:: yaml :caption: gateway_a_mdns.yaml ros2_medkit_gateway: ros__parameters: server: port: 8080 aggregation: enabled: true announce: true discover: true mdns_name: "gateway_a" **Gateway B config with mDNS:** .. code-block:: yaml :caption: gateway_b_mdns.yaml ros2_medkit_gateway: ros__parameters: server: port: 8081 aggregation: enabled: true announce: true discover: true mdns_name: "gateway_b" .. note:: ``mdns_name`` must be set explicitly when running multiple gateways on the same host. Without it, all instances share the same hostname and filter each other out as "self". On separate hosts or containers with distinct hostnames, this parameter can be omitted. Start both gateways with their configs: .. code-block:: bash # Terminal 1 ros2 run ros2_medkit_gateway gateway_node --ros-args \ --params-file gateway_a_mdns.yaml -r __node:=gateway_a # Terminal 2 ros2 run ros2_medkit_gateway gateway_node --ros-args \ --params-file gateway_b_mdns.yaml -r __node:=gateway_b After a few seconds, each gateway discovers the other via mDNS. Check the health endpoint to see discovered peers: .. code-block:: bash curl -s http://localhost:8080/api/v1/health | jq '.peers' .. note:: mDNS requires multicast network support. If using Docker, ensure containers share a network that supports multicast, or use host networking. For bridge-networked containers, use static peers instead. Step 5: Handle Peer Failures ----------------------------- When a peer goes down, the gateway handles it gracefully: 1. **Health checks detect failure**: The cache refresh cycle (default: 10 seconds) runs ``check_all_health()`` and detects the peer is unreachable, marking it unhealthy. 2. **Entity collection endpoints serve cached data**: Endpoints like ``GET /api/v1/components`` continue to return the last cached entity set. Remote entities from the unhealthy peer are dropped from the cache on the next refresh cycle. 3. **Resource fan-out returns partial results**: Per-entity resource collection endpoints (data, operations, faults, configurations, logs) perform real-time fan-out. When some peer requests fail (peer unreachable or non-2xx response), the response includes ``x-medkit.partial: true`` and ``x-medkit.failed_peers``. 4. **Entity-specific requests return 502**: If a request targets a remote entity whose peer is down, the gateway returns ``502 Bad Gateway``. Test this by stopping Gateway B and querying Gateway A: .. code-block:: bash # Stop Gateway B (Ctrl+C in Terminal 2) # Wait for cache refresh interval, then: curl -s http://localhost:8080/api/v1/components | jq # Returns only local components (remote entities dropped from cache) # Faults show partial results: curl -s http://localhost:8080/api/v1/faults | jq '.["x-medkit"].partial' # Returns true # Try accessing a remote entity: curl -s http://localhost:8080/api/v1/apps/subsystem_b__some_node/data # Returns 502 Bad Gateway When Gateway B comes back online, it is automatically re-included after the next successful health check. Step 6: Chain Topology ---------------------- For hierarchical systems, gateways can be chained. Gateway A aggregates from B, which aggregates from C: .. code-block:: yaml :caption: gateway_a_chain.yaml ros2_medkit_gateway: ros__parameters: server: port: 8080 aggregation: enabled: true peer_urls: ["http://localhost:8081"] peer_names: ["mid_level"] .. code-block:: yaml :caption: gateway_b_chain.yaml ros2_medkit_gateway: ros__parameters: server: port: 8081 aggregation: enabled: true peer_urls: ["http://localhost:8082"] peer_names: ["leaf_system"] .. code-block:: yaml :caption: gateway_c_chain.yaml ros2_medkit_gateway: ros__parameters: server: port: 8082 aggregation: enabled: false Start all three gateways, then query the top-level: .. code-block:: bash curl -s http://localhost:8080/api/v1/components | jq Gateway A returns components from all three levels. Requests for entities on Gateway C are forwarded through B to C. Summary ------- - **Static peers**: List known gateways in ``aggregation.peer_urls`` and ``aggregation.peer_names`` for deterministic connections. - **mDNS discovery**: Set ``aggregation.announce`` and ``aggregation.discover`` to ``true`` for zero-configuration peer discovery. - **Entity merging**: Areas, Functions, and Components merge by ID. Apps get peer-name prefixes on collision. - **Transparent forwarding**: Requests for remote entities are forwarded to the owning peer. Clients interact with a single API endpoint. - **Graceful degradation**: Unhealthy peers are excluded from fan-out. Partial results are clearly marked. Troubleshooting --------------- **mDNS socket bind failure (port 5353)** mDNS announcement requires binding to UDP port 5353, which is a privileged port (below 1024). If the gateway logs an error like:: mDNS: Failed to open mDNS announce socket on port 5353. This means the process does not have permission to bind to port 5353. Solutions (choose one): 1. **Grant the capability to the binary** (recommended for production): .. code-block:: bash sudo setcap cap_net_bind_service=+ep $(which gateway_node) 2. **Run as root** (not recommended for production): .. code-block:: bash sudo ros2 run ros2_medkit_gateway gateway_node --ros-args ... 3. **Use Docker with host networking**: .. code-block:: bash docker run --net=host ... 4. **Fall back to static peers**: If mDNS is not viable in your environment, disable ``aggregation.announce`` and ``aggregation.discover``, and configure peers explicitly with ``aggregation.peer_urls`` and ``aggregation.peer_names``. .. note:: On systems where another mDNS responder is already running (e.g., Avahi, systemd-resolved), port 5353 may already be in use. Either stop the existing responder or use static peers. Next Steps ---------- - :doc:`../config/aggregation` - Full configuration reference - :doc:`docker` - Deploy aggregated gateways in Docker containers - :doc:`authentication` - Secure peer-to-peer communication with JWT