ros2_medkit_graph_provider
This section contains design documentation for the ros2_medkit_graph_provider plugin.
Overview
The ros2_medkit_graph_provider package implements a gateway plugin that generates
live dataflow graph documents for SOVD Function entities. It models the ROS 2 topic
graph as a directed graph of nodes (apps) and edges (topic connections), enriched
with frequency, latency, and drop rate metrics from /diagnostics. The graph
document powers the x-medkit-graph vendor extension endpoint and cyclic
subscription sampler.
Architecture
The following diagram shows the plugin’s main components and data flow.
Graph Provider Plugin Architecture
Graph Document Schema
The plugin generates a JSON document per Function entity with the following structure:
schema_version:
"1.0.0"graph_id:
"{function_id}-graph"timestamp: ISO 8601 nanosecond timestamp
scope: Function entity that owns the graph
pipeline_status:
"healthy","degraded", or"broken"bottleneck_edge: Edge ID with the lowest frequency ratio (only when degraded)
topics: List of topics with stable IDs
nodes: List of app entities with reachability status
edges: Publisher-subscriber connections with per-edge metrics
Edge metrics include frequency, latency, drop rate, and a status field:
"active"- Diagnostics data available, normal operation"pending"- No diagnostics data received yet"error"- Node offline, topic stale, or no data source after diagnostics started
Pipeline Status Logic
The overall pipeline status is determined by aggregating edge states:
broken - At least one edge has
metrics_status: "error"(node offline or topic stale)degraded - At least one edge has frequency below the degraded ratio threshold, or drop rate exceeds the threshold
healthy - All edges are active with acceptable metrics
When the status is "degraded", the bottleneck_edge field identifies the edge
with the lowest frequency-to-expected ratio, helping operators pinpoint the
constraint in the dataflow pipeline.
Data Sources
Diagnostics Subscription
The plugin subscribes to /diagnostics and parses DiagnosticStatus messages
for topic-level metrics. Recognized keys:
frame_rate_msg- Mapped tofrequency_hzcurrent_delay_from_realtime_ms- Mapped tolatency_msdrop_rate_percent/drop_rate- Mapped todrop_rate_percentexpected_frequency- Mapped toexpected_frequency_hz
A bounded cache (max 512 topics) with LRU eviction prevents unbounded memory growth.
Fault Manager Integration
The plugin queries the fault manager via PluginContext::list_all_faults() to
detect stale topics. A topic is considered stale when there is a confirmed critical
fault whose fault code matches the topic name (after normalization). Stale topics
cause their edges to be marked as "error" with reason "topic_stale".
Entity Cache
On HTTP requests, the plugin rebuilds the graph from the current entity cache
(PluginContext::get_entity_snapshot()) rather than serving the potentially stale
introspection-pipeline cache. This ensures the HTTP endpoint always reflects the
latest node and topic state.
Function Scoping
Graph documents are scoped to individual Function entities. The plugin resolves
which apps belong to a function by checking the function’s hosts list against
app IDs and component IDs. Only topics that connect scoped apps appear as edges.
System topics (/parameter_events, /rosout, /diagnostics, NITROS topics)
are filtered out.
Design Decisions
Extracted Plugin
The graph provider was extracted from the gateway core into a standalone plugin
package. This follows the same pattern as the linux introspection plugins: the
gateway loads graph_provider as a .so at runtime, keeping the core gateway
free of diagnostics-specific logic. The plugin can be omitted from deployments
that do not need dataflow visualization.
Static build_graph_document
The build_graph_document() method is static and takes all inputs explicitly
(function_id, IntrospectionInput, GraphBuildState, GraphBuildConfig, timestamp).
This makes the graph generation logic fully testable without instantiating the
plugin or its ROS 2 dependencies.
Per-Function Config Overrides
The plugin supports per-function configuration overrides for thresholds (expected frequency, degraded ratio, drop rate). This allows operators to set different health baselines for different subsystems - for example, a camera pipeline at 30 Hz vs. a LiDAR pipeline at 10 Hz.
Cyclic Subscription Sampler
The plugin registers a sampler via PluginContext::register_sampler() for the
x-medkit-graph resource. This allows clients to create cyclic subscriptions
that receive periodic graph snapshots over SSE, enabling live dashboard updates
without polling the HTTP endpoint.