ros2_medkit_fault_reporter
This section contains design documentation for the ros2_medkit_fault_reporter library.
Overview
The FaultReporter library provides a simple, reusable API for ROS 2 nodes to report
faults to the central ros2_medkit_fault_manager. It includes optional local
filtering to reduce noise from repeated fault occurrences.
Architecture
The following diagram shows the relationships between the main components.
ROS 2 Medkit Fault Reporter Class Architecture
Main Components
FaultReporter - The main public API for fault reporting
Takes a
rclcpp::Node::SharedPtrandsource_idin constructorCreates a service client to
/fault_manager/report_faultLoads filter configuration from ROS parameters
Provides simple
report(fault_code, severity, description)methodIntegrates
LocalFilterto reduce noise from repeated faultsFire-and-forget service calls (non-blocking)
LocalFilter - Per-fault-code filtering with threshold and time window
Tracks fault occurrences per
fault_codeOnly forwards to FaultManager when threshold is met within time window
High-severity faults (ERROR, CRITICAL) bypass filtering by default
Thread-safe: protected by mutex for concurrent access
Configurable via
FilterConfig
FilterConfig - Configuration for local filtering behavior
enabled: Enable/disable filtering (default: true)default_threshold: Number of reports before forwarding (default: 3)default_window_sec: Time window in seconds (default: 10.0)bypass_severity: Severity level that bypasses filtering (default: ERROR=2)
FaultTracker - Internal state for per-fault-code tracking
Maintains vector of timestamps for recent reports
Expired timestamps are cleaned up on each
should_forward()call
Usage
Basic Example
#include "ros2_medkit_fault_reporter/fault_reporter.hpp"
class MyNode : public rclcpp::Node {
public:
MyNode() : Node("my_node") {
// Create reporter after node is fully constructed
reporter_ = std::make_unique<ros2_medkit_fault_reporter::FaultReporter>(
shared_from_this(), get_fully_qualified_name());
}
void check_sensor() {
if (sensor_error_detected()) {
reporter_->report("SENSOR_FAILURE",
ros2_medkit_msgs::msg::Fault::SEVERITY_ERROR,
"Sensor communication timeout");
}
}
private:
std::unique_ptr<ros2_medkit_fault_reporter::FaultReporter> reporter_;
};
Configuration
Parameters are loaded from the node’s parameter server:
my_node:
ros__parameters:
fault_reporter:
local_filtering:
enabled: true
default_threshold: 3
default_window_sec: 10.0
bypass_severity: 2
Design Decisions
Local Filtering Rationale
Local filtering is enabled by default to address common scenarios where sensors or subsystems may produce repeated fault reports in quick succession. Without filtering, this could flood the central FaultManager and obscure other issues.
The default threshold of 3 reports within 10 seconds balances:
Noise reduction: Transient single occurrences are filtered out
Responsiveness: Persistent issues are reported within seconds
Safety: High-severity faults (ERROR, CRITICAL) bypass filtering entirely
Fire-and-Forget Service Calls
The report() method uses asynchronous service calls without waiting for
responses. This ensures fault reporting never blocks the calling node, which
is important for real-time systems. If the FaultManager service is unavailable,
reports are silently dropped (logged at DEBUG level).
Thread Safety
LocalFilter is protected by a mutex to support nodes that may call
report() from multiple threads (e.g., different callback groups or timers).
The service client from rclcpp is also thread-safe.
Integration with FaultManager
The FaultReporter is designed as a thin client to the FaultManager node:
Decoupled: FaultReporter only depends on
ros2_medkit_msgs, not the managerLightweight: Nodes get simple API without pulling in server-side code
Flexible: Service name is configurable for custom deployments