Fault Manager Configuration
The ros2_medkit_fault_manager node aggregates and manages faults from multiple sources.
This page documents all configuration parameters.
Basic Configuration
Storage
fault_manager:
ros__parameters:
storage_type: "sqlite" # Storage backend: "sqlite" or "memory"
database_path: "/var/lib/ros2_medkit/faults.db" # Path for sqlite storage
Parameter |
Default |
Description |
|---|---|---|
|
|
Storage backend. |
|
|
File path for SQLite database. Directory must exist and be writable. |
Debounce Settings
The fault manager uses AUTOSAR DEM-style debounce filtering to prevent fault flapping.
fault_manager:
ros__parameters:
confirmation_threshold: -1 # Counter threshold to confirm fault
healing_enabled: false # Enable auto-healing via PASSED events
healing_threshold: 3 # Counter threshold to heal fault
auto_confirm_after_sec: 0.0 # Auto-confirm timeout (0 = disabled)
Parameter |
Default |
Description |
|---|---|---|
|
|
Number of FAILED events to confirm fault. Negative values mean more events needed.
Use |
|
|
When true, PASSED events can heal confirmed faults. |
|
|
Number of PASSED events to transition from CONFIRMED to HEALED. |
|
|
Auto-confirm prefailed faults after this duration. Set to 0 to disable. |
Tip
For immediate fault confirmation (no debounce), set confirmation_threshold: 0.
Faults with SEVERITY_CRITICAL always bypass debounce regardless of this setting.
Snapshot Configuration
Snapshots capture diagnostic data when faults occur.
Basic Snapshot Settings
fault_manager:
ros__parameters:
snapshots:
enabled: true # Enable snapshot capture
background_capture: false # Capture in background thread
timeout_sec: 1.0 # Timeout for topic sampling
max_message_size: 65536 # Max message size in bytes (64KB)
default_topics: [] # Topics to capture for all faults
config_file: "" # Path to YAML config file
Parameter |
Default |
Description |
|---|---|---|
|
|
Master switch to enable/disable snapshot capture. |
|
|
Capture snapshots in background thread (non-blocking). |
|
|
Timeout for sampling each topic. |
|
|
Maximum message size to capture (bytes). Larger messages are truncated. |
|
|
List of topics to capture for all faults. |
|
|
Path to YAML file with fault-specific snapshot configurations. |
Rosbag Recording
Capture continuous rosbag recordings around fault events.
fault_manager:
ros__parameters:
snapshots:
rosbag:
enabled: false # Enable rosbag recording
duration_sec: 5.0 # Pre-fault buffer duration
duration_after_sec: 1.0 # Post-fault recording duration
topics: "config" # Topic selection: "config", "all", or "none"
include_topics: [] # Additional topics to include
exclude_topics: [] # Topics to exclude
lazy_start: false # Start recording on first fault
format: "sqlite3" # Storage format
storage_path: "" # Custom storage path
max_bag_size_mb: 50 # Max size per bag file
max_total_storage_mb: 500 # Max total storage
auto_cleanup: true # Auto-delete old bags
Parameter |
Default |
Description |
|---|---|---|
|
|
Enable rosbag recording for snapshots. |
|
|
Duration of pre-fault circular buffer. |
|
|
How long to record after fault. |
|
|
Topic selection mode: |
|
|
Start recording only when first fault occurs. |
|
|
Maximum size per rosbag file (MB). |
|
|
Maximum total storage for all rosbags (MB). |
|
|
Automatically delete oldest rosbags when storage limit reached. |
See also
Configuring Snapshot Capture for detailed snapshot configuration examples.
Correlation Configuration
Fault correlation identifies root causes and filters symptom faults.
fault_manager:
ros__parameters:
correlation:
config_file: "/path/to/correlation_rules.yaml"
cleanup_interval_sec: 5.0 # Interval for cleanup tasks
Parameter |
Default |
Description |
|---|---|---|
|
|
Path to YAML file defining correlation rules. |
|
|
Interval for running correlation cleanup tasks. |
See also
Configuring Fault Correlation for correlation rule syntax and examples.
Complete Example
fault_manager:
ros__parameters:
# Storage
storage_type: "sqlite"
database_path: "/var/lib/ros2_medkit/faults.db"
# Debounce (require 3 FAILED events to confirm)
confirmation_threshold: -3
healing_enabled: true
healing_threshold: 3
auto_confirm_after_sec: 30.0
# Snapshots
snapshots:
enabled: true
background_capture: true
timeout_sec: 2.0
max_message_size: 131072
default_topics:
- /diagnostics
- /rosout
config_file: "/etc/ros2_medkit/snapshot_config.yaml"
rosbag:
enabled: true
duration_sec: 10.0
duration_after_sec: 2.0
topics: "config"
max_bag_size_mb: 100
max_total_storage_mb: 1000
auto_cleanup: true
# Correlation
correlation:
config_file: "/etc/ros2_medkit/correlation_rules.yaml"
cleanup_interval_sec: 10.0
See Also
Configuring Snapshot Capture - Diagnostic snapshot configuration
Configuring Fault Correlation - Fault correlation rules
Message Definitions - Message definitions (Fault.msg, FaultEvent.msg)
ros2_medkit_fault_manager - FaultManager architecture