Fault Manager Configuration

The ros2_medkit_fault_manager node aggregates and manages faults from multiple sources. This page documents all configuration parameters.

Basic Configuration 

Storage 

fault_manager:
  ros__parameters:
    storage_type: "sqlite"              # Storage backend: "sqlite" or "memory"
    database_path: "/var/lib/ros2_medkit/faults.db"  # Path for sqlite storage

Parameter	Default	Description
`storage_type`	`sqlite`	Storage backend. `sqlite` persists faults to disk, `memory` keeps in RAM only.
`database_path`	`/var/lib/ros2_medkit/faults.db`	File path for SQLite database. Directory must exist and be writable.

Debounce Settings 

The fault manager uses AUTOSAR DEM-style debounce filtering to prevent fault flapping.

fault_manager:
  ros__parameters:
    confirmation_threshold: -1          # Counter threshold to confirm fault
    healing_enabled: false              # Enable auto-healing via PASSED events
    healing_threshold: 3                # Counter threshold to heal fault
    auto_confirm_after_sec: 0.0         # Auto-confirm timeout (0 = disabled)

Parameter	Default	Description
`confirmation_threshold`	`-1`	Number of FAILED events to confirm fault. Negative values mean more events needed. Use `-3` to require 3 FAILED events before confirmation.
`healing_enabled`	`false`	When true, PASSED events can heal confirmed faults.
`healing_threshold`	`3`	Number of PASSED events to transition from CONFIRMED to HEALED.
`auto_confirm_after_sec`	`0.0`	Auto-confirm prefailed faults after this duration. Set to 0 to disable.

Tip

For immediate fault confirmation (no debounce), set confirmation_threshold: 0. Faults with SEVERITY_CRITICAL always bypass debounce regardless of this setting.

Snapshot Configuration 

Snapshots capture diagnostic data when faults occur.

Basic Snapshot Settings 

fault_manager:
  ros__parameters:
    snapshots:
      enabled: true                     # Enable snapshot capture
      background_capture: false         # Capture in background thread
      timeout_sec: 1.0                  # Timeout for topic sampling
      max_message_size: 65536           # Max message size in bytes (64KB)
      default_topics: []                # Topics to capture for all faults
      config_file: ""                   # Path to YAML config file

Parameter	Default	Description
`snapshots.enabled`	`true`	Master switch to enable/disable snapshot capture.
`snapshots.background_capture`	`false`	Capture snapshots in background thread (non-blocking).
`snapshots.timeout_sec`	`1.0`	Timeout for sampling each topic.
`snapshots.max_message_size`	`65536`	Maximum message size to capture (bytes). Larger messages are truncated.
`snapshots.default_topics`	`[]`	List of topics to capture for all faults.
`snapshots.config_file`	`""`	Path to YAML file with fault-specific snapshot configurations.

Rosbag Recording 

Capture continuous rosbag recordings around fault events.

fault_manager:
  ros__parameters:
    snapshots:
      rosbag:
        enabled: false                  # Enable rosbag recording
        duration_sec: 5.0               # Pre-fault buffer duration
        duration_after_sec: 1.0         # Post-fault recording duration
        topics: "config"                # Topic selection: "config", "all", or "none"
        include_topics: []              # Additional topics to include
        exclude_topics: []              # Topics to exclude
        lazy_start: false               # Start recording on first fault
        format: "sqlite3"               # Storage format
        storage_path: ""                # Custom storage path
        max_bag_size_mb: 50             # Max size per bag file
        max_total_storage_mb: 500       # Max total storage
        auto_cleanup: true              # Auto-delete old bags

Parameter	Default	Description
`rosbag.enabled`	`false`	Enable rosbag recording for snapshots.
`rosbag.duration_sec`	`5.0`	Duration of pre-fault circular buffer.
`rosbag.duration_after_sec`	`1.0`	How long to record after fault.
`rosbag.topics`	`config`	Topic selection mode: `config` (per-fault), `all`, or `none`.
`rosbag.lazy_start`	`false`	Start recording only when first fault occurs.
`rosbag.max_bag_size_mb`	`50`	Maximum size per rosbag file (MB).
`rosbag.max_total_storage_mb`	`500`	Maximum total storage for all rosbags (MB).
`rosbag.auto_cleanup`	`true`	Automatically delete oldest rosbags when storage limit reached.

Correlation Configuration 

Fault correlation identifies root causes and filters symptom faults.

fault_manager:
  ros__parameters:
    correlation:
      config_file: "/path/to/correlation_rules.yaml"
      cleanup_interval_sec: 5.0         # Interval for cleanup tasks

Parameter	Default	Description
`correlation.config_file`	`""`	Path to YAML file defining correlation rules.
`correlation.cleanup_interval_sec`	`5.0`	Interval for running correlation cleanup tasks.

Complete Example 

fault_manager:
  ros__parameters:
    # Storage
    storage_type: "sqlite"
    database_path: "/var/lib/ros2_medkit/faults.db"

    # Debounce (require 3 FAILED events to confirm)
    confirmation_threshold: -3
    healing_enabled: true
    healing_threshold: 3
    auto_confirm_after_sec: 30.0

    # Snapshots
    snapshots:
      enabled: true
      background_capture: true
      timeout_sec: 2.0
      max_message_size: 131072
      default_topics:
        - /diagnostics
        - /rosout
      config_file: "/etc/ros2_medkit/snapshot_config.yaml"
      rosbag:
        enabled: true
        duration_sec: 10.0
        duration_after_sec: 2.0
        topics: "config"
        max_bag_size_mb: 100
        max_total_storage_mb: 1000
        auto_cleanup: true

    # Correlation
    correlation:
      config_file: "/etc/ros2_medkit/correlation_rules.yaml"
      cleanup_interval_sec: 10.0