ros2_medkit_opcua

Design notes for the OPC-UA gateway plugin.

Motivation

Mixed ROS 2 + PLC diagnostics

Industrial robot deployments are rarely pure ROS 2. A typical cell combines ROS 2 manipulators with PLC-controlled fixtures, conveyors, safety relays, and process equipment. Operators want a single diagnostic surface that covers both sides.

The OPC-UA plugin lets ros2_medkit_gateway expose OPC-UA PLC tags as first-class SOVD entities, giving diagnostic tools and MCP agents one consistent API across the whole cell.

Architecture

The plugin is loaded by ros2_medkit_gateway at runtime via dlopen and implements two interfaces:

  • GatewayPlugin - lifecycle (configure, set_context, get_routes, shutdown) and vendor REST endpoints.

  • IntrospectionProvider - emits an IntrospectionResult describing the SOVD entities generated from the PLC node map, so the gateway’s merge pipeline can place them in the entity tree.

OPC-UA server (PLC, OpenPLC, Siemens S7, Beckhoff, ...)
         |  opc.tcp, port 4840
         v
OpcuaClient (open62541pp wrapper)
         |
         v
OpcuaPoller  --- threshold alarms ---> OpcuaPlugin::on_alarm_change
         |                                     |
         |  PollSnapshot                       | ros2_medkit_msgs/ReportFault
         v                                     v  (or ClearFault when alarm clears)
OpcuaPlugin::publish_values          ros2_medkit_fault_manager
         |                                     |
         v                                     v
std_msgs/Float32 topics            SOVD faults on PLC entities
(one per numeric node)

Node map schema

All mapping lives in YAML (config/tank_demo_nodes.yaml is the reference example):

  • area_id / component_id - SOVD tree placement for the PLC runtime

  • nodes - list of entries, each with:

    • node_id - OPC-UA node identifier (e.g. "ns=2;i=1")

    • entity_id - SOVD app the value belongs to

    • data_name - short name used in REST URLs

    • display_name, unit, data_type, writable - metadata

    • min_value / max_value - optional write range check

    • alarm - optional fault definition (fault_code, severity, threshold, above_threshold direction)

The plugin binary is completely PLC-agnostic; pointing it at a different node map file reuses the same build for a different PLC.

Poll vs subscription mode

The plugin supports two data paths, selected by the prefer_subscriptions config flag:

Polling (default) - a background thread reads all mapped nodes at a fixed interval (poll_interval_ms, default 1000 ms). Simple, predictable, no event loop needed, sufficient for diagnostic use cases where latency below one second does not add value.

Subscription - registers OPC-UA monitored items and receives change notifications at subscription_interval_ms. Lower CPU cost on large node sets, but requires a running OPC-UA client event loop. Kept as an opt-in path because many PLC servers have limits on the number of active subscriptions.

Both modes share the same PollSnapshot output structure, so downstream consumers (OpcuaPlugin::publish_values, REST handlers, fault mapping) are agnostic to the source.

Alarm -> fault mapping

Each node in the map may declare an alarm block. The poller compares the current value against the configured threshold on every poll; transitions across the threshold produce edge-triggered callbacks:

  • Active -> fault reported via /fault_manager/report_fault with the configured severity and message

  • Cleared -> fault cleared via /fault_manager/clear_fault by fault code

The plugin keeps per-fault state only long enough to detect edges; the fault manager owns persistence and fault lifecycle.

Type-aware writes

POST /apps/{id}/x-plc-operations/{op} accepts a JSON body {"value": ...}. The handler:

  1. Looks up the node by data_name (with optional set_ prefix in the op name).

  2. Rejects read-only nodes with a 400 error.

  3. Coerces the JSON value to the node’s declared data_type (bool, int, string, otherwise double) and returns a typed 400 on mismatch.

  4. Applies the range check (min_value / max_value) if configured.

  5. Calls OpcuaClient::write_value which uses the node’s OPC-UA data type to build the correct binary payload.

This avoids the common failure mode of writing a float64 to an OPC-UA REAL (float32) node and getting silent truncation or a server-side rejection.

ROS 2 topic bridge

set_context creates one std_msgs/Float32 publisher per numeric node, named from the ros2_topic field of the node map. After every poll, publish_values pushes the current values onto these topics. Non-numeric nodes (strings) are skipped with a one-time warning per node.

This lets ROS 2 consumers (rqt plots, nav2 behavior trees, diagnostic aggregators) consume PLC state without having to speak OPC-UA directly.

Security

The current implementation uses anonymous OPC-UA authentication with SecurityPolicy=None. This is explicitly not suitable for untrusted networks - the plugin logs a warning on startup. Proper username/password and certificate-based authentication are planned.

Supported PLC servers

The plugin speaks plain OPC-UA and supports all four node identifier types defined by OPC 10000-6 section 5.3.1.10 (i= numeric, s= string, g= GUID, b= opaque ByteString). This means the same binary works against any compliant OPC-UA server without a code change - only the node_map.yaml changes per vendor. config/tank_demo_nodes.yaml carries ready-to-paste examples for OpenPLC (this demo), Siemens S7-1500 TIA Portal, Beckhoff TwinCAT 3, Allen-Bradley via Kepware, and KUKA KR C5.

What the plugin can NOT do today (regardless of vendor):

  • Complex OPC-UA types: only scalar Variables (int, float, bool, string) are mapped. Structures, arrays, enums, unions and ExtensionObjects are ignored.

  • Vendor information models: profiles like Euromap 77 (injection molding), Siemens DI (device integration), and PA-DIM (process automation) are not interpreted natively. Tags exposed through those models can still be mapped manually via their node IDs, but the plugin does not understand the model semantics.

  • Native OPC-UA Alarms & Conditions: the plugin detects alarms by applying thresholds to polled values. Servers that publish native AlarmCondition events (e.g. Siemens AlarmConditionType) are not subscribed to.

Future work

Tracked issues in the ros2_medkit issue tracker:

  • #367 Certificate-based OPC-UA authentication (Basic256Sha256)

  • #368 Optional auto-browse of the OPC-UA address space to seed the node map (deferred pending validated user demand; UaExpert and python -m asyncua.tools.uals are recommended for now)

  • #366 Vendor open62541pp inline so the package can be added to the rosdistro release list

Untracked (open the issue if you hit the pain):

  • Hot-reload of the node map without restarting the plugin

  • Complex OPC-UA type support (structures, arrays, enums)

  • Vendor information model bindings (Euromap 77, Siemens DI, PA-DIM)

Native AlarmConditionType event subscription (issue #386)

The plugin subscribes to native OPC-UA Part 9 AlarmConditionType events emitted by vendor PLCs, in addition to the threshold-based polling path. Both modes coexist in a single node_map.yaml (different YAML keys) and feed the same fault_manager service.

Configuration

Two YAML forms describe alarms; an entry can use one but never both:

nodes:
  # Threshold-based (existing): polls the scalar value and raises a fault
  # when it crosses the configured threshold.
  - node_id: 'ns=2;i=2'
    entity_id: tank_process
    data_name: tank_temperature
    data_type: float
    alarm:
      fault_code: TANK_OVERHEAT
      severity: ERROR
      threshold: 80.0
      above_threshold: true

event_alarms:
  # Native AlarmConditionType (new): subscribes to events emitted from
  # the source NodeId and bridges them through the state machine below.
  - alarm_source: 'ns=4;s=Alarms.Overpressure'
    entity_id: tank_process
    fault_code: PLC_OVERPRESSURE
    severity_override: ERROR        # optional - else derived from event Severity
    message: 'Tank overpressure'    # optional - else event Message field

State machine

Inputs from each event payload (positional EventFilter select clauses):

  • EnabledState.Id - bool

  • ShelvingState.CurrentState.Id - NodeId; non-Unshelved => suppressed

  • ActiveState.Id - bool

  • AckedState.Id - bool

  • ConfirmedState.Id - bool

  • BranchId - NodeId; non-null means historical branch (Part 9 §5.5.2.12)

Decision order, first match wins:

#

Condition

Outcome

1

BranchId != null

history-only (no SOVD update)

2

EnabledState == false

clear if was active, else no-op

3

ShelvingState/CurrentState/Id in {TimedShelved (i=2930), OneShotShelved (i=2932)}

clear if was active, else no-op. A null/unset/unknown Id is treated as Unshelved (some servers leave the optional field uninitialized).

4

ActiveState == true

CONFIRMED (idempotent)

5a

ActiveState == false and not (Acked and Confirmed)

internal HEALED state; /faults still shows CONFIRMED (see note below)

5b

ActiveState == false, Acked == true, Confirmed == true

CLEARED

Retain is intentionally NOT used for state determination. Per Part 9 §5.5.2.10 it controls visibility during ConditionRefresh bursts only; lifecycle is driven entirely by Active / Acked / Confirmed. The SOVD PREFAILED state has no native equivalent and is reserved for the threshold-polling pre-trigger path.

Note

HEALED is internal-only. Rule 5a transitions the bridge’s internal SovdAlarmStatus to Healed and emits the ReportHealed action, but OpcuaPlugin::on_event_alarm deliberately treats that action as a no-op. The reason: ros2_medkit_msgs/srv/ReportFault has only EVENT_FAILED / EVENT_PASSED verbs, and routing EVENT_PASSED on every latch would feed the fault_manager debounce engine - which has its own statistical HEALED semantics (healing_threshold cycles of PASSED events) and may auto-clear the fault, defeating Part 9’s mandatory ack/confirm contract. Until ros2_medkit_msgs/msg/Fault gains a STATUS_LATCHED (or equivalent) value, /faults keeps status=CONFIRMED for a latched alarm; the next Cleared (rule 5b) removes the entry.

This is a known UX gap: an operator looking at /faults cannot distinguish “alarm physically active” from “alarm physically cleared, awaiting confirm”. The ack/confirm SOVD operations work end-to-end; the visibility limitation is tracked as a follow-up.

Severity mapping

OPC-UA severity is a 1-1000 scalar. The plugin maps it to selfpatch’s SOVD severity buckets:

  • 1-200 -> INFO

  • 201-500 -> WARNING

  • 501-800 -> ERROR

  • 801-1000 -> CRITICAL

This is the selfpatch convention, not IEC 62682 - that spec defines a 1-1000 priority scale but no normative band names. severity_override on an event_alarms entry takes precedence when set.

ConditionRefresh

After creating event monitored items the plugin invokes ConditionRefresh (Server object i=2253, method i=3875) so the server pushes any condition that fired before the subscription started. The same call fires on every successful reconnect.

The bracketing RefreshStartEventType (i=2787) and RefreshEndEventType (i=2788) are recognized and used to set a diagnostic flag; live notifications arriving during the burst are applied normally because the state machine is driven by per-condition ConditionId and runs idempotently.

Acknowledge / Confirm round-trip

Two SOVD operations appear on every entity that has at least one event-mode alarm declared:

  • POST /apps/{entity}/operations/acknowledge_fault/executions

  • POST /apps/{entity}/operations/confirm_fault/executions

Body:

{ "fault_code": "PLC_OVERPRESSURE", "comment": "operator on radio" }

The plugin resolves (entity_id, fault_code) to the live ConditionId maintained by the poller, then calls the inherited AcknowledgeableConditionType method (Acknowledge i=9111 or Confirm i=9113) on that NodeId. The latest EventId ByteString captured from the most recent notification is passed as the first argument; without it servers return BadEventIdUnknown (Part 9 §5.7.3).

Vendor matrix

Vendor / runtime

AlarmConditionType

Notes

Siemens S7-1500

yes (FW V2.9+)

ProDiag, Program_Alarm, system diagnostics

Beckhoff TwinCAT 3

yes (TF6100)

Confirm propagates to PLC code; Ack does not

Rockwell ControlLogix

yes (via FactoryTalk Linx FW 16.20+)

Tag-based alarms bridged by the gateway

CodeSys 3.5+

yes (alarm manager provider library)

Custom severity mapping

OpenPLC v3

no

Scalar variables only; use threshold mode

Out of scope

  • ShelvingState write operations (TimedShelve / OneShotShelve / Unshelve). The plugin reads the state to suppress active alarms but does not yet expose operator UI to set it.

  • OPC-UA branch reasoning beyond BranchId-based suppression. Re-fires are tracked via fault_manager occurrence_count plus the /faults/stream SSE history.

  • Auto-discovery of alarm sources via Server.GeneratedEvents browse (tracked in #368 alongside scalar auto-discovery).

  • Quality (StatusCode) propagation to a SOVD status_quality field. Requires an additive field on the ReportFault.srv schema; tracked separately.