Plugin-Driven Entity Surface Notifications (Plugin API v7)

Motivation

A plugin that implements UpdateProvider (or any other plugin that mutates the entity surface at runtime) has two mechanisms available in v6:

  • declare the entity in the base manifest file (static, known at gateway startup).

  • rely on runtime discovery to notice a new ROS 2 node appearing on the graph.

Neither covers the OTA install case. The installer drops a new app on disk, starts it, and wants the operator’s client (Web UI, MCP tool, Foxglove panel) to see the change immediately. Runtime discovery catches the new node within a few seconds but shows it as source: orphan - it is not attached to any component so the entity tree looks inconsistent. On plugin rollback the opposite problem appears: the manifest entry (if it existed) lingers even though the app is gone.

v7 adds two pieces that together close the gap:

  1. PluginContext::notify_entities_changed(EntityChangeScope) - a plugin-driven refresh trigger.

  2. discovery.manifest.fragments_dir - a directory of drop-in manifest yaml chunks that the gateway scans on every manifest load / reload.

Neither is OTA-specific. The same pair is usable by rosbag injectors, dynamic config deployers, hot-reloadable adapters - any plugin whose correct behaviour depends on adding or removing entities at runtime.

Lifecycle

(1) Plugin installs new app on disk
(2) Plugin writes /path/to/fragments/<deploy-id>.yaml declaring
    the app (is_located_on, ros_binding, ...)
(3) Plugin starts the app process
(4) Plugin calls
      ctx.notify_entities_changed(
          EntityChangeScope::for_component("my-ecu"));
(5) Gateway re-parses base manifest and re-scans fragments_dir,
    rebuilds the entity cache, runs a full discovery cycle
(6) Client's next GET /apps / GET /components/{id}/apps reflects
    the new app

Rollback reverses the sequence:

(7)  Plugin stops the app process
(8)  Plugin deletes /path/to/fragments/<deploy-id>.yaml
(9)  Plugin calls
       ctx.notify_entities_changed(
           EntityChangeScope::for_component("my-ecu"));
(10) Gateway re-loads, app is no longer in the merged manifest,
     cache drops it, clients refresh to find it gone

The scope hint in steps (4) / (9) is informational in v7: the gateway always performs a full refresh_cache(). A future optimisation may restrict the pass to the named area or component subtree; the plugin API does not change when that lands.

Fragment merge rules

Files in discovery.manifest.fragments_dir are parsed with ManifestParser::parse_fragment_file and merged on top of the loaded base manifest before validation runs.

Allowed in a fragment:

  • apps - appended to Manifest::apps

  • components - appended to Manifest::components

  • functions - appended to Manifest::functions

Forbidden in a fragment (owned by the base manifest):

  • areas

  • metadata (any field - name, description, version, created_at)

  • discovery

  • scripts

  • capabilities (vendor extensions)

  • lock_overrides

A fragment that declares any forbidden top-level field fails the load with a FRAGMENT_FORBIDDEN_FIELD validation error. Each forbidden field in a fragment is reported separately so a single load reports every violation, not just the first. manifest_version is optional in fragments - when omitted a synthetic "1.0" is injected before parsing.

File ordering is deterministic: fragment files in the directory are sorted by full path before being merged. Duplicate IDs across the combined manifest (base + every fragment) are caught by the normal validator run and cause the load to fail with the same error the user would see from a single file, plus the offending fragment path in the error message.

A missing fragments directory is not an error - the plugin can create the directory lazily on first install.

All-or-nothing fragment contract

Fragment loading is intentionally atomic: if ANY fragment in fragments_dir fails to parse, exceeds the size limit, declares a forbidden top-level field, or causes the merged manifest to fail validation, the entire load_manifest / reload_manifest call returns failure. The previously-loaded manifest stays active until the next successful reload. One broken fragment therefore blocks every fragment in that directory (including valid ones) from taking effect.

The contract is deliberate - a partially-applied fragment set is worse than the pre-existing manifest, because it produces a client-visible entity tree that neither matches the old deployment nor the intended new one. Plugins that want independent failure domains should use separate fragment directories per domain.

Plugin write contract (TOCTOU and size safety)

To keep the all-or-nothing semantics from being tripped by a racing reader, plugins that write into fragments_dir MUST publish each fragment atomically:

  1. write the final content to fragments_dir/.tmp-<id>.yaml and fsync it;

  2. rename() to fragments_dir/<id>.yaml (POSIX rename is atomic within one filesystem).

The gateway scans the directory on every reload and calls notify_entities_changed runs the reload synchronously on the caller’s thread, so a partial write observed between open() and close() can fail the entire merge if the plugin publishes in-place.

Two other reader-side safeguards protect the gateway from malformed inputs:

  • Size cap. Each fragment file is rejected before it is read into memory if it exceeds ManifestParser::kMaxFragmentBytes (1 MiB). This protects against a misconfigured fragments_dir pointing at a log or data file.

  • Symlink scoping. Symlinks ARE followed (k8s ConfigMap mounts rely on that), but any directory entry whose resolved real path is not a descendant of the canonical fragments_dir is skipped with a warning. This stops a evil.yaml -> /etc/shadow escape while keeping ConfigMap use cases working.

Compatibility

  • PLUGIN_API_VERSION bumped from v6 to v7.

  • PluginContext::notify_entities_changed has a default no-op implementation, so plugin source code written against v6 compiles unchanged against v7 headers. No code changes are needed.

  • Binary compatibility is NOT provided: the plugin loader compares the exported plugin_api_version() against the gateway’s PLUGIN_API_VERSION with strict equality, so a .so pre-compiled against v6 IS rejected. In-tree plugins pick up the bump automatically because they return PLUGIN_API_VERSION from the shared header; out-of-tree plugins must be recompiled.

  • A v7 gateway loading a recompiled v6 plugin is fully functional; the new lifecycle hooks are off by default until the plugin opts in by calling notify_entities_changed.