Entity Cache Architecture
Background
The gateway’s ThreadSafeEntityCache holds the live SOVD entity tree
(Areas, Components, Apps, Functions) and is read by every HTTP handler
thread. Prior to issue #442 the cache stored entities in
std::unordered_map containers that reallocated on insert and shifted
slot addresses on erase. Discovery refreshes - which run every 100 ms on
graph-event detection - caused structural allocations even when nothing had
changed, adding heap churn and making the cache unsuitable for
memory-constrained or embedded deployments.
This document describes the replacement architecture introduced in #442.
Goals
Eliminate per-refresh structural allocations in the cache layer for a stable entity graph.
Make heap use predictable: capacity is fixed at startup, not open-ended.
Gate the generation counter on actual changes so polling it on a quiet graph is cheap.
Preserve the existing public API and SOVD REST contract - the changes are entirely below the handler layer.
Object Pool and Flat Index Architecture
The cache uses two building blocks.
SlotStore<T> - a stable-slot object pool. Each entity type (Area,
Component, App, Function) gets its own SlotStore<T>. The store holds
entities in a flat std::vector that is pre-reserved to capacity
slots at construction. Slots are never moved or shifted after allocation;
a freed slot is marked available and reused for the next insert. Slot IDs
are stable for the lifetime of an entity. The store exposes typed
references so callers do not hold raw pointers across mutations.
FlatHashMap<K, V> - a linear-probe open-address hash map pre-reserved to the same capacity at construction. Used for all index maps:
area_index_,component_index_,app_index_,function_index_- map entity ID to slot ID (O(1) lookup)component_to_apps_,area_to_components_, etc. - relationship indexes (parent ID to child slot ID list)operation_index_- operation full_path to owning entity referencetopic_type_cache_- topic name to message type
All these maps are reserved once at ThreadSafeEntityCache construction
and do not rehash as long as the live entity count stays within
capacity. The capacity value comes from the
entity_cache.capacity ROS parameter (default: 256, valid range:
16-1000000; values outside this range are clamped with a warning at
startup).
Incremental Reconcile
update_all (and the per-type variants update_areas,
update_components, update_apps, update_functions) implement an
in-place diff instead of a clear-and-rebuild:
Remove slots whose IDs are absent from the incoming list.
Add incoming IDs that are not yet in the index (allocate a new slot from the pool).
Change slots where an existing entity’s payload differs from the incoming value (in-place overwrite, no slot movement).
Only mutations that fall into categories 1, 2, or 3 advance the generation counter. A no-op call (incoming set is identical to the current cache) leaves the generation unchanged. This allows consumers such as the OpenAPI capability generator to skip expensive regeneration when the graph has not changed.
Zero-Structural-Allocation Property
Once capacity slots are reserved and the entity count stabilises
below that threshold, steady-state update_all calls perform zero
structural allocations in the cache layer: no std::vector resize, no
hash map rehash, no new/delete for index entries. In-place slot
overwrites update entity payloads without touching the pool structure.
Note
Honest boundary. Entity payloads themselves - nlohmann::json
objects (e.g., host_metadata, type_info) and
std::vector<std::string> fields (topic lists, service lists) -
still allocate on the heap inside the discovery layer before the cache
sees them. Making those payloads fixed-capacity (e.g., custom
allocators or pre-sized arenas) is a separate future effort and is
explicitly out of scope for this change. The zero-structural-allocation
guarantee applies to the cache data structures, not to the payloads
stored in them.
Discovery-Side Allocation Under Graph Churn
The cache is only the last step of refresh_cache(). Per-graph-event
allocation profiling showed the dominant churn under a churning graph was in
the discovery pipeline, not the cache rebuild, driven by two things:
Eager schema building.
discover_appsrebuilt the JSON request/ response (and goal/result/feedback) schemas for every service and action in the graph on every refresh, then deep-copied them through several intermediate containers. These schemas are immutable per type and are not read from the discovered entities at all - the/operationshandler already resolves them on demand. Discovery now leavestype_infoempty; the handler resolves it lazily andTypeIntrospectioncaches the assembled per-typetype_infoas a shared, immutable object (get_service_type_info/get_action_type_info), so repeated requests reuse it.Refresh frequency. The rclcpp graph event fires many times per second under churn, running the full discovery pipeline each time. Graph events are now debounced (
discovery.refresh_debounce_ms, default 1 s) so a burst of graph changes coalesces into a single refresh.
Overflow Behaviour
If update_all receives more entities than the reserved capacity,
the pool and maps grow dynamically (the standard resize path). A one-shot
WARN log fires when this happens:
[WARN] entity_cache capacity 256 exceeded (grew); raise
entity_cache.capacity for embedded determinism
The gateway continues operating normally. The WARN fires once per process lifetime to avoid log spam.
Generation Counter
ThreadSafeEntityCache::generation() returns a uint64_t that
increments on each mutation that actually changes the cache contents.
Callers use it as a change token:
auto gen = cache.generation();
// ... later ...
if (cache.generation() != gen) {
rebuild_openapi_spec();
}
Because the counter is gated on real changes, polling it on a stable graph costs nothing beyond an atomic load.
Health Endpoint Stats
GET /api/v1/health exposes entity cache statistics under the
x-medkit-entity-cache vendor-extension key:
{
"status": "healthy",
"x-medkit-entity-cache": {
"capacity": 256,
"areas": 2,
"components": 1,
"apps": 14,
"functions": 3,
"generation": 7,
"grew": false
}
}
Configuration
The cache capacity is set via entity_cache.capacity in
gateway_params.yaml (or any ROS parameter source). See
Server Configuration (Performance Tuning section) for the full parameter
reference.
See Also
ROS 2 Subscription Architecture - Analogous embedded-hardened design for the topic subscription pool
Server Configuration -
entity_cache.capacityparameter reference