FlexGalaxy.AI Platform Architecture¶
Overview¶
FlexGalaxy.AI is an Application Platform as a Service (APaaS) for building robotic and agentic applications. Developers build applications on top of the platform’s horizontal services. Syrius Robotics is the first developer, dogfooding the platform by building its own products on the same APIs and interfaces available to third-party developers.
Core Principle¶
No privileged access. First-party applications (WES, ClearJanitor, SiteView, OpsFlow) use the same public APIs, the same authentication (DotID/OIDC), and the same constraints that any third-party developer would encounter.
Platform Service Map¶
Data Plane — The Physical World Interface¶
Service |
Responsibility |
|---|---|
DeviceAdmin |
Device enrollment, alert monitoring, and OTA status visibility. The unified device management interface — wraps ThingsBoard for connectivity and alerts, reads hawkBit for update status. Developers never interact with ThingsBoard directly. |
ThingIO |
IoT data processing, analytics, and dashboard visualization. Built on Apache StreamPipes. Applications register custom dashboards, widgets, and data pipelines scoped to their enrolled devices. |
OTAForge |
OTA artifact management and policy-driven rollout automation. Built on Eclipse hawkBit. CI/CD access keys for artifact publishing, policy-driven rollouts (polls DeviceAdmin, evaluates policies, auto-creates rollouts), proxied DDI endpoint, and distributor portal. |
Identity Plane — Who and What Has Access¶
Service |
Responsibility |
|---|---|
DotID |
Identity and access management, modeled after AWS IAM and Organizations. Built on Keycloak. The umbrella for all identity concerns. |
Org Service |
Account hierarchy and organization structure. Manages multi-tenant relationships — organizations, member accounts, sites, and resources across a three-tier model. |
IAM Identity Center |
Cross-account access and SSO. Enables visibility and permission sharing across organizational boundaries (e.g., leasing company sees assets across multiple contractors). |
StarGate |
Internal web application for administering the Identity Plane. Named in the spirit of AWS Gandalf — the gatekeeper. |
DotID Subsystem Relationship:
DotID (umbrella)
├── Org Service (who belongs where)
├── IAM Identity Center (who can access what, across accounts)
└── StarGate (admin UI for managing it all)
Spatial Plane — Where Things Are¶
Service |
Responsibility |
|---|---|
Equator |
Spatial data management. Source of truth for maps, floor plans, zones, coordinates, racks, charging stations, landmarks, geofencing boundaries. |
Marie |
Spatial query engine. Named after Marie Tharp, the oceanographic cartographer. Uses Equator’s map data to fulfill queries from applications — proximity searches, coverage analysis, path clearance, zone-based lookups. |
Relationship: Equator is the data, Marie is the intelligence on top of it. Developers interact with Marie; Equator manages the underlying spatial model.
Intelligence Plane — What To Do, When, and How¶
Service |
Responsibility |
|---|---|
Planner |
Strategic, multi-resource, order-level task decomposition and goal reasoning. Operates at the platform/cloud level with a global view of all resources. |
Scheduler |
Resource allocation and time optimization. Supports one-time scheduling (order-driven, e.g., WES), recurring/cron-style scheduling (schedule-driven, e.g., ClearJanitor), and conditional scheduling (event-driven, future). |
Execution Manager |
Orchestrates and monitors task execution. Manages the execution contract lifecycle — pushing contracts to edge devices, maintaining projected state during offline periods, reconciling actual state upon reconnection. |
Governance Plane — Rules That Guide All Decisions¶
Service |
Responsibility |
|---|---|
Policy Service |
Stores, serves, and enforces policies. Every decision-making service in the Intelligence Plane consults the Policy Service before acting. |
Policy Validator |
Conflict detection, safety checks, deadlock prevention, coverage gap analysis. Mandatory validation before any policy goes live. |
AI Policy Agent |
Conversational policy creation. A platform service that uses app-provided Domain Policy Schemas to understand domain context, enabling it to translate user intent into formal policies through natural dialogue. |
Policy Hierarchy (enforcement order):
Platform defaults (safety, physics, hard limits) — cannot be overridden
↓
App-defined policies (developer presets) — can be customized within bounds
↓
User-defined policies (via AI Policy Agent) — validated before activation
AI Policy Agent Architecture:
The Agent is domain-agnostic at its core. Each application registers a Domain Policy Schema that provides the domain knowledge the Agent needs:
Platform provides: App injects:
├─ Conversational engine ├─ Domain ontology (entities, actions)
├─ Policy syntax understanding ├─ Domain constraints
├─ Validation logic ├─ Available strategies
└─ Common concepts └─ Example policies / best practices
(devices, zones, time)
The Agent operates in two modes:
Creation mode — help users set up new policies through conversation
Audit mode — explain current policies, simulate “what would happen if…” scenarios
Ecosystem Plane — Distribution¶
Service |
Responsibility |
|---|---|
Marketplace |
Application and component distribution. Where developers publish apps and reusable components. |
IoT Layer — Technology Stack¶
The IoT Layer defines the open-source platforms underneath the Data Plane services. Each Data Plane service wraps an IoT platform, providing FlexGalaxy API conventions, account scoping, and access control.
Data Plane Service |
Wraps |
License |
|---|---|---|
DeviceAdmin |
ThingsBoard CE — device registry, multi-protocol connectivity (MQTT, CoAP, LwM2M, HTTP), telemetry ingestion, rule engine alerts |
Apache 2.0 |
ThingIO |
Apache StreamPipes — stream analytics, data pipelines, ML inference, visual pipeline builder |
Apache 2.0 |
OTAForge |
Eclipse hawkBit + Hara — OTA updates, artifact delivery, policy-driven rollouts, proxied DDI |
EPL 2.0 |
Integration pattern:
DeviceAdmin ThingIO
(enrollment, (APIs + React SDK)
alerts)
│ │
│ delegates to ▼
│ Provisioning Kafka Kafka
│ Service Cluster A Cluster B
▼ │ │
Devices ──MQTT──► ThingsBoard ──Rule Engine──►│ replicate │──► StreamPipes
│ └──────►──────┘
Telemetry DB
OTAForge
(artifacts, policies,
proxied DDI)
│
│ DDI proxy
▼
Devices ──DDI──► OTAForge ──► hawkBit
(policy-driven rollouts,
polls DeviceAdmin)
DeviceAdmin delegates device provisioning to the Provisioning Service, which orchestrates ThingsBoard, hawkBit, and DotID. CI/CD pipelines publish artifacts to OTAForge via access keys (AK/SK); rollouts are created automatically by policy evaluation. Devices poll OTAForge’s proxied DDI endpoint, not hawkBit directly. For how these services support multi-party supply chains with cross-account OTA and ownership transfer, see the Supply Chain OTA and Ownership pattern.
Execution Contract Model¶
When the platform assigns a task to an edge device (robot, PDA), it pushes an execution contract — the task definition, scoped actions, and required data (e.g., map from Equator). At the pushing moment, connectivity is confirmed.
Three Device States¶
State |
Meaning |
Platform Behavior |
|---|---|---|
Live |
Connected, real-time telemetry |
Platform sees truth directly |
Projected |
Offline, no telemetry |
Platform estimates based on the contract scope |
Reconciled |
Device reconnects, reports actuals |
Actual state corrects the projection, triggers replanning if needed |
Two-Tier Planning¶
Tier |
Location |
Scope |
Nature |
|---|---|---|---|
Platform Planner |
Cloud (FlexGalaxy.AI) |
Global, multi-resource |
Strategic — “which resources handle which tasks” |
Edge Planner |
Local (robot/PDA) |
Single-agent, local |
Tactical, reactive — “obstacle ahead, reroute” |
These are fundamentally different planners solving different problems. They do not share logic.
Partial Connectivity Replanning¶
When some devices go offline:
Connected devices → replan freely, reassign tasks as needed
Offline devices → hold their contracted tasks as projected state, do not reassign
On reconnection → reconcile actual state, trigger global replanning if divergence detected
Contract Failure Recovery¶
Handled by the Policy Service, not hardcoded:
Application defines failure recovery policies (via developer presets or user-defined policies)
Execution Manager detects contract failure
Consults Policy Service for the applicable strategy
Executes recovery: retry, reassign, escalate, hold, or abort
Application Validation¶
WES — Warehouse Execution Management System¶
Order-driven, high-complexity planning, complex edge contracts with multi-step decision points.
Platform services consumed:
Service |
Usage |
|---|---|
DeviceAdmin |
Enroll AMRs, MHEs, PDAs; monitor device alerts and OTA status |
ThingIO |
Fleet dashboards, order throughput analytics, zone utilization widgets |
OTAForge |
Policy-driven firmware and map distribution to warehouse robots |
DotID/StarGate |
Authenticate robots, operators, upstream systems (OMS/WMS) |
Equator |
Warehouse maps, zones, racks, charging stations |
Marie |
Proximity queries, path clearance, zone-based resource lookup |
Planner |
Decompose orders into pick/move/place task sequences |
Scheduler |
One-time task assignment based on resource availability |
Execution Manager |
Orchestrate and monitor multi-step task execution |
Policy Service |
Failure recovery, planning preferences, scheduling priorities |
AI Policy Agent |
Operators define rules conversationally using warehouse domain schema |
Architecture insights surfaced:
Execution contract model
Edge/cloud reconciliation pattern
Policy-driven failure recovery
Application-defined policy with platform execution
ClearJanitor — Commercial Cleaning Robot Management¶
Schedule-driven, simpler edge contracts, multi-tenant ownership model.
Ownership vs Operation:
Model A: Leasing Company → Contractor → End User (ClearJanitor user = Contractor)
Model B: End User buys/rents directly (ClearJanitor user = End User)
ClearJanitor serves whoever operates the robots, regardless of ownership.
Platform services consumed:
Service |
Usage |
|---|---|
DeviceAdmin |
Enroll cleaning robots, track lease status, monitor alerts and OTA status |
ThingIO |
Cleaning coverage heatmaps, battery analytics, robot status dashboards |
OTAForge |
Policy-driven map and firmware distribution to cleaning fleet |
DotID/StarGate |
Authenticate operators, distinguish contractor vs end-user roles |
Org Service |
Model leasing company → contractor → end-user hierarchy |
IAM Identity Center |
Leasing company gets read-only visibility across contractors |
Equator |
Building floor plans |
Marie |
Coverage queries, “which floors haven’t been cleaned today?” |
Planner |
Generate cleaning routes/sequences |
Scheduler |
Recurring cleaning jobs (nightly, weekly, conditional) |
Execution Manager |
Monitor robots during cleaning, handle stuck/error states |
Policy Service |
Battery thresholds, time-of-day restrictions, zone rules |
AI Policy Agent |
Operators define cleaning policies conversationally |
Architecture insights surfaced:
Multi-tenant org hierarchy (asset owner ≠ operator)
Recurring/cron-style scheduling need
Simpler edge contracts (single-path cleaning vs multi-step warehouse picks)
Same execution contract model, different payload complexity
Cross-App Architecture Validation Summary¶
Architecture Element |
Validated By |
|---|---|
Execution contract model |
WES (complex), ClearJanitor (simple) |
Policy Service with domain schemas |
Both — different domains, same mechanism |
Edge/cloud reconciliation |
Both — different complexity levels |
Org Service multi-tenancy |
ClearJanitor (leasing chain) |
One-time scheduling |
WES (order-driven) |
Recurring scheduling |
ClearJanitor (schedule-driven) |
AI Policy Agent reusability |
Both — domain schema makes it domain-agnostic |
No privileged access principle |
Both — same APIs as third-party developers |
Supply chain OTA + ownership transfer |
Supply Chain OTA and Ownership (cross-account rollout, multi-tier scoping) |