FlexGalaxy.AI Platform Architecture¶

Overview¶

FlexGalaxy.AI is an Application Platform as a Service (APaaS) for building robotic and agentic applications. Developers build applications on top of the platform’s horizontal services. Syrius Robotics is the first developer, dogfooding the platform by building its own products on the same APIs and interfaces available to third-party developers.

Core Principle¶

No privileged access. First-party applications (WES, ClearJanitor, SiteView, OpsFlow) use the same public APIs, the same authentication (DotID/OIDC), and the same constraints that any third-party developer would encounter.

Platform Service Map¶

Data Plane — The Physical World Interface¶

Service	Responsibility
DeviceAdmin	Device enrollment, alert monitoring, and OTA status visibility. The unified device management interface — wraps ThingsBoard for connectivity and alerts, reads hawkBit for update status. Developers never interact with ThingsBoard directly.
ThingIO	IoT data processing, analytics, and dashboard visualization. Built on Apache StreamPipes. Applications register custom dashboards, widgets, and data pipelines scoped to their enrolled devices.
OTAForge	OTA artifact management and policy-driven rollout automation. Built on Eclipse hawkBit. CI/CD access keys for artifact publishing, policy-driven rollouts (polls DeviceAdmin, evaluates policies, auto-creates rollouts), proxied DDI endpoint, and distributor portal.

Identity Plane — Who and What Has Access¶

Service	Responsibility
DotID	Identity and access management, modeled after AWS IAM and Organizations. Built on Keycloak. The umbrella for all identity concerns.
Org Service	Account hierarchy and organization structure. Manages multi-tenant relationships — organizations, member accounts, sites, and resources across a three-tier model.
IAM Identity Center	Cross-account access and SSO. Enables visibility and permission sharing across organizational boundaries (e.g., leasing company sees assets across multiple contractors).
StarGate	Internal web application for administering the Identity Plane. Named in the spirit of AWS Gandalf — the gatekeeper.

DotID Subsystem Relationship:

DotID (umbrella)
├── Org Service (who belongs where)
├── IAM Identity Center (who can access what, across accounts)
└── StarGate (admin UI for managing it all)

Spatial Plane — Where Things Are¶

Service	Responsibility
Equator	Spatial data management. Source of truth for maps, floor plans, zones, coordinates, racks, charging stations, landmarks, geofencing boundaries.
Marie	Spatial query engine. Named after Marie Tharp, the oceanographic cartographer. Uses Equator’s map data to fulfill queries from applications — proximity searches, coverage analysis, path clearance, zone-based lookups.

Relationship: Equator is the data, Marie is the intelligence on top of it. Developers interact with Marie; Equator manages the underlying spatial model.

Intelligence Plane — What To Do, When, and How¶

Service	Responsibility
Planner	Strategic, multi-resource, order-level task decomposition and goal reasoning. Operates at the platform/cloud level with a global view of all resources.
Scheduler	Resource allocation and time optimization. Supports one-time scheduling (order-driven, e.g., WES), recurring/cron-style scheduling (schedule-driven, e.g., ClearJanitor), and conditional scheduling (event-driven, future).
Execution Manager	Orchestrates and monitors task execution. Manages the execution contract lifecycle — pushing contracts to edge devices, maintaining projected state during offline periods, reconciling actual state upon reconnection.

Governance Plane — Rules That Guide All Decisions¶

Service	Responsibility
Policy Service	Stores, serves, and enforces policies. Every decision-making service in the Intelligence Plane consults the Policy Service before acting.
Policy Validator	Conflict detection, safety checks, deadlock prevention, coverage gap analysis. Mandatory validation before any policy goes live.
AI Policy Agent	Conversational policy creation. A platform service that uses app-provided Domain Policy Schemas to understand domain context, enabling it to translate user intent into formal policies through natural dialogue.

Policy Hierarchy (enforcement order):

Platform defaults (safety, physics, hard limits) — cannot be overridden
    ↓
App-defined policies (developer presets) — can be customized within bounds
    ↓
User-defined policies (via AI Policy Agent) — validated before activation

AI Policy Agent Architecture:

The Agent is domain-agnostic at its core. Each application registers a Domain Policy Schema that provides the domain knowledge the Agent needs:

Platform provides:              App injects:
├─ Conversational engine        ├─ Domain ontology (entities, actions)
├─ Policy syntax understanding  ├─ Domain constraints
├─ Validation logic             ├─ Available strategies
└─ Common concepts              └─ Example policies / best practices
    (devices, zones, time)

The Agent operates in two modes:

Creation mode — help users set up new policies through conversation
Audit mode — explain current policies, simulate “what would happen if…” scenarios

Ecosystem Plane — Distribution¶

Service	Responsibility
Marketplace	Application and component distribution. Where developers publish apps and reusable components.

IoT Layer — Technology Stack¶

The IoT Layer defines the open-source platforms underneath the Data Plane services. Each Data Plane service wraps an IoT platform, providing FlexGalaxy API conventions, account scoping, and access control.

Data Plane Service	Wraps	License
DeviceAdmin	ThingsBoard CE — device registry, multi-protocol connectivity (MQTT, CoAP, LwM2M, HTTP), telemetry ingestion, rule engine alerts	Apache 2.0
ThingIO	Apache StreamPipes — stream analytics, data pipelines, ML inference, visual pipeline builder	Apache 2.0
OTAForge	Eclipse hawkBit + Hara — OTA updates, artifact delivery, policy-driven rollouts, proxied DDI	EPL 2.0

Integration pattern:

                DeviceAdmin                    ThingIO
                (enrollment,                   (APIs + React SDK)
                 alerts)
                    │                              │
                    │ delegates to                  ▼
                    │ Provisioning          Kafka         Kafka
                    │ Service              Cluster A     Cluster B
                    ▼                         │             │
Devices ──MQTT──► ThingsBoard ──Rule Engine──►│  replicate  │──► StreamPipes
                       │                      └──────►──────┘
                  Telemetry DB

                 OTAForge
                 (artifacts, policies,
                  proxied DDI)
                    │
                    │ DDI proxy
                    ▼
Devices ──DDI──► OTAForge ──► hawkBit
                 (policy-driven rollouts,
                  polls DeviceAdmin)

DeviceAdmin delegates device provisioning to the Provisioning Service, which orchestrates ThingsBoard, hawkBit, and DotID. CI/CD pipelines publish artifacts to OTAForge via access keys (AK/SK); rollouts are created automatically by policy evaluation. Devices poll OTAForge’s proxied DDI endpoint, not hawkBit directly. For how these services support multi-party supply chains with cross-account OTA and ownership transfer, see the Supply Chain OTA and Ownership pattern.

Execution Contract Model¶

When the platform assigns a task to an edge device (robot, PDA), it pushes an execution contract — the task definition, scoped actions, and required data (e.g., map from Equator). At the pushing moment, connectivity is confirmed.

Three Device States¶

State	Meaning	Platform Behavior
Live	Connected, real-time telemetry	Platform sees truth directly
Projected	Offline, no telemetry	Platform estimates based on the contract scope
Reconciled	Device reconnects, reports actuals	Actual state corrects the projection, triggers replanning if needed

Two-Tier Planning¶

Tier	Location	Scope	Nature
Platform Planner	Cloud (FlexGalaxy.AI)	Global, multi-resource	Strategic — “which resources handle which tasks”
Edge Planner	Local (robot/PDA)	Single-agent, local	Tactical, reactive — “obstacle ahead, reroute”

These are fundamentally different planners solving different problems. They do not share logic.

Partial Connectivity Replanning¶

When some devices go offline:

Connected devices → replan freely, reassign tasks as needed
Offline devices → hold their contracted tasks as projected state, do not reassign
On reconnection → reconcile actual state, trigger global replanning if divergence detected

Contract Failure Recovery¶

Handled by the Policy Service, not hardcoded:

Application defines failure recovery policies (via developer presets or user-defined policies)
Execution Manager detects contract failure
Consults Policy Service for the applicable strategy
Executes recovery: retry, reassign, escalate, hold, or abort

Application Validation¶

WES — Warehouse Execution Management System¶

Order-driven, high-complexity planning, complex edge contracts with multi-step decision points.

Platform services consumed:

Service	Usage
DeviceAdmin	Enroll AMRs, MHEs, PDAs; monitor device alerts and OTA status
ThingIO	Fleet dashboards, order throughput analytics, zone utilization widgets
OTAForge	Policy-driven firmware and map distribution to warehouse robots
DotID/StarGate	Authenticate robots, operators, upstream systems (OMS/WMS)
Equator	Warehouse maps, zones, racks, charging stations
Marie	Proximity queries, path clearance, zone-based resource lookup
Planner	Decompose orders into pick/move/place task sequences
Scheduler	One-time task assignment based on resource availability
Execution Manager	Orchestrate and monitor multi-step task execution
Policy Service	Failure recovery, planning preferences, scheduling priorities
AI Policy Agent	Operators define rules conversationally using warehouse domain schema

Architecture insights surfaced:

Execution contract model
Edge/cloud reconciliation pattern
Policy-driven failure recovery
Application-defined policy with platform execution

ClearJanitor — Commercial Cleaning Robot Management¶

Schedule-driven, simpler edge contracts, multi-tenant ownership model.

Ownership vs Operation:

Model A: Leasing Company → Contractor → End User (ClearJanitor user = Contractor)
Model B: End User buys/rents directly (ClearJanitor user = End User)

ClearJanitor serves whoever operates the robots, regardless of ownership.

Platform services consumed:

Service	Usage
DeviceAdmin	Enroll cleaning robots, track lease status, monitor alerts and OTA status
ThingIO	Cleaning coverage heatmaps, battery analytics, robot status dashboards
OTAForge	Policy-driven map and firmware distribution to cleaning fleet
DotID/StarGate	Authenticate operators, distinguish contractor vs end-user roles
Org Service	Model leasing company → contractor → end-user hierarchy
IAM Identity Center	Leasing company gets read-only visibility across contractors
Equator	Building floor plans
Marie	Coverage queries, “which floors haven’t been cleaned today?”
Planner	Generate cleaning routes/sequences
Scheduler	Recurring cleaning jobs (nightly, weekly, conditional)
Execution Manager	Monitor robots during cleaning, handle stuck/error states
Policy Service	Battery thresholds, time-of-day restrictions, zone rules
AI Policy Agent	Operators define cleaning policies conversationally

Architecture insights surfaced:

Multi-tenant org hierarchy (asset owner ≠ operator)
Recurring/cron-style scheduling need
Simpler edge contracts (single-path cleaning vs multi-step warehouse picks)
Same execution contract model, different payload complexity

Cross-App Architecture Validation Summary¶

Architecture Element	Validated By
Execution contract model	WES (complex), ClearJanitor (simple)
Policy Service with domain schemas	Both — different domains, same mechanism
Edge/cloud reconciliation	Both — different complexity levels
Org Service multi-tenancy	ClearJanitor (leasing chain)
One-time scheduling	WES (order-driven)
Recurring scheduling	ClearJanitor (schedule-driven)
AI Policy Agent reusability	Both — domain schema makes it domain-agnostic
No privileged access principle	Both — same APIs as third-party developers
Supply chain OTA + ownership transfer	Supply Chain OTA and Ownership (cross-account rollout, multi-tier scoping)