FlexGalaxy.AI Platform Architecture

Overview

FlexGalaxy.AI is an Application Platform as a Service (APaaS) for building robotic and agentic applications. Developers build applications on top of the platform’s horizontal services. Syrius Robotics is the first developer, dogfooding the platform by building its own products on the same APIs and interfaces available to third-party developers.

Core Principle

No privileged access. First-party applications (WES, ClearJanitor, SiteView, OpsFlow) use the same public APIs, the same authentication (DotID/OIDC), and the same constraints that any third-party developer would encounter.


Platform Service Map

Data Plane — The Physical World Interface

Service

Responsibility

DeviceAdmin

Device enrollment, alert monitoring, and OTA status visibility. The unified device management interface — wraps ThingsBoard for connectivity and alerts, reads hawkBit for update status. Developers never interact with ThingsBoard directly.

ThingIO

IoT data processing, analytics, and dashboard visualization. Built on Apache StreamPipes. Applications register custom dashboards, widgets, and data pipelines scoped to their enrolled devices.

OTAForge

OTA artifact management and policy-driven rollout automation. Built on Eclipse hawkBit. CI/CD access keys for artifact publishing, policy-driven rollouts (polls DeviceAdmin, evaluates policies, auto-creates rollouts), proxied DDI endpoint, and distributor portal.

Identity Plane — Who and What Has Access

Service

Responsibility

DotID

Identity and access management, modeled after AWS IAM and Organizations. Built on Keycloak. The umbrella for all identity concerns.

Org Service

Account hierarchy and organization structure. Manages multi-tenant relationships — organizations, member accounts, sites, and resources across a three-tier model.

IAM Identity Center

Cross-account access and SSO. Enables visibility and permission sharing across organizational boundaries (e.g., leasing company sees assets across multiple contractors).

StarGate

Internal web application for administering the Identity Plane. Named in the spirit of AWS Gandalf — the gatekeeper.

DotID Subsystem Relationship:

DotID (umbrella)
├── Org Service (who belongs where)
├── IAM Identity Center (who can access what, across accounts)
└── StarGate (admin UI for managing it all)

Spatial Plane — Where Things Are

Service

Responsibility

Equator

Spatial data management. Source of truth for maps, floor plans, zones, coordinates, racks, charging stations, landmarks, geofencing boundaries.

Marie

Spatial query engine. Named after Marie Tharp, the oceanographic cartographer. Uses Equator’s map data to fulfill queries from applications — proximity searches, coverage analysis, path clearance, zone-based lookups.

Relationship: Equator is the data, Marie is the intelligence on top of it. Developers interact with Marie; Equator manages the underlying spatial model.

Intelligence Plane — What To Do, When, and How

Service

Responsibility

Planner

Strategic, multi-resource, order-level task decomposition and goal reasoning. Operates at the platform/cloud level with a global view of all resources.

Scheduler

Resource allocation and time optimization. Supports one-time scheduling (order-driven, e.g., WES), recurring/cron-style scheduling (schedule-driven, e.g., ClearJanitor), and conditional scheduling (event-driven, future).

Execution Manager

Orchestrates and monitors task execution. Manages the execution contract lifecycle — pushing contracts to edge devices, maintaining projected state during offline periods, reconciling actual state upon reconnection.

Governance Plane — Rules That Guide All Decisions

Service

Responsibility

Policy Service

Stores, serves, and enforces policies. Every decision-making service in the Intelligence Plane consults the Policy Service before acting.

Policy Validator

Conflict detection, safety checks, deadlock prevention, coverage gap analysis. Mandatory validation before any policy goes live.

AI Policy Agent

Conversational policy creation. A platform service that uses app-provided Domain Policy Schemas to understand domain context, enabling it to translate user intent into formal policies through natural dialogue.

Policy Hierarchy (enforcement order):

Platform defaults (safety, physics, hard limits) — cannot be overridden
    ↓
App-defined policies (developer presets) — can be customized within bounds
    ↓
User-defined policies (via AI Policy Agent) — validated before activation

AI Policy Agent Architecture:

The Agent is domain-agnostic at its core. Each application registers a Domain Policy Schema that provides the domain knowledge the Agent needs:

Platform provides:              App injects:
├─ Conversational engine        ├─ Domain ontology (entities, actions)
├─ Policy syntax understanding  ├─ Domain constraints
├─ Validation logic             ├─ Available strategies
└─ Common concepts              └─ Example policies / best practices
    (devices, zones, time)

The Agent operates in two modes:

  • Creation mode — help users set up new policies through conversation

  • Audit mode — explain current policies, simulate “what would happen if…” scenarios

Ecosystem Plane — Distribution

Service

Responsibility

Marketplace

Application and component distribution. Where developers publish apps and reusable components.


IoT Layer — Technology Stack

The IoT Layer defines the open-source platforms underneath the Data Plane services. Each Data Plane service wraps an IoT platform, providing FlexGalaxy API conventions, account scoping, and access control.

Data Plane Service

Wraps

License

DeviceAdmin

ThingsBoard CE — device registry, multi-protocol connectivity (MQTT, CoAP, LwM2M, HTTP), telemetry ingestion, rule engine alerts

Apache 2.0

ThingIO

Apache StreamPipes — stream analytics, data pipelines, ML inference, visual pipeline builder

Apache 2.0

OTAForge

Eclipse hawkBit + Hara — OTA updates, artifact delivery, policy-driven rollouts, proxied DDI

EPL 2.0

Integration pattern:

                DeviceAdmin                    ThingIO
                (enrollment,                   (APIs + React SDK)
                 alerts)
                    │                              │
                    │ delegates to                  ▼
                    │ Provisioning          Kafka         Kafka
                    │ Service              Cluster A     Cluster B
                    ▼                         │             │
Devices ──MQTT──► ThingsBoard ──Rule Engine──►│  replicate  │──► StreamPipes
                       │                      └──────►──────┘
                  Telemetry DB

                 OTAForge
                 (artifacts, policies,
                  proxied DDI)
                    │
                    │ DDI proxy
                    ▼
Devices ──DDI──► OTAForge ──► hawkBit
                 (policy-driven rollouts,
                  polls DeviceAdmin)

DeviceAdmin delegates device provisioning to the Provisioning Service, which orchestrates ThingsBoard, hawkBit, and DotID. CI/CD pipelines publish artifacts to OTAForge via access keys (AK/SK); rollouts are created automatically by policy evaluation. Devices poll OTAForge’s proxied DDI endpoint, not hawkBit directly. For how these services support multi-party supply chains with cross-account OTA and ownership transfer, see the Supply Chain OTA and Ownership pattern.


Execution Contract Model

When the platform assigns a task to an edge device (robot, PDA), it pushes an execution contract — the task definition, scoped actions, and required data (e.g., map from Equator). At the pushing moment, connectivity is confirmed.

Three Device States

State

Meaning

Platform Behavior

Live

Connected, real-time telemetry

Platform sees truth directly

Projected

Offline, no telemetry

Platform estimates based on the contract scope

Reconciled

Device reconnects, reports actuals

Actual state corrects the projection, triggers replanning if needed

Two-Tier Planning

Tier

Location

Scope

Nature

Platform Planner

Cloud (FlexGalaxy.AI)

Global, multi-resource

Strategic — “which resources handle which tasks”

Edge Planner

Local (robot/PDA)

Single-agent, local

Tactical, reactive — “obstacle ahead, reroute”

These are fundamentally different planners solving different problems. They do not share logic.

Partial Connectivity Replanning

When some devices go offline:

  • Connected devices → replan freely, reassign tasks as needed

  • Offline devices → hold their contracted tasks as projected state, do not reassign

  • On reconnection → reconcile actual state, trigger global replanning if divergence detected

Contract Failure Recovery

Handled by the Policy Service, not hardcoded:

  • Application defines failure recovery policies (via developer presets or user-defined policies)

  • Execution Manager detects contract failure

  • Consults Policy Service for the applicable strategy

  • Executes recovery: retry, reassign, escalate, hold, or abort


Application Validation

WES — Warehouse Execution Management System

Order-driven, high-complexity planning, complex edge contracts with multi-step decision points.

Platform services consumed:

Service

Usage

DeviceAdmin

Enroll AMRs, MHEs, PDAs; monitor device alerts and OTA status

ThingIO

Fleet dashboards, order throughput analytics, zone utilization widgets

OTAForge

Policy-driven firmware and map distribution to warehouse robots

DotID/StarGate

Authenticate robots, operators, upstream systems (OMS/WMS)

Equator

Warehouse maps, zones, racks, charging stations

Marie

Proximity queries, path clearance, zone-based resource lookup

Planner

Decompose orders into pick/move/place task sequences

Scheduler

One-time task assignment based on resource availability

Execution Manager

Orchestrate and monitor multi-step task execution

Policy Service

Failure recovery, planning preferences, scheduling priorities

AI Policy Agent

Operators define rules conversationally using warehouse domain schema

Architecture insights surfaced:

  • Execution contract model

  • Edge/cloud reconciliation pattern

  • Policy-driven failure recovery

  • Application-defined policy with platform execution

ClearJanitor — Commercial Cleaning Robot Management

Schedule-driven, simpler edge contracts, multi-tenant ownership model.

Ownership vs Operation:

Model A: Leasing Company → Contractor → End User (ClearJanitor user = Contractor)
Model B: End User buys/rents directly (ClearJanitor user = End User)

ClearJanitor serves whoever operates the robots, regardless of ownership.

Platform services consumed:

Service

Usage

DeviceAdmin

Enroll cleaning robots, track lease status, monitor alerts and OTA status

ThingIO

Cleaning coverage heatmaps, battery analytics, robot status dashboards

OTAForge

Policy-driven map and firmware distribution to cleaning fleet

DotID/StarGate

Authenticate operators, distinguish contractor vs end-user roles

Org Service

Model leasing company → contractor → end-user hierarchy

IAM Identity Center

Leasing company gets read-only visibility across contractors

Equator

Building floor plans

Marie

Coverage queries, “which floors haven’t been cleaned today?”

Planner

Generate cleaning routes/sequences

Scheduler

Recurring cleaning jobs (nightly, weekly, conditional)

Execution Manager

Monitor robots during cleaning, handle stuck/error states

Policy Service

Battery thresholds, time-of-day restrictions, zone rules

AI Policy Agent

Operators define cleaning policies conversationally

Architecture insights surfaced:

  • Multi-tenant org hierarchy (asset owner ≠ operator)

  • Recurring/cron-style scheduling need

  • Simpler edge contracts (single-path cleaning vs multi-step warehouse picks)

  • Same execution contract model, different payload complexity


Cross-App Architecture Validation Summary

Architecture Element

Validated By

Execution contract model

WES (complex), ClearJanitor (simple)

Policy Service with domain schemas

Both — different domains, same mechanism

Edge/cloud reconciliation

Both — different complexity levels

Org Service multi-tenancy

ClearJanitor (leasing chain)

One-time scheduling

WES (order-driven)

Recurring scheduling

ClearJanitor (schedule-driven)

AI Policy Agent reusability

Both — domain schema makes it domain-agnostic

No privileged access principle

Both — same APIs as third-party developers

Supply chain OTA + ownership transfer

Supply Chain OTA and Ownership (cross-account rollout, multi-tier scoping)