Execution Contract Pattern¶

Context¶

Edge devices (AMRs, cleaning robots, PDAs) operate in environments where network connectivity is intermittent. The platform cannot assume a persistent connection to devices once they begin executing a task. This creates a fundamental challenge: how does the platform assign work, maintain awareness of progress, and recover from failures when the link between cloud and edge may drop at any time?

Hardcoding task instructions directly into device firmware is too rigid. Sending bare commands over a live connection is too fragile. The platform needs a model that is self-contained enough for offline execution yet structured enough for cloud-side reasoning when the device is unreachable.

Pattern¶

The Contract¶

When the platform assigns a task to a device, it pushes an execution contract. The contract is a self-contained package containing:

Component	Description
Task definition	What the device must accomplish (pick, move, clean, etc.)
Scoped actions	The set of actions the device is permitted to take
Required data	Spatial data (map excerpt from Equator), zone boundaries, landmarks
Constraints	Battery thresholds, time windows, zone restrictions
Failure policy	What to do if the contract cannot be completed

At the moment the contract is pushed, connectivity is confirmed — the platform knows the device received it.

Three Device States¶

Once a contract is pushed, the device enters a state machine managed by the Execution Manager:

    ┌────────────┐
    │    Live     │ ◄── Connected, real-time telemetry
    └─────┬──────┘
          │ connection lost
          ▼
    ┌────────────┐
    │ Projected  │ ◄── Offline, platform estimates based on contract
    └─────┬──────┘
          │ device reconnects
          ▼
    ┌────────────┐
    │ Reconciled │ ◄── Actual state corrects projection
    └─────┬──────┘
          │ divergence? → replan
          ▼
    ┌────────────┐
    │    Live     │
    └────────────┘

State	Meaning	Platform Behavior
Live	Connected, real-time telemetry	Platform sees truth directly
Projected	Offline, no telemetry	Platform estimates based on the contract scope
Reconciled	Device reconnects, reports actuals	Actual corrects projection, triggers replanning if needed

Two-Tier Planning¶

The execution contract separates planning into two fundamentally different tiers:

Tier	Location	Scope	Nature
Platform Planner	Cloud	Global, multi-resource	Strategic — “which resources handle which tasks”
Edge Planner	Device	Single-agent, local	Tactical, reactive — “obstacle ahead, reroute”

These planners solve different problems and do not share logic. The platform Planner reasons about resource allocation across the fleet. The edge planner handles real-time obstacle avoidance, local path adjustment, and immediate sensor responses within the bounds of the contract.

Partial Connectivity Replanning¶

When some devices lose connectivity while others remain connected:

Connected devices — replan freely, reassign tasks as needed
Offline devices — hold their contracted tasks as projected state, do not reassign
On reconnection — reconcile actual state vs projection, trigger global replanning if divergence detected

This prevents the platform from double-assigning work (giving an offline device’s task to another device, only to have both attempt it when the original reconnects).

Contract Failure Recovery¶

Failure recovery is policy-driven, not hardcoded:

The Execution Manager detects a contract failure
Consults the Policy Service for the applicable recovery strategy
Executes recovery based on the policy response:

Strategy	Behavior
Retry	Re-attempt the failed action
Reassign	Assign the task to the nearest available device
Escalate	Notify an operator for manual intervention
Hold	Pause execution and wait for conditions to change
Abort	Cancel the task entirely

Applications define failure recovery policies through developer presets or operator-created policies (via the AI Policy Agent).

Consequences¶

Benefits¶

Offline resilience — devices can complete tasks without a persistent connection
No double-assignment — projected state prevents reassigning work that an offline device is still executing
Composable failure handling — recovery strategies are defined by policy, not code, and can be customized per application, per account, per device type
Clean separation — cloud-side strategic planning and edge-side tactical execution operate independently within the contract boundary

Trade-offs¶

Stale projections — the longer a device is offline, the less accurate the projected state becomes; the platform must accept uncertainty
Contract granularity — contracts must be self-contained enough for offline execution, which means including spatial data, constraints, and policies at push time rather than querying them dynamically
Reconciliation cost — when many devices reconnect simultaneously, the reconciliation and replanning burst can be computationally expensive

Examples¶

WES — Complex Multi-Step Contract¶

A warehouse order triggers a contract with multiple decision points:

Contract: Pick Order #4521
├── Step 1: Navigate to Rack B-12 (map excerpt included)
├── Step 2: Pick item SKU-7890 from shelf 3
├── Step 3: Navigate to Packing Station 2
├── Step 4: Place item on conveyor
├── Failure policy: Retry once, then reassign to nearest AMR
└── Constraints: Battery > 20%, complete within 15 min

If the AMR loses connectivity after Step 2, the platform projects it will arrive at Packing Station 2 based on historical travel times. When it reconnects, actual position is reconciled against projection.

ClearJanitor — Simple Single-Path Contract¶

A scheduled cleaning job produces a simpler contract:

Contract: Clean Floor 3, Zone A
├── Route: [waypoint sequence from Marie]
├── Coverage target: 95%
├── Failure policy: Hold and notify operator
└── Constraints: Battery > 15%, complete before 6 AM

Same contract model, different payload complexity. The cleaning robot executes the route autonomously. If it gets stuck, the failure policy triggers operator notification rather than automatic reassignment.