Execution Contract Pattern¶
Context¶
Edge devices (AMRs, cleaning robots, PDAs) operate in environments where network connectivity is intermittent. The platform cannot assume a persistent connection to devices once they begin executing a task. This creates a fundamental challenge: how does the platform assign work, maintain awareness of progress, and recover from failures when the link between cloud and edge may drop at any time?
Hardcoding task instructions directly into device firmware is too rigid. Sending bare commands over a live connection is too fragile. The platform needs a model that is self-contained enough for offline execution yet structured enough for cloud-side reasoning when the device is unreachable.
Pattern¶
The Contract¶
When the platform assigns a task to a device, it pushes an execution contract. The contract is a self-contained package containing:
Component |
Description |
|---|---|
Task definition |
What the device must accomplish (pick, move, clean, etc.) |
Scoped actions |
The set of actions the device is permitted to take |
Required data |
Spatial data (map excerpt from Equator), zone boundaries, landmarks |
Constraints |
Battery thresholds, time windows, zone restrictions |
Failure policy |
What to do if the contract cannot be completed |
At the moment the contract is pushed, connectivity is confirmed — the platform knows the device received it.
Three Device States¶
Once a contract is pushed, the device enters a state machine managed by the Execution Manager:
┌────────────┐
│ Live │ ◄── Connected, real-time telemetry
└─────┬──────┘
│ connection lost
▼
┌────────────┐
│ Projected │ ◄── Offline, platform estimates based on contract
└─────┬──────┘
│ device reconnects
▼
┌────────────┐
│ Reconciled │ ◄── Actual state corrects projection
└─────┬──────┘
│ divergence? → replan
▼
┌────────────┐
│ Live │
└────────────┘
State |
Meaning |
Platform Behavior |
|---|---|---|
Live |
Connected, real-time telemetry |
Platform sees truth directly |
Projected |
Offline, no telemetry |
Platform estimates based on the contract scope |
Reconciled |
Device reconnects, reports actuals |
Actual corrects projection, triggers replanning if needed |
Two-Tier Planning¶
The execution contract separates planning into two fundamentally different tiers:
Tier |
Location |
Scope |
Nature |
|---|---|---|---|
Platform Planner |
Cloud |
Global, multi-resource |
Strategic — “which resources handle which tasks” |
Edge Planner |
Device |
Single-agent, local |
Tactical, reactive — “obstacle ahead, reroute” |
These planners solve different problems and do not share logic. The platform Planner reasons about resource allocation across the fleet. The edge planner handles real-time obstacle avoidance, local path adjustment, and immediate sensor responses within the bounds of the contract.
Partial Connectivity Replanning¶
When some devices lose connectivity while others remain connected:
Connected devices — replan freely, reassign tasks as needed
Offline devices — hold their contracted tasks as projected state, do not reassign
On reconnection — reconcile actual state vs projection, trigger global replanning if divergence detected
This prevents the platform from double-assigning work (giving an offline device’s task to another device, only to have both attempt it when the original reconnects).
Contract Failure Recovery¶
Failure recovery is policy-driven, not hardcoded:
The Execution Manager detects a contract failure
Consults the Policy Service for the applicable recovery strategy
Executes recovery based on the policy response:
Strategy |
Behavior |
|---|---|
Retry |
Re-attempt the failed action |
Reassign |
Assign the task to the nearest available device |
Escalate |
Notify an operator for manual intervention |
Hold |
Pause execution and wait for conditions to change |
Abort |
Cancel the task entirely |
Applications define failure recovery policies through developer presets or operator-created policies (via the AI Policy Agent).
Consequences¶
Benefits¶
Offline resilience — devices can complete tasks without a persistent connection
No double-assignment — projected state prevents reassigning work that an offline device is still executing
Composable failure handling — recovery strategies are defined by policy, not code, and can be customized per application, per account, per device type
Clean separation — cloud-side strategic planning and edge-side tactical execution operate independently within the contract boundary
Trade-offs¶
Stale projections — the longer a device is offline, the less accurate the projected state becomes; the platform must accept uncertainty
Contract granularity — contracts must be self-contained enough for offline execution, which means including spatial data, constraints, and policies at push time rather than querying them dynamically
Reconciliation cost — when many devices reconnect simultaneously, the reconciliation and replanning burst can be computationally expensive
Examples¶
WES — Complex Multi-Step Contract¶
A warehouse order triggers a contract with multiple decision points:
Contract: Pick Order #4521
├── Step 1: Navigate to Rack B-12 (map excerpt included)
├── Step 2: Pick item SKU-7890 from shelf 3
├── Step 3: Navigate to Packing Station 2
├── Step 4: Place item on conveyor
├── Failure policy: Retry once, then reassign to nearest AMR
└── Constraints: Battery > 20%, complete within 15 min
If the AMR loses connectivity after Step 2, the platform projects it will arrive at Packing Station 2 based on historical travel times. When it reconnects, actual position is reconciled against projection.
ClearJanitor — Simple Single-Path Contract¶
A scheduled cleaning job produces a simpler contract:
Contract: Clean Floor 3, Zone A
├── Route: [waypoint sequence from Marie]
├── Coverage target: 95%
├── Failure policy: Hold and notify operator
└── Constraints: Battery > 15%, complete before 6 AM
Same contract model, different payload complexity. The cleaning robot executes the route autonomously. If it gets stuck, the failure policy triggers operator notification rather than automatic reassignment.