Run module stable¶
Purpose & Scope¶
The Run module records the execution layer of CORA's recipe ladder. Where Method, Practice, and Plan describe what should be done and how, a Run records what actually happened: one Run is one execution instance, with batch identity, a finite lifecycle, an immutable audit trail, and references to the bound Plan and (optionally) Subject. Every sensor reading, every parameter adjustment, every operator hold or termination during the execution lands on the Run's event stream.
A Run carries five roles:
- Identity for one execution. The Run id is the stable handle that all downstream artifacts (datasets, reports, decisions, calibration citations) reference.
- A finite lifecycle with a closed state machine: a Run runs, may be held and resumed, and ends in exactly one of four terminal states (Completed / Aborted / Stopped / Truncated).
- Parameter resolution. A Run starts with parameters resolved from the Plan's defaults plus operator-supplied overrides, validated against the Method's parameter schema. The resolved snapshot is recorded on
RunStartedand remains queryable for the life of the Run. - A reading logbook. Sensor and motor readings during the Run land on a polymorphic per-Run logbook (
entries_run_readings) keyed by a SOSA-alignedsampling_procedurediscriminator. The logbook opens lazily on the first reading and closes implicitly when the Run reaches a terminal state. - Cross-module anchors. A Run pins the Calibration revisions that were active at start time (AsShot semantics, immutable for the life of the Run); references the Safety clearances that authorize it; can join a Campaign for coordinated multi-Run studies; and may cite the Decision that justified a mid-flight parameter adjustment.
Out of scope
- High-frequency telemetry. Per-frame triggers and sub-millisecond timing edges live on observation channels, not on the Run's main event stream.
- Bulk data. Frame bytes and reconstructed volumes live in the Data module's Datasets, referenced from the Run by URI plus checksum.
Aggregates¶
| Name | Identity | State summary | FSM |
|---|---|---|---|
Run |
id: UUID |
name, plan_id, subject_id?, raid?, status, override_parameters, effective_parameters, triggered_by?, reading_logbook_id?, external_refs, campaign_id?, last_adjusted_at?, adjustment_count, pinned_calibrations |
yes |
RunReading (sub-aggregate VO on Run) |
event_id: UUID (per row) |
channel_name, value, units?, sampling_procedure, sampled_at, occurred_at, recorded_at |
no |
Run.subject_id is optional because some execution shapes have no Subject: dark-field acquisition, flat-field acquisition, energy calibration with a standard reference. These share the full Run lifecycle with sample Runs; only the Subject binding differs.
Run.raid carries an optional Research Activity Identifier (ISO 23527), enabling cross-facility project attribution.
Value Objects¶
| Name | Shape | Where used |
|---|---|---|
RunName |
trimmed string, 1–200 chars | Run.name |
RunAbortReason |
trimmed string, 1–500 chars | RunAborted.reason (decider-input VO) |
RunStopReason |
trimmed string, 1–500 chars | RunStopped.reason (decider-input VO) |
RunTruncateReason |
trimmed string, 1–500 chars | RunTruncated.reason (decider-input VO) |
ChannelName |
trimmed string, 1–255 chars | RunReading.channel_name |
ExternalRef |
(scheme: str, id: str) shared kernel |
Run.external_refs (anti-corruption refs to upstream concepts like proposal / btr / lab_visit / session) |
The wire representation of each reason is a plain str (post-trim); the VO exists at decider-input time to centralize validation. Reason fields are free-form today; a structured taxonomy is a future-additive change behind the same triggers across all four reason fields.
FSM¶
stateDiagram-v2
[*] --> Running: start_run
Running --> Held: hold_run
Held --> Running: resume_run
Running --> Completed: complete_run
Running --> Aborted: abort_run
Held --> Aborted: abort_run
Running --> Stopped: stop_run
Held --> Stopped: stop_run
Running --> Truncated: truncate_run
Held --> Truncated: truncate_run
Completed --> [*]
Aborted --> [*]
Stopped --> [*]
Truncated --> [*]
| From | To | Command | Event |
|---|---|---|---|
(none) |
Running |
start_run |
RunStarted |
Running |
Held |
hold_run |
RunHeld |
Held |
Running |
resume_run |
RunResumed |
Running |
Completed |
complete_run |
RunCompleted |
Running | Held |
Aborted |
abort_run |
RunAborted |
Running | Held |
Stopped |
stop_run |
RunStopped |
Running | Held |
Truncated |
truncate_run |
RunTruncated |
Guards. Beyond the source-state check shown in the From column, each transition enforces:
start_run- Plan not
Deprecated; Subject in{Mounted, Measured}when present; no bound Asset isDecommissioned; the family superset is re-validated from current Asset state (the Plan-bind snapshot is not trusted); at least oneActiveSafety Clearance covers the Run scope(run_id, subject_id, asset_ids); the Campaign (if cited) is in{Planned, Active, Held}. hold_run/resume_run- Strict source-state.
holdrequiresRunning;resumerequiresHeld. Re-holding aHeldRun or re-resuming aRunningRun raises rather than no-oping. complete_run- Single-source from
Runningonly. Completion claims active achievement, so it cannot fire fromHeld. An operator wanting to complete a held Run mustresumefirst. abort_run/stop_run/truncate_run- Multi-source from
{Running, Held}. Exits don't require active work, only any non-terminal state. Each requires a free-formreason(1–500 chars).truncate_runadditionally takes an optionalinterrupted_attimestamp that must not be in the future.
hold ⇄ resume is bidirectional and unlimited-cycle; the event stream may interleave any number of hold/resume pairs between RunStarted and the terminal event. The aggregate state preserves only the latest status; per-cycle audit lives in the event stream itself.
Stopped vs Truncated: both are operator-initiated terminals reachable from Running or Held. Stopped is a controlled exit while the system is responsive and the operator decides to end early; data up to the stop point is valid. Truncated is a cleanup terminal for a Run that became de-facto dead through interruption (power loss, process crash, hardware fault) and is being closed retroactively; the optional interrupted_at captures the operator's best guess at when the actual interruption happened, separate from occurred_at (when the truncate command was processed).
Events¶
| Event | Payload sketch | When emitted |
|---|---|---|
RunStarted |
run_id, name, plan_id, subject_id?, raid?, override_parameters, effective_parameters, triggered_by?, external_refs, campaign_id?, pinned_calibrations, occurred_at |
start_run succeeds |
RunHeld |
run_id, occurred_at |
hold_run succeeds |
RunResumed |
run_id, occurred_at |
resume_run succeeds |
RunCompleted |
run_id, occurred_at |
complete_run succeeds |
RunAborted |
run_id, reason, occurred_at |
abort_run succeeds |
RunStopped |
run_id, reason, occurred_at |
stop_run succeeds |
RunTruncated |
run_id, reason, interrupted_at?, occurred_at |
truncate_run succeeds |
RunAdjusted |
run_id, parameter_patch, effective_parameters, reason, decided_by_decision_id?, occurred_at |
adjust_run succeeds; carries both the RFC 7396 patch and the post-merge snapshot |
RunReadingLogbookOpened |
run_id, logbook_id, schema, occurred_at |
append_run_reading first write per Run (lazy open) |
RunCampaignAssigned |
run_id, campaign_id, occurred_at |
post-hoc Campaign membership write (see Campaign module) |
RunCampaignUnassigned |
run_id, campaign_id, occurred_at |
post-hoc Campaign membership removal |
Individual reading rows do not emit per-row events on the Run stream; they are written directly to entries_run_readings via the ReadingStore port. The row's event_id, correlation_id, and causation_id constitute the audit trail without bloating the main event log.
Slices¶
| Command | Category | REST | MCP tool | Idempotency |
|---|---|---|---|---|
StartRun |
NEW | POST /runs |
start_run |
required |
HoldRun |
MODIFIED | POST /runs/{run_id}/hold |
hold_run |
none |
ResumeRun |
MODIFIED | POST /runs/{run_id}/resume |
resume_run |
none |
CompleteRun |
MODIFIED | POST /runs/{run_id}/complete |
complete_run |
none |
AbortRun |
MODIFIED | POST /runs/{run_id}/abort |
abort_run |
none |
StopRun |
MODIFIED | POST /runs/{run_id}/stop |
stop_run |
none |
TruncateRun |
MODIFIED | POST /runs/{run_id}/truncate |
truncate_run |
none |
AdjustRun |
MODIFIED | POST /runs/{run_id}/adjust |
adjust_run |
required |
AppendRunReading |
MODIFIED | POST /runs/{run_id}/readings |
append_run_reading |
none |
GetRun |
QUERY | GET /runs/{run_id} |
get_run |
none |
ListRuns |
QUERY | GET /runs |
list_runs |
none |
Errors per slice. Beyond Pydantic boundary 422s, each slice raises:
StartRunRunAlreadyExists,InvalidRunName,PlanNotFound,PlanDeprecated,SubjectNotMountable,RunAssetDecommissioned,RunCapabilitiesNotSatisfied,RunRequiresActiveClearance,RunClearanceCoverageMismatch,RunCannotJoinCampaign,InvalidRunExternalRef,InvalidRunParameters,InvalidPinnedCalibrations,UnauthorizedHoldRun/ResumeRun/CompleteRunRunNotFound,RunCannot{Hold,Resume,Complete},UnauthorizedAbortRun/StopRunRunNotFound,RunCannot{Abort,Stop},InvalidRun{Abort,Stop}Reason,UnauthorizedTruncateRunRunNotFound,RunCannotTruncate,InvalidRunTruncateReason,InvalidRunInterruptedAt,UnauthorizedAdjustRunRunNotFound,RunCannotAdjust,InvalidRunAdjustPatch,InvalidRunAdjustSchema,InvalidRunAdjustReason,UnauthorizedAppendRunReadingRunNotFound,RunReadingLogbookClosed,InvalidChannelName,InvalidReadingValue,InvalidSamplingProcedure,UnauthorizedGetRunRunNotFoundListRuns- (boundary 422 only)
StartRun and AdjustRun are wrapped by the Idempotency-Key header pattern for safe operator retry. The terminal and pause transitions are strict-not-idempotent: a second complete_run against an already-Completed Run raises RunCannotComplete, not a silent no-op, so the audit log never carries a "did nothing" entry.
AdjustRun is the mid-flight parameter steering slice. Its source-state guard is {Running, Held} and its scope is strictly the parameter merge: Subject, Plan, Asset, and Method changes still require abort-and-restart by design. The optional decided_by_decision_id field links a steering action to the Decision that justified it; this maps to the PROV-O wasInformedBy relationship at the export adapter.
Storage & Projections¶
Two read-side tables back the Run module.
proj_run_summary is the per-Run summary projection. One row per Run, updated as the FSM advances:
CREATE TABLE proj_run_summary (
run_id UUID PRIMARY KEY,
name TEXT NOT NULL,
plan_id UUID NOT NULL,
subject_id UUID,
raid TEXT,
status TEXT NOT NULL CHECK (
status IN ('Running', 'Held', 'Completed',
'Aborted', 'Stopped', 'Truncated')
),
created_at TIMESTAMPTZ NOT NULL,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The CHECK constraint encodes the closed RunStatus enum at the row level. GET /runs/{id} reads from this projection (with fold-on-read fallback for fields not yet projected); GET /runs reads exclusively from this projection with keyset pagination over (created_at, run_id) and additive filters.
entries_run_readings is the polymorphic per-Run reading logbook. One row per reading; the sampling_procedure column carries the SOSA-aligned discriminator (baseline for snapshots at Run boundaries; monitor for sub-Hz time-series during a Run). Defense-in-depth: NaN and Infinity are rejected at three layers (Pydantic at the API boundary, the in-decider InvalidReadingValueError, and a Postgres CHECK constraint on value).
CREATE TABLE entries_run_readings (
event_id UUID PRIMARY KEY,
run_id UUID NOT NULL,
logbook_id UUID NOT NULL,
actor_id UUID NOT NULL,
command_name TEXT NOT NULL,
channel_name TEXT NOT NULL CHECK (length(channel_name) BETWEEN 1 AND 255),
value DOUBLE PRECISION NOT NULL CHECK (
value = value
AND value <> 'Infinity'::DOUBLE PRECISION
AND value <> '-Infinity'::DOUBLE PRECISION
),
units TEXT CHECK (units IS NULL OR length(units) <= 64),
sampling_procedure TEXT NOT NULL,
sampled_at TIMESTAMPTZ NOT NULL,
occurred_at TIMESTAMPTZ NOT NULL,
recorded_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The three timestamps each carry distinct meaning:
sampled_atis the SOSAphenomenonTime: when the sensor captured the value.occurred_atis when the handler appended the row.recorded_atis when Postgres wrote the row.
Clock skew between the sensor (sampled_at) and the handler (occurred_at) is real and expected; the three timestamps preserve all three observations rather than collapsing them.
Cross-Module boundaries¶
| Module | Relationship | What's exchanged |
|---|---|---|
| Trust | gated-by | Every write-side Run slice is gated by the Authorize port resolving a Policy for the (principal, command, conduit, surface) tuple; deny outcomes refuse before the decider runs |
| Recipe | reads-from | Loads Plan for status and asset_ids; walks Plan → Practice → Method for parameters_schema to validate effective_parameters (strict by default) |
| Subject | reads-from | Loads Subject (when subject_id set) to enforce the Mounted-or-Measured guard at start |
| Equipment | reads-from | Loads each bound Asset to re-validate the family superset against the Method's needed families (drift is real; the Plan-bind snapshot is not trusted at start) |
| Safety | reads-from | ClearanceLookup.find_referencing_run(run_id, subject_id, asset_ids) returns clearances whose bindings cover the Run scope; ≥1 must be Active |
| Caution | reads-from | CautionLookup returns Active Cautions for the Run scope; non-blocking, surfaced as a banner on the response, never refuses start |
| Campaign | shared-id-with | Run.campaign_id (single-Campaign-per-Run invariant); the post-hoc add_run_to_campaign / remove_run_from_campaign slices are owned by the Campaign module and atomically write RunCampaignAssigned / RunCampaignUnassigned plus the Campaign-side membership event via EventStore.append_streams |
| Decision | shared-id-with | RunAdjusted.decided_by_decision_id cites the Decision that justified a mid-flight adjustment; no existence check at write time (eventual-consistency stance) |
| Calibration | reads-from | Run.pinned_calibrations is a frozen set of CalibrationRevision.ids captured at start_run and immutable for the life of the Run; every FSM transition preserves the set verbatim, and downstream consumers cite this set to answer "what calibration was this scan acquired against?" deterministically |
| Agent | writes-to | Terminal Run events (RunCompleted, RunAborted, RunStopped, RunTruncated) are subscribed by the RunDebrief agent, which emits an advisory Decision per terminal Run |
Plan, Subject, Asset, Campaign, Clearance, and Calibration references are validated at handler load-time but treated as opaque by the decider; the decider operates on pre-loaded context bundles rather than re-fetching, which keeps the pure-decider boundary clean.
Examples¶
The four examples below follow the happy path for one Run: start it, steer it mid-flight, append a sensor reading, end it. For the REST/MCP equivalence, auth, and idempotency conventions these examples share, see Reading the examples on the Modules landing page.
Start a Run with operator parameter overrides¶
POST /runs
Content-Type: application/json
Idempotency-Key: 9f6a3b1c-8e2d-4f5a-9b8c-1d2e3f4a5b6c
X-Principal-Id: 11111111-2222-3333-4444-555555555555
{
"name": "2-BM continuous-rotation acquisition (Subject sn-2024-038)",
"plan_id": "12345678-1234-1234-1234-123456789abc",
"subject_id": "abcdef12-3456-7890-abcd-ef1234567890",
"override_parameters": {
"rotation_speed_deg_per_s": 0.5,
"exposure_time_ms": 50
},
"triggered_by": "operator:opid:42",
"pinned_calibrations": [
"cal-rev-aaaa1111-2222-3333-4444-555555555555",
"cal-rev-bbbb2222-3333-4444-5555-666666666666"
]
}
A successful call returns 201 Created with the newly-assigned run_id and the resolved effective_parameters (Plan defaults merged with the overrides above).
mcp.call_tool(
"start_run",
{
"name": "2-BM continuous-rotation acquisition (Subject sn-2024-038)",
"plan_id": "12345678-1234-1234-1234-123456789abc",
"subject_id": "abcdef12-3456-7890-abcd-ef1234567890",
"override_parameters": {
"rotation_speed_deg_per_s": 0.5,
"exposure_time_ms": 50,
},
"triggered_by": "operator:opid:42",
"pinned_calibrations": [
"cal-rev-aaaa1111-2222-3333-4444-555555555555",
"cal-rev-bbbb2222-3333-4444-5555-666666666666",
],
},
)
Returns the same run_id + effective_parameters shape as the REST call.
Mid-flight parameter steering¶
POST /runs/9f6a3b1c-8e2d-4f5a-9b8c-1d2e3f4a5b6c/adjust
Content-Type: application/json
Idempotency-Key: 7c8d9e0f-1a2b-3c4d-5e6f-7a8b9c0d1e2f
X-Principal-Id: 11111111-2222-3333-4444-555555555555
{
"parameter_patch": {
"exposure_time_ms": 75
},
"reason": "increased exposure to recover signal after detector temperature drift",
"decided_by_decision_id": "decision-aaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
}
The response carries the post-merge effective_parameters so the caller can confirm what the Run is now executing against.
Append a baseline reading¶
The first reading per Run lazily opens the reading logbook (one RunReadingLogbookOpened event on the Run stream); subsequent readings write directly to entries_run_readings with no per-row event.
Terminate the Run¶
A second complete against the same Run returns 409 RunCannotComplete rather than a silent no-op. Terminal transitions are strict-not-idempotent, so the audit log never carries a "did nothing" entry.