Where CORA Fits: a system of record above the floor it stands on¶

Stewart Brand, describing how complex systems survive, compressed the whole idea into four words: "Fast learns, slow remembers." A healthy system is layered by speed. Its fast layers move, propose, and innovate in the moment; its slow layers hold steady underneath and keep the memory. A beamline is almost all fast layer. The motors move, the detector reads out, the control system acts, and the data lands, all at once. What it has rarely had is a slow layer, the one whose only job is to remember, faithfully and for a long time, what the fast layers did and why.

Ask a beamline, six months after a measurement, why a particular run used that center of rotation, who approved the energy it ran at, under which recipe, and whether it can be replayed exactly. The answer is usually true but scattered: partly in an EPICS archive, partly in an HDF5 file, partly in a logbook entry, partly in the memory of whoever was on shift. Each piece is real. None of them is the record. CORA exists to be the slow layer that turns those scattered pieces into one.

This post is about where that slow layer sits relative to the software a beamline already runs, because the honest answer is that most of it is not a competitor at all. The useful map here is not a ranking but an altitude, a stratification by speed rather than by worth. Some tools live below CORA, on the fast control and acquisition floor, and CORA stands on them rather than against them. Others have solved a version of the same remembering problem in a different domain, and CORA borrows from them while aiming at a place none of them target: one-off, exploratory beamline science. CORA's own positioning draws the line first, so we will start there: the fast floor is never CORA's to replace.

The floor CORA stands on, and never replaces¶

The deterministic real-time layer is deliberately out of scope. EPICS and Tango, the two control-system middlewares most synchrotrons are built on, together with the motion controllers beneath them, run the servo loops, the position-synchronized triggers, and the sub-millisecond timing a measurement depends on. These are the layers that propose and act, and CORA runs at decision-grade latency and never sits inside that loop. Where a facility chooses, CORA can drive operations over EPICS through optional adapters, but at the 2-BM pilot that execution edge is exploratory, and the default posture is that the facility's own fast tools stay exactly where they are.

The same holds going up from the metal. Acquisition tools such as TomoScan orchestrate the scan and produce the frames; reconstruction and analysis tools such as TomoPy, the ASTRA Toolbox, and tomocupy turn those frames into volumes; the bytes themselves live in HDF5, usually under a community layout such as NeXus with its per-technique application definitions or the synchrotron Data Exchange schema, read and written by libraries like DXchange, and move between facilities through services such as Globus. CORA stores none of that. It does not own a dataset, a group inside an HDF5 file, a PV snapshot, or the e-logbook. What it records is the work that produced them: the run that an acquisition tool executed, the recipe it ran under, the decisions that shaped it, and a reference to the dataset that came out, with its checksum and where it went.

Broader frameworks such as Bluesky, Sardana (built on Tango), and NICOS sit in this tier too, and Bluesky deserves a careful word, because it reaches the highest. Bluesky is a set of Python libraries that keep scientific logic separate from hardware: Ophyd abstracts the devices, the RunEngine orchestrates the scan, and Tiled serves the data, with rich metadata capture built in. That is real overlap with the upper edge of what CORA can do. The distinction is one of intent rather than quality: Bluesky presents itself as an acquisition and orchestration framework, a fast layer that proposes and runs, not as a durable, governed, cross-facility record with an immutable audit trail. CORA integrates with that layer rather than competing to drive scans; the part it claims is the slow layer above it.

What industry already solved, in another domain¶

The idea of a layered recipe is not new, and CORA does not pretend it invented it. Industrial batch manufacturing has run on it for decades. ISA-88, adopted internationally as IEC 61512, is a design philosophy for describing equipment and procedures, and it defines a ladder of recipes that runs from a general, equipment-independent description down to the control recipe governing one specific batch. Manufacturing Execution Systems, the layer between enterprise planning and process control, execute that ladder in production: they manage master recipes, issue electronic work instructions, trace genealogy, and create the as-built record that regulated industries such as pharmaceuticals depend on. Industry, in fact, solved the record as much as the recipe: under data-integrity rules like ALCOA+ and the FDA's 21 CFR Part 11, every change in a regulated plant is time-stamped, attributed, and preserved, because an inspector may ask years later what happened and who signed for it. An append-only, auditable account is not exotic there; it is mandatory.

This is genuine prior art, and CORA borrows from it openly. CORA's recipe ladder takes its shape, and even its equipment-tier vocabulary, from ISA-88, so a process engineer recognizes the model on sight. What differs is the domain, and the difference is exactly what the slow layer is asked to remember. A manufacturing line optimizes for conformance: the master recipe is assumed correct, and the job is to reproduce it with as little deviation as possible, where a deviation is an exception to be closed. A record built only to confirm that reality matched the plan is, in the end, a record you do not need. A beamline experiment is the opposite case. The plan is a hypothesis, the deviation is often the finding, and the tacit knowledge an operator brings is the point rather than noise. CORA keeps the ladder and reframes the bottom of it: a Run is not a conformance certificate but a record of what actually happened, including the distance between plan and reality.

ISA-88 is only the most visible borrowing. CORA's habit is to reach for an established lens wherever one already fits, rather than invent vocabulary of its own: ISA-95 for the way a facility's scope nests, ISA-106 for the operational procedures that surround a measurement, and ISA-99 (also published as IEC 62443) for the zones-and-conduits shape of its trust model. The aim is not compliance for its own sake but recognizability. An engineer who knows the manufacturing world should find CORA's bones familiar, even though the body is built for one-off science rather than repeatable production.

The scientific record surfaces¶

Closer to home, several kinds of tool already hold parts of the scientific record, and CORA is complementary to all of them. An electronic lab notebook captures research and procedures as a flexible, searchable document, but it is a documentation surface: what is written down, and when, is at the author's discretion, and there is no formal binding from a technique to the run that executed it. A data catalog such as SciCat indexes datasets and their metadata so the community can find them, but it catalogs the artifacts rather than modeling the reasoned process that produced them. It answers where the data is, not why this run used that setting and who approved it.

Provenance standards sit one level further out, and CORA treats them as something to emit, not to compete with. PROV-O, a W3C Recommendation frozen in 2013, gives a vocabulary of Entities, Activities, and Agents for describing how something came to be; RO-Crate packages data together with its metadata; DataCite registers persistent identifiers and metadata for research outputs. These are vocabularies, packages, and registries, not running systems: PROV-O can describe provenance but cannot capture or enforce it. A vocabulary is a way to write the memory down; it is not the layer that does the remembering. CORA is the runtime that produces a PROV-O-aligned account in the first place, and a catalog or registry is a natural downstream consumer of what it records.

The autonomous turn: who answers for the agent¶

A newer axis is rising quickly, and it is the one CORA's design is most pointed at. Beamlines are increasingly steered by software that decides what to measure next. bluesky-adaptive offers a bring-your-own-agent framework for putting an algorithm in the loop, from a simple rule to a machine-learning model, and arbitrating its suggestions before they reach the queue; engines such as gpCAM supply the optimization itself, using Gaussian-process models to choose the next point under uncertainty. This work is real and valuable, and CORA does none of it. It builds no optimizer and proposes no measurement.

What CORA provides is the part these tools leave open: it makes the agent accountable. An adaptive framework arbitrates suggestions at runtime, but it does not carry a durable identity for each agent, an authorization model for what that agent may do, a budget it must stay within, or an immutable record of every decision it made and why. CORA treats an agent as a principal, with the same identity and authorization as a human operator, and writes each decision it takes as an event with its reason and evidence attached. That instinct, giving an automated actor an identity, a bounded authority, and an accountable record, is the same one behind emerging AI-governance frameworks such as ISO/IEC 42001 and the NIST AI Risk Management Framework; CORA applies it at the bench. The division is clean: the adaptive layer decides what to measure next; CORA is where that choice becomes accountable, replayable, and auditable long after the beam is off. As autonomous experiments become ordinary, the pressing question stops being whether a machine can choose and becomes who answers for what it chose, which is precisely the question CORA is built to settle.

What CORA adds at this altitude¶

Put the map together and a gap appears that none of these tools is trying to fill. The control floor produces data but keeps no reasoned record. The manufacturing world keeps an excellent record, but for repeatable production rather than exploratory science. The notebooks and catalogs document and index, but do not model the path from technique to execution. The autonomous layer decides what to measure next, but does not answer for the decision. The fast layers, in Brand's phrase, get all the attention; the slow layer is the one that quietly holds the power to say, long afterward, what was done and why, and on a beamline that layer was simply missing.

CORA's contribution is to bring a durable, governed, replayable record to one-off beamline science, through a small number of deliberate choices: a recipe ladder that keeps a technique portable across facilities while capturing local know-how; decisions, by people and by agents alike, recorded with their reason and evidence at the moment they are made rather than narrated afterward; agents treated as principals, with the same identity and authorization as people, whether they act through the web API or the agent tool protocol; and a history replayable from a Postgres event log alone, because immutability is enforced at the storage layer rather than asked for in application code. None of the individual ingredients is unprecedented. The synthesis, aimed at this particular altitude, is what is unoccupied.

Honest edges¶

It would be dishonest to stop there. CORA is a pre-1.0 system and, at present, a single-developer research bet; the claims above are design claims grounded in a working codebase, not the report of a fleet in production. The cross-facility portability of the recipe ladder, the part that would let a Method published at one facility run unchanged at another, is a forward design supported by deterministic identifiers, not something proven across real facilities yet. The execution edge that drives EPICS is exploratory at 2-BM. And several of the standards CORA expects to align with, from instrument identifiers to a signed-event format, are on the horizon rather than landed. The non-competition stance is partly a statement of values and partly a statement of scope: there is a great deal CORA does not do, on purpose.

Where this goes¶

The 2-BM beamline is the grounding corpus for all of this: a real instrument, with real runs, that keeps the design honest. Fast learns, slow remembers, and on a beamline the slow layer is the one that has been missing. The invitation, to facilities that might host a future pilot and to researchers building autonomous agents that need a trustworthy place to act, is to treat CORA as the layer it is trying to be. Not the thing that drives the motors or stores the bytes, but the system that remembers, faithfully and for a long time, what was done and why.