State
known, contested, scoped, actionable
Finding / Evidence / Frontier A scientific operating substrate for turning activity into state, state into tasks, and tasks into action.
Science already has most of the parts. It has instruments, papers, datasets, code, peer review, clinical records, AI agents, cloud labs, funders, and expert communities. What it lacks is the drivetrain that lets one piece change the next.
Papers live in journals. Datasets live in repositories. Code lives in GitHub. Lab results live in notebooks, LIMS, cloud-lab dashboards, clinical records, and instrument traces. Reviews live beside papers, in private emails, in grant panels, in Slack threads, and in the memories of people who move institutions. AI outputs live in chats. Hypotheses live in feeds. Experiments live in cloud labs. Funding decisions live in grant systems. Clinical reality lives somewhere else. The diagnosis sits beside older infrastructure efforts: FAIR data principles (Wilkinson et al., 2016), nanopublications (Groth et al., 2010), scientific workflow provenance (W3C PROV), and knowledge systems such as the Open Research Knowledge Graph. The point here is narrower: not better metadata alone, but a governed state transition.
The fragmentation is not only a tooling problem. It is a fragmentation of epistemic state. A paper can tell you what an author claimed. It does not, by itself, tell the next system what changed, where the change applies, what confidence moved, what contradiction appeared, what depends on the claim, who attested it, and what should be tested next.
This is why a scientist still assembles the car by hand. She searches the literature in one place, checks data in another, reads code in a third, reconstructs methods from a supplement, asks a colleague whether a failure was real, opens a model chat that will not remember the correction tomorrow, and then writes a narrative artifact that another person has to reverse-engineer later. Wrong trial assumptions continue. Failed experiments repeat. Funders buy isolated reports. Patients wait while updates remain trapped in local memory. The system contains intelligence and labor. It does not contain a shared transition object.
The future engine for science is not one app. It is a unified, open, state-centered operating substrate with modular products on top.
The closest business analogy is Rippling, but only as a structural analogy. Rippling works because the employee becomes a shared object across HR, IT, finance, payroll, identity, devices, permissions, workflows, and reporting. The power is not that one vendor made many apps. The power is that many operational surfaces read and write one underlying state. A permission change can move from HR to identity to devices to finance because the object underneath them is shared.
Science needs an equivalent shared object, but the object is not a person, a paper, a dataset, a project, a lab, an agent, or a grant. The shared object is the scientific state transition.
Fig. 01. The trilogy. The public frame is Record, Engine, Body. In the essay-world, the same movement is Sky, Engine, Body: the shared record, the transition loop, and the physical infrastructure that acts on it.
This page is the middle artifact. Constellations of Borrowed Light argues that science needs a shared record. The Terafactory Age asks whether the engine reaches the physical world as an open public body or as closed private bodies first. This page names the architecture between them.
The public structure is Record, Engine, Body. Inside the essay-world, it is Sky, Engine, Body.
The claim here is narrower and more mechanical than the vision around it: if scientific work is going to compound across humans, agents, world models, and labs, the basic operating unit has to change from an artifact to a governed state transition.
The engine is an operating loop, not a database.
The loop starts with a goal: cure a disease, prove a theorem, build a better material, explain a climate signal, identify a safety risk, or decide which experiment deserves scarce lab time. The goal is not a slogan; it is a pressure on the frontier. It determines which uncertainty matters enough to become work.
The goal becomes a frontier: what is known, what is unknown, what is contested, what depends on what, and which uncertainties are worth spending effort to reduce. A frontier becomes tasks. Tasks are assigned to humans, agents, models, reviewers, labs, funders, or institutions. Activity produces artifacts: papers, extractions, simulations, protocols, robot runs, code, clinical observations, field measurements, and reviews.
Activity is still not state. It passes through the drivetrain.
The reference implementation can have many named surfaces, but the loop itself is simple. Capture preserves the artifact and provenance. Compilation turns it into typed scientific objects. Diff exposes the proposed change. Attestation decides whether it merges. The event log records the accepted transition. The Atlas renders the new frontier. Registry and telemetry exist so the loop can be governed rather than merely executed.
one BBB finding spans incompatible cohorts
APOE4+ split, replication request, model recalibration
frontier.rank BBB delivery discord rises diff.ready APOE4 subgroup split proposed event.commit Atlas update and task rerank Fig. 05. The engine loop. The operating unit is not the paper. It is a governed transition: work proposes a change, review decides whether it should merge, the event updates the Atlas, and the next task is selected from the new state.
This is why the architecture is not simply a knowledge graph. A graph can store relationships. It does not, by itself, decide what should change next, who can propose it, who can merge it, what evidence is sufficient, or what physical action follows. A graph is a map. An engine needs a clutch.
It is also why the architecture is not simply an agent runtime. Agents can produce activity. They can extract, summarize, design, critique, and execute. Without a governed transition layer, their work becomes more output for another agent to summarize later.
The test is practical. A better database helps a lab find what happened. A better notebook helps a lab preserve what happened. A better agent helps a lab produce more things that happen. The engine is different only if it changes the next decision: a trial pauses, a replication is routed, a model recalibrates, a funder stops buying a fragile assumption, or a lab avoids repeating a failure another lab already paid for.
The engine closes the loop:
The loop is the drivetrain. Every product in the ecosystem matters only insofar as it helps the loop run with more fidelity, more throughput, more legitimacy, or less waste.
The design pressure is that the loop must be mundane enough for ordinary work. A graduate student should be able to extract a method, an agent should be able to open a provenance audit, a reviewer should be able to sign a narrow correction, and a lab should be able to write back a failed run without turning the act into ceremonial publication. The engine is only real when the smallest transition can travel.
A discovery engine has four planes. The separation keeps the system from collapsing into metaphor.
known, contested, scoped, actionable
Finding / Evidence / Frontier what should happen next
ResearchTask / Queue / SafetyGate what might happen if we act
Simulation / Prediction / Calibration what touched reality
Protocol / RobotRun / LabResult Fig. 06. Four planes. The architecture separates what is known, how work is coordinated, how outcomes are forecast, and what physically changes. The planes share one event spine, but each one has a different shape of work.
The state plane is what is currently known, unknown, contested, scoped, weakened, deprecated, and actionable. It contains Atlases, findings, evidence, contexts, confidence histories, provenance, discord, frontiers, and scientific commits. Its job is to make scientific state typed, replayable, attestable, and federated. If the state plane fails, the engine becomes another pile of activity.
The control plane coordinates work. It contains ResearchTasks, queues, assignments, agent workspaces, review tasks, lab tasks, replication requests, funding tasks, and safety gates. It decides what should happen next and who is allowed to do it. This is the lesson from software agent orchestration: serious work needs tasks, isolated workspaces, review output, permissions, and boundaries, not one enormous conversation. Codex-style software orchestration is a useful adjacent pattern because it treats agent work as bounded tasks with reviewable outputs. The science version needs the same discipline, but with scientific state as the merge target.
The model plane predicts what might happen. World models, digital twins, simulations, counterfactuals, prediction records, model-state coupling, calibration records, and sim-to-real gaps all live here. Model outputs are powerful, but they do not become state by being generated. A prediction has to record which state it trained on, where it was validated, where it failed, and what evidence later contradicted it.
Fig. 07. Model calibration. The model plane needs its own record. Predictions become useful when they are coupled to the state they read from and recalibrated against the evidence that returns.
The action plane touches the world. Experiments, protocols, robot runs, lab results, clinical observations, simulation jobs, field measurements, instrument traces, and executed funding decisions all live here. Review belongs to governance and attestation. Funding usually belongs to control and allocation, unless it triggers an external execution event. The action plane is where the engine meets bodies, instruments, patients, organisms, materials, weather, and capital.
The complete engine is the relationship among the four:
No plane can replace the others. A state plane without control is a museum. A control plane without state is a task manager. A model plane without state is a prediction market floating above reality. An action plane without writeback is the current lab system with better robots.
The product form is a discovery operating system. The protocol form is Vela. The theory form is cumulative state theory.
That does not mean one closed monolith. It means a shared substrate that lets specialized products exist without fragmenting the frontier. The operating system is not one app. It is the set of objects and permissions that let many apps act as one system when state changes.
The substrate contains the open objects and rules: event log, schemas, signatures, replay semantics, federation, identity, signer recognition, provenance, capability declarations, safety classes, and governance hooks. Above it sit modular products, but the first fundable unit is not the full suite. It is one frontier where signed state transitions, a registry, and a review queue work together.
The names divide into a small role contract. Vela records signed transitions. Carina compiles captured artifacts into typed objects and proposals. Atlas renders the materialized frontier. Registry decides which schemas, signers, safety classes, and governance rules other systems recognize. The control plane routes tasks, agents, reviewers, labs, and funders against that state. The action plane executes experiments and writes evidence back.
Everything else is a surface. Search, review, lab orchestration, model calibration, education, funding, and safety products can compete as long as they read and write the same transition substrate.
state cockpit control review forecast execution Fig. 08. Modular products on one substrate. The discovery engine is not a single app. Products can specialize at the surface because they share lower-level state, schemas, identity, replay, and attestation.
The build order should be narrower than the eventual ecosystem. First make state transitions real in one serious frontier. Then make review, frontier state, task routing, lab write-back, and model write-back read and write the same transition substrate.
0 Protocol object Vela Event record a signed transition accepted change becomes replayable 1 Typed kernel Carina object compile artifact to evidence activity becomes proposal-ready 2 Review surface ScientificDiff scope, approve, contest, reject proposal becomes commit or hold 3 Frontier surface Atlas inspect current state events become navigable state 4 Control plane ResearchTask assign the next work state creates action 5 Write-back Lab and model connectors submit trace or prediction world contact becomes evidence What gets built first. The product sequence is not build every science app. It is make state transitions real, then build the first surfaces that read and write them.
The Rippling analogy becomes concrete here. Its public product language presents HR, IT, finance, payroll, identity, devices, workflow, and reporting as surfaces on one workforce platform. The analogy is structural, not evidence that science should copy the company. A Discovery Engine is strong because a scientific state transition can move across literature, evidence, code, lab execution, model prediction, funding, review, and governance.
The shared object lets the ecosystem remain plural. Different teams can build better search, better review surfaces, better lab orchestration, better education products, better safety gates, better model registries, and better funding markets. The products can compete while the state remains inheritable. That is the difference between an ecosystem and a suite.
The engine should be concrete enough that a user can ask it for work, not only knowledge.
A disease foundation says: “We want to make progress on pediatric high-grade glioma.”
The engine responds with the current Atlas, the top unresolved frontiers, the highest-value discriminating experiments, the claims most fragile to provenance gaps, the agents already assigned, the expert reviews needed, the labs capable of execution, the safety class, the proposed ScientificDiffs, and what changed this week. The foundation is no longer buying isolated papers or grant reports. It is buying movement in the frontier.
For a foundation, this changes the week. A $5 million program does not begin with a blank call for proposals. It begins with a portfolio frontier: three candidate experiments, two fragile assumptions, one lab-capacity constraint, one safety class, and a reviewer queue. If a failed replication weakens the broad claim on Wednesday, Friday’s funding decision should know.
A student says: “I want to help cure brain tumors.”
The engine routes her to a frontier, then to a task appropriate to her current competence: link evidence for this claim, compare these two papers, check this method, extract the protocol conditions, inspect why a replication failed, draft a proposal, learn from the review. The student does not begin with a textbook/test pathway detached from real work. She begins with a small edge of the frontier that has somewhere for her contribution to land.
A robotic lab says: “Experiment E completed.”
The engine captures the protocol trace, instrument calibration, raw data, measured outcome, uncertainty, and execution context. It creates an evidence object, computes the ScientificDiff, routes for attestation, updates the Atlas if accepted, and updates the relevant model calibration records. The lab does not merely finish a run. It writes back.
Fig. 09. Instrument writeback. The lab surface should look like instrumentation, not project management. The useful object is the connection between plate position, signal trace, calibration, raw data, and the state transition it proposes.
A model says: “I predict intervention X will work.”
The engine converts the prediction into a counterfactual proposal, checks the context and assumptions, ranks expected information gain, applies the safety gate, routes to lab or simulation, and writes back the outcome. The prediction is not trusted because it is fluent or because the model is impressive. It is trusted, weakened, or rejected through its relationship to state, action, evidence, and calibration.
These scenarios are intentionally ordinary. The engine is not interesting because it creates theatrical AI science demos. It is interesting because the mundane work of science stops being lost between systems.
Mon frontier gaps ranked
Tue evidence object improves
Wed claim scoped; model weakened
Thu Atlas state updates
Fri funding decision changes
A week in the engine. The operating substrate matters because ordinary actions land in the same frontier. A grant question, student contribution, lab failure, review decision, and scheduler update become one governed week of state movement.
state@vela:7f3c2a priority queue RT-1842 RT-1843 RT-1844 RT-1845 awaiting attestation Illustrative transport mechanism weakens outside a narrow cohort.
- broad disease population
+ scoped subgroup with qualifying evidence
+ replication request opened against discordant animal model
2/3 signatures domain reviewer signed
clinical statistician signed
trial liaison pending
frontier toy.bbb.transport
capability clinical-stat-review
workspace scoped evidence + logs
safety class 2 / no public raw data
output diff or archived hold
expiry 14 days
Fig. 10. A frontier work docket. A schematic control-plane docket, not a live Alzheimer claim. The point is the shape: bounded tasks, explicit authority, proposed state changes, and a visible path from work to attested event.
You do not scale to millions or billions of scientific agents by letting every agent talk to every other agent. The literal number is a stress test, not the premise. The failure begins as soon as generation outruns review.
You scale with structure.
Frontiers become shards: pediatric high-grade glioma, blood-brain-barrier delivery, protein binder design, a disputed theorem, climate attribution, drought-tolerant cultivars, direct-air-capture sorbents. Agents operate inside frontiers, not in one global chat.
Tasks become the coordination primitive. Check an evidence span. Extract a method. Propose a discriminating experiment. Review a contradiction. Verify a proof. Audit a provenance chain. Draft a lab protocol. Inspect a model’s sim-to-real gap.
A ResearchTask is not a prompt. It is a work contract: objective, frontier, input state, required capability, workspace boundary, allowed tools, safety class, evidence standard, expected output, reviewer requirement, expiry, budget, and downstream dependencies. The scheduler can deduplicate near-identical tasks, lease work to qualified actors, expire stale tasks, retry failed work, archive dead ends, and route high-risk outputs to heavier review.
Fig. 11. Frontier sharding. At scale, agents do not enter one global room. They attach to frontiers, pick up bounded tasks, and route proposed changes through scarce merge authority.
review mode - broad neurovascular impairment
+ APOE4+, age > 65, vascular-inflammatory cohort
+ queue trial assumption review
Fig. 12. Navigator workspace. Navigator is the cockpit: a working surface where a reviewer sees the frontier, task queue, diff preview, evidence, and merge constraints together.
Agents specialize. One is excellent at PubMed extraction. Another at Lean proofs. Another at protein design papers. Another at trial protocol critique. Another at statistical power checks. Another at lab safety. Another at reconciling contradictory clinical cohorts. Specialization matters because scientific work is not one job. It is a chain of jobs with different failure modes.
Every agent has a reliability record: accepted proposal rate, rejection reasons, later contradiction rate, citation hallucination rate, calibration, domain competence, review usefulness, and safety violations. The record does not have to be punitive to be essential. Reliability affects routing; routing produces work; review and later contradiction update reliability; reliability decays by domain and time. In a world of abundant generation, reliability is infrastructure.
Merge authority remains scarce. Millions of agents can propose. Few can merge.
The open-source law carries over:
Proposal access is broad.
Merge authority is governed.
At large scale, planning becomes hierarchical:
Markets and schedulers appear because scarce resources remain scarce: human review, lab time, funding, model recalibration, safety approval, clinical access, material transfer, regulatory attention. The engine does not abolish politics or allocation. It makes the allocation surface visible enough to govern.
Review capacity has to be engineered as deliberately as agent capacity. Low-risk transitions can be deduplicated, sampled, auto-rejected, or routed to narrow credential pools. Safety-relevant, clinical, animal, manufacturing, or high-dependency transitions require named human signers, conflict checks, escalation paths, and service-level expectations. A queue that receives ten thousand proposals a week but can merge two hundred is not failing if it discards nine thousand with auditable reasons and reserves scarce review for the transitions that move decisions.
The worst version of abundant agents is not that they produce nothing. It is that they produce too much low-legibility work for any institution to sort. The engine’s job is to make generation submit to structure: task boundaries, isolated workspaces, replayable evidence, calibration, review queues, signer reputation, safety classes, and merge authority. Agent spam, stale shards, poisoned evidence, model overconfidence, queue saturation, and reviewer capture are not edge cases. They are exactly what the control plane exists to absorb.
The bottleneck is not only intelligence. It is trustworthy integration.
Votes, comments, stars, citation counts, social attention, and agent output are not trust. They are signals. Trust enters when someone recognized for a domain, by a registry, under a revocable credential, signs a transition under rules that other institutions can inspect, contest, and inherit. The signature is not magic. It has scope, conflict metadata, expiration, appeal paths, and a registry that can itself be audited.
This makes governance a product requirement. Proposal access can be broad. Merge authority has to be governed. Identity, signer recognition, schema evolution, dispute handling, safety classification, maintainer turnover, and registry federation cannot be afterthoughts.
The first operator should not be a company pretending to be a commons. It should be a chartered nonprofit registry or consortium for a bounded frontier. The minimum viable institution is concrete: disease foundations and participating labs pay dues; public agencies or philanthropies seed the first two years; member institutions elect a board; a technical steering group controls schema conformance; conflicts are logged; audits run on a fixed cadence; maintainers have fiduciary duties to the public-interest charter; registry snapshots are portable; and a fork path triggers if the registrar fails audit, loses quorum, or refuses a conformant independent signer.
The first pilot is fundable because it is narrow: one disease frontier, three to five labs or funders, a registry, a reviewer queue, negative-result writeback, one regulator-readable export, and a twelve-to-twenty-four-month success metric: did accepted transitions change downstream work that would otherwise have repeated a fragile assumption?
Regulatory coupling should be explicit without pretending the engine approves anything. Some transitions are non-regulatory scientific state. Some are regulator-readable support for an IND, CMC package, real-world-evidence submission, DSMB review, IRB packet, or IACUC protocol. The registry makes provenance and dependency movement inspectable; the agency still decides under its own authority.
The capture point is often above the nominally open layer. Git itself remained open, but much of software collaboration moved into platform-owned issues, pull requests, Actions, review history, social graphs, and contribution reputation. The scientific analogue is sharper. A protocol can be open while the canonical registry of signers, reviewer reputation, lab capabilities, safety gates, and regulatory recognition is closed. Open code with a closed registry is captured infrastructure with a permissive license file.
rotated by foundation charter
forkable under audit failuredomain consortia issue credentials
regulators query registry statepublic, delayed, permissioned, excluded
classification is itself reviewableplural canonical views allowed
discord is represented as stateFig. 13. Registry as governance. The registry is a social object as much as a technical one. It decides which signatures, safety gates, credentials, and disputes other institutions can trust.
State visibility. Open state does not mean every byte is public. The protocol has to distinguish public record, delayed disclosure, permissioned evidence, and excluded detail.
The Discovery Engine has to be open protocol first, reference implementation second, ecosystem third, SaaS fourth.
Commons governance needs several enforceable properties:
This is where the middle page connects back to both other essays. Constellations argues for maintainers, plural canonical views, and structural separation. Terafactory argues that the orchestration layer and identity registry above physical facilities are the capture points that decide whether the body stays open.
The Discovery Engine is the architecture that makes those claims operational. Without governance, the engine becomes a faster way to manufacture apparent consensus. With governance, disagreement itself can become state: not noise around the record, but part of the record.
The states have to be explicit: contested, minority view, scope split, replication failed, under appeal, safety held, and retracted. Each has a different effect on what can be shown, routed, funded, repeated, or merged.
The three pages should not repeat the same idea in three tones. Each has a specific job.
Constellations of Borrowed Light is the thesis. It begins with the human reason this matters and ends with the need for a shared frontier where corrections, failures, dependencies, and evidence can compound. Its purpose is moral and conceptual: why science needs a sky.
The Discovery Engine is the architecture. It names the shared object, the loop, the planes, the products, the users, the scaling problem, and the governance requirements. Its purpose is operational: what system must exist for the thesis to become infrastructure.
The Terafactory Age is the future scenario. It asks what happens when the architecture reaches labs, factories, clinical systems, materials, agriculture, funders, regulators, and geopolitical capital. Its purpose is civilizational: what world appears if the engine gets a body.
The public trilogy is simpler than the internal taxonomy: why shared state matters, what engine makes it work, and what world follows if it scales.
The future of science is not more activity. It is activity that can become state, state that can guide action, and action that can return to the record.
That sentence is the hinge between the three essays. The first essay argues for the record. This one specifies the engine. The third asks what happens when the engine reaches enough physical capacity that scientific state can act in the world.
At small scale, that makes one researcher faster.
At large scale, it lets abundant agents and thousands of labs work without collapsing into noise.
Across the next thousand years, the details change. The invariant remains:
Intelligence and experimentation can become abundant. Trustworthy state integration remains the bottleneck.
The engine exists for that bottleneck. It is the machinery that lets a correction travel, lets a failure become useful, lets disagreement stay visible, and lets the next action inherit more than the last paper said.