Constellations of Borrowed Light

Why science needs a shared, living record of its frontier.

A quiet watercolor horizon over still water, with a small sailboat beneath delicate constellation arcs.

The Inheritance

The pattern that names a brain tumor in a six-year-old takes about thirty seconds to recognize. The papers describing it existed for decades.

When I was that six-year-old, my mother took me to twelve doctors in four months. The symptoms were textbook: headaches, vomiting, loss of balance. The knowledge to connect them was already there, built over decades from the children who came before me. The system that should have carried it to me did not.

A quiet watercolor hospital bedside with anonymous medical records and wristbands on a table. A warm lamp dissolves into navy and gold stars, which become faint constellation lines and navigational arcs.

The inheritance is real, but it only reaches a patient if the pattern can travel.

A field sees by light thrown from the past, the way a navigator steers by stars that may already be dark. Science runs on borrowed light, and it compounds only when that light reaches the next person in time to steer by.

Every field has a version of those four months. A failed trial whose lesson does not reach the next protocol. A materials lab that resynthesizes a compound another group already proved would not work. A correction that travels by rumor to one lab and never reaches the work that depended on it. A graduate student who joins a project and spends half a year rebuilding her advisor’s institutional memory because none of it was written down. None of it is exotic; it is what happens when a field produces more work than it can keep track of.

AI is making hypotheses, code, protocols, reviews, and experiment plans cheaper to produce than science’s current institutions can absorb. Models can draft candidate experiments in minutes; the wet lab still needs weeks.

Production was never the whole of science. A field advances by generating variation, many hypotheses, methods, and competing readings of the same data, and then doing the slower work that follows: selecting what survives scrutiny, retaining what failed so the next group does not repeat it, and carrying corrections back to the claims they overturn. AI has made the first half cheap and left the second half where it was. Cheap variation only compounds if selection and memory keep pace, and they have not.

OpenAI reported a collaboration with Ginkgo Bioworks that ran a closed loop on cell-free protein synthesis: model proposes, lab executes, six rounds across more than 36,000 reactions. The reported loop shows the shape of the new bottleneck: generation can outrun the bench process that tests it. OpenAI reported a closed-loop autonomous lab at Ginkgo Bioworks across six rounds and more than 36,000 cell-free protein synthesis reactions, cutting protein production cost by roughly 40% (OpenAI, Feb 2026). Treat this as an organization-reported frontier example, not settled evidence that all wet-lab bottlenecks have the same shape.

Cheap generation pushes the bottleneck downstream, into judgment about what to test. A frontier is the current state of a scientific question: what is known, what failed, what changed, what depends on what, and what would move the field next.

That state is not a smooth surface of settled knowledge. It is jagged, with gaps inside it and unknowns past its edge, and the hardest work lives at the seams, where two frontiers meet and someone has to rebuild the missing map by hand every time the field needs to cross. A finding can be load-bearing in one frontier and invisible in the one beside it: APOE sits near the top of the Alzheimer’s evidence and is absent from the glioblastoma associations next door. Nothing today makes that absence addressable from either side.

A literature search returns papers. A frontier says: this claim is supported in this population, weakened in that model, contradicted by this trial, dependent on this assay, and worth testing next by this experiment.

A warm paper atlas plate showing a jagged scientific frontier. Connected navy and gold nodes fill part of the known region, darker unconnected gaps remain inside it, and faint survey paths reach into the unknown beyond the boundary.

A frontier is the current map of what can be acted on, with gaps inside it and unknowns beyond it.

The question stops being how to produce more candidates and becomes the harder one: which candidates deserve scarce experiment time and human trust. That question can only compound if it is written down somewhere durable. Otherwise every lab re-adjudicates the same candidates in private, and every AI agent recompiles the same scattered prose into a context window that closes when the conversation ends.

Science already has the raw material. Findings, failures, methods, corrections, replications, and expert judgment are scattered across registries, repositories, supplementary materials, dashboards, slack threads, and private memory. What it lacks is the first layer beneath all of those: a writable substrate where a frontier can be updated, inherited, and used by the next researcher, clinician, or agent.

The old failure was that the pattern did not reach the doctor in time. The new failure is that the correction does not reach the agent, the funder, the protocol, or the trial in time.

Science does forget, but the sharper problem is inheritance. The next generation of scientific agents will inherit whatever memory exists. If it is private, incomplete, or controlled by the wrong incentives, AI will not repair what was already broken. It will make the distortions easier to reproduce: the same models retrieving fluently from the same dominant prose, narrowing the field’s effective hypothesis space.

The Pattern

A quiet research desk at night with scattered scientific papers, margin notes, and faint gold lines that fail to fully connect across the pages.

The literature can contain the answer while the field fails to carry the correction.

Take amyloid.

For more than twenty years, amyloid concentrated an outsized share of Alzheimer’s funding, attention, and late-stage trial investment. One dominant version of the program treated plaque clearance as central: plaques drive the disease, so clearing the plaques should stop the decline. Tau, inflammation, vascular, and lipid hypotheses never disappeared. They competed inside a field whose money, trial capacity, and senior attention had leaned toward amyloid year after year.

Trial after trial failed. Each Phase III failure meant years of enrollment unwound and patients and families who had organized their lives around a hypothesis that ended at interim analysis. Wrong belief was only part of the cost. Most of it was patient time, trial capacity, funder attention, and expert labor organized around an outdated picture of the field.

Lecanemab and donanemab eventually arrived and showed modest benefit in early symptomatic disease. The field had not been built to carry that nuance cleanly. The public argument often polarized into belief or rejection, with too little room for “this works at the margin in a narrow population.” Amyloid-hypothesis papers and review narratives continued to circulate while failed trials lived in separate registries and therapeutic trackers. No canonical dependency layer forced the original claim and the contradictory trial record to travel together so a downstream reader saw both at once. Many anti-amyloid and amyloid-pathway programs failed across the 2003-2023 period, while the antibodies that eventually showed modest clinical benefit (lecanemab, donanemab) arrived decades into the program and remain debated against cost, safety, and translation constraints. See Cummings et al., Alzheimer’s & Dementia: TRCI 2023, for the 2023 pipeline; van Dyck et al., NEJM 2023, for lecanemab; and Sims et al., JAMA 2023, for donanemab.

Horizontal timeline from 2002 to 2024 with fourteen dim dots for failed or halted Phase III amyloid-pathway programs and two gold dots at the right for lecanemab (2023) and donanemab (2024). Hover or tap any dot to open a card with the trial detail and the other programs in the same mechanism class. lecanemab · donanemab

Fig. 01. Twenty years, two antibodies. Each dim dot is a Phase III amyloid-pathway program that failed or was halted between 2007 and 2022. The two gold dots are the antibodies that eventually showed modest benefit in early symptomatic disease, still debated against cost, safety, and the populations they apply to. Hover a dot for the trial in full, and for the other programs that bet on the same mechanism.

The evidence and the corrections existed in separate places, and nothing forced them to move together.

Confidence traveled farther than correction.

A star can go dark and keep shining for years. Amyloid worked the same way: the simplest version had failed in trial after trial while the field went on steering by the light it threw before it changed.

The error-correcting tradition only works when a correction can find the claim it corrects; a field producing more work than the paper system can carry loses that property by default. The same pattern recurs wherever knowledge has to change action: retractions reach one lab but not the work that depends on them, null results stay unpublished, and agents read a field without writing anything back.

Without a shared write-back layer, AI can make this worse. The incentive is a trap: each researcher is locally rewarded for publishing more, and cited more for it, while the field is globally worse off as its collective attention narrows onto the same dominant prose. The portfolio of bets contracts toward whatever the models surface most fluently. In an analysis of 41.3 million natural-sciences papers (Nature 2026), AI-augmented researchers published roughly 3× more papers and received nearly 5× more citations, while the collective volume of topics studied contracted by 4.6% and scientist-to-scientist engagement fell by 22%. The study is evidence of an association, not proof that AI alone causes every narrowing mechanism described here.

The amyloid concentration took twenty years to produce. The next version can form in a fraction of that time, because generation and retrieval now move faster than correction.

Experiments begin in large numbers, then narrow through positive written results, publication, citation, and memory while nulls, failed replications, abandoned protocols, and instrument quirks leak out of the record. experimentsbegun written up published cited acted on

Fig. 02. The negative-results funnel. A schematic of the publication filter: what the next decision acts on is not the same as what the field learned.

The same failure shows up everywhere: translation delay, unpublished nulls, unreported trials, retractions that don’t propagate, duplicated experiments that no one had a way to inherit. Balas & Boren, Yearbook of Medical Informatics 2000, on the often-cited 17-year translation gap; Morris et al., J. R. Soc. Med. 2011, note the estimate hides significant variation by field. Franco, Malhotra & Simonovits, Science 2014: in a TESS social-science sample, roughly two-thirds of null-result studies were never written up. Goldacre et al., BMJ 2018: about half of due EUCTR trials had reported results.

Underneath all of those is the same missing thing: structured dependencies between scientific objects (claim to evidence, claim to contradicting trial, claim to retraction, claim to dependent claim). Whatever isn’t recorded as a dependency does not propagate when something changes. Anyone who has joined a mature project knows the feeling: before contributing, you first have to rebuild the missing map.

The Substrate

A watercolor cross-section of scientific papers and structured records connected by navy and gold lines beneath the surface, suggesting a shared substrate for scientific state.

The missing layer is beneath the artifacts: the changing state of the claims they contain.

Software offers the nearest contrast. Git gave code a memory; GitHub and package ecosystems made that memory networked through commits, issues, reviews, checks, and releases. AI can write code at scale because the work already lives in objects agents can inherit, test, merge, and distribute. Google and Microsoft executives have both described AI-generated code becoming ordinary inside their engineering workflows; see Ars Technica on Google’s 2024 disclosure and TechCrunch on Microsoft’s 2025 disclosure. The exact percentages are volatile and definition-dependent; the durable point is that code already has reviewable objects an agent can write into.

Code compounds because every change inherits the one before it. Science diverges because nothing carries the change forward: findings, failures, and corrections scatter into systems that never reconcile. Pull requests, issues, and CI checks are objects an agent can read and write; science has no general equivalent for changes to claim state. An agent that synthesizes the literature today produces a fluent summary that no other agent inherits, no record carries forward, and no reviewer can read as a change. The next agent starts the same job from scratch.

Scientific work is moving into the same regime. A model can read more literature than any human and reason across chemistry, biology, and physics in the same conversation. It cannot write into a record that compounds. The missing layer sits beneath the archive, where what an archive holds can change. A writable substrate gives AI a place to deposit what it produces; the frontier is the object a scientist reads and writes.

The obvious objection is that the tools already exist: version control, data repositories, preprint servers. Trevor Bedford, “Some thoughts on a GitHub of Science”, is the sharpest version of the case that the existing open-source stack can already carry much of scientific work. They move artifacts; they do not maintain the live state of the claims those artifacts contain.

Human experts gain time in the same loop. A reviewer who once spent weeks reconstructing a contested cluster should spend most of that time judging the update itself. The substrate amplifies expertise without replacing the judgment about what to trust, who to trust, and what to test next.

AlphaFold was trained in part on experimentally determined structures from the same shared archive structural biologists had been depositing into for decades. Its database now provides predicted structures for over 200 million proteins from public sequence resources. Jumper et al., Nature 2021; the AlphaFold Protein Structure Database holds over 200 million predicted structures. Hassabis and Jumper shared the 2024 Nobel Prize in Chemistry. By making experimentally determined structures shared, curated, and machine-readable, the PDB gave the model structured state to learn from. The Protein Data Bank, established in 1971, holds over 250,000 experimentally determined structures. Deposition became a condition of publication in major journals by the 1990s; AlphaFold’s training data was drawn directly from this shared resource.

AlphaFold is what a field gets when it already has a PDB. There are not a hundred more like it because there are not a hundred more PDBs.

A model is bounded by what it can learn from, and most fields keep no machine-readable record of their own current state. The substrate is that missing input as much as a defense against forgetting. A model needs two things a field rarely keeps in one place: a corpus to learn from and a standard to test against, and the substrate is where a field would have both. The corpus is the record itself, the findings and failures and corrections as they deposit. The standard is whatever lets the next reader re-derive a claim instead of trusting it. Where those collapse into one object, the loop closes at the speed of compute. Where the standard is a wet lab and a year, the record still fills, but reality sets the pace. That split decides which questions AI can close now; a field’s size does not.

Without that record, the breakthrough never arrives: the material goes undiscovered, the theorem stays unproven, the patient waits out the years it would have saved. The intelligence is already here; the record it would learn from is not.

Most fields do not have protein structure’s clean experimental object. Their version of the PDB holds changing state rather than settled facts: claims, evidence, contradictions, confidence, and correction, canonical enough that downstream decisions reference it by default, not perfect.

And whoever owns that record shapes what later science treats as true. A privately held substrate with an AI wrapper on top turns inheritance into something the next researcher has to rent. The PDB worked because it stayed open and deposition became a condition of publication; a closed record makes the field pay, again and again, for what it should have inherited for free.

Measured protein structures become deposited, curated records with identifiers and provenance, then become training signal that supports model predictions at field scale. 200k+ structures measured shared archive the PDB 200M+ model predictions from one curated record 200k+ structures shared archive 200M+ model predictions

Fig. 03. Structured state before intelligence. AlphaFold did not emerge from papers alone; it learned from a shared, curated scientific record.

To a model reading prose alone, a retracted finding and a Nobel-winning one can arrive as similarly fluent text unless the record carries status, provenance, and correction state. A tentative claim travels through citations without carrying the uncertainty forward until it looks established. AI makes this faster because the model has no shared place to record the difference.

The record carries the part that can travel: scope, evidence, dependency, confidence. It does not carry social trust, and it cannot know every load-bearing variable before something downstream breaks; it can only give the later failure somewhere to attach. Begley and Ellis’s audit of fifty-three landmark preclinical cancer studies found that independent replication failed in 47 cases. Begley & Ellis, Nature 2012: 47 of 53 landmark preclinical cancer studies failed to reproduce on independent replication.

The record gives a field one place to read from and write back to.

Take a claim like “this material superconducts at room temperature.” In the substrate it stays a scoped object: at which pressure, in which sample, by which measurement, and replicated by whom, with the failed replications attached to the claim instead of circulating beside it. Each finding becomes scoped evidence bearing on a related set of claims.

When a replication fails, it attaches as a proposed scope correction: the broad claim narrows to the conditions that held, confidence on the unscoped version falls, and the work downstream that assumed it is flagged for review. The substrate accelerates the evidence handoff, not the institutional decision.

A machine can check some claims outright. A proof, a bound, a synthesis that either runs or fails: the record ships a frozen verifier alongside the claim, and the next reader re-runs it in seconds rather than trusting whoever ran it first. A superconductor or a drug has no such verifier. For those the record holds scoped evidence and the corrections attached to it, and a reviewer supplies the judgment a machine cannot.

A correction can only travel if the finding it corrects has an address.

Whether a correction can travel. An overturned claim and the findings that depend on it. With a shared record, the correction propagates along every dependency, mending each link in gold and flagging the dependent for review. In the paper world, with no address to follow, it stops at the claim while the dependents keep building on it. Replay the correction, or switch worlds.

The Constellation

A watercolor field of navy and gold constellation nodes connected by dotted arcs across pale paper, with dense clusters and open regions suggesting a navigable scientific map.

A frontier becomes navigable when corrections, failures, and dependencies stay connected.

The first AI-science layers to appear are social and operational, not state: agents can post hypotheses, debate papers, chain tools, and preserve artifact lineage. Science Beach, Agent4Science, and ScienceClaw are early examples of social and runtime layers arriving before a shared state layer.

Those systems matter because they show that AI scientists will generate public activity. But none of it is state. A post records activity, a vote records attention, an artifact DAG records lineage; the change in what the field believes goes unrecorded.

The record exists to make public activity inheritable: a hypothesis becomes a finding or question, an artifact becomes evidence, and a reviewer can see what would change if the update were accepted. The object that matters is the change to the frontier.

A single finding sits alone until something connects it to others, the way the lines between stars make a constellation. A working scientist navigates the pattern: which findings depend on which evidence, which corrections have moved through which downstream claims, which questions still have no answer anyone trusts. Today that pattern is answered by reconstruction. A clinician, researcher, or agent reads, follows citations, chases retractions, emails authors, guesses which caveats still matter. The reconstructed map lives in her head and disappears when she leaves the project. In the substrate, that map is an artifact: one researcher’s reconstruction becomes the next researcher’s starting point, signed and inspectable.

When a result fails to replicate or a paper is retracted today, the update moves by rumor, by review article, by the luck of who happened to notice. In the substrate, a correction is an event inside the record, and every claim that depends on the original is flagged for review.

Open-by-default fits most findings but not all. The governance that handles the exceptions (who can sign, how maintainers are held, how disagreement and dual-use work are gated) is the subject of the engine, not this essay. The claim here is narrower: the pattern only stays navigable if corrections remain events that travel.

Contact With Reality

A watercolor scientific corridor where instruments, protocols, measurements, and faint constellation lines converge into a lived encounter with reality.

The record matters only if results from the world can change what the field believes next.

The record has to touch reality. If a lab result cannot change the finding it bears on, the substrate becomes another library of artifacts.

RECOVERY is the practical proof. When COVID hit, many trials were fragmented hospital by hospital, protocol by protocol. The UK did something simpler and more durable: a shared protocol, lightweight enrollment, integration with existing records, and a system many hospitals could join.

The first patient was enrolled within days. Within 100 days, RECOVERY produced a result that changed care worldwide. Dexamethasone, a cheap generic steroid, reduced mortality in ventilated patients by about a third and has since saved hundreds of thousands of lives. Dexamethasone reduced 28-day mortality by roughly one-third in ventilated patients across 176 UK hospitals (RECOVERY Collaborative Group, NEJM 2021). UKRI describes RECOVERY as the world’s largest clinical trial into COVID-19 treatments, with more than 40,000 participants across 185 UK sites, and estimates that dexamethasone had saved around 22,000 UK lives and one million lives globally by March 2021 (UKRI, updated 2024).

Execution structure mattered as much as intelligence. RECOVERY proved that consolidation can move knowledge into care at speed. The record version needs an equivalent current answer without a full clinical-trial machine: a way for findings, failures, corrections, and dependencies to travel across institutions before each team repeats the same reconstruction.

Many hospitals feed into one shared protocol, producing a dexamethasone result and a change in global care, with an inset showing lower 28-day mortality for dexamethasone than control. ~180 hospitals shared protocol one trial state result dexamethasone care changes global practice 28-day mortality 41.4 29.3 control dex 12.1 lower

Fig. 04. Execution as infrastructure. RECOVERY worked because participation, measurement, and learning converged into one shared protocol.

The loop matters: models propose against the current record, labs test what would reduce uncertainty, failures return to the record, and human attention moves to the decisions only humans can make. Amodei’s “Machines of Loving Grace” argues that AI could compress decades of biological progress into years. McCarty’s “Levers for Biological Progress” is the useful constraint: experiment speed, cost, measurement, regulation, protocols, and human collaboration remain bottlenecks even if intelligence becomes abundant.

The Haverford lab showed this at small scale: a model trained on years of failed syntheses out-predicted experienced chemists because heterogeneous notebook failures were structured enough to learn from. Raccuglia et al., Nature 2016. A machine-learning model trained on years of failed vanadium selenite syntheses predicted reaction outcomes with 89% accuracy, vs. 78% for experienced chemists. When a synthesis fails today, the failure stays local. In the substrate, it enters the record directly: this compound, these conditions, this measured outcome, this uncertainty. The next chemist designing a similar synthesis sees the dead end before she runs the experiment.

In the blood-brain-barrier corridor, a researcher opens the question rather than a paper. She sees human evidence, animal-model claims that failed to translate, interventions that moved biomarkers without changing cognition, and the experiment most likely to separate vascular damage from downstream inflammation. She clicks a failed intervention and sees the protocol, model, dose, endpoint, measurement window, and reason confidence fell. She begins from the current state of the question.

Major Alzheimer’s initiatives are already moving at this scale, with OpenAI Foundation and Arc each treating the disease as a blueprint for complex translation. OpenAI Foundation, “AI for Alzheimer’s”, April 8, 2026: more than $100 million in planned grants across six institutions, across a five-layer stack of causal mapping, AI-assisted drug design, open datasets, biomarkers, and off-patent intervention testing. See also Arc’s Alzheimer’s Disease Initiative. Multiple teams can produce discoveries in parallel and still leave no shared map unless failed attempts, partial replications, biomarkers, target hypotheses, and changing confidence enter one record. Without that record, the next nine-figure initiative risks repeating the old coordination failure, only faster and at larger scale.

The Crossing

If every field had a current frontier, a question would have a state instead of an accumulated bibliography. A failed trial could surface wherever the target hypothesis was being reused, with the affected scope marked explicitly. A failed experiment would enter the shared record before the result is written up, so the next team starts from the current field instead of rebuilding a private map.

At that scale, funders can see neglected bottlenecks across many fields at once, regulators can trace a claim through evidence, corrections, and dependencies, and models can propose discriminating tests against the current field instead of a private scrape of papers.

The first enforceable unit should be small enough to fund: one disease-frontier pilot where signed findings, failed experiments, confidence updates, and review decisions travel between institutions. The test is record-level, not a demo: can a downstream funder, reviewer, clinician, or regulator-facing team trace why a claim changed, what evidence moved it, and which next action should stop or proceed? The detailed machinery belongs to the engine. This essay’s claim is simpler: the record has to become common enough that later institutions have something current to inspect.

Open infrastructure widens who can contribute. arXiv let mathematicians outside elite departments compete on the work; GitHub let outsiders ship code to projects used by Fortune 500 companies; Hugging Face let independent ML researchers ship models that production systems depend on. A scientific substrate would do the same for clinicians at non-research hospitals, researchers in fields without good shared archives, and students. Science is one of the few cumulative human activities still routed through gatekeepers chosen a century ago.

The pattern that doesn’t reach in time is no longer the hardest case. Retrieval can surface what a textbook would have connected. The harder failures are the ones retrieval cannot solve: the local correction, the failed trial that never reaches the next protocol, the agent whose synthesis disappears at the end of the session.

Human knowledge is never contained in one person. It grows from the relationships we create between each other and the world, and still it is never complete.

Paul Kalanithi When Breath Becomes Air

The literature records what science has said. The frontier carries what it can act on now.

If you do science, what you find becomes borrowed light for the next person. They steer by it, or fail to, depending on whether it reached them. The failed run in your drawer, the correction you meant to circulate, the caveat that lives only in your head: none of it travels unless you record it. What you keep to yourself goes dark for someone you will never meet.

A six-year-old comes in with headaches, vomiting, and balance trouble. The frontier has changed since the last textbook was written, and the pattern reaches her in thirty seconds.

A quiet closing watercolor plate echoing the opening horizon, with a line of light continuing forward.