Constellations

of

Borrowed Light

Borrowed Light Project

v1.0 · 2026-01-08

"The inheritance from the master becomes not only his additions to the world's record but for his disciples the entire scaffolding by which they were erected."

Vannevar Bush
As We May Think, 1945

The Inheritance

Human knowledge is never contained in one person. It grows from the relationships we create between each other and the world, and still it is never complete.

—Paul Kalanithi, When Breath Becomes Air

The knowledge to save your life exists. It cannot reach you.

When I was six years old, I saw the same three doctors twelve times over four months. Each visit, my chart grew thicker—headaches, fevers, nausea, all documented. Each visit, the same routine: temperature, abdomen, ears. “It’s viral,” they said. “Give it another week.”

My mother had noticed I was unsteady on my feet. She mentioned it. The doctor nodded and wrote something down. She’d noticed my vision seemed off, too. He didn’t ask about that. No one had.

The knowledge to diagnose me existed. What I had was one of the most common pediatric brain tumors, well-documented and highly treatable if caught early. The diagnostic triad (headaches, vomiting, and ataxia) has been documented for decades. I presented with all three. Textbook in retrospect, invisible in the moment.

But there was no scaffold. No checklist prompted any of those doctors to ask about balance, nausea, and headaches together. No system flagged that the same child had returned twelve times with unresolved symptoms. The pattern was building in the chart. Nothing prompted anyone to see it.

One night, my fever hit 105. My mother drove me to the children’s hospital.

In the waiting room, a teenage volunteer passed me snacks and candy. I asked if I could have more. She brought me another handful. I remember the fluorescent lights, the plastic chairs, the sound of a television playing somewhere I couldn’t see. I remember thinking this was the kind of day where people were nice to you.

Hours passed. They took me for a CT scan. Not because someone recognized the pattern, but because my symptoms had become impossible to ignore.

I remember the machine. The hum. The coldness of the table. I remember looking up through the glass at the booth above, watching the technicians stare at the screen. Then their faces changed. One of them left. He came back with another doctor. Then another. They were pointing at something. They weren’t looking at me anymore.

The headaches. The vision. The swelling. All of it, now undeniable.

They wheeled me back to my mother. No one said anything yet. But I had seen their faces. I was six years old, and I understood that something was very wrong.

The surgeon appeared at my bedside. I survived.

My doctors were not incompetent. They were doing what the system allowed.

Each specialist sees the part they were trained for. A pediatrician sees a fever and thinks infection. An ophthalmologist sees vision problems and thinks refractive error. A neurologist might see the pattern, but the child never reaches the neurologist, because no one thinks to refer. The whole picture exists. No one is positioned to see it.

Only 38% of pediatric brain tumors are diagnosed within the first month of symptoms. One in three children is misdiagnosed on their first hospital visit. In one study, only 57% of children presenting to emergency rooms with brain tumor symptoms were even examined for papilledema, swelling of the optic nerve, one of the most reliable signs of increased intracranial pressure. The examination takes thirty seconds. It wasn’t done.

The failure is transmission: how knowledge moves between the people who generate it and the people who need it. Transmission does not solve training, staffing, or regulation. But it is the scaffold those solutions require.

My case was typical, and that is the harder truth: the system was working exactly as designed, without the scaffold that would make knowledge arrive.

That scaffold still doesn’t exist. Medicine is just where the stakes are easiest to see.

Most scientific knowledge never becomes a paper. The failed experiment that would have saved someone else six months. The tacit pattern a senior researcher recognizes but has never articulated. None of it structured, none of it connected.

In materials science, a research group spends two years optimizing a catalyst, publishes the formulation that worked, and never reports the hundreds of conditions that did not. Across the world, another group begins the same search from scratch. The pattern repeats across every field: redundant work, irreversible loss, stranded knowledge.

Three shapes of the same failure: repeated (a loop where effort goes out and comes back), lost (a broken line where knowledge dissolves into nothing), stranded (two disconnected clusters with a gap between them) Three shapes of the same failure: repeated (a loop where effort goes out and comes back), lost (a broken line where knowledge dissolves into nothing), stranded (two disconnected clusters with a gap between them)

In medicine, by one widely cited estimate, it takes an average of seventeen years for a scientific discovery to reach routine clinical practice. The iPhone went from concept to a billion users in less time. SpaceX went from founding to landing rockets on drone ships. We treat the speed of knowledge transfer as a law of nature rather than a design failure.

Science is the starting point, because its findings are verifiable and its stakes are measured in lives.

Look up at the night sky and you are seeing ghosts. Some of those stars collapsed millions of years ago. Their light is still traveling. Sailors crossed oceans by it. Entire civilizations oriented themselves by stars already dead.

Carl Sagan saw the same pattern in knowledge: “Books permit us to interrogate the past with high accuracy; to tap the wisdom of our species... They allow people long dead to talk inside our heads.” Dead stars and dead scientists: both transmitting across time. Both lending us light we didn’t generate.

We navigate by borrowed light: knowledge we inherit rather than discover. The physician inherits clinical wisdom from cases she never saw. The researcher inherits methods from experiments she never ran. Every generation inherits from the one before.

We have no choice in this. The question is whether the light is structured: navigable, correctable, traceable. Whether it arrives, or is lost. And what we can see (what has been published, connected, made findable) is only a fraction of what exists.

Drugs are worse. By one estimate, eighty percent of the world’s most important drugs originated from publicly funded science, yet an average of thirty-two years separates a basic science discovery from an approved drug. Development and testing account for part of that timeline. Fragmentation and failed knowledge transfer account for a significant share of the rest.

An estimated twenty-eight billion dollars per year—more than NASA’s entire annual budget—lost to preclinical research that cannot be reproduced. That is one field.

Thirty-two years: long enough for the researcher who made the discovery to retire, or die, never knowing whether it helped anyone.

Bar chart showing gaps: 2 months from symptom to diagnosis, 17 years from discovery to practice, 32 years from discovery to approved drug Bar chart showing gaps: 2 months from symptom to diagnosis, 17 years from discovery to practice, 32 years from discovery to approved drug

That inheritance carries obligation. The best transmission carries warmth: we’re going to figure this out. This isn’t the end. I learned this from someone who didn’t get to finish teaching me.

Discoveries are not scarce. They exist in journals, databases, lab notebooks, clinical records, the minds of specialists. But they are locked inside containers. A paper holds dozens of findings; the findings themselves have no way to connect across documents, fields, or time. What is missing is infrastructure at the level of each point of knowledge, not the document that contains it.

The barriers are not only technical. They are institutional: incentives that reward citation over impact, formats that fragment rather than connect. But infrastructure can change the incentives. There are eleven million physicians and eight million researchers worldwide. Each week, millions of hours spent searching for what already exists: the same query, entered by a thousand doctors in a thousand cities this month, each unaware the others are looking.

Clinicians who know they cannot stay current with the evidence that governs their patients’ lives. Researchers who spend more hours navigating the literature than doing the work that drew them to science. The system does not just lose knowledge. It wears down the people who carry it.

The stars are there, scattered across the archive, luminous but unconnected. What remains is to draw the constellations that make them navigable.

The Pattern

Every age builds its library. Every age loses it.

Callimachus catalogued the Library of Alexandria but could not propagate corrections. Eunice Newton Foote demonstrated the greenhouse effect in 1856; her paper was read by a man at the AAAS, omitted from the proceedings, and forgotten for a hundred and fifty-five years until a retired petroleum geologist found it by accident. Paul Otlet cross-referenced twelve million index cards; the Nazis destroyed them. Vannevar Bush imagined associative trails through the literature; he could not build them. The web and Wikipedia proved that scale comes from participation, but both organize documents, not the findings underneath. The Semantic Web demanded ontology agreement before use; the agreement never came. FAIR principles addressed artifacts—papers, datasets—but not findings.

Each failure clarified a constraint. Catalogues cannot propagate corrections. Centralized systems can be destroyed. Participation does not guarantee depth. Architecture must emerge from use, not precede it.

What changed is the technology: extraction without human reading, replication without central servers, verification without manual checking. The bottlenecks that stopped every previous generation no longer hold. They were not wrong. They were early.

Each generation of civilization removed a constraint the previous one could not: writing removed memory, printing removed distribution, the scientific method removed reliability, the internet removed access. AI is removing intelligence itself—reasoning, generation, and discovery becoming abundant and cheap. It is a sequence, each layer built on the last. It is not finished. There is a next layer, and it has a name.

But each layer also created the problem the next one solved. Writing enabled memory but created fragility—one copy, one location, one fire. Printing solved fragility but created noise—anyone could publish anything. The scientific method solved noise but was slow. The internet solved speed but re-created noise at global scale. And AI solves intelligence but creates the deepest problem of all: a hundred times the collective reasoning power of the human race, operating on a substrate that reflects our publishing incentives more than it reflects reality.

The next layer is the constellation: the infrastructure that connects intelligence to reality. Not faster information. Not more intelligence. The substrate that determines whether what intelligence believes is actually true: whether findings trace to evidence, whether corrections propagate, whether anyone can verify. Writing, printing, method, network, intelligence, constellation. Six layers. Each made something abundant that was previously scarce. The sixth makes structured, correctable knowledge abundant. For the first time, tracing a claim to its evidence is infrastructure, not individual work.

Six civilizational layers ascending: Writing, Printing, Method, Internet, Intelligence, and Constellation — each solving the problem the previous one created Six civilizational layers ascending: Writing, Printing, Method, Internet, Intelligence, and Constellation — each solving the problem the previous one created

Six layers. Each solved what the previous one broke.

The Foundation

Software rediscovered what Callimachus was trying to build: a system where nothing is lost, where every contribution is attributed, where the whole can be reconstructed from any fragment.

Before Git, software collaboration was fragile. Teams used CVS, Subversion, proprietary version-control systems. Each worked inside one organization. None composed across them. The Linux kernel—the most ambitious collaborative software project in history—was coordinated through emailed patches and tarballs sent to a mailing list. It worked, barely, for one project with one dictator. It could never have produced an ecosystem. GitHub could not have been built on Subversion. Copilot could not have been built on GitHub-on-Subversion. Each layer’s architecture constrains what can exist above it.

In 2005, Linus Torvalds released Git: every change preserved, every decision recorded, every history queryable. After Git, code could compound. The Pinakes for code.

In 2008, GitHub launched: not just storage but discovery, reputation, collaboration. Over 150 million developers now push nearly a billion commits a year.

Hugging Face repeated the pattern for machine learning: its model hub, launched in 2020, now hosts over two million models. Then came agents—first suggesting code alongside developers, now autonomous, navigating codebases for hours, creating pull requests without supervision.

Timeline of software infrastructure: Git (2005), GitHub (2008), Hugging Face (2021), and AI coding agents (2025) Timeline of software infrastructure: Git (2005), GitHub (2008), Hugging Face (2021), and AI coding agents (2025)

Foundation → Platform → Agents. The stack built in order.

AI writes code because this infrastructure exists. It reads documents for everything else.

The lesson is precise. Code did not become abundant because compute got cheaper. Code became abundant because Git solved the transmission problem—every commit reachable, every dependency traceable, every conflict surfaced. Without the substrate, the intelligence has nothing structured to operate on.

But Git solved only half the problem. Git versions code. A compiler transforms it: human-readable text becomes machine-executable instructions. The transformation is what makes code actionable—not merely stored but runnable, testable, composable with other compiled artifacts.

Science has never had a compiler. A paper is human-readable prose. The findings inside it, the actual claims about reality with their evidence and limitations, are locked in natural language. Every reader performs the compilation step in their head: extracting the finding, assessing the evidence, connecting it to what they already know. Every reader does this independently, every time, and none of these compilations persist. The next reader starts from scratch. The next agent re-extracts the same structure from the same prose. The compilation is performed millions of times and preserved zero times.

The constellation is the compiler for scientific knowledge. It takes findings—prose, papers, practitioner experience, failed experiments—and compiles them into points: discrete claims with evidence, confidence, conditions, and lineage attached. Compile once, query forever. Every subsequent reader, every agent, every system inherits the compiled form rather than performing the compilation again.

Git versions lines. The constellation compiles findings. Both compound for the same reason: every unit is addressable, every relationship is traversable, and every update propagates. Every other form of transmission has its infrastructure: roads for goods, wires for messages, protocols for data, Git for code. Nothing makes knowledge flow. Not findings, not corrections, not negative results, not the reasoning that connects them. Git was useful to Linus on day one, for one project. It did not require the world’s permission to start. The constellation would work the same way: useful to one researcher, in one field, before it is useful to science. Science has no Git. It has no GitHub. It jumped straight to agents, and the agents are building on sand.

An honest objection: give every scientist a good enough AI assistant and the transmission problem solves itself. But a hundred labs with brilliant AI assistants, each producing findings trapped in isolated sessions, just flood the same broken channel faster.

Git tracks lines of text; it cannot tell you how the evidence state of a finding changed, propagate a retraction to everything that cited it, or answer: which findings about this mutation have replicated? Search retrieves documents. What science needs is to navigate findings: to trace a finding from observation through replication, challenge, and revision.

AI science agents are arriving: FutureHouse for literature review, autonomous laboratories at Argonne for physical experiments, the AI Scientist for end-to-end paper writing. Across these frameworks, the highest failure rate belongs not to experimentation or writing but to literature review—the step that requires navigating what is known. But these agents operate on documents. They read science; they do not navigate it.

The primitive for version control is the line of text. The primitive for knowledge must be something different: a discrete finding with evidence, confidence, and lineage attached. When new evidence arrives, the finding updates and everything connected to it knows.

The primitive is different but the architecture is the same: version, compile, propagate.

Diagram comparing software built bottom-up with science built out of order Diagram comparing software built bottom-up with science built out of order

Software built its stack in order. Science jumped to agents with nothing underneath.

Call this stack inversion: building the top of a stack before the bottom exists. The result is structurally different. Every vision for what comes after AI assumes a structured knowledge layer that does not yet exist: autonomous labs, AI scientists, drug-repurposing agents. So every FutureHouse, every Sakana, every autonomous laboratory builds its own private, non-interoperable representation of knowledge just to make its agent work. These are not research prototypes. FutureHouse’s commercial spinout raised seventy million dollars at a two hundred fifty million dollar valuation; its Robin agent identified a novel therapeutic for dry age-related macular degeneration—the first genuine AI-driven drug repurposing discovery. Quarter-billion-dollar companies, building on unstructured prose. Each invents ad hoc entity resolution, ad hoc confidence tracking, ad hoc provenance—or, more often, skips them entirely. A hundred teams each pouring their own basement, none reusable. The pattern is not new. The United States spent thirty billion dollars digitizing health records without requiring interoperability; sixteen years later, the systems still cannot exchange patient data. Digital silos replaced paper silos, at public expense. A bad foundation does not slow the ecosystem down. It caps it.

The pattern is always the same: capability, then application, then chaos, then infrastructure. The telegraph preceded wire services; the wire services preceded editorial standards. The automobile preceded highways; the highways preceded traffic law. Infrastructure arrives last because it requires coordination among parties who do not yet know they need each other. By the time they do, the chaos is already shaping what gets built. We are in the application phase of AI for science. The chaos phase—contradictory claims at machine speed, proprietary lock-in, a hundred private schemas hardening into switching costs—is not a distant risk. It is what the previous paragraph describes. The window in which shared infrastructure can precede entrenchment narrows with every funding round.

The world is racing to build artificial intelligence. No one is racing to build the substrate that would make it compound. When the machine arrives, the only thing that matters is what it has to think with.

Make knowledge compound. That is the work.

The Substrate

For most of history, the transmission problem was terrible but stable. Knowledge moved slowly; people adjusted. A doctor in 1950 could reasonably expect that what she learned in medical school would remain current for decades. The gap between discovery and practice was measured in years, but so was the pace of discovery itself. The system was broken, but it was broken at human speed. Human speed is ending.

We are entering the era of AI-driven scientific production: science performed by AI, not assisted by it. AlphaFold’s creators won the 2024 Nobel Prize in Chemistry. Early demonstrations show models that formulate hypotheses, run experiments, and write papers end-to-end. Within years, AI systems will generate the majority of new scientific claims.

A 2026 study in Nature, analyzing 41.3 million papers across the natural sciences, found that AI tools consistently expand individual impact while contracting collective reach: researchers using AI produce more but explore less, and the fields they work in narrow. Intelligence without transmission does not merely fail to compound; it actively concentrates, pulling the frontier inward toward what is already known.

And the substrate it operates on is thinner than it appears. Published science is a fraction of the science that was conducted. The majority of experiments produce negative results—hypotheses that failed, conditions that had no effect. In the current system, they vanish: into lab notebooks overwritten when storage fills, into databases abandoned when researchers change institutions, into the memories of people who move on. In catalysis research, thousands of groups test palladium compounds under varying conditions of temperature, pressure, and solvent. Each publishes the reaction that worked. The hundreds of conditions that produced nothing vanish. Each new PhD student begins by repeating failures that a shared record would have eliminated in an afternoon. By one estimate, roughly half of all clinical trials go unreported.

The published literature is what survived selection bias, positive-result preference, and format constraints. AI trained on this literature learns to navigate what is visible. It gets better at moving inside well-lit fields and worse at noticing what is missing. The dark matter of science—everything that was tried but never recorded—stays dark.

Aviation made incident reporting mandatory decades ago. The result is the safest mode of transportation ever built. Medicine lets half its evidence vanish, and every doctor navigates by the half that survived.

When intelligence scales—when AI systems generate thousands of hypotheses per day—every experiment that was never recorded becomes a blind spot that scales with them. An AI a hundred times more capable, trained on a substrate missing half the evidence, is not a hundred times smarter. It is a hundred times more confidently wrong about the things no one recorded. At human scale, a reviewer reads a paper; a clinician checks a source. Slow and incomplete, but possible. At AI scale, it is impossible. Claims arriving faster than anyone can verify.

And the problems will not wait. Antibiotic resistance, climate thresholds, the next pandemic—none will pause for seventeen years while knowledge makes the journey from discovery to practice. The solutions may already exist, scattered across institutions and formats and half-finished work. Whether they arrive in time depends on whether the answer can travel.

We are at a fork. The substrate we build is the substrate intelligence inherits. Get it wrong and the most powerful systems ever built navigate reality through a lens that is incomplete, enclosed, and shaped by whoever captured it. Get it right and those systems see clearly—because the full record is there, corrections propagate, dissent is visible, and no one controls the map.

The fork: two diverging paths from now—documents leading to scattered noise, structure leading to connected AI science. There is no third path. The fork: two diverging paths from now—documents leading to scattered noise, structure leading to connected AI science. There is no third path.

That is the five-year argument. The longer one: intelligence will become abundant, hypotheses cheap, any question answerable. When that happens, the only thing that matters is what that intelligence thinks with. Linus Torvalds made design decisions in 2005 that constrain what AI coding agents can and cannot do in 2026. The constellation’s design choices would do the same for knowledge. The stakes are higher than code. These decisions shape what intelligence believes is true about reality.

To a language model, a retracted paper and a Nobel Prize paper look nearly identical. The mechanism that makes this dangerous is lossy compression at every stage of transmission. A researcher runs forty-eight synthesis attempts, varying temperature, pressure, and precursor ratios. Forty-seven fail. The paper reports the one that worked, noting conditions but not the search that produced them. A review article cites the paper, mentioning the result but not the conditions. A second paper cites the review. By the third generation, the claim reads as established fact: “material X was synthesized at conditions Y.” The forty-seven failures, the wide confidence intervals, the caveats in the original methods section—all stripped away. Each layer of transmission loses context and hardens uncertainty into apparent fact.

Call this confidence drift. AI accelerates what already happens in human science.

Confidence laundering: a tentative finding with wide uncertainty gets cited through several generations. At each step the uncertainty halo shrinks and the star brightens, until a tentative result looks like established fact. The uncertainty doesn't shrink—it vanishes. Confidence laundering: a tentative finding with wide uncertainty gets cited through several generations. At each step the uncertainty halo shrinks and the star brightens, until a tentative result looks like established fact. The uncertainty doesn't shrink—it vanishes.

The pattern is not hypothetical. In 2023, Google DeepMind’s GNoME model predicted 2.2 million computationally stable crystal structures, framed as an order-of-magnitude expansion of known stable materials. Subsequent independent analysis raised concerns about experimental relevance: the gap between a structure computationally stable at zero Kelvin and a material that can be synthesized under real-world conditions is vast. A computational prediction is not a material. But downstream, the prediction enters the literature as a “discovery,” and the next model trains on the result. Confidence drift at the scale of a frontier laboratory.

AlphaFold is held up as the defining example of what happens when you pour compute at a scientific problem. It is actually an example of what happens when the transmission infrastructure already exists. AlphaFold predicted 200 million protein structures in months—more than the entire history of experimental biology combined. But structural biology had already solved its transmission problem. The Protein Data Bank provided decades of structured, machine-readable ground truth. CASP provided an adversarial evaluation framework that prevented the field from grading its own homework. The compute was necessary. The infrastructure was the precondition. Without the PDB, there is no AlphaFold—only models trained on unstructured papers, fluent but ungrounded. Most of science has no PDB. The constellation is the infrastructure that lets every field build one.

Every domain that has been industrialized required first building the layer that makes it cheap to check what is true. The scientific method was that layer for empirical claims. The Protein Data Bank was that layer for molecular structure. Every AI agent acting on science today—reading papers, proposing hypotheses, designing experiments—is automating the work without having first compiled the evaluation. The evaluation must come first.

The Convergence

For the first time, everything converges. Large language models can extract findings from documents at scale; distributed protocols can replicate data without central points of failure; content addressing can make knowledge self-verifying.

It has happened once.

When COVID hit, every country ran trials. Most fragmented, each hospital with its own protocol, incompatible with the others. The UK did something different.

In late February 2020, Martin Landray, an Oxford epidemiologist, emailed Jeremy Farrar, then director of the Wellcome Trust. A few days later, they discussed it on a No. 18 bus to Marylebone. Farrar suggested Landray join forces with Peter Horby, an infectious disease specialist. Within nine days of writing the protocol, the first patient was enrolled.

The RECOVERY trial launched with shared infrastructure: web-based randomization that any hospital could use, a single ethics committee approval instead of 180 separate applications, minimal data collection integrated with NHS electronic records. The structure made participation effortless. One in six COVID patients admitted to UK hospitals entered the trial.

Within 100 days, RECOVERY had enrolled 11,000 patients and delivered its first result: dexamethasone reduced deaths by one-third in ventilated patients. The drug costs £5. It has since saved an estimated one million lives worldwide.

“It’s very, very rare,” Landray reflected, “that you announce results at lunchtime, and it becomes policy and practice by tea time, and probably starts to save lives by the weekend.”

Lunchtime to saving lives by the weekend.

Same pandemic. Same virus. Same doctors. Same patients. The difference was infrastructure. For one trial, in one country, for one disease, the pattern completed: discovery became arrival in days instead of decades.

RECOVERY eliminated administrative friction: enrollment, randomization, data collection. The constellation would go deeper, giving the findings themselves provenance and lineage so that the next trial inherits what this one found, what it contradicted, and what remains uncertain. Speed is possible when the scaffold exists. The constellation would make it permanent.

The gap between the two is real and I do not minimize it. RECOVERY had advantages the constellation cannot assume: a pandemic that aligned incentives, a single national health system, and political will that made cooperation easier than resistance. The constellation must work when there is no crisis, when participants are competitors, when no government mandates participation. That is a harder problem. RECOVERY showed the answer: design that makes participation effortless and locally useful. The constellation must do the same without a pandemic to align incentives.

The tool layer is arriving regardless: research agents, autonomous laboratories, each producing findings without structured output. Whether the exception becomes the default depends on what gets built next.

Open or captured

The infrastructure will be built. AI requires it. The only question is whether it will be open or captured.

Open does not mean all knowledge is public. The constellation protocol would be open: anyone can implement a client, anyone can verify provenance. Organizations can maintain private constellations. What must be open is the substrate: how points connect, the format of trails, the logic of verification. Without that, the risk is fragmentation: competing private knowledge graphs, each incomplete, none interoperable.

A doctor in rural Tamil Nadu does not have access to UpToDate. She has a phone. If the constellation is open, she navigates the same frontier as a physician at Massachusetts General—for the first time in the history of medicine. If it is captured, she pays for fragments of what the wealthy world takes for granted, and the gap that was once geographic becomes infrastructural: harder to see, harder to close.

Infrastructure, once set, calcifies. The journal article took its modern form in the 1800s and still dominates. When intelligence scales beyond any human’s ability to follow, the substrate it runs on determines which diseases get studied, which materials get synthesized, which climate interventions get tested. Not by mandate. By architecture. Open or closed. That choice outlasts us.

The Constellation

What I describe is recognized, not invented.

Knowledge has architecture that containers obscure. A finding implies its evidence. Evidence implies what challenges it. Findings relate to what they supersede and what supersedes them. The whole fabric versions over time, accumulating confidence or doubt as new information arrives. This architecture already exists. What I describe is the infrastructure that makes it explicit.

This sounds like another knowledge graph project. The graveyards are real. The pattern of failure is specific: build the perfect schema, then wait for the world to use it. The Semantic Web asked scientists to annotate their work in formats they had no reason to adopt. The world never does. The constellation does not ask. LLMs extract findings from papers that already exist; trails record decisions clinicians are already making. The difference between “agree on a schema” and “extract what’s already there” is the difference between the Semantic Web and Google.

And where previous systems demanded global coordination before delivering local value, the constellation inverts this: useful to one clinician today, before it is useful to science tomorrow. RECOVERY worked because the trial framework was useful to each hospital independently, not only when every hospital joined. Vertical before horizontal—one field, one problem, one community where the density of points justifies the infrastructure.

The primitives

A paper is an artifact: “Chapman et al. 2011, New England Journal of Medicine.” You can cite artifacts, but you cannot query them. You cannot ask: has this been replicated? Under what conditions does it hold? What depends on it?

A point is a unit of knowledge with its evidence and lineage attached. Points come from published studies, but also from the dark matter of science—failed experiments, clinical decisions made thousands of times daily but never recorded, lab protocols (the real ones that work, not the sanitized version in Methods), and AI outputs (requiring verification before they become knowledge). “This approach was tried under these conditions and did not work” is a point. The source is tracked: a point from a replicated RCT carries different weight than a point from a single lab notebook. But the primitive is the point, not the container it came from.

Points need stable referents—a gene, a drug, a mutation—to link across contexts; the constellation builds on existing biomedical ontologies, accepting incompleteness as the price of starting. LLMs extract candidate points from papers at scale; domain experts confirm or correct them, and the validated point enters with its history attached. Disagreement is preserved, not resolved—the protocol ensures disputes cannot be hidden.

Take a point: “BRAF V600E mutation predicts response to vemurafenib in metastatic melanoma.” In 2010, a clinician considering this drug for a dying patient would have seen uncertainty—early signals, small studies, no definitive trial. The point’s chronicle shows what happened next: Phase III results in 2011, 675 patients, clear effect. Then a complication in 2012—resistance develops within six months. Then combination therapies shifting the calculus again. The clinician does not need to reconstruct this history from scattered papers. She sees the trajectory—whether the finding is consolidating or fragmenting—and the trajectory changes what she does next. That is what a chronicle is: a living record of how confidence evolved, visible to anyone who needs it.

The primitive is domain-agnostic. A drug, a material, a climate threshold—the same structure applies.

Anatomy of a point: central assertion with entities, evidence, confidence, dissent, lineage, and chronicle showing how confidence evolved from uncertain (2010) to strengthened (2011) to complicated (2012) to refined (now). Anatomy of a point: central assertion with entities, evidence, confidence, dissent, lineage, and chronicle showing how confidence evolved from uncertain (2010) to strengthened (2011) to complicated (2012) to refined (now).

Points alone are not a constellation. The BRAF point links to the Phase III trial that established it, the resistance data that complicated it, and the combination therapies that changed the calculus. Follow the links and you can trace from what you know to everything that supports, challenges, or extends it. Without links, findings are scattered stars. With them, constellations.

The third primitive is the trail.

A trail is a recorded path through knowledge: what you searched, what you found, what you decided, and why. A trail might record: assembled solid-state cell with Li6PS5Cl electrolyte, NMC cathode—capacity faded by cycle 200. Found Puls et al. 2024: twenty-one labs given identical materials, results varied by a factor of three. Assembly pressure was the uncontrolled variable. Adjusted pressing protocol based on the top-performing group’s published parameters. Stable through 500 cycles.

But a trail is more than a log. It is a compressed story: a character who searched, a conflict that demanded judgment, a resolution that others can learn from. My doctors had the knowledge. They knew the diagnostic triad. What they lacked was the story from another doctor who had missed the same pattern and wanted to make sure no one else did. A point can change what someone knows. A story can change what someone sees. They are not illustrations of knowledge. They are a form of knowledge that structured data cannot replace.

Henrietta Lacks was thirty-one when she died of cervical cancer in a segregated ward at Johns Hopkins Hospital. Her doctors took a sample without her knowledge or consent. Those cells, HeLa cells, became immortal: polio vaccine, cancer treatments, gene mapping, COVID vaccines. Over 110,000 publications and billions in research.

Her family didn’t know. For decades, they couldn’t afford health insurance while researchers around the world built careers on her contribution. The knowledge traveled everywhere. The trail back to its source was never built.

Trails are for more than navigation. They are for attribution, for the visibility that makes informed consent enforceable, for the debt owed to those who contributed. In the constellation, Henrietta Lacks’s contribution would have traveled with her name attached, every downstream use traceable, every derivative linked back to its source. The gap the Lacks family spent decades confronting—who benefits from what was taken, and who knows?—is a gap of provenance. Trails close it.

But trails do more than attribution. They route knowledge to people who need it but would never think to search for it.

In the Netherlands, a baby girl was born with shortened bones and failing kidneys. She died within seven weeks. Her clinicians had no diagnosis. They sequenced her genome, found two candidate genes, and entered her case into Matchmaker Exchange, a federated network for rare disease research. Within days, they connected with researchers in Germany who had seen the same mutation. Then Canada. Then Portugal. Then the UK. Nine patients across five families on four continents, linked through a single API. They named the disorder. The research led to a potential treatment.

The baby in the Netherlands didn’t survive. Her data did. It traveled the network her clinicians built, reaching families who had searched for years.

Two constellations side by side: left shows the connected pattern that existed in medical knowledge, right shows the same stars fragmented and isolated—what the doctors saw through siloed lenses Two constellations side by side: left shows the connected pattern that existed in medical knowledge, right shows the same stars fragmented and isolated—what the doctors saw through siloed lenses

The map of what is known is simultaneously a map of what is absent: gaps made legible and queryable, the whole fabric versioned over time.

The constellation does not arrive fully formed. Consider the lithium dendrite problem—the tendency of lithium metal to form needle-like structures that short-circuit batteries, unsolved since Whittingham’s cells caught fire at Exxon in the 1970s. Today the fifty-year history is scattered across thousands of papers in electrochemistry, materials science, and solid-state physics. A grad student starting today spends her first year reading, building a mental map that exists nowhere else.

Now watch the field enter the constellation. An LLM reads a decade of literature and surfaces structured points—every claim about dendrite formation and suppression, with evidence and confidence attached. Relationships that existed only in the minds of a dozen senior researchers become explicit and traversable. A grad student opens her browser and sees the frontier as a map. A trail left by a researcher in Munich: sulfide electrolytes degrade above 4V with this cathode chemistry. Wasted three months. Use the oxide instead. Retractions propagate. Contradictions stay visible. Each point added makes the structure denser: more to connect to, more to correct against, more to build on.

Agents and the frontier

The agents are already here. What the constellation gives them is what they currently lack: a queryable frontier. AI scientists can substantiate findings with tool outputs—a binding affinity measured, a synthesis completed, a simulation converged. The constellation would store what was done, not what was said. As it grows, the frontier becomes visible: dense where evidence is strong, jagged where it is not, entire continents unexplored because no one thought to look.

Two-panel diagram: a zoomed-out sky map showing multiple knowledge constellations, and a zoomed-in view of one constellation where gaps and trails are visible. Two-panel diagram: a zoomed-out sky map showing multiple knowledge constellations, and a zoomed-in view of one constellation where gaps and trails are visible.

We are early. A species 300,000 years old, with writing for 5,000, with modern science for 400. The observable universe of knowledge is a sliver of a sliver. With each point added the horizon moves outward. The map is never complete. But for the first time, we can see where we are.

Corrections

Knowledge must be correctable: errors detected, corrections propagated, the record updated.

In January 2014, Haruko Obokata published two papers in Nature claiming a revolutionary method for creating stem cells: just dip cells in weak acid. Labs worldwide scrambled to replicate.

Ken Lee, a stem cell researcher at the Chinese University of Hong Kong, decided to document his attempt publicly. He live-blogged his replication on ResearchGate, the first live-blogged scientific experiment. Within weeks, he identified the problem: what Obokata had seen was likely autofluorescence, cells glowing from stress rather than transformation. Her revolutionary discovery was an artifact.

Lee submitted his findings to Nature’s “Brief Communications Arising,” the journal’s mechanism for publishing corrections. Nature rejected it. No clear explanation. The journal that published the flawed papers refused to publish the correction.

For six more months, labs worldwide continued wasting resources. Rudolf Jaenisch at MIT: “Many people wasted their money and their time and their resources on repeating this.”

The correction existed, but the system could not propagate it. STAP was not an exception. An estimated four hundred thousand fraudulent articles have entered the literature in the past twenty years. The corruption enters. It stays. Everything downstream inherits it.

The constellation would make corrections durable and discoverable: a retraction linked to what it corrected, a failed replication updating the points that depended on it, dissent attached to its evidence until the weight shifts. When a point is disproved, every point that depends on it knows—not eventually, but structurally. The system that carries findings also carries their failures.

The Gigafactory

The world is pouring hundreds of billions into AI compute. But compute was never the missing piece for making knowledge arrive. Engines need roads.

A researcher in Nairobi has the same right to navigate the frontier as one in Boston. Without shared infrastructure, there is no exploration at scale—just isolated parties starting from scratch.

The constellation maps what is known. The gigafactory is the machine that makes the map grow.

The frontier map: dense constellation of known knowledge on the left, sparse frontier on the right, with agents converging toward the gaps. The frontier map: dense constellation of known knowledge on the left, sparse frontier on the right, with agents converging toward the gaps.

When the foundation exists, science runs differently. This is the bottleneck inversion. Today intelligence is scarce; thinking is the rate-limiting step. Soon intelligence will be abundant and hypotheses cheap. The limiting factor becomes physical verification: lab time, patients, instruments, the speed of atoms and organisms. The constellation shifts from a tool for finding what is known to a tool for deciding what to test next.

A self-driving lab in Toronto synthesizes a candidate solid-state electrolyte. The result—ionic conductivity, stability window, failure mode—enters the constellation as a structured point. A lab in Shenzhen queries the frontier that evening: which sulfide compositions have been tested above 4V? The Toronto result appears. The Shenzhen lab skips the composition that failed and tests the next one. The cycle that once took months of redundant work takes hours of coordinated exploration. Not by central planning, but by making the frontier legible enough that coordination emerges.

The observable universe doesn’t expand through mechanism alone. It expands through participation. A grad student opens her browser. She sees the frontier of solid-state electrolyte research as a map, not a literature review. She clicks on the lithium dendrite problem—the same one whose fifty-year history the constellation structured—and sees why it’s still open: three approaches tried at high current density, two failed at pressures no one reported until 2024, one that might work with a sulfide electrolyte at a scale no lab has tested. She picks the sulfide approach. She can see where she’s going, and why no one has gone there yet.

The constellation does not replace her judgment. It replaces the years she would have spent acquiring the context to exercise it.

That decision requires something intelligence cannot supply. When knowledge is abundant, the scarce thing is conviction: the belief that a problem is worth pursuing, that its answer matters to someone. The map can show the gap. It cannot create the stubbornness to spend five years filling it. The constellation makes the frontier visible. It cannot choose which frontier matters. That remains human, and it is the most important thing that does.

Imagine instead: 2032. Intelligence is cheap. An AI system identifies a therapeutic target for a rare pediatric condition in three hours. The molecular mechanism is characterized, the binding affinity validated, the safety profile modeled. The discovery exists. It sits inside a proprietary knowledge system that cannot interoperate with the hospital network in Chennai where a child is presenting with that condition. The physician searches the literature. Nothing—the finding hasn’t been published. It exists as structured data inside one company’s platform. The child’s family will learn of the therapy in fourteen months, when it surfaces in a review article. By then the disease has progressed past the treatment window.

Imagine: 2035. A different child, the same symptoms.

She is six years old. Headaches that won’t go away. Unsteady on her feet. Her mother mentions it to the pediatrician.

The pediatrician enters the symptoms. The system surfaces a pattern from 847 confirmed diagnoses across twelve countries. A trail left by a neurologist in Melbourne: “When balance problems present with persistent headache in this age range, order imaging early. I missed one. Don’t repeat my mistake.”

The doctor reads the trail. She orders imaging. On the first visit, not the twelfth.

“We found something,” the doctor says. “But we caught it early. The prognosis is good.”

What Callimachus began in Alexandria, what every generation has carried forward, we can finally make durable. The stars have always been there. The constellation is ours to draw.

The light is borrowed. Pass it on.

the light arrives