Lux Protocol Kernel — Technical Specification

Abstract

We specify a protocol kernel for structured scientific knowledge. Three content-addressed primitives—points (findings), links (typed relationships), and trails (ordered reasoning paths)—are governed by an append-only signed event history. A deterministic replay algorithm ensures that any two replicas holding the same events produce identical materialized views without coordination. Cryptographic checkpoints computed via the RFC 9162 Merkle Tree Hash provide consistency verification and append-only auditing. All interpretation—trust, scoring, frontier detection—is delegated to replaceable, policy-scoped observers. The result is a compilation layer for knowledge: content-addressed like Git, append-only like Certificate Transparency, and interpretively open.

Introduction

Scientific knowledge suffers a transmission failure: findings exist but cannot reliably reach the people who need them, corrections do not propagate to the claims that depend on them, and the majority of experimental work is never recorded. In Constellations of Borrowed Light [constellations], we made the full case for why this failure demands a new kind of substrate—one structured at the level of individual findings, where corrections propagate, provenance is preserved, and no single entity controls the map. We named the structure a constellation and argued that it would do for knowledge what Git [git] did for code.

The companion essay observes that science has never had a compiler: a mechanism that takes scattered findings and produces a deterministic, queryable view. This paper specifies that compiler—the protocol kernel, the minimal deterministic layer that makes constellations possible. The kernel is responsible for content addressing, event ordering, view materialization, and cryptographic checkpointing. It is deliberately policy-free: it does not score, rank, trust, or resolve. Everything interpretive is delegated to observers that operate on the kernel’s materialized view under declared policies. This separation—deterministic kernel, replaceable observers—is the central architectural decision (Section 3).

Section 2 states the design requirements. Section 3 defines the kernel/observer boundary. Sections 4–5 specify the data model. Section 6 defines content addressing and deterministic replay. Section 7 specifies cryptographic checkpoints. Section 8 describes the replication protocol. Section 9 analyzes security properties. Section 10 defines the conformance contract. Section 11 discusses related work.

Design Requirements

The manifesto [constellations] identifies specific failures in how scientific knowledge is transmitted. Each failure implies a protocol requirement:

Content-addressed immutable objects. Findings must be individually addressable—not locked inside documents—so that corrections, challenges, and extensions can target specific claims. Immutability ensures that an ID always refers to the same content.
Append-only signed events. History must be auditable and corrections must be durable. When a finding is retracted, the retraction must propagate to everything that depends on it. Deletion destroys auditability; append-only history preserves it.
Deterministic replay. Independent replicas must converge to identical views without coordination. This is the property that makes the protocol open: anyone can verify the state by replaying the same events. No trusted third party is required.
Policy-free kernel. The protocol must not encode opinions about trust, quality, or truth. Different communities weight evidence differently. A protocol that encodes one community’s definition of trust cannot serve all communities. The kernel ensures that disputes cannot be hidden; it does not resolve them.
Cryptographic checkpoints. It must be impossible to silently rewrite history. A sequence of checkpoint roots provides a verifiable append-only log. Confidence laundering—certainty hardening at each transmission hop—requires an auditable record of what was actually claimed and when.
Open conformance. Anyone must be able to implement a conforming client. Byte-level test vectors, not prose descriptions, are the interoperability gate.

The Kernel / Observer Boundary

The central architectural decision: the kernel is deterministic, minimal, and policy-free. Observers are interpretive, replaceable, and policy-scoped. Observers never modify kernel state. This separation is what makes the protocol trustworthy without requiring agreement on what trust means.

The kernel owns: content addressing, event ingestion and signature verification, deterministic replay, view materialization, and checkpoint computation. Given the same event set, every conforming kernel produces the same view. The kernel does not know whether a finding is important, whether an experiment was well-designed, or whether a researcher is credible. It knows only whether events are structurally valid, whether signatures verify, and what the deterministic replay produces.

Observers consume the materialized view and produce certificates. Given the same view and policy, a conforming observer produces the same certificate. A certificate uses the standard envelope:

{
  "kind": "lux.certificate",
  "v": 1,
  "schema": "urn:lux:schema:v1:certificate",
  "body": {
    "policy_id": "urn:lux:policy:replication-weight:v1",
    "input_root": "sha256:9f86d0...",
    "inputs_sufficient": true,
    "result":  "score": 0.87, "label": "well-replicated" ,
    "explanation": "3 of 4 independent replications confirm effect."
  }
}

policy_id identifies which policy governed the computation. input_root is the checkpoint root of the view consumed, binding the certificate to a specific kernel state. inputs_sufficient declares whether the observer had enough data to produce a meaningful result. result is a structured output whose schema is policy-defined. explanation is a human-readable justification. Certificates are content-addressed like all other objects: the ID is the SHA-256 hash of the canonical envelope bytes.

A research institution may run an observer weighting replication. A pharmaceutical company may run one prioritizing clinical relevance. A frontier lab may surface contradictions. All operate on the same kernel view. The substrate is shared; the interpretation is not.

The kernel / observer boundary: kernel owns content addressing, events, replay, checkpoints; observers produce certificates from the materialized view

Objects

A constellation is the collection of content-addressed objects, signed events, and the materialized view they produce under a named checkpoint. The protocol defines three primitive object types. All objects are immutable and content-addressed: the ID is the SHA-256 hash of the object’s canonical bytes (Section 6). Objects use a uniform envelope:

{
  "kind": "lux.point",
  "v": 1,
  "schema": "urn:lux:schema:v1:point",
  "body":  ... 
}

The kind field identifies the object type. The v field is the envelope version. The schema field references the JSON Schema governing the body. The hash is computed over the RFC 8785 [jcs] canonical bytes of the full envelope. The envelope is extensible: additional object kinds may be defined provided they use this structure.

Versioning. The current envelope version is 1. When a future version changes the envelope structure (e.g. new required fields, changed canonicalization rules), the v field increments. A conforming v1 implementation must reject envelopes with v $> 1$ rather than silently misinterpret them: because the hash is computed over the full envelope, a v1 implementation cannot verify the content address of a v2 object without understanding v2 canonicalization. Implementations should hold unrecognized-version objects in a pending set (analogous to events with missing dependencies) so that a software upgrade can process them without re-fetching. This ensures that version transitions do not require flag days: v2 objects propagate through the replication layer and become processable when implementations upgrade.

The three primitives serve distinct and necessary roles. A finding without relationships is a fact in isolation; a relationship without a reasoning path is a connection without context; a reasoning path without discrete findings has nothing to connect. Together they form a navigable graph.

A note on entities. The manifesto [constellations] identifies entity resolution—determining that three different strings all refer to the same gene, the same drug, or the same patient population—as a problem where previous systems died. The kernel deliberately excludes entity resolution from its scope. The kernel stores points; it does not know whether two points refer to the same real-world referent. Entity resolution is an observer concern or a higher-layer concern: an observer may declare a policy that clusters points by entity, and a link of type lux.same_as may assert co-reference between two points, but the kernel itself maintains no entity index. This is a scope decision, not an oversight. A kernel that resolves entities is a kernel that encodes an ontology, and ontology agreement is precisely the failure mode the design avoids.

Point

A point is the atomic unit of knowledge: a single finding, individually addressable. A point is not a paper. One paper may yield many points. A point with no paper behind it—an observation, a negative result, a protocol note—is equally valid.

The body requires: statement (natural language) and language (ISO 639 code, e.g. "en"). Optional fields include qualifiers (structured metadata object) and titles (array of display titles). The schema permits additional properties, so domain-specific fields may be added without breaking conformance.

Points exist because the document is the wrong unit. A paper contains dozens of claims at varying confidence levels; a citation to the paper cannot distinguish between them. Content addressing at the finding level means corrections, challenges, and extensions can target exactly what they refer to.

Link

A link is a directed, typed relationship between two points. The body requires: src and dst (point IDs) and rel (a namespaced relationship type, e.g. lux.supports, lux.contradicts, lux.depends_on, lux.refines, lux.replicates). An optional attrs object carries relationship-specific metadata.

Links exist because findings in isolation are not knowledge. The graph of support, contradiction, and dependency between findings—the structure that specialists hold in their heads—must be explicit and machine-traversable. A point without links is a fact in isolation. A point with links is a fact in context.

Trail

A trail is an ordered sequence of points and links recording a path through the constellation. The body requires: trail_type (e.g. "navigation", "methodology"). Optional fields include steps (ordered array of step entries) and attachments (array of referenced artifacts).

Each step entry is an object with the following fields:


  "ref": "sha256:3a7f...",
  "ref_kind": "point",
  "annotation": "Starting observation: IC50 below threshold",
  "ordinal": 0

ref (required) is the content-addressed ID of a point or link. ref_kind (required) is "point" or "link", indicating which object type is referenced. ordinal (required) is a non-negative integer establishing the step’s position in the trail; steps are replayed in ordinal order, with EID tie-breaking for equal ordinals. annotation (optional) is free-text commentary on this step’s role in the trail.

Trails exist because knowledge is not only what is true but how it was found. Trails record methodology (making experiments replayable) and carry attribution (making provenance enforceable). The manifesto [constellations] argues that a trail left by a clinician who missed a diagnosis and wants to prevent others from repeating the mistake is a form of knowledge that structured data alone cannot replace.

Primitive objects and their relationships: Point A linked to Point B via a typed relationship, with a trail connecting them

Events

Objects are immutable. State changes are expressed through events: signed, append-only records. An event is a separate structure from the object it acts on:

{
  "kind": "lux.event",
  "v": 1,
  "schema": "urn:lux:schema:v1:event",
  "body": {
    "type": "lux.object.add",
    "refs":  "object_id": "sha256:3a7f..." ,
    "deps": [],
    "meta":  "ts": 1708905600 
  },
  "sig": 
    "alg": "ed25519",
    "public_key_b64u": "...",
    "sig_b64u": "..."
  
}

The event ID (EID) is computed over the unsigned event body: canonicalize the envelope with sig excluded, hash with SHA-256. The signature is then computed over the same canonical bytes and attached. This means the EID is stable regardless of which key signs the event.

Events declare dependencies in the deps array. An event that retracts a link depends on the event that introduced it. Dependencies create the partial order that the replay algorithm resolves (Section 6).

Event types

The minimal set for a working protocol:

object.add — Adds an object ID to replayable history. Objects may exist in storage before introduction; this event declares membership in the shared record. Duplicate additions of the same object ID are benign.

point.retract — Marks a point inactive in the kernel view. The object remains in history for auditability. Retraction is an event, not a deletion.

link.retract — Marks a link inactive in adjacency indexes. The link object remains in history. Both retraction types preserve the original for auditability.

point.supersede — Establishes a replacement relation: old → new. The new point must already be introduced. The old point’s inbound links are preserved. The chain of supersessions is the version history of a finding.

checkpoint.publish — Publishes a Merkle root committing to the accepted event set (Section 7).

Append-only event chain: E1 (object.add) → E2 (object.add) → E3 (point.retract) with dependency edges

Content Addressing and Deterministic Replay

Content addressing

Every object and event is identified by the hash of its canonical representation. The rules are non-negotiable—they are the interop-hard decisions that make independent implementations converge:

Canonicalization. All JSON is canonicalized with RFC 8785 (JCS) [jcs] before hashing or signing. This ensures independent implementations produce identical bytes.

Hashing. SHA-256 over canonical bytes. 32 raw bytes.

ID format. sha256: followed by the lowercase hex encoding of the 32-byte digest. Ordering comparisons are on raw digest bytes, not string representation.

Numeric safety. Kernel numbers are integers only, within IEEE-754 safe range (±2⁵³−1). No floating point in the kernel.

Worked example

Consider the following point object:

{
  "kind": "lux.point",
  "v": 1,
  "schema": "urn:lux:schema:v1:point",
  "body": 
    "statement": "Compound A inhibits enzyme B with IC50 = 340 nM",
    "language": "en"
  
}

RFC 8785 canonicalization produces (keys sorted, no whitespace):

{"body":"language":"en","statement":"Compound A inhibits enzyme B with IC50 = 340 nM","kind":"lux.point","schema":"urn:lux:schema:v1:point","v":1}

SHA-256 digest (hex): b3d228c052b3ce34cb5d12e6b92efb7633c1837729cee65763a9722172adb2de

ID string: sha256:b3d228c052b3ce34cb5d12e6b92efb7633c1837729cee65763a9722172adb2de

Any conforming implementation performing the same canonicalization and hashing on this object must produce this exact ID. The conformance suite (Section 10) provides test vectors for verification.

Deterministic replay

A constellation’s state is a pure function of its accepted event set. Two replicas holding the same events produce the same materialized view.

Define $\textAccepted(E, O)$ as the function of both the event set $E$ and the locally available object set $O$. An event is accepted when: (a) it is structurally valid (envelope, schema, types), (b) its signature verifies, (c) all objects it references are in $O$, and (d) all events in its deps array are in $E$.

Replay algorithm:

function replay(E) -> View:
  order = topo_sort(E)        // Kahn's algorithm
  V = empty_view()
  for e in order:
    match e.type:
      object.add       -> V.add(e.refs.object_id)
      point.retract    -> V.deactivate_point(e.refs.point_id)
      link.retract     -> V.deactivate_link(e.refs.link_id)
      point.supersede  -> V.supersede(e.refs.old_id, e.refs.new_id)
  return V

function topo_sort(E) -> [Event]:
  // Kahn's algorithm with deterministic tie-breaking
  in_degree = e: |e.deps intersect E| for e in E
  queue = min_heap(key = raw_digest_bytes(EID))
  for e in E where in_degree[e] == 0:
    queue.push(e)
  result = []
  while queue not empty:
    e = queue.pop_min()        // smallest EID bytes
    result.append(e)
    for e' in E where e in e'.deps:
      in_degree[e'] -= 1
      if in_degree[e'] == 0:
        queue.push(e')
  if |result| < |E|:
    reject cycle(E - result)   // SCC detection
  return result

The tie-breaking rule—lexicographic comparison of raw EID digest bytes—ensures that the ordering is fully deterministic. Two implementations processing the same events in any reception order produce the same replay sequence.

Materialized view

The kernel view $V$ after replay contains: (a) the set of introduced object IDs, partitioned by kind; (b) the active/inactive status of each point and link; (c) supersession chains (old → new); (d) adjacency indexes (point → outbound active links, point → inbound active links); (e) the accepted event set.

When an event’s dependencies are missing (the replica has not received them), the event is held pending until dependencies arrive. Convergence requires only eventual delivery of the same events, regardless of order.

Scale considerations

The replay algorithm’s topological sort is $O(E \log E)$ where $E$ is the number of accepted events. Full replay from an empty view remains practical for small to moderate constellations (tens of thousands of events). At larger scales—millions of events—full replay on every query becomes impractical. Implementations should support incremental replay: given an existing materialized view $V$ at checkpoint $C$ and a set of new events $\Delta E$, an implementation may compute the updated view by replaying only $\Delta E$ against $V$, provided the new events’ dependencies are all satisfied in the existing view. Incremental replay is an implementation optimization, not a protocol-level mechanism: the correctness criterion remains that the materialized view is identical to what full replay would produce. A conforming implementation may always fall back to full replay. The conformance suite (Section 10) tests full-replay correctness; incremental replay correctness is validated by comparing its output to full replay over the same event set.

Deterministic replay pipeline: Accepted Events → Topo Sort (Kahn) → δ Transition Function → Kernel View → Checkpoint Root

Checkpoints

A checkpoint is a cryptographic commitment to the accepted event set at a point in time. Checkpoint roots are computed using the Merkle Tree Hash (MTH) from RFC 9162 (Certificate Transparency v2) [ct], which defines leaf/node domain separation and proof algorithms.

Computation:

function checkpoint(E) -> IDStr:
  eids = [raw_digest_bytes(eid) for eid in accepted(E)]
  sort(eids, lexicographic)    // sort by raw bytes
  root = MTH(eids)             // RFC 9162 section 2.1
  return "sha256:" + hex_lower(root)

// RFC 9162 MTH construction:
MTH()       = SHA-256()
MTH(d)      = SHA-256(0x00 || d)
MTH(D)        = SHA-256(0x01 || MTH(D[:k]) || MTH(D[k:]))
                where k = largest power of 2 < |D|

Checkpoints enable: consistency verification (two replicas compare roots to confirm identical accepted sets), inclusion proofs (Merkle path proves a specific event is in the accepted set without revealing the full set), and append-only auditing (sequence of checkpoints provides verifiable log; removing an old event changes the root).

Checkpoint Merkle tree following RFC 9162: leaf hashing with 0x00 prefix, node hashing with 0x01 prefix

Replication

Replicas exchange objects and events by content address. There is no central server and no consensus mechanism. Convergence follows from deterministic replay: same events → same view.

The transfer protocol consists of three operations, modeled on Git protocol v2 [gitproto]:

ls-refs — A replica advertises its named references (checkpoint roots). The response is a list of (name, checkpoint_root) pairs.

fetch — The client sends the set of event IDs and object IDs it already holds. The server responds with the objects and events the client is missing. Because everything is content-addressed, “missing” is unambiguous: either you have the bytes that hash to that ID or you do not.

push — A replica sends new objects and events to another replica. The recipient verifies signatures and replays into its own kernel view.

Replication does not require trust between replicas. A replica accepts events from any source, verifies signatures, and replays deterministically. The kernel guarantees identical results regardless of reception order or originating replica. This property follows directly from deterministic replay (Section 6): the view is a function of the event set, not the path by which the events arrived.

Partial replication

A replica is not required to hold the complete event set. A replica may hold a subset of events—for example, all events referencing points in a specific domain, or all events signed by a specific set of keys. The kernel’s deterministic replay still applies: the materialized view is the function of whichever events the replica holds. Two partial replicas with different subsets will produce different views; two partial replicas with the same subset will produce identical views. Partial replication is an implementation decision, not a protocol-level feature: the kernel does not distinguish between a replica that has not yet received an event and one that has chosen not to request it.

Bandwidth optimization

The three-operation protocol described above is minimal. Implementations may optimize bandwidth through standard techniques: bloom filters or similar probabilistic structures in the fetch handshake to summarize held event IDs, delta compression of object payloads when the transport supports it, and batched responses that pack multiple objects into a single transfer. These optimizations are transport-level concerns and do not affect the kernel’s deterministic replay or checkpoint computation. The protocol requires only that the result of replication is the correct set of objects and events; it does not prescribe how they are transferred.

Divergent accepted sets

Two replicas may hold overlapping but non-identical accepted sets without conflict. Each replica’s view is deterministic with respect to its own accepted set. When replicas exchange events via fetch, their accepted sets converge. Full convergence requires only eventual delivery of the same events. There is no inconsistency to resolve: a replica that holds events A, B and a replica that holds events B, C are both correct—they simply have different views. After exchanging missing events, both hold A, B, C and produce identical views. This is weaker than consensus and stronger than eventual consistency: no coordination is needed, and convergence is guaranteed once the event sets match.

Security Analysis

The protocol provides specific guarantees and explicitly declines others.

Guarantees

Integrity. Every object and event is identified by the SHA-256 hash of its canonical bytes. Any modification—to content, ordering of fields, or encoding—changes the hash and is immediately detectable.

Authentication. Events are signed with Ed25519. Forging an event requires the signer’s private key. The EID is computed over the unsigned event body, so the identity of an event is stable, but its authenticity is bound to a specific key.

Append-only auditing. Checkpoint roots are computed via the RFC 9162 Merkle Tree Hash. Removing an event from the accepted set changes the root. Any replica holding a previous checkpoint root can detect the removal. A sequence of checkpoint roots provides a verifiable append-only log.

Convergence. The replay algorithm is a deterministic function of the accepted event set. Two replicas holding identical event sets produce identical materialized views, regardless of the order in which events were received. This follows from the deterministic topological sort with tie-breaking on raw EID digest bytes (Section 6). Unlike consensus-based systems such as Bitcoin [bitcoin], which achieve convergence through computational proof, Lux achieves convergence through deterministic replay: the same events always produce the same view, with no probabilistic finality.

Equivocation detection. If a signer publishes two different checkpoints for the same ref, any replica holding both can detect the conflict. The protocol does not prevent equivocation, but it makes equivocation visible and auditable.

Non-guarantees

The kernel does not guarantee truth, quality, or trustworthiness. It does not prevent a signer from publishing false claims. It does not assess whether evidence supports a finding. It does not resolve disagreements. These are observer concerns, by design (Section 3).

This is deliberate. A protocol that encodes trust is a protocol that encodes one community’s definition of trust. The kernel serves all communities by serving none. It guarantees that the record is complete, ordered, and auditable. It does not guarantee that the record is wise.

Conformance

A conforming implementation must reproduce identical outputs for all published test vectors. There is no partial conformance. The vectors cover:

Canonical bytes: RFC 8785 canonicalization of test JSON inputs.
Object and event IDs: SHA-256 digest and ID string from canonical bytes.
Signature verification: Ed25519 validation over canonical unsigned event bytes.
Topological ordering: Kahn’s algorithm with deterministic tie-breaking, including missing-dependency hold and cycle-SCC rejection.
Checkpoint roots: RFC 9162 MTH over test EID sets.
Materialized view: active points, active links, supersession chains, and adjacency indexes after replay of test event sequences.

A conforming implementation may use any language, any storage backend, and any set of observers (or none). It may extend the object schema with additional kinds provided they use the standard envelope. It must not use floating-point numbers in kernel computations, produce results different from reference vectors, or allow observer outputs to modify kernel state.

The idea of structuring scientific knowledge as a graph is not new. What distinguishes this protocol is the combination of content addressing, append-only history, deterministic replay, and the separation of kernel from policy.

Nanopublications [nanopub] decompose scientific assertions into structured claims with provenance and share the goal of operating at the finding level rather than the document level. However, nanopublications do not provide deterministic replay, append-only event history, or cryptographic checkpoints. There is no mechanism for convergence between independent replicas, and correction propagation is not a protocol-level guarantee.

The Semantic Web [semweb] proposed structuring all web content as machine-readable linked data. The vision was correct; the adoption model was not. The Semantic Web required ontology agreement before use—and the agreement never came. Lux requires only that objects use a uniform envelope; body schemas are extensible and community-defined. The manifesto [constellations] observes: the difference between “agree on a schema” and “extract what’s already there” is the difference between the Semantic Web and Google.

FAIR principles [fair] address the findability, accessibility, interoperability, and reusability of scientific artifacts—papers, datasets, code. FAIR does not address individual findings, does not provide event-driven lifecycle management, and does not specify a convergence mechanism.

Document-level systems such as Semantic Scholar, OpenAlex, and Wikidata index papers, authors, institutions, and concepts. Each is valuable. None operates at the level of individual findings with evidence, lineage, and correction propagation. They are platforms built on the document as the unit; this protocol operates on the finding.

Git [git] provides the closest architectural precedent: content-addressed objects, append-only history, deterministic reconstruction, and refs pointing to named states. Lux follows the same architecture for a different domain. The key differences are the object model (points, links, and trails rather than blobs, trees, and commits) and the policy-free observer layer, which has no equivalent in Git.

Conclusion

We have specified a protocol kernel for structuring knowledge as content-addressed objects governed by an append-only event history with deterministic replay. The protocol separates a minimal, deterministic kernel from replaceable observers, ensuring that the substrate is shared while interpretation remains community-specific. Convergence is achieved without global consensus: any two replicas holding the same events produce the same view. Conformance is enforced by byte-level test vectors.

The protocol is deliberately small. Git’s core is a content-addressable filesystem with four object types. Everything built on it—GitHub, CI/CD, code review, Copilot—depended on that foundation being exactly right. This protocol follows the same architecture. The kernel must be boring and correct; nothing interesting should happen here.

The kernel is designed to support layers above it—attestation [dsse,slsa], distribution, agent integration—without modification. The manifesto [constellations] described what must be built and why. This paper describes the foundation: the smallest set of rules that makes independent implementations converge, so that everything above it can be built by different people, for different purposes, on shared ground.

—

References

[constellations] Borrowed Light Collective, “Constellations of Borrowed Light,” borrowedlight.org, January 2026.

[git] L. Torvalds, “Git: Fast Version Control System,” git-scm.com, 2005.

[jcs] A. Rundgren, B. Jordan, S. Erdtman, “JSON Canonicalization Scheme (JCS),” RFC 8785, IETF, June 2020.

[ct] B. Laurie, E. Messeri, R. Stradling, “Certificate Transparency Version 2.0,” RFC 9162, IETF, December 2021.

[gitproto] J. Hamano, “Git Wire Protocol, Version 2,” git-protocol-v2(5), 2018.

[dsse] Secure Systems Lab, “Dead Simple Signing Envelope (DSSE),” github.com/secure-systems-lab/dsse, 2021.

[slsa] Supply-chain Levels for Software Artifacts, “SLSA Provenance v1.0,” slsa.dev, 2023.

[bitcoin] S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System,” bitcoin.org, 2008.

[nanopub] P. Groth, A. Gibson, J. Velterop, “The anatomy of a nanopublication,” Information Services & Use 30, 51–56, 2010.

[fair] M. D. Wilkinson et al., “The FAIR Guiding Principles for scientific data management and stewardship,” Scientific Data 3, 160018, 2016.

[semweb] T. Berners-Lee, J. Hendler, O. Lassila, “The Semantic Web,” Scientific American 284(5), 34–43, 2001.