Knowledge Graphs for Cultural Infrastructure

Problem

Cultural knowledge is rarely simple. A craft technique can belong to a person, a place, a school, a material tradition, a set of tools, a historical period and a living practice. A musical work can be a release, a performance, a file, a memory object, a set of production decisions and evidence of a wider research question. A design archive can contain final images, discarded sketches, rights information, software versions, references and oral context.

Conventional websites flatten this complexity. They publish pages, categories and tags, but the relations between things remain vague. A tag can say that two records both concern “audio” or “archives”, but it cannot explain that one document cites another, that one project applies a concept, that one program powers a platform, or that one object is evidence for a claim.

The result is familiar: cultural platforms become attractive but shallow catalogues. Search engines can find individual pages, but they cannot understand the system. Human readers can browse, but they cannot easily follow lineage, provenance, method or influence. AI retrieval systems may extract isolated paragraphs without knowing where a claim comes from or which internal records should be read next.

A knowledge graph addresses this problem by making entities and relationships first-class editorial objects.

Introduction

A knowledge graph is a structured network of entities and typed relationships. In practice, that means a person, project, publication, concept, tool, artwork, archive record or organization receives a stable identity, then the site records meaningful relations between those identities.

The important word is “meaningful”. A knowledge graph is not valuable because it draws many lines. It is valuable when those lines have predicates that can be interpreted: documents, appliesConcept, usesTechnology, poweredBy, derivedFrom, publishedBy, evidencedBy. Those predicates let readers and machines distinguish a citation from a dependency, a case study from an influence, and a component from a source.

For Electronic Artefacts, this matters because the site is not only a portfolio. It contains software programs, artistic projects, research fields, publications, concepts, archives and commercial work. A linear blog would separate those objects into posts. A knowledge graph lets each new publication strengthen existing entities.

Context

The Web has always been a linking system, but not every link is a semantic relation. A link inside an essay may be navigational, rhetorical, evidential or decorative. A typed relation is different: it states what connects two records.

RDF formalizes graph statements as subject-predicate-object triples. That model is useful because it separates the thing being described, the relationship and the target. CIDOC CRM is useful for cultural heritage because it shows how cultural data needs shared conceptual structure across people, events, objects, places and documents. W3C PROV is useful because it treats provenance as information about entities, activities and people involved in producing a thing.

Electronic Artefacts does not need to reproduce every museum or semantic-web standard internally. The strategic lesson is simpler: cultural infrastructure needs stable identity, typed relation, provenance and review.

History

The idea of connected knowledge has a long pre-web history. Libraries, catalogues, citation indexes, museums and archives have always tried to connect works to authors, subjects, dates, places and sources. The Web made linking public and cheap, but early web pages usually linked documents rather than modeling the things inside them.

Semantic Web research made a stronger claim: the Web could describe resources in machine-readable ways. Linked Data practices then focused on stable identifiers and connections between datasets. Cultural heritage communities developed models such as CIDOC CRM because museum and archive records could not be integrated reliably through shallow metadata alone.

Knowledge graphs entered product and search vocabulary later, but the underlying issue is older: how can a system preserve meaning across changing records, institutions, technologies and readers?

Core concepts

The first concept is entity identity. A record needs to know whether it describes a concept, a project, a person, a publication or an artefact. It also needs a stable identifier that should not change when the title, page design or summary changes.

The second concept is relationship semantics. “Related” is rarely enough. If a publication explains a concept, the relation should say defines or documents. If a project uses a runtime, it should say poweredBy. If a concept influenced a framework, it should say influencedBy.

The third concept is provenance. Cultural knowledge depends on context. A graph should record sources, authors, publishers, dates, confidence and evidence, because those elements help readers evaluate trust.

The fourth concept is public projection. A private database does not become a public knowledge graph until it produces readable pages, search documents, JSON-LD, sitemaps, canonical URLs and exports that other systems can consume.

Architecture

A practical cultural knowledge graph needs a few layers.

The source layer stores canonical records. In this repository, those records live in Markdown frontmatter and body content under content/. The relation layer stores typed statements in YAML. The validation layer checks IDs, allowed predicates, public visibility and required fields. The rendering layer generates public HTML, identifier routes, JSON-LD, graph neighborhoods and search documents.

This architecture is deliberately static. It does not require a server-side database to publish useful graph knowledge. The build step becomes the authority: it validates the corpus, renders complete pages and writes machine-readable outputs.

That choice matters for cultural publishing because static output is durable. A future reader, crawler or archive can inspect the generated page without needing a running application server.

Implementation

The implementation pattern is straightforward:

Define entity types.
Give each record a canonical ID and route.
Require publication metadata.
Keep body content substantive enough to stand alone.
Store relations as typed statements.
Generate pages and JSON-LD from the same source.
Build search documents from titles, abstracts, bodies, tags and relations.
Reject broken references during validation.

The Electronic Artefacts graph already applies this pattern to VASTE, Runtime Theory, Graph Runtime and Vestiges. This article expands the same model into a broader knowledge hub.

Practical applications

For Vestiges, a knowledge graph can connect people, techniques, materials, works, documents and institutions. A reader could start from a craft material, follow it to a technique, then to a practitioner, then to a publication, then to a provenance record.

For Palimpsests, a graph can connect tracks, visual references, audio analysis, memory concepts, signal archaeology and publication context.

For VASTE, a graph can document runtime concepts and show where they are applied in projects.

For the public site, a graph improves SEO because every article becomes a route into related canonical records. It improves AI discoverability because sources, identifiers and relations are explicit.

Tools

Useful tools and standards include RDF for graph thinking, JSON-LD for structured data, Schema.org for web metadata, PROV for provenance, CIDOC CRM for cultural heritage modeling, static site generation for durable output, and search indexes built from structured records.

Evidence

The current Electronic Artefacts build already generates canonical entity pages, identifier routes, JSON-LD files, search documents, a sitemap and local graph neighborhoods. That means the knowledge graph is not theoretical. It is the publishing substrate of the site.

Vestiges is the strongest applied case because it needs to model living cultural knowledge without flattening people, materials and techniques into isolated cards.

Related concepts

Read the records for Knowledge Graph, Entity Identity, Provenance, Linked Data and Graph Runtime.

Glossary

Entity: a thing with identity in the graph.

Predicate: the named relationship between two entities.

Provenance: information about origin, authorship, activity and transformation.

Canonical URL: the preferred public URL for a record.

Identifier route: a stable machine-facing route for an entity.

Limitations

Knowledge graphs can become over-modeled. Not every noun deserves an entity. Not every association deserves a relation. Editorial discipline is required to prevent duplication, weak predicates and noise.

The graph should also avoid false certainty. Cultural knowledge often contains uncertainty, contested interpretation and incomplete provenance. A good graph records confidence rather than hiding it.

References

W3C. RDF 1.1 Concepts and Abstract Syntax. 2014.
CIDOC CRM Special Interest Group. The CIDOC Conceptual Reference Model.
W3C. PROV-Overview. 2013.
Electronic Artefacts. VASTE, Runtime Theory, Graph Runtime and Vestiges records.

Knowledge Graphs for Cultural Infrastructure

Problem

Introduction

Context

History

Core concepts

Architecture

Implementation

Practical applications

Tools

Evidence

Related concepts

Suggested reading

Related articles

Glossary

Limitations

References

Record metadata

How to cite this record

How this entity connects.

Knowledge Graph

Entity Identity

Provenance

Vestiges

RDF

Knowledge Hub Foundations

6 typed connections