Open Knowledge Format (OKF v0.1)

If you would ask an AI agent a question, to get a useful response from it, the agent would need to retrieve the information and its context across many locations, like third-party wikis, source code comments, random notes on different machines, etc.

LLMs are expensive and energy-hungry, so it makes sense to limit the amount of location the agent needs to search. The idea is that agents build a centralized wiki by themselves over time, by collecting and updating the content within it. While we humans would be responsible for curating the data, the LLM would be responsible for keeping it up-to-date, especially checking cross-references and updating information across multiple locations, a task that is usually difficult for humans to do. Consequently, the collected data should not only be easy to work with for agents, but humans alike.

LLMs don’t get bored, don’t forget to update a cross-reference, and can touch 15 files in one pass

— Andrej Karpathy
LLM Wiki

The Open Knowledge Format (OKF) attempts to standardize the form of the data, so that different AI agents and institutions can work with it.

Additionally, the main design ideas are that

Everyone can produce and consume the data
It is portable across systems
It can be managed using source control such as git and
it is readable by humans and agents alike.

The Details

Example of an OKF Bundle

sales/
├── index.md
├── datasets/
│   ├── index.md
│   └── orders_db.md
├── tables/
│   ├── index.md
│   ├── orders.md
│   └── customers.md
└── metrics/
│   ├── index.md
│   └── weekly_active_users.md

A centralized location is called an OKF bundle and consists of directories with Markdown files with YAML frontmatter and a set of agreed-upon conventions. Each Markdown file represents a concept, which can be an idea, a table, or anything else. The path to the file (without the .md suffix) is the identity of the concept (e.g. datasets/orders_db).

Each concept document has YAML frontmatter (delimited by a block of ---) for structured fields at the top the file and a Markdown body for everything else.

Concepts can be linked to each other using the Markdown link syntax (e.g. [<name>](<path>). It is recommended to use absolute (bundle-relative) paths, so that concepts can be moved to different subdirectories. Markdown doesn’t offer any syntax to type a link (parent/child, depends-on, etc.). The type must be set by the surrounding prose.

A link that points to a non-existent concept should not be considered broken. It rather means that the concept pointed at still needs to be written.

The index.md files are optional for progressive disclosure. They give the reader or the agent an overview of the available content, before opening individual files. index.md files do not contain any frontmatter. Sections are used to group concepts and each entry should include the description value from their concept’s frontmatter.

Example of a index.md file

# Section / Group Heading

* [Title 1](relative-url-1) - short description of item 1
* [Title 2](relative-url-2) - short description of item 2

# Another Section

* [Subdirectory](subdir/) - short description of the subdirectory

The log.md file is for chronological historical changes. It is a flat hierarchy of changes with the newest one first. Headings must use the ISO-8601 format (YYYY-MM-DD), while entries can be prose.

Example of a log.md file

# Directory Update Log

## 2026-05-22
* **Update**: Added new BigQuery table reference for [Customer Metrics](/tables/customer-metrics.md).
* **Creation**: Established the [Dataplex Playbook](/playbooks/dataplex.md).

## 2026-05-15
* **Initialization**: Created foundational directory structure.
* **Update**: Added progressive-disclosure guidelines to the root [index](/index.md).

Example of a Concept Document

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---

# Schema

| Column        | Type      | Description                              |
|---------------|-----------|------------------------------------------|
| `order_id`    | STRING    | Globally unique order identifier.        |
| `customer_id` | STRING    | FK to [customers](/tables/customers.md). |

# Joins

Joined with [customers](/tables/customers.md) on `customer_id`.

Frontmatter

type (mandatory): The type of concept used for routing, filtering, and presentation. Producers can choose any type they want, while consumers should be able to handle unknown types.
title This is the display name. Otherwise, consumers would need to derive it from either the file name or the Markdown # Header.
description: A one-sentence summary, which can be used by index.md generators.
resource: URI for the underlying asset. This can be omitted, if an abstract idea is described.
tags: A comma separated list of strings e.g. [projectX, note]
timestamp: The last modified time in ISO-8601 format e.g 2026-05-28T00:00:00Z

Body

The contents is written in Markdown.

Some headings should be used if appropriate:

# Schema: Structured description of an asset’s columns/fields.
# Examples: Concrete usage example, often as fenced code blocks.
# Citations: External sources backing claims in the body.

Links are connections between concepts, while citations are links from concepts to an external source.

Example of citations

# Citations

[1] [BigQuery public dataset announcement](https://cloud.google.com/blog/products/data-analytics/...)
[2] [Internal data quality runbook](https://wiki.acme.internal/data/quality)

Open Knowledge Format (OKF v0.1)

The Details

Frontmatter

Body

References