Knowledge System

Overview

NODYN’s knowledge system persists information across sessions using local file storage. Knowledge is automatically populated by analyzing agent responses and can be manually managed via tools and CLI commands.

Storage

Context-Scoped Storage

Knowledge is stored as plain text files, scoped by context:

~/.nodyn/memory/
  <contextId>/          # SHA-256 hash of project root (CLI) or explicit ID
    knowledge.txt       # knowledge namespace
    methods.txt         # methods namespace
    project-state.txt   # project-state namespace
    learnings.txt       # learnings namespace
  global/               # Fallback when no contextId
    knowledge.txt
    methods.txt
    project-state.txt
    learnings.txt
  user-<userId>/        # User-specific preferences (when NODYN_USER set)
    knowledge.txt
    ...

The base directory is ~/.nodyn/memory/. Context ID is generated by resolveContext() — for CLI it uses sha256Short(projectRoot), for other sources (Telegram, Slack, PWA) it uses the explicit context ID.

Load order: Context-scoped knowledge is loaded first. If no context is detected, falls back to global/.

Docker Persistence

Knowledge lives at /home/nodyn/.nodyn/memory/ inside the container, covered by the ~/.nodyn volume mount:

docker run -it --rm \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -v ~/.nodyn:/home/nodyn/.nodyn \
  nodyn

Namespaces

Knowledge is organized into 4 namespaces:

Namespace	Purpose	Examples
`knowledge`	Key facts, user preferences	”User prefers TypeScript”, “Project uses ESM”
`methods`	Patterns and techniques	”Use Promise.allSettled for parallel ops”
`project-state`	Ongoing project state	”Currently refactoring auth module”
`learnings`	Mistakes and lessons	”Avoid using any — use unknown instead”

Auto-Extraction

After every completed agent turn, NODYN automatically extracts relevant information using the fast model:

Agent response → maybeUpdate() → extraction → append to namespaces

This is fire-and-forget: it runs asynchronously and never blocks the response. Extraction failures are silently ignored.

The extraction prompt categorizes information into the 4 namespaces. Only namespaces with relevant content are updated. Responses shorter than 50 characters are skipped.

Knowledge in System Prompt

Knowledge injection uses the Knowledge Graph:

KnowledgeLayer.retrieve() → HyDE + vector + graph expansion + MMR → <relevant_context> block with entity subgraph

Knowledge Graph Retrieval

<relevant_context>
<scope type="user">
[knowledge] (92%)
User prefers TypeScript over JavaScript.
</scope>
<scope type="context">
[knowledge] (85%)
Project uses PostgreSQL 16 for JSONB queries.

[methods] (78%)
Use Promise.allSettled for parallel operations.
</scope>
<knowledge_graph>
Entities: Thomas Weber (person, 5 mentions), acme-shop.ch (organization, 3 mentions)
</knowledge_graph>
</relevant_context>

The <relevant_context> block has cache_control: { type: 'ephemeral' } for efficient prompt caching.

Knowledge Graph

Architecture

The Knowledge Graph is an embedded property graph (LadybugDB, Kuzu fork) that stores memories, entities, and their relationships. It provides entity-aware, graph-augmented retrieval.

~/.nodyn/knowledge-graph/    # LadybugDB embedded database

Graph Schema:

Entity nodes: persons, organizations, projects, products, concepts, locations — with canonical_name, aliases[], entity_type, embedding, mention_count
Memory nodes: knowledge entries with text, namespace, scope, embedding, is_active, superseded_by
Community nodes: clusters of related entities (future use)
MENTIONS edges: Memory → Entity (which entities a memory references)
RELATES_TO edges: Entity → Entity (typed: works_for, owns, uses, etc.)
SUPERSEDES edges: Memory → Memory (contradiction resolution)
COOCCURS edges: Entity → Entity (co-occurrence frequency)

Retrieval Pipeline

User query
  │
  ├─ 1. HyDE (optional, Haiku ~$0.001)
  │     Generate hypothetical answer → embed for better semantic match
  │
  ├─ 2. Multi-signal search (parallel)
  │     ├─ Vector search (ANN, top-50) ─── 55% weight
  │     ├─ Full-text search (keywords) ─── 30% weight
  │     └─ Graph expansion ─────────────── 15% boost
  │           Query entities → resolve → 1-2 hop → connected memories
  │
  ├─ 3. Scoring: similarity × scope_weight × namespace_decay
  │     knowledge: 365d half-life, project-state: 21d half-life
  │
  ├─ 4. MMR re-ranking (λ=0.7 relevance, 0.3 diversity)
  │
  └─ 5. Context formatting with entity subgraph

Store Pipeline

memory_store / maybeUpdate()
  │
  ├─ 1. Embed text (multilingual-e5-small, 384d)
  ├─ 2. Dedup check (cosine > 0.90 → skip)
  ├─ 3. Contradiction detection (knowledge/learnings only)
  │     Vector search > 0.80 → heuristic: negation, number, state change
  │     Contradicted memory → is_active=false, SUPERSEDES edge
  ├─ 4. Create Memory node in graph
  ├─ 5. Entity extraction (regex DE/EN, optional Haiku)
  ├─ 6. Entity resolution (canonical name → alias → create)
  ├─ 7. Create MENTIONS + RELATES_TO + COOCCURS edges
  └─ 8. Parallel: append to flat-file (dual-write for debugging)

Entity Extraction

Two-tier approach:

Tier 1 — Regex (always, zero cost): Persons (Herr/Frau/Mr. + Name, client/Kunde + Name), Organizations (domain names, Firma/company + Name), Technology (uses/nutzt + Term), Projects (project "Name", org/repo), Locations (in/aus + Place)
Tier 2 — Haiku (~$0.001, optional): Only for knowledge/methods namespace, text > 200 chars, 0 regex entities found. Also extracts relations between entities

Entity Resolution

Priority: exact canonical match (case-insensitive) → alias match → normalized match → create new entity. Aliases accumulate: “Thomas”, “Herr Weber”, “the client from Bern” all resolve to the same entity.

Contradiction Detection

Only for knowledge and learnings namespaces. Finds memories with >0.80 cosine similarity, then applies heuristic checks:

Negation: “uses X” vs “doesn’t use X” / “nicht mehr”
Number change: “budget is 5000” vs “budget is 8000”
State change: “project is active” vs “project is completed”

Contradicted memories: is_active=false, SUPERSEDES edge created. Old memory stays in graph as audit trail but is excluded from retrieval.

Embedding Providers

OnnxProvider (default) — @huggingface/transformers WASM runtime. Default model: multilingual-e5-small (384d, 100 languages, ~118MB). Configurable via embedding_model config. Lazy-loads pipeline on first call (~800ms cold start). Auto-downloads model to ~/.cache/huggingface/.
VoyageProvider — HTTP via Voyage AI (1024 dims). Requires voyage_api_key.
LocalProvider — Hash-based deterministic (384 dims). Test-only.

Available ONNX models (embedding_model config):

Model ID	Dimensions	Size	Languages	Use Case
`multilingual-e5-small` (default)	384	~118MB	100	Best balance: multilingual + fast
`all-minilm-l6-v2`	384	~23MB	English	Fastest cold start, English-only
`bge-m3`	1024	~570MB	100+	Highest quality, slowest start

CLI Commands

/knowledge list      # List stored embeddings
/knowledge prune     # Remove stale or duplicate entries

Knowledge GC

GC runs automatically every 50 runs:

Graph GC (runGraphGc()): Deletes superseded memories, removes orphan entities (not referenced by any active memory)
CLI: /memory gc [dry] — dry run previews changes without applying

DataStore ↔ Knowledge Graph Bridge

Structured data in DataStore tables is automatically linked to the Knowledge Graph:

On collection create: Table registered as Entity (type: collection) in graph
On record insert: String fields scanned for entities via regex → has_data_in relationships created
On retrieval: When an entity is found in the graph, related DataStore collections are included as hints in the context (e.g., “Thomas has data in: customers (revenue: 5000)”)
Proactive discovery: Agent suggests creating tables when it notices recurring structured data during collaboration

data_store_insert("customers", [{name: "Thomas", company: "acme-shop.ch"}])
  → Entity "Thomas" (person) → has_data_in → "customers" (collection)
  → Entity "acme-shop.ch" (org) → has_data_in → "customers" (collection)

Tools

Memory Tools

All memory tools sync with the Knowledge Graph when enabled:

memory_store: Stores content → entities extracted → graph write → flat-file dual-write
memory_recall: Reads from flat-file (graph retrieval happens per-turn via system prompt)
memory_update: Updates flat-file text → updates graph Memory node text → re-extracts entities
memory_delete: Removes from flat-file → deactivates matching Memory nodes in graph
memory_list: Lists flat-file entries by scope/namespace
memory_promote: Copies to broader scope (publishes for graph store) → deactivates source in graph

DataStore Tools

data_store_create: Set up a table with typed columns. Registers collection in graph
data_store_insert: Insert/upsert records. Entities from string fields indexed in graph
data_store_query: Filter, sort, aggregate (sum/avg/count/min/max)
data_store_delete: Remove records matching a filter. Requires filter — no bulk delete
data_store_list: Browse tables and schemas

All tools are available to the agent and sub-agents.

CLI Command

/memory             # Show all knowledge
/memory knowledge   # Show only the knowledge namespace
/memory methods     # Show only the methods namespace

Disabling Knowledge

Pass memory: false in EngineConfig:

const engine = new Engine({ memory: false });

Implementation Details

Knowledge Graph (primary path)

KnowledgeLayer (src/core/knowledge-layer.ts) — implements IKnowledgeLayer from src/types/index.ts. Composes KuzuGraph + EntityExtractor + EntityResolver + ContradictionDetector + RetrievalEngine
KuzuGraph (src/core/knowledge-graph.ts) — LadybugDB wrapper. DB at ~/.nodyn/knowledge-graph/. Schema: Entity/Memory/Community nodes, MENTIONS/RELATES_TO/SUPERSEDES/COOCCURS edges
RetrievalEngine (src/core/retrieval-engine.ts) — HyDE + vector + graph expansion + MMR. formatContext() produces XML output
EntityExtractor (src/core/entity-extractor.ts) — Tier 1 regex (DE/EN), Tier 2 optional Haiku
EntityResolver (src/core/entity-resolver.ts) — Canonical name resolution, alias merge
ContradictionDetector (src/core/contradiction-detector.ts) — Heuristic contradiction checks for knowledge/learnings
Wiring: Engine initializes KnowledgeLayer at startup, routes knowledge retrieval and memoryStore channel through it

Flat-file storage (dual-write)

Class: Memory (src/core/memory.ts)
Interface: IMemory (src/types/index.ts) — includes hasContent() check
Extraction model: Fast tier via beta messages API
Cache: Unified Map<string, string> keyed by ${scopeType}:${scopeId}:${namespace}
Scope delegation: Base CRUD methods delegate to scoped variants via _defaultScope()
Namespaces: ALL_NAMESPACES constant from src/types/index.ts

Shared infrastructure

Embedding providers: src/core/embedding.ts — OnnxProvider (model registry: multilingual-e5-small default), VoyageProvider, LocalProvider
Embedding queue: Engine uses bounded concurrency (max 3 parallel) for stores, with failures logged
Observability: nodyn:memory:store (every write), nodyn:knowledge:graph (graph events), nodyn:knowledge:entity (entity events)
Docker: HF model cache persisted via nodyn-hf-cache volume at /home/nodyn/.cache/huggingface