Skip to content

Knowledge System

NODYN’s knowledge system persists information across sessions using local file storage. Knowledge is automatically populated by analyzing agent responses and can be manually managed via tools and CLI commands.

Knowledge is stored as plain text files, scoped by context:

~/.nodyn/memory/
<contextId>/ # SHA-256 hash of project root (CLI) or explicit ID
knowledge.txt # knowledge namespace
methods.txt # methods namespace
project-state.txt # project-state namespace
learnings.txt # learnings namespace
global/ # Fallback when no contextId
knowledge.txt
methods.txt
project-state.txt
learnings.txt
user-<userId>/ # User-specific preferences (when NODYN_USER set)
knowledge.txt
...

The base directory is ~/.nodyn/memory/. Context ID is generated by resolveContext() — for CLI it uses sha256Short(projectRoot), for other sources (Telegram, Slack, PWA) it uses the explicit context ID.

Load order: Context-scoped knowledge is loaded first. If no context is detected, falls back to global/.

Knowledge lives at /home/nodyn/.nodyn/memory/ inside the container, covered by the ~/.nodyn volume mount:

Terminal window
docker run -it --rm \
-e ANTHROPIC_API_KEY=sk-ant-... \
-v ~/.nodyn:/home/nodyn/.nodyn \
nodyn

Knowledge is organized into 4 namespaces:

NamespacePurposeExamples
knowledgeKey facts, user preferences”User prefers TypeScript”, “Project uses ESM”
methodsPatterns and techniques”Use Promise.allSettled for parallel ops”
project-stateOngoing project state”Currently refactoring auth module”
learningsMistakes and lessons”Avoid using any — use unknown instead”

After every completed agent turn, NODYN automatically extracts relevant information using the fast model:

Agent response → maybeUpdate() → extraction → append to namespaces

This is fire-and-forget: it runs asynchronously and never blocks the response. Extraction failures are silently ignored.

The extraction prompt categorizes information into the 4 namespaces. Only namespaces with relevant content are updated. Responses shorter than 50 characters are skipped.

Knowledge injection uses the Knowledge Graph:

KnowledgeLayer.retrieve() → HyDE + vector + graph expansion + MMR → <relevant_context> block with entity subgraph

<relevant_context>
<scope type="user">
[knowledge] (92%)
User prefers TypeScript over JavaScript.
</scope>
<scope type="context">
[knowledge] (85%)
Project uses PostgreSQL 16 for JSONB queries.
[methods] (78%)
Use Promise.allSettled for parallel operations.
</scope>
<knowledge_graph>
Entities: Thomas Weber (person, 5 mentions), acme-shop.ch (organization, 3 mentions)
</knowledge_graph>
</relevant_context>

The <relevant_context> block has cache_control: { type: 'ephemeral' } for efficient prompt caching.

The Knowledge Graph is an embedded property graph (LadybugDB, Kuzu fork) that stores memories, entities, and their relationships. It provides entity-aware, graph-augmented retrieval.

~/.nodyn/knowledge-graph/ # LadybugDB embedded database

Graph Schema:

  • Entity nodes: persons, organizations, projects, products, concepts, locations — with canonical_name, aliases[], entity_type, embedding, mention_count
  • Memory nodes: knowledge entries with text, namespace, scope, embedding, is_active, superseded_by
  • Community nodes: clusters of related entities (future use)
  • MENTIONS edges: Memory → Entity (which entities a memory references)
  • RELATES_TO edges: Entity → Entity (typed: works_for, owns, uses, etc.)
  • SUPERSEDES edges: Memory → Memory (contradiction resolution)
  • COOCCURS edges: Entity → Entity (co-occurrence frequency)
User query
├─ 1. HyDE (optional, Haiku ~$0.001)
│ Generate hypothetical answer → embed for better semantic match
├─ 2. Multi-signal search (parallel)
│ ├─ Vector search (ANN, top-50) ─── 55% weight
│ ├─ Full-text search (keywords) ─── 30% weight
│ └─ Graph expansion ─────────────── 15% boost
│ Query entities → resolve → 1-2 hop → connected memories
├─ 3. Scoring: similarity × scope_weight × namespace_decay
│ knowledge: 365d half-life, project-state: 21d half-life
├─ 4. MMR re-ranking (λ=0.7 relevance, 0.3 diversity)
└─ 5. Context formatting with entity subgraph
memory_store / maybeUpdate()
├─ 1. Embed text (multilingual-e5-small, 384d)
├─ 2. Dedup check (cosine > 0.90 → skip)
├─ 3. Contradiction detection (knowledge/learnings only)
│ Vector search > 0.80 → heuristic: negation, number, state change
│ Contradicted memory → is_active=false, SUPERSEDES edge
├─ 4. Create Memory node in graph
├─ 5. Entity extraction (regex DE/EN, optional Haiku)
├─ 6. Entity resolution (canonical name → alias → create)
├─ 7. Create MENTIONS + RELATES_TO + COOCCURS edges
└─ 8. Parallel: append to flat-file (dual-write for debugging)

Two-tier approach:

  • Tier 1 — Regex (always, zero cost): Persons (Herr/Frau/Mr. + Name, client/Kunde + Name), Organizations (domain names, Firma/company + Name), Technology (uses/nutzt + Term), Projects (project "Name", org/repo), Locations (in/aus + Place)
  • Tier 2 — Haiku (~$0.001, optional): Only for knowledge/methods namespace, text > 200 chars, 0 regex entities found. Also extracts relations between entities

Priority: exact canonical match (case-insensitive) → alias match → normalized match → create new entity. Aliases accumulate: “Thomas”, “Herr Weber”, “the client from Bern” all resolve to the same entity.

Only for knowledge and learnings namespaces. Finds memories with >0.80 cosine similarity, then applies heuristic checks:

  • Negation: “uses X” vs “doesn’t use X” / “nicht mehr”
  • Number change: “budget is 5000” vs “budget is 8000”
  • State change: “project is active” vs “project is completed”

Contradicted memories: is_active=false, SUPERSEDES edge created. Old memory stays in graph as audit trail but is excluded from retrieval.

  • OnnxProvider (default) — @huggingface/transformers WASM runtime. Default model: multilingual-e5-small (384d, 100 languages, ~118MB). Configurable via embedding_model config. Lazy-loads pipeline on first call (~800ms cold start). Auto-downloads model to ~/.cache/huggingface/.
  • VoyageProvider — HTTP via Voyage AI (1024 dims). Requires voyage_api_key.
  • LocalProvider — Hash-based deterministic (384 dims). Test-only.

Available ONNX models (embedding_model config):

Model IDDimensionsSizeLanguagesUse Case
multilingual-e5-small (default)384~118MB100Best balance: multilingual + fast
all-minilm-l6-v2384~23MBEnglishFastest cold start, English-only
bge-m31024~570MB100+Highest quality, slowest start
/knowledge list # List stored embeddings
/knowledge prune # Remove stale or duplicate entries

GC runs automatically every 50 runs:

  • Graph GC (runGraphGc()): Deletes superseded memories, removes orphan entities (not referenced by any active memory)
  • CLI: /memory gc [dry] — dry run previews changes without applying

Structured data in DataStore tables is automatically linked to the Knowledge Graph:

  • On collection create: Table registered as Entity (type: collection) in graph
  • On record insert: String fields scanned for entities via regex → has_data_in relationships created
  • On retrieval: When an entity is found in the graph, related DataStore collections are included as hints in the context (e.g., “Thomas has data in: customers (revenue: 5000)”)
  • Proactive discovery: Agent suggests creating tables when it notices recurring structured data during collaboration
data_store_insert("customers", [{name: "Thomas", company: "acme-shop.ch"}])
→ Entity "Thomas" (person) → has_data_in → "customers" (collection)
→ Entity "acme-shop.ch" (org) → has_data_in → "customers" (collection)

All memory tools sync with the Knowledge Graph when enabled:

  • memory_store: Stores content → entities extracted → graph write → flat-file dual-write
  • memory_recall: Reads from flat-file (graph retrieval happens per-turn via system prompt)
  • memory_update: Updates flat-file text → updates graph Memory node text → re-extracts entities
  • memory_delete: Removes from flat-file → deactivates matching Memory nodes in graph
  • memory_list: Lists flat-file entries by scope/namespace
  • memory_promote: Copies to broader scope (publishes for graph store) → deactivates source in graph
  • data_store_create: Set up a table with typed columns. Registers collection in graph
  • data_store_insert: Insert/upsert records. Entities from string fields indexed in graph
  • data_store_query: Filter, sort, aggregate (sum/avg/count/min/max)
  • data_store_delete: Remove records matching a filter. Requires filter — no bulk delete
  • data_store_list: Browse tables and schemas

All tools are available to the agent and sub-agents.

/memory # Show all knowledge
/memory knowledge # Show only the knowledge namespace
/memory methods # Show only the methods namespace

Pass memory: false in EngineConfig:

const engine = new Engine({ memory: false });
  • KnowledgeLayer (src/core/knowledge-layer.ts) — implements IKnowledgeLayer from src/types/index.ts. Composes KuzuGraph + EntityExtractor + EntityResolver + ContradictionDetector + RetrievalEngine
  • KuzuGraph (src/core/knowledge-graph.ts) — LadybugDB wrapper. DB at ~/.nodyn/knowledge-graph/. Schema: Entity/Memory/Community nodes, MENTIONS/RELATES_TO/SUPERSEDES/COOCCURS edges
  • RetrievalEngine (src/core/retrieval-engine.ts) — HyDE + vector + graph expansion + MMR. formatContext() produces XML output
  • EntityExtractor (src/core/entity-extractor.ts) — Tier 1 regex (DE/EN), Tier 2 optional Haiku
  • EntityResolver (src/core/entity-resolver.ts) — Canonical name resolution, alias merge
  • ContradictionDetector (src/core/contradiction-detector.ts) — Heuristic contradiction checks for knowledge/learnings
  • Wiring: Engine initializes KnowledgeLayer at startup, routes knowledge retrieval and memoryStore channel through it
  • Class: Memory (src/core/memory.ts)
  • Interface: IMemory (src/types/index.ts) — includes hasContent() check
  • Extraction model: Fast tier via beta messages API
  • Cache: Unified Map<string, string> keyed by ${scopeType}:${scopeId}:${namespace}
  • Scope delegation: Base CRUD methods delegate to scoped variants via _defaultScope()
  • Namespaces: ALL_NAMESPACES constant from src/types/index.ts
  • Embedding providers: src/core/embedding.tsOnnxProvider (model registry: multilingual-e5-small default), VoyageProvider, LocalProvider
  • Embedding queue: Engine uses bounded concurrency (max 3 parallel) for stores, with failures logged
  • Observability: nodyn:memory:store (every write), nodyn:knowledge:graph (graph events), nodyn:knowledge:entity (entity events)
  • Docker: HF model cache persisted via nodyn-hf-cache volume at /home/nodyn/.cache/huggingface