Skip to content

Performance Benchmarks

Vitest bench-based performance benchmarks for tracking regressions and establishing baselines across releases. Two tiers: offline (no API key, CI-safe) and online (real API calls).

Terminal window
pnpm bench # all offline benchmarks (~30s)
pnpm bench:online # online benchmarks (requires API key, ~$0.02)
  • Config file: vitest.bench.config.ts
  • Output: tests/performance/results.json (gitignored, regenerated on each run)
  • Baselines: tests/performance/baselines/ (committed, versioned)

No API key needed. Safe for CI. Located in tests/performance/.

FileModuleWhat it measures
embedding.bench.tscore/embedding.tsONNX cold/warm start, LocalProvider throughput, cosine similarity, blob serialization
data-store.bench.tscore/data-store.tsSQLite collection creation, single/batch insert, query with filters/sort/aggregation
entity-extractor.bench.tscore/entity-extractor.tsRegex Tier 1 extraction on short/medium/large/plain text
security.bench.tscore/data-boundary.ts, core/output-guard.tsInjection detection, write content scanning, tool result scanning, data wrapping
memory.bench.tscore/memory.tsFlat-file save/load/append/delete/render, loadAll
knowledge-graph.bench.tscore/knowledge-graph.tsLadybugDB init, entity/memory/mention creation, Cypher queries (parameterized, 1-hop, scalar)
history-truncation.bench.tscore/agent.tsMessage count gate, token budget truncation, content block truncation

Require API key via ~/.nodyn/config.json or ANTHROPIC_API_KEY. Auto-skip without key. Located in tests/performance/online/.

FileModuleWhat it measuresCost
agent-loop.bench.tscore/agent.tssend() round-trip, streaming, multi-turn, tool dispatch~$0.005
retrieval-pipeline.bench.tscore/retrieval-engine.tsFull pipeline: embed → vector → graph → MMR, with/without HyDE~$0.01
dag-planner.bench.tscore/dag-planner.tsHaiku DAG decomposition (simple/medium/complex goals)~$0.005

Key metrics per benchmark:

  • hz: Operations per second (higher = better)
  • mean: Average time per operation in milliseconds (lower = better)
  • p99: 99th percentile latency — the “worst realistic case”
  • rme: Relative margin of error — below ±5% is stable
SignalMeaning
hz drops >20% vs baselinePerformance regression
p99 spikes >3x meanJitter/GC pressure
rme >10%Unstable benchmark — results unreliable
ONNX cold start >2sModel cache issue or download
KG init >200msDatabase migration or disk I/O issue
Security scan <1K ops/s on 50KBRegex backtracking

After a release or significant change:

Terminal window
pnpm bench
cp tests/performance/results.json tests/performance/baselines/v1.0.0.json
  1. Create tests/performance/<module>.bench.ts
  2. Import helpers from ./setup.ts
  3. Use describe + bench from vitest
  4. For online benchmarks: place in tests/performance/online/, use describe.skipIf(!hasApiKey())

Run with debug output to correlate benchmark timing with internal events:

Terminal window
NODYN_DEBUG=1 pnpm bench 2>bench-debug.log

Debug channels observed during benchmarks:

  • nodyn:tool:start/end — tool timing
  • nodyn:knowledge:graph — KG operations
  • nodyn:datastore:insert — DataStore writes
  • nodyn:memory:store — memory operations
  • nodyn:security:* — security scan events