Skip to content

Latest commit

 

History

History
81 lines (56 loc) · 2.14 KB

File metadata and controls

81 lines (56 loc) · 2.14 KB

Brain Model Test Results

Date: April 17, 2026 (auto-generated) Model: brain_model Training: curriculum → preschool → grade1 → bAbI → FineWeb-Edu


Model Statistics

Metric Value
Neurons 48,312
Connections 1,476,117
MYELINATED 23,784 (1.6%)
USED 76,342 (5.2%)
NEW 1,375,991
Episodes 76,679
— NEW 35,065
— REPLAYED 2,189
— CONSOLIDATED 38,074
— DECAYING 1,351

Test Results Summary

Test Suite Passed Total Accuracy Time Description
CURRICULUM 49 50 98.0% 35.4s Core knowledge tests
STRICT 3 3 100.0% 2.0s "I do not know" tests
TOTAL 52 53 98.1% All tests combined

Baseline Comparison

QA baselines (TF-IDF, BM25) trained on identical data. Working memory baselines (MemNet, NTM) tested on all bAbI tasks. QA SUITE AVG is a macro-average across QA suites, not weighted by question count.

Test Brain TF-IDF BM25 MemNet NTM
CURRICULUM 98.0% 64.0% 70.0% N/A N/A
STRICT 100.0% 33.3% 33.3% N/A N/A
QA SUITE AVG 99.0% 48.7% 51.7% N/A N/A

bAbI requires working memory — TF-IDF/BM25 cannot track entity states. MemNet/NTM tested on all 20 tasks.

Key Findings

  1. Brain significantly outperforms simple IR methods (+50-66%)
  2. "I don't know" capability — Brain correctly abstains on unknown queries

Failed Tests Analysis

CURRICULUM (1 failures)

Question Brain Answer Expected
What is the moon? and stars appear in the sky at night ['satellite', 'round', 'night']

How to Reproduce

# Train model
python train.py

# Run all tests with baseline comparison
python test_brain.py --no-gpt --no-llm --babi-limit 5

# Run specific test suite
python test_brain.py --curriculum --no-gpt --no-llm

This file is auto-generated by test_brain.py. Do not edit manually.