Brain Model Test Results

Date: April 17, 2026 (auto-generated) Model: brain_model Training: curriculum → preschool → grade1 → bAbI → FineWeb-Edu

Model Statistics

Metric	Value
Neurons	48,312
Connections	1,476,117
MYELINATED	23,784 (1.6%)
USED	76,342 (5.2%)
NEW	1,375,991
Episodes	76,679
— NEW	35,065
— REPLAYED	2,189
— CONSOLIDATED	38,074
— DECAYING	1,351

Test Results Summary

Test Suite	Passed	Total	Accuracy	Time	Description
CURRICULUM	49	50	98.0%	35.4s	Core knowledge tests
STRICT	3	3	100.0%	2.0s	"I do not know" tests
TOTAL	52	53	98.1%		All tests combined

Baseline Comparison

QA baselines (TF-IDF, BM25) trained on identical data. Working memory baselines (MemNet, NTM) tested on all bAbI tasks. QA SUITE AVG is a macro-average across QA suites, not weighted by question count.

Test	Brain	TF-IDF	BM25	MemNet	NTM
CURRICULUM	98.0%	64.0%	70.0%	N/A	N/A
STRICT	100.0%	33.3%	33.3%	N/A	N/A
QA SUITE AVG	99.0%	48.7%	51.7%	N/A	N/A

bAbI requires working memory — TF-IDF/BM25 cannot track entity states. MemNet/NTM tested on all 20 tasks.

Key Findings

Brain significantly outperforms simple IR methods (+50-66%)
"I don't know" capability — Brain correctly abstains on unknown queries

Failed Tests Analysis

CURRICULUM (1 failures)

Question	Brain Answer	Expected
What is the moon?	and stars appear in the sky at night	['satellite', 'round', 'night']

How to Reproduce

# Train model
python train.py

# Run all tests with baseline comparison
python test_brain.py --no-gpt --no-llm --babi-limit 5

# Run specific test suite
python test_brain.py --curriculum --no-gpt --no-llm

This file is auto-generated by test_brain.py. Do not edit manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brain Model Test Results

Model Statistics

Test Results Summary

Baseline Comparison

Key Findings

Failed Tests Analysis

CURRICULUM (1 failures)

How to Reproduce

FilesExpand file tree

RESULTS.md

Latest commit

History

RESULTS.md

File metadata and controls

Brain Model Test Results

Model Statistics

Test Results Summary

Baseline Comparison

Key Findings

Failed Tests Analysis

CURRICULUM (1 failures)

How to Reproduce