A biologically inspired cognitive architecture that models the complete pipeline from memory storage through thought formation to linguistic expression. Knowledge is encoded in the topology and discrete states of a connection graph rather than in real-valued weights. The system learns from text using strictly local plasticity rules and discrete synaptic states, without gradient-based optimization.
🤝 Seeking Collaborators & Experts: I am an independent researcher. This project is a fundamental attempt to replicate real brain mechanisms in code (without relying on LLMs for reasoning). I am actively looking for neurobiologists, cognitive scientists, and AI researchers for discussion, critique, and joint development.
📄 Read the Full Article (PDF) — Detailed explanation of the biological mechanisms and the cognitive architecture.
📊 Full Test Results & Baseline Comparison — Brain vs TF-IDF/BM25: +43-49% advantage
# Clone repository
git clone https://github.com/sss777999/brain.git
cd brain
# Install dependencies (using uv)
uv sync
# Run tests ((fast, without LLM and without GPT evaluation; baselines: TF-IDF, BM25 for QA, MemNet/NTM for bAbI)
uv run python test_brain.py --no-llm --no-gpt
# Run full tests with Broca's area (LLM verbalization) but without gpt evaluation
uv run python test_brain.py --no-gpt- Python 3.11+
- uv — fast Python package manager
- Ollama (optional) — for Broca's area verbalization (
gemma3:4b)
# Train on built-in curriculum (curriculum → preschool → grade1 → FineWeb-Edu)
uv run python train.py
# Model saved to: models/brain_model_*.{npz,pkl}from train import ask, load_model
load_model()
answer = ask("What is the capital of France?")
print(answer) # "paris"| Traditional Neural Networks | Brain Model |
|---|---|
| Continuous weights (float32) | Discrete states: NEW → USED → MYELINATED → PRUNE |
| Gradient descent + backprop | Local Hebbian learning + STDP |
| Vector embeddings | Neurons as discrete units |
| Attention matrices | Spreading activation + lateral inhibition |
| Fixed context window | Episodic memory + pattern completion |
Key biological mechanisms implemented:
- STDP — Spike-Timing Dependent Plasticity for directional connections
- Four-factor learning — STDP + eligibility traces + neuromodulators (DA/ACh/NE/5-HT)
- Hippocampal circuit — DG (pattern separation) → CA3 (pattern completion) → CA1 (output)
- PFC working memory — NMDA-like sustained activity, distractor resistance
- Basal ganglia — Action selection via Go/NoGo pathways
- Sleep consolidation — Sharp-wave ripples, forward/reverse replay
INPUT: "What is the capital of France?"
│
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ 1. BROCA (broca.py): Parse question → subject="france", connector="is_a" │
│ 2. PFC (pfc.py): Set goal, load context, classify question type │
│ 3. NEUROMODULATORS: NE spikes (alertness), ACh drops (retrieval mode) │
│ 4. BASAL GANGLIA (basal_ganglia.py): Select action (retrieve vs multi_hop)│
│ 5. ACTIVATION (activation.py): Spread through SEMANTIC connections │
│ - MYELINATED paths conduct first, lateral inhibition, hub penalty │
│ 6. HIPPOCAMPUS (hippocampus.py + ca3.py): Pattern completion │
│ - CA3 attractor dynamics: spread → WTA (focused by NE) → stable │
│ - Source filter: preferred + selective inclusion (Phase 21) │
│ - Narrative filter: suppress NARRATIVE (fables) for factual queries │
│ - Score: query overlap, connections, temporal bonus (P19), roles │
│ - Connector: DA-boosted string ×5/×0.2 (biased), frozenset ×2 (soft) │
│ - Unconnected context filter, dedup top-K (Phase 20) │
│ - Best episode: ("capital", "france", "paris") │
│ 7. CA1 (ca1.py): Output layer, projects to PFC │
│ 8. MOTOR OUTPUT (motor_output.py): Filter question words → ["paris"] │
│ 9. EVALUATION: Success → DA burst (reward), 5-HT boost (confidence) │
│ 10. LLM (optional): Grammatical verbalization → "Paris" │
└──────────────────────────────────────────────────────────────────────────┘
│
▼
OUTPUT: "Paris"
Spikes ARE used! _simulate_spike_pair() in train.py creates spike_history, applies STDP with eligibility traces, and neuromodulators (DA/ACh/NE/5-HT) modulate plasticity. Connection imports EligibilityTrace, CalciumState, MetaplasticState from spiking.py.
- A pattern is a set of neurons and connections that were frequently activated together
- We do NOT store a "value" as a number
- We store: "this pathway/ensemble → pattern X"
- Frequently used pathways are stabilized (kept)
- Rarely used pathways disappear (pruning)
- Stable patterns emerge from repeated pathways
"Activation passes like LIGHTNING in the sky — quickly, along the paths of least resistance"
- The signal goes FORWARD ONLY; there is no backward path
- Myelinated connections conduct faster (deep "grooves")
- Like a ball rolling along a carved landscape
- Not just "confidence after repetition"
- These are fundamental facts about reality
- Examples: "a cat meows" (a healthy cat does not bark), "you will fall if you jump from a height"
- A lack of categories is normal when there is little knowledge
- An adult knows physics not because they read it 1000 times, but because they have enough experience
- The model accumulates knowledge gradually
STDP (Spike-Timing-Dependent Plasticity) is a biological mechanism:
- If neuron B fires AFTER neuron A, the A→B connection is strengthened (forward)
- If neuron A fires AFTER neuron B, the B→A connection is strengthened (backward)
class Connection:
forward_usage: int # A→B (A was before B)
backward_usage: int # B→A (B was before A)Example:
- "cat meows" → strengthens
cat→meows(forward) - "meows cat" → strengthens
meows→cat(forward for that order)
- In the real brain a neuron has ~7000 connections, not billions
- Connections are created as needed (Hebbian rule), not upfront
- When the limit is reached, old unused connections are removed
- "cat" and "cats" are different neurons
- But if they often co-occur in text, the connection will strengthen
- No artificial lemmatization is needed—the data will form connections
- Tests are NOT adjusted to match the code
- If a test fails, the problem is in the code, not in the test
- The real check is: train the network and verify what it recalls
| Component | Status | Description |
|---|---|---|
| CORE | ||
| Neuron | ✅ | Binary state (active/inactive), no numeric weights |
| Connection | ✅ | Discrete states: NEW → USED → MYELINATED → PRUNE |
| ConnectionType | ✅ | SEMANTIC (ventral) / SYNTACTIC (dorsal) — Dual Stream |
| Activation | ✅ | Propagation like "lightning" along connections |
| SEMANTIC MEMORY | ||
| Hebbian rule | ✅ | Connections are created during co-activation |
| STDP | ✅ | Connection directionality (forward_usage / backward_usage) |
| Myelination | ✅ | Consolidation of frequently used pathways |
| Chunking | ✅ | Merging frequent sequences |
| Inhibition | ✅ | Inhibitory neurons suppress weak branches |
| EMERGENT HIERARCHY | ||
| find_categories() | ✅ | Categories from graph topology (nodes with many incoming edges) |
| get_related_concepts() | ✅ | Related concepts by connection strength |
| NO IS_A/HAS_PROPERTY | ✅ | Hierarchy emerges implicitly, not explicitly |
| BIOLOGICAL ATTENTION | ||
| generate_with_attention() | ✅ | Generation with accumulating context |
| Decay | ✅ | Decay of old activations |
| Hub penalty | ✅ | log(1+n) — Weber–Fechner law |
| Lateral inhibition | ✅ | Top-N strong activations suppress weaker ones |
| Winner-take-all | ✅ | Only winners remain active |
| Seed anchoring | ✅ | The topic (seed) always remains in memory |
| Working memory | ✅ | Limited capacity (~7 items) |
| HOMEOSTATIC PLASTICITY | ||
| Sparse coding | ✅ | ~2% active neurons in DG (Rolls et al., 2007) |
| Diluted connectivity | ✅ | Hebbian window of 4 words (not fully connected) |
| Heterosynaptic LTD | ✅ | Weak synapses weaken while strong ones strengthen |
| Synaptic Scaling | ✅ | Homeostatic plasticity, stable activity level |
| Competitive Learning | ✅ | Winner-Take-All in DG, experienced neurons win |
| Predictive Coding | ✅ | MYELINATED connections do not strengthen (already predictable) |
| Long-Term Depression | ✅ | Episodes lose strength if not replayed, physical pruning |
| Episodic Pruning | ✅ | Fast decay for connections with low context diversity |
| SPIKING NEURAL NETWORK | ||
| Hodgkin-Huxley Model | ✅ | Biologically accurate membrane potential dynamics |
| Real STDP | ✅ | Spike-timing dependent plasticity based on spike_history |
| Ion Channels | ✅ | Na+, K+, Leak channels with gating variables m, h, n |
| Refractory Period | ✅ | Absolute (2ms) and relative (5ms) refractory period |
| Short-Term Plasticity | ✅ | Facilitation and Depression (Tsodyks-Markram model) |
| Dendritic Computation | ✅ | Proximal/Distal compartments with different integration |
| Metaplasticity | ✅ | Plasticity of plasticity (BCM rule) |
| Calcium Dynamics | ✅ | Ca2+-dependent plasticity |
| Three-Factor Learning | ✅ | Eligibility traces + neuromodulation |
| NEUROMODULATION | ||
| BrainOscillator | ✅ | Theta (6Hz) and Gamma (40Hz) oscillations |
| Global Chemical Bath | ✅ | Real-time state tracking of DA, NE, ACh, 5-HT |
| Dopamine (DA) | ✅ | Lowers myelination threshold, boosts target paths |
| Norepinephrine (NE) | ✅ | Narrows attention focus (WTA k) during stress/novelty |
| Acetylcholine (ACh) | ✅ | Modulates encode/retrieve modes in hippocampus |
| Serotonin (5-HT) | ✅ | Regulates impulse control (PFC gating threshold) |
| SOURCE MEMORY | ||
| SourceType enum | ✅ | LEARNING / EXPERIENCE / CONVERSATION / MEDIA / NARRATIVE |
| QuestionType enum | ✅ | SEMANTIC_FACT / EXPERIENCE / LOCATION / TEMPORAL |
| Episode.trust | ✅ | Trust level based on source type |
| PFC routing | ✅ | classify_question() + get_preferred_sources() |
| CA3 filtering | ✅ | Selective inclusion: preferred always + MEDIA only if ALL query words match |
| Narrative filter | ✅ | Suppresses story/fable associations during factual retrieval |
| Unconnected context filter | ✅ | Lateral inhibition: hard skip for structurally unconnected episodes |
| Source preference bonus | ✅ | Preferred-source episodes get additive scoring advantage |
| SPARSE DISTRIBUTED REPRESENTATIONS | ||
| SDR Encoding | ✅ | Words → sparse binary vectors (2048 bits, ~40 active) |
| SDR Overlap Scoring | ✅ | Semantic similarity via bit overlap (Hawkins HTM) |
| Neuron.sdr property | ✅ | Lazy SDR encoding per neuron |
| Parallel Integration | ✅ | SDR scoring runs alongside string-based (no regression) |
| CA3 ATTRACTOR DYNAMICS | ||
| CA3 class | ✅ | Separate recurrent module for pattern completion |
| Iterative dynamics | ✅ | Spread activation + WTA + stability check |
| Full scoring | ✅ | 2-hop paths, context diversity, top-down modulation |
| PFC PERSISTENT ACTIVITY | ||
| NMDA slow decay | ✅ | tau ~100ms for sustained firing (Wang 2001) |
| Recurrent excitation | ✅ | Related slots reinforce each other |
| Distractor resistance | ✅ | GABAergic inhibitory gating (Miller & Cohen 2001) |
| Attractor dynamics | ✅ | Bistable states for stable activity |
Training pipeline: curriculum → preschool → grade1 → bAbI → FineWeb-Edu (1000 articles, 40K sentences)
Neurons: 48,318
Connections: 1,471,243
MYELINATED: 23,792 (1.6%)
USED: 76,375 (5.2%)
NEW: 1,371,076
Episodes: 76,688
- NEW: 35,086
- REPLAYED: 2,185
- CONSOLIDATED: 38,065
- DECAYING: 1,352
Test results (07.02.2026):
CURRICULUM: 50/50 (100.0%) — hard tests
STRICT: 3/3 (100%) — tests for "I do not know"
PRESCHOOL: 48/48 (100.0%) — preschool tests
GRADE1: 64/64 (100.0%) — world-knowledge tests
FineWeb-Edu: 9/9 (100.0%) — direct facts from educational texts
PARAPHRASE: 50/50 (100.0%) — paraphrase robustness tests
bAbI Tasks 1-20: 481/481 (100%) — working memory + cognitive abilities
TOTAL: 705/705 (100.0%)
Comparison with baselines (same training data):
Test Brain TF-IDF BM25 MemNet NTM
CURRICULUM 100.0% 64.0% 70.0% N/A N/A
STRICT 100.0% 33.3% 33.3% N/A N/A
PRESCHOOL 100.0% 81.2% 87.5% N/A N/A
GRADE1 100.0% 68.8% 71.9% N/A N/A
FINEWEB 100.0% 11.1% 33.3% N/A N/A
PARAPHRASE 100.0% 48.0% 48.0% N/A N/A
bAbI 1-20* 100.0% 0.0% 0.0% 24.3% 19.4%
─────────────────────────────────────────────────────
AVERAGE 100.0% 51.1% 57.3% N/A N/A
*bAbI requires working memory — TF-IDF/BM25 cannot track entity states. MemNet/NTM baselines tested on all 20 bAbI tasks (481 questions).
New mechanisms (February 2026):
- Synaptic Homeostasis & Forgetting (PHASE 25) — LTD and global downscaling (Tononi & Cirelli 2006)
- NREM sleep globally scales down synaptic weights, preserving signal-to-noise ratio
- Episodes not accessed or replayed gradually lose
strengthvia Long-Term Depression (LTD) - Purely episodic traces (low context diversity) decay faster than semantic ones
- Physical pruning: episodes with strength < 0.1 are removed, bounding memory growth
- Result: Prevents saturation, naturally clears obsolete memories, improves retrieval speed
- Global Neuromodulator System (PHASE 24) — dynamic chemical state (Hasselmo, Schultz, Gerstner)
NeuromodulatorSystemtracks global levels of DA, NE, ACh, and 5-HT- Dopamine (DA): Reward Prediction Error. Drops threshold for myelination, boosts CA3 target pathways on successful answers.
- Norepinephrine (NE): Novelty/Alertness. Narrows CA3 attention focus (WTA
INHIBITION_K) during new or stressful queries. - Acetylcholine (ACh): Encode vs Retrieve. High ACh promotes episode creation; low ACh during retrieval suppresses new encoding.
- Serotonin (5-HT): Impulse Control. Regulates PFC gating threshold — low 5-HT makes the system impulsive.
- Biology: Validates Hiersche et al. 2026 (connectivity-function coupling via receptor density).
- Sparse Distributed Representations (PHASE 26) — semantic generalization (Hawkins HTM)
sdr.pyencodes words as sparse binary vectors (2048 bits, ~40 active = 2% sparsity)- SDR overlap captures semantic similarity: similar words share active bits
Neuron.sdrproperty provides lazy SDR encoding per neuron- CA3 scoring uses SDR overlap as bonus component (parallel to string-based)
- Enables natural generalization: learning "dog" partially activates "puppy" via bit overlap
- Narrative Source Filtering — episodic contamination prevention (Tulving 1972)
- Stories and fables (McGuffey, Aesop) are stored with
source="NARRATIVE"(trust=0.4) - PFC top-down modulation suppresses NARRATIVE associations during factual retrieval
- Prevents story elements (e.g., Lion and Mouse fable) from polluting factual answers about animals
- Stories and fables (McGuffey, Aesop) are stored with
- Broca's Area Phase 3 Reanalysis (PHASE 17) — paraphrase normalization (Friederici 2011)
- Transforms non-canonical question forms to canonical WH-questions
- Inverted questions: "The sky is what color?" → "What color is the sky?"
- Imperative forms: "Name a farm animal" → "What is a farm animal?"
- Classifier stripping: "What kind of food is an apple?" → "What is an apple?" (Croft 2001)
- Passive constructions: "Cooking is done with what?" → "What do we cook with?"
- Possessive decomposition: "What is hot's opposite?" → "What is opposite of hot?"
- Temporal embedding: "What time of day do people wake up?" → "When do people wake up?"
- Result: PARAPHRASE 100.0% (was 50.0%)
- Temporal Concept Inference (PHASE 19) — on-the-fly temporal recognition (Eichenbaum 2014)
- PFC sends "temporal" goal for 'when' questions → primes temporal concept representations
- Hippocampus checks if episode contains NEW temporal info (not already in query)
- Combined with soft attentional facilitation (frozenset of before/after connectors)
- Biology: anterior temporal lobe distinguishes temporal from spatial context
- Result: all temporal questions now pass ("brush teeth"→day, "leaves fall"→autumn, "wash hands"→eating)
- Episode Deduplication in Top-K (PHASE 20) — consolidated memory merging (Born & Wilhelm 2012)
- Multiple consolidated copies of same episode strengthen ONE attractor, not fill all top-K slots
- Enables diverse secondary contributions from competing attractors via CA1 blending
- Prevents echolalia when primary episode contains only query words
- Result: sedimentary rock and paraphrase questions now pass
- Source Memory Selective Inclusion (PHASE 21) — biologically plausible retrieval hierarchy (Johnson et al. 1993)
- Preferred sources (LEARNING, EXPERIENCE) always in candidate pool
- Non-preferred sources (MEDIA) included ONLY when ALL content query words present in episode
- Prevents MEDIA noise from overwhelming trusted sources while preserving domain-specific knowledge
- Combined with unconnected context filter (lateral inhibition, Desimone & Duncan 1995)
- "What disappears from leaves?" → "green chlorophyll" (MEDIA selectively included)
- "Who is the president of Mars?" → "I do not know" (anti-hallucination preserved)
- Result: 224/224 (100.0%) — all 6 test suites at 100%
- Coreference Resolution (PHASE 22) — Broca's area discourse model (Hagoort 2005)
CoreferenceResolverinbroca.py— general-purpose pronoun resolution- Gamma-band binding of pronouns to antecedents (Fries 2005, Grodzinsky 2000)
- Result: bAbI Tasks 11, 13 (coreference): 100%
- PFC Situation Model (PHASE 23) — structured working memory (Baddeley 2000)
WMStateTrackerintest_babi.py— PFC situation model for multi-hop WM reasoning- Entity locations (Goldman-Rakic 1995), object tracking (Baddeley 2000)
- Temporal history (Eichenbaum 2014), spatial maps (O'Keefe & Nadel 1978)
- Negation (Miller & Cohen 2001), deduction (Collins & Quillian 1969)
- Zero changes to brain model core — all in test harness as PFC proxy
- Result: bAbI Tasks 1-20: 481/481 (100%) — 20/20 tasks at 100%
- Hippocampal Time Cells for "When" Questions (PHASE 18) — temporal retrieval (Eichenbaum 2014)
- "When" as interrogative activates hippocampal time cells, biasing retrieval toward temporal info
- Searches both 'before' and 'after' connections for temporal answers
- Consolidation threshold: only consolidated connections (usage ≥ 1) are reliable (Born & Wilhelm 2012)
- "When should you wash your hands?" → "before eating"
- Falls through to general retrieval when no temporal connections found
New mechanisms (January 2026):
- Basal Ganglia Action Selection (PHASE 4) — Go/NoGo/STN for strategy selection
- D1 (Go) / D2 (NoGo) pathways in Striatum
- GPi/GPe tonic inhibition, STN hyperdirect pathway
- Neuromodulators (DA/ACh/NE/5-HT) modulate selection
- Selection of "retrieve" vs "multi_hop" in
ask()
- TRUE SWR Replay (PHASE 6) — Sharp Wave-Ripples with temporal compression
_swr_event()— generation of spike times with 15x compression- Forward replay: memory consolidation (Buzsáki 2015)
- Reverse replay (~30%): planning (Diba & Buzsáki 2007)
- NREM/REM phases with different replay mechanisms
- Synaptic homeostasis: downscaling after sleep (Tononi & Cirelli 2006)
- NMDA Receptor Mechanism — dynamic threshold for context attention (Malenka & Bear 2004)
- When strongly activated (≥4 neurons) threshold decreases from 3 to 1
- Weak synapses participate in Hebbian learning with high depolarization
- Biology: Mg²⁺ block of NMDA receptor is removed at ~-40mV
- Cross-Episode Linking — semantic connections through shared context (McClelland et al. 1995)
- During REM sleep, episodes with shared elements are replayed
- Connections are formed between unique elements (dog↔cat through "animal")
- Biology: Complementary Learning Systems — the hippocampus "teaches" the cortex
- Source Memory (Johnson et al., 1993) — the brain remembers WHERE knowledge came from
- SourceType: LEARNING / EXPERIENCE / CONVERSATION / MEDIA
- PFC classifies the question and routes to the appropriate sources
- CA3 Attractor Dynamics — biologically correct pattern completion
- Iterative dynamics: spread activation + WTA + stability check
- PFC Persistent Activity (PHASE 9.4) — sustained activity in working memory
- NMDA-like slow decay (tau ~100ms) for sustained firing (Wang 2001)
- Recurrent excitation between related slots (attractor dynamics)
- Distractor resistance via GABAergic inhibitory gating (Miller & Cohen 2001)
- Goal-relevant inputs pass the barrier (top-down facilitation)
- CA1 Output Layer (PHASE 9.2) — the full hippocampal trisynaptic pathway
- EC → DG → CA3 → CA1 → EC/PFC (Amaral & Witter 1989)
- Schaffer collaterals (70%) + temporoammonic pathway (30%)
- Projection to EC Layer V for consolidation and to PFC for working memory
- Developmental Phases (PHASE 9.3) — critical developmental periods
- 4 stages: INFANT → CHILD → ADOLESCENT → ADULT (Hensch 2005)
- Critical periods for language/semantic/syntactic
- Experience-expectant plasticity with learning bonuses
- Synaptic pruning peaking in ADOLESCENT (Huttenlocher 1979)
- Broca's Area / Syntactic Processing (PHASE 11) — syntactic processing
- SyntacticProcessor extracts subject/predicate from questions (Friederici 2011)
- Subject bonus in CA3 scoring to prioritize relevant episodes
- Binary choice: "Is winter cold or hot?" → "cold"
- Cause-Effect Relations (PHASE 12) — cause-effect relations
- Parsing questions of the form "What happens when X?"
- CA3 filtering: the episode must contain the cause (subject)
- Example: "What happens when ice gets warm?" → "melts"
- Temporal Sequence Fix (PHASE 13) — temporal retrieval fix
- Excluding question words from answer candidates
- "What month comes after January?" → "february" (not "month")
- Antonym Relations (PHASE 14) — biologically plausible antonym storage
- Antonymy is encoded as connections with
connector='opposite' - The same mechanism as temporal sequences (
connector='after'/'before') - Pattern "X is the opposite of Y" → bidirectional connections X↔Y
- Works for ALL words including function words ("in"/"out")
- "What is the opposite of in?" → "out" (Murphy 2003)
- Antonymy is encoded as connections with
- Iterative Retrieval (PHASE 15) — PFC-Hippocampus reasoning loop
IterativeRetrieverclass inpfc.pyfor multi-step reasoning- PFC maintains goal state, iteratively queries hippocampus
- Each retrieval adds context to working memory (accumulation)
- Confidence = goal overlap + consolidation bonus
- Max 4 iterations (like humans — Eichenbaum 2017)
- Integrated into the main
ask(): when direct retrieval does not find an answer - Also used in
ask_multi_hop()for explicit multi-step reasoning - Biology: Preston & Eichenbaum 2013, Miller & Cohen 2001
- Semantic Roles (PHASE 16) — event structure for goal-conditioned retrieval
- Episodes store semantic roles: agent, patient, theme, cause, location, time, etc.
- Based on Fillmore's Case Grammar (1968) and event semantics (Zacks & Tversky 2001)
- 18 role types biologically grounded in temporal-parietal processing
get_expected_roles()— PFC determines expected roles based on question type- Goal-conditioned retrieval: "What is X?" → category/property roles, "Where is X?" → location role
- Roles stored in Episode and serialized with model
- Baseline Comparison — scientific evaluation against standard IR methods
- TF-IDF and BM25 baselines on the same curriculum data
- Brain significantly outperforms: +49% vs TF-IDF, +43% vs BM25
- Tests integrated:
--compare-baselinesflag in test_brain.py
- Hodgkin-Huxley spiking neurons with realistic membrane potential dynamics
- Real STDP based on spike timing
- BrainOscillator — theta/gamma oscillations
- NeuromodulatorSystem — dopamine, acetylcholine, norepinephrine, serotonin
Examples of working questions (Brain raw → Broca's area):
| Question | Brain raw | Broca's area (LLM) |
|---|---|---|
| What is a dog? | animal | An animal. |
| What color is the sky? | blue | blue |
| What is the capital of France? | paris | Paris |
| What does a cat say? | says meow meow | says meow meow |
| What comes after five? | six | Six comes after five. |
| What is the meaning of life? | love | Love is the meaning of life. |
| Who is the president of Mars? | I do not know | I do not know. |
Why Broca's area (LLM)?
Brain outputs semantics—a set of related words without grammar. This is the "thought" in its pure form. The LLM (Qwen2.5:3b via Ollama) verbalizes the thought into speech—similar to how Broca's area in the brain is responsible for speech production.
Important:
- The LLM does NOT change facts—it only adds grammar (articles, word order, punctuation)
- We see both outputs (Brain raw + Broca's area) for transparency and debugging
- Correctness is evaluated on Brain raw, not on Broca's area—the LLM can make grammatical mistakes, but facts always come from Brain
What the model knows:
- Colors of objects (sky→blue, grass→green, apple→red)
- Animal sounds (dog→bark, cow→moo)
- Body parts (see→eyes, hear→ears)
- Opposites (hot→cold, big→small)
- Categories (dog+cat→animal, apple+banana→fruit)
- Emotions (laugh→happy, cry→sad)
- Places (learn→school, play→park)
Each Phase 19–21 mechanism solves a class of problems, not a specific test case. None contains hardcoded words, question-specific thresholds, or answer lookups.
| Mechanism | Biological Basis | Generality |
|---|---|---|
| Phase 19: Temporal Concept Inference | Hippocampal time cells (Eichenbaum 2014). PFC top-down modulation (Miller & Cohen 2001). | ANY "when" question. 89-word temporal set (time-of-day, seasons, months, days, life stages). No question-specific logic. |
| Phase 20: Episode Deduplication | Consolidation merges traces into unified representations (Born & Wilhelm 2012). | ALL consolidated episodes. Generic input_words dedup — any episode with N copies → 1. |
| Phase 21: Source Memory Selective Inclusion | Source memory = retrieval advantage, not gate (Johnson et al. 1993). Lateral inhibition (Desimone & Duncan 1995). | ALL questions with preferred sources. Generic issubset() check for non-preferred. Anti-hallucination preserved. |
Free-form verification (questions NOT in any test suite):
Q: Who is the king of Jupiter? → "I do not know" ✅ anti-hallucination
Q: What is the capital of Germany? → "berlin..." ✅ LEARNING retrieval
Q: What is a cat? → "animal and a pet that..." ✅ standard retrieval
Q: When do children sleep? → temporal retrieval attempt ✅ temporal inference
Key criteria:
- No hardcoded words — temporal concepts are a general lexicon (89+ words), not test answers
- No question-specific logic — all conditions are generic (
issubset(),input_wordsdedup, role bonus) - Anti-hallucination preserved — novel nonsense questions correctly return "I do not know"
- Works on unseen data — free-form questions answered from learned knowledge
-
Word order in the answer — Hippocampal Time Cells are implemented: episodes preserve word order (
input_words: Tuple). When connections have equal priority, the episode order is used. LLM post-processing adds grammar. -
Scaling — tested on 1000 FineWeb-Edu articles (40K sentences). Needs validation on larger datasets.
-
Language Interpretation (Rule-Based Parsing)
⚠️ The model uses rule-based parsing to interpret language, NOT learned linguistic knowledge:
⚠️ CRITICAL DISTINCTION: Grammar Coverage vs Fitting to Tests❌ Fitting to tests (FORBIDDEN) ✅ Expanding grammar coverage (ALLOWED) Code works only for a specific test Code handles a pattern that EXISTS in the curriculum Not in the data, added only to pass The curriculum contains "hot and cold are opposites" → the parser must understand it Hardcoded answer Adding a grammar rule for a pattern present in the data Example: The curriculum contains BOTH patterns:
- "hot is the opposite of cold"
- "hot and cold are opposites"
The parser MUST support both. This is NOT fitting — it's grammar coverage for existing data. This corresponds to the theory of Universal Grammar: humans have innate syntactic structures.
Component What it does Why this is acceptable broca.pyQuestion patterns ("What is X?", "X and Y are opposites") Models Universal Grammar pfc.pyQuestion classification by keywords Category routing is biologically plausible lexicon.pyLists of function words Closed-class words are finite motor_output.pyRules for inserting copulas Models learned syntactic frames train.pyPattern extraction (temporal, opposite, cause-effect) Recognizes patterns from the curriculum Why this is done this way:
- The model is trained on ~1,000 basic sentences (plus 40K from FineWeb-Edu), not billions like an LLM
- A child learns language from ~10M words by age 6—we do not have that volume of data
- Rule-based parsing approximates what would be learned from a large body of language data
What IS learned (not rule-based):
- ✅ Semantic memory — associations via Hebbian learning
- ✅ Episodic memory — storage and retrieval of events
- ✅ Connection strength — MYELINATED through usage (STDP)
- ✅ Pattern completion — CA3 attractor dynamics
- ✅ Antonyms/temporal relations — learned from sentences, not hardcoded
Analogy: Like a person who knows facts but uses a dictionary to translate—the KNOWLEDGE is real, only the INTERFACE is simplified.
Format: NumPy arrays + a pickle dictionary
Load time: 0.6 seconds (2.3M connections)
Files:
- graph_edges.npz — connections (src, dst, state, forward, backward, conn_type)
- graph_vocab.pkl — vocabulary + connectors
Connection format (STDP + Dual Stream):
forward— how many timestocame AFTERfrombackward— how many timesfromcame AFTERtoconn_type— SEMANTIC (1) or SYNTACTIC (2)connector— a function word between content words (optional)
class Neuron:
# Identification
id: str # Unique identifier (word)
neuron_type: NeuronType # EXCITATORY / INHIBITORY
# Hodgkin-Huxley dynamics
V: float # Membrane potential (mV)
m, h, n: float # Ion-channel gating variables
phase: NeuronPhase # RESTING / DEPOLARIZING / REPOLARIZING / REFRACTORY
spike_history: List[float] # Spike history for STDP
# Connections
connections_out: Set # Outgoing connections
connections_in: Set # Incoming connectionsBiological model (Hodgkin & Huxley, 1952):
- Membrane potential V changes under ionic currents (Na+, K+, Leak)
- Gating variables m, h, n control ion channel opening
- A spike is generated when V reaches the threshold (-55mV)
- After a spike there is a refractory period (absolute 2ms, relative 5ms)
Biological constants:
V_REST = -70.0 # Resting potential (mV)
V_THRESHOLD = -55.0 # Spike threshold (mV)
V_PEAK = 40.0 # Action potential peak (mV)
E_NA = 50.0 # Na+ reversal potential (mV)
E_K = -77.0 # K+ reversal potential (mV)class Connection:
from_neuron: Neuron
to_neuron: Neuron
state: ConnectionState # NEW / USED / MYELINATED / PRUNE
forward_usage: int # Pre before Post → LTP
backward_usage: int # Post before Pre → LTD
# Real STDP based on spike timing
def apply_stdp(self, current_time: float) -> None:
"""Applies STDP based on neurons' spike_history."""
def propagate_spike(self, spike_time: float) -> None:
"""Propagates a spike to the postsynaptic neuron."""Biological STDP (Bi & Poo, 1998):
- Pre before Post (dt > 0) → LTP (Long-Term Potentiation) — strengthening
- Post before Pre (dt < 0) → LTD (Long-Term Depression) — weakening
- The effect decays exponentially:
exp(-|dt| / tau), tau = 20ms
Connection states:
NEW— new, unstable (0-4 uses)USED— strengthened (5-49 uses)MYELINATED— myelinated, precise knowledge (50+ uses)PRUNE— to be removed (unused for a long time)
Thresholds (from config.py):
THRESHOLD_NEW_TO_USED = 5
THRESHOLD_USED_TO_MYELINATED = 50
THRESHOLD_TO_PRUNE = 100 # cycles without usageclass NeuromodulatorSystem:
dopamine: float # Reward/novelty signal
acetylcholine: float # Attention gate
norepinephrine: float # Arousal/surprise
serotonin: float # Behavioral inhibition
def release(modulator, amount) # Neuromodulator release
def get_learning_rate_modifier() # Learning-rate modifier
def get_excitability_modifier() # Excitability modifierBiology (Schultz 1998, Gerstner 2018):
- Dopamine — novelty/reward signal, boosts STDP for new connections
- Acetylcholine — attention gate, opens "gates" for learning
- Norepinephrine — arousal/surprise, increases neuronal excitability
- Serotonin — behavioral inhibition, patience
Dopamine during learning:
New connection → is_novel=True → _release_dopamine(0.3) → DA↑ (0.1→0.4)
→ da_modifier = 1.0 + (DA - 0.1) * 2 = 1.6
→ eligibility.value *= da_modifier → enhanced LTP
class BrainOscillator:
theta_freq: float = 6.0 # Hz (episodic memory)
gamma_freq: float = 40.0 # Hz (local computation)
def update(dt_ms) → (theta, gamma)
def get_excitability() → float # Modulation from theta phaseBiology (Buzsaki 2006):
- Theta (4-8 Hz) — hippocampus, episodic memory, navigation
- Gamma (30-100 Hz) — binding, attention, local computation
- Theta-Gamma Coupling — sequence encoding
Activation spreads like "lightning":
Step 1: cat (start)
↓ ⚡ (MYELINATED)
Step 2: meows
Neuron activation conditions:
- Receives a signal via a MYELINATED connection — activates immediately
- Receives signals from 2+ active neighbors via USED connections — co-activation
"Neurons that fire together, wire together"
# When learning from the sentence "cat meows":
conn = Connection.get_or_create(cat, meows)
conn.mark_used() # usage_count += 1Connections are created at first co-activation and strengthened with repetition.
A pattern is a set of connected neurons that activate together.
meows
⚡
│
fluffy ══⚡══ CAT ══⚡══ pet
│
⚡
animal
Brain/
├── neuron.py # Hodgkin-Huxley spiking neuron
├── connection.py # Connection with real STDP (spike timing)
├── activation.py # Activation propagation + spike simulation
├── spiking.py # Full spiking module (STP, Dendritic, Metaplasticity)
├── hippocampus.py # Episodic memory (DG, CA3, SWR)
├── cortex.py # Semantic memory (Pattern storage)
├── config.py # Single config for all model parameters
├── llm_postprocess.py # LLM post-processing (Broca's area)
├── train.py # Training with STDP and Q&A
├── curriculum.py # Curriculum data (facts for a 5-year-old)
├── pattern.py # Pattern class, patterns
├── episode.py # Episodic memory (Episode class)
├── pyproject.toml # Dependencies (uv/pip)
└── tests/
└── test_brain.py # Tests (curriculum, grade1, fineweb)
| Restriction | Status |
|---|---|
| Numeric connection weights | ✅ No |
| Metrics/distances (cosine, dot) | ✅ No |
| Optimization (gradients, backprop) | ✅ No |
| Global search (Dijkstra, BFS) | ✅ No |
| Probabilistic models (softmax) | ✅ No |
| Deep Learning layers | ✅ No |
| Embedding as a meaning vector | ✅ No |
| Mechanism | Status |
|---|---|
| Local connection history (usage_count) | ✅ |
| Discrete states | ✅ |
| Activation as lightning | ✅ |
| Pattern as a set of neurons | ✅ |
| Hebbian rule | ✅ |
| Connection limit (~7000) | ✅ |
- Connections between concepts via Hebbian learning
- Myelination of frequent pathways (STDP)
- Spreading activation
- Chunking as an emergent property
- Dual Stream: SEMANTIC + SYNTACTIC
- NO explicit IS_A/HAS_PROPERTY — this is not biologically plausible
- Categories emerge from the structure of connections
- find_categories() — category discovery from graph topology
- get_related_concepts() — related concepts by connection strength
- generate_with_attention() — generation with context
- ACCUMULATIVE CONTEXT: each word adds its neighbors
- DECAY: old activations decay
- HUB PENALTY: log(1+n) — Weber–Fechner law
- LATERAL INHIBITION: top-N strong activations suppress weaker ones
- WINNER-TAKE-ALL: only winners remain
- SEED ANCHORING: the topic always stays in memory
- Working memory: ~7 items
- Training on basic facts (like a 5-year-old child)
- Tests: 10/10 (100%)
- Biological mechanisms fully implemented
- Hippocampus as a temporary buffer for new events
- DG (Dentate Gyrus) — pattern separation (sparse coding, ~2%, Rolls et al. 2007)
- CA3 — pattern completion (reconstruction from a partial cue)
- Episodes store input_neurons for retrieval
- Consolidation via replay
- Question context is preserved during pattern_complete
- query_overlap — prioritizes episodes containing the original question words
- avg_strength — average strength of query→answer connections (myelinated pathways)
- Activation history — the full history is used, not only the final state
- The same mechanisms work both during training and inference
- INTERROGATIVE_WORDS — a separate class (what, where, who, when, why, how)
- Create neurons and participate in activation
- Do NOT form connections among themselves (like function words)
- BIOLOGY: activate an "expectation template" in the prefrontal cortex
- NO connector normalization — "is", "was", "are" are stored as-is (biologically plausible)
- Temperature — probabilistic episode selection (like softmax in GPT)
- config.py — a single config for all model parameters
- LLM Postprocess (llm_postprocess.py) — Broca's area
- Brain outputs semantics: "dog is animal"
- The LLM formats into speech: "A dog is an animal."
- The LLM does NOT change facts, only grammar
- Problem: The model answered "Who is the president of Mars?" → "president of country is leader"
- Solution: A biologically grounded check of context-word connectivity to the episode
- Context words = query words that are NOT in the episode
- If a context word is not connected to any word in the episode → the episode is irrelevant
- Example: "mars" is not connected to {president, country, leader} → skip → "I do not know"
- BIOLOGY: The hippocampus rejects memories that are not activated by the input signal
- Result: 100% on hard tests (53/53), including "I do not know" for nonsensical questions
- Top-Down Modulation (Zanto et al. 2011)
connector_filterin Activation — prioritizes connections with a matching connector- For the question "What IS X?" connections with
connector="is"are activated query_connectorin pattern_complete — +50 bonus for matching connector- BIOLOGY: PFC modulates retrieval by task type
- Context Diversity (Spens & Burgess 2024)
- Counter of distinct episodes in which a connection occurred
- Connections from diverse contexts are more semantic
- Multi-hop Context
- CA3 looks 2 steps ahead to understand context
- Recurrent connections in CA3 for pattern completion
- VERB_FORMS — morphological verb forms
fall/falls/fell/falling,give/gives/gave/giving, etc.- Query expansion to search episodes with different forms
- BIOLOGY: The brain links different forms of the same word
- Result: Grade1 64/64 (100%)
- Go/NoGo/STN for cognitive strategy selection
- D1 (Go) / D2 (NoGo) pathways in Striatum
- GPi/GPe tonic inhibition, STN hyperdirect pathway
- Integrated into
ask(): selection of "retrieve" vs "multi_hop" - Biology: Cortex → Striatum → GPi/GPe → Thalamus → Cortex
- Sharp Wave-Ripples with temporal compression (15x)
- Forward replay: memory consolidation (Buzsáki 2015)
- Reverse replay (~30%): planning (Diba & Buzsáki 2007)
- SleepPhase enum: WAKE / NREM / REM
- NREM: SWR replay + slow oscillations
- REM: random reactivation for integration
- Synaptic homeostasis: downscaling after sleep
- BG selects cognitive actions: RETRIEVE / MULTI_HOP / INFER / WAIT
- Working Memory / Semantic Memory / Episodic Memory routing
- Integrated with PFC for routing
- Remove hardcoded
VERB_FORMSdict - Morphology via learning (like children)
- Links goes↔went via shared context
- DG Pattern Separation without hash()
- Sparse coding: 5:1 compression, WTA
- Scaling to 50K+ articles
- Pruning at the connection limit — automatic removal of old connections
- Multimodality — visual input, modality binding
git clone https://github.com/sss777999/Brain.git
cd Brain
uv sync # or pip install -e .# Test (100 articles, ~10 seconds)
PYTHONPATH=. python train.py
# Full training (10K articles, ~30-40 minutes)
# Change in train.py: max_articles=10000, max_sentences=500000python3 test_brain.py # ALL tests (curriculum + grade1)
python3 test_brain.py --curriculum # Curriculum-only tests
python3 test_brain.py --grade1 # Grade 1 tests only
python3 test_brain.py --train # Train a single model (curriculum → grade1)
python3 test_brain.py --strict # Hard tests with correctness checks
python3 test_brain.py --raw # Without LLM post-processingpython3 test_brain.py --train # Full pipeline: curriculum → grade1 → brain_model
python3 train.py full # Alternative training methodfrom graph_storage import GraphStorage
storage = GraphStorage.load('graph')
print(storage.get_neighbors('science', min_state=1)[:10])| File | Purpose |
|---|---|
neuron.py |
Neuron class (binary state) |
connection.py |
Connection class with STDP (forward/backward) |
activation.py |
Activation logic (lightning over connections) |
graph_storage.py |
NumPy graph storage (fast loading) |
train.py |
Training on FineWeb-Edu |
| File | Description |
|---|---|
graph_edges.npz |
Connections (src, dst, state, forward, backward) |
graph_vocab.pkl |
Word vocabulary |
- Biological plausibility — everything as in the brain, without artificial computations
- Locality — a neuron knows only its neighbors; there is no global observer
- Discreteness — states, not numbers
- Natural selection — frequently used connections strengthen, rare ones die off
- Patterns — memory = connection structure, not values
Memory is like a landscape with grooves. Activation is like a ball rolling along those grooves. Myelinated connections are deep grooves. The ball rolls where the strengthened paths lead.
- World knowledge: "a cat meows", "the sun is a star"
- Not tied to time/place
- Stored in the cortex (cortex)
- Hippocampus as a temporary buffer for new events (
hippocampus.py) - DG (Dentate Gyrus) — pattern separation (sparse coding, ~2%)
- CA3 — pattern completion (reconstruction from a partial cue)
- SWR (Sharp Wave-Ripples) — replay and consolidation during sleep
- Episodes store
input_words(word order) for correct generation - Consolidation via
sleep()— strengthening connections and myelination - 64,013 episodes in the current model (26,160 CONSOLIDATED)
Episodic memory (hippocampus)
↓ consolidation (replay)
Semantic memory (cortex)
- "pressed the button" → "the light turned on"
- For reasoning, not for memory
- Requires understanding of time and agency
- Word order in a sentence
- For text generation
- The next step after memory
| Forbidden | Why |
|---|---|
| Numeric connection weights (0.37, 0.85) | A connection exists or not; at most it has qualitative states |
| Metrics and distances (cosine, dot product) | You cannot choose a pattern by "minimum distance" |
| Optimization (gradients, backprop, loss) | There is no "error as a number"; only stabilization or decay |
| Global search (Dijkstra, BFS, A*) | Activation propagates locally; a neuron knows only its neighbors |
| Probabilistic models (softmax, Bayes) | Randomness only as a source of chaos, not as a knowledge model |
| Deep Learning layers (Linear, ReLU, Transformer) | Structures can be used for storage, but not as carriers of meaning |
| Symbolic rules (if A and B then C) | Logic can exist in control code, but memory is not stored as rules |
| Embedding as a meaning vector | Embedding is only a packaging of structure, not a geometric object |
- NEW → USED: 5 repetitions
- USED → MYELINATED: 50 repetitions
For one word pair (cat→meows):
- Need 50 sentences where both words occur together
For 100 basic facts:
- Need ~50,000 sentences
This matches real learning:
- A child hears "cat meows" hundreds of times
- Before it becomes stable knowledge
- Synthetic dataset: works (repetitions are manually specified)
- Real dataset: 580 sentences (too few, connections do not reach thresholds)
- Needed: real texts (Wikipedia, books, FineWeb)
meows
⚡
│
fluffy ══⚡══ CAT ══⚡══ pet
│
⚡
animal
│
→
mammal
│
→
lion (also a feline!)
"The cat is fluffy" → connection cat↔fluffy +1
"A fluffy cat" → connection cat↔fluffy +1 (the same connection!)
"The cat is soft and fluffy" → connection cat↔fluffy +1
After 50 repetitions: cat ══⚡══> fluffy (MYELINATED)
- Connections only strengthen or weaken
- New information adds new connections
- Repeated information strengthens existing ones
- Unused connections → PRUNE (forgetting)
📍 STEP 1: einstein⚡ → theory, relativity
📍 STEP 2: theory⚡ → darwin, evolution
📍 STEP 3: darwin + einstein → scientist
📍 STEP 4: scientist⚡ → physicist, biologist
📍 STEP 5: physicist + scientist → newton
🧠 FINAL PATTERN:
⚡ Precise knowledge: darwin, relativity, theory, scientist, physicist, evolution
→ Associations: biologist, newton, evolution
══⚡══>— myelinated connection (precise knowledge)──────>— USED connection (association)+— co-activation from multiple sources
- Initially we thought: only forward in word order
- But: "meows" should activate "cat" (backward association)
- Decision: connections in both directions as independent synapses
- Temptation: lower thresholds for a small dataset
- But: that is artificial, not like the brain
- Decision: increase the data, do not lower thresholds
- Temptation: artificially treat "cat" = "cats" = "cat (acc.)"
- But: in the brain these are different forms linked through experience
- Decision: learn it naturally from data
- Spike-Timing-Dependent Plasticity — biological mechanism
- Connections now store
forward_usageandbackward_usage - Word order matters: "cat meows" ≠ "meows cat"
- This enables generating text in the correct order
"Neurons that fire together wire together" — Donald Hebb, 1949
Hebbian rule: connections form between neurons that are co-active in time.
The original Hebbian rule does not define a "window size". We derive it from biology:
| Parameter | Value | Source |
|---|---|---|
| Reading speed | ~250ms per word | Psycholinguistics |
| Working memory | ~2000ms (~7±2 items) | Miller's law, 1956 |
| Hebbian time window | ~100ms | Neuroscience (STDP) |
Calculation (diluted connectivity, Rolls et al. 2007):
Diluted connectivity: 4 words (HEBBIAN_WINDOW_SIZE)
Window = 4 words — sparser connectivity increases memory capacity and reduces interference.
When reading text, words activate sequentially:
- Words within window (4 words) form connections
- Words beyond that are too far → NO connection
"The cat sat on the mat"
Hebbian rule (window = 4, diluted connectivity):
cat ↔ sat ✓ (within window — connection)
cat ↔ on ✓ (within window — connection)
cat ↔ mat ✗ (beyond window — NO connection)
This is biologically plausible: Connections form between words that a person holds in consciousness at the same time.
- Stop words (prepositions, conjunctions, pronouns) participate in connections with other words
- Stop words do NOT create connections among themselves
- During recall, stop words are filtered out from results
Sentence: "a book lies on the table"
Connections created:
✓ on → table (stop + content)
✓ table → lies (content + content)
✓ lies → book (content + content)
✗ on → and (stop + stop — skip)
When recalling "table":
→ lies, book (stop words filtered out)
- In the brain, a child hears "on the table", "under the table" and learns that "on/under" change meaning
- But by themselves "on", "under" without context are meaningless
- Connections to them are needed, but they should not dominate results
In the brain, strongly activated neurons suppress weakly activated neighboring neurons via inhibitory interneurons.
Implementation:
- Top-N strongest activations are "winners" (not inhibited)
- The rest receive inhibition proportional to the gap from the leader
- The weaker relative to max, the stronger the inhibition
# Code in graph_storage.py process_sentence()
max_score = scored[0][1]
top_n_inhibitors = 5 # Winners
for i, (word, score) in enumerate(scored):
if i < top_n_inhibitors:
# Top-N are not inhibited
inhibited_scores.append((word, score))
else:
# Inhibition proportional to the difference from max
relative_strength = score / max_score
inhibition_factor = 0.5 + 0.5 * relative_strength
final_score = score * inhibition_factorNeurons with many connections have a higher activation threshold:
hub_penalty = 1.0 / math.log1p(num_neighbors + 1)This is biologically plausible: hubs (common words) are less specific and require more input signal to activate.
- Digits are preserved: "1945", "2024", "15th" — these are important data
- Pure numbers ("10", "100") are filtered during recall (too much noise)
- Numbers with letters ("15th", "2024th") remain
- Dates are knowledge: "1945 — the end of the war"
- Numbers in context matter: "100 rubles", "15 kilometers"
- Decision: keep them in the graph; filter pure numbers during recall
| Solution | Speed | Complexity | Scalability |
|---|---|---|---|
| Pickle (old) | ❌ Slow | ✅ Simple | ❌ Poor |
| NumPy arrays | ✅ Fast | ✅ Simple | ✅ Good |
| SQLite | ✅ Good | ||
| Neo4j | ✅ Fast | ❌ Complex | ✅ Excellent |
{
"word_to_id": {"russia": 0, "moscow": 1, ...}, # dict
"id_to_word": ["russia", "moscow", ...], # list
"edges_src": np.array([0, 0, 1, ...]), # int32
"edges_dst": np.array([1, 2, 3, ...]), # int32
"edges_state": np.array([2, 1, 2, ...]), # int8 (0=NEW, 1=USED, 2=MYELINATED)
"edges_usage": np.array([50, 10, 55, ...]), # int32
"neighbors": {0: [(1, 2, 50), ...], ...} # index for fast lookup
}- NumPy is a storage format, not a computation model
- Like a book vs an e-book: the content is the same, the medium is different
- The activation logic remains biologically plausible
2017: Transformer (Vaswani et al.)
- Idea: attention instead of recurrence
- They did not know whether it would scale
- They just tried it
2018-2020: GPT-1 → GPT-2 → GPT-3
- They observed: more parameters = better quality
- Scaling laws (Kaplan et al., 2020): a predictable relationship
- The revolution: "just scale it up and it will work"
Key point: They did not understand WHY it worked. They simply scaled up and observed correlation.
Similarities with early LLMs:
- We also do not know for sure whether it will scale
- We also rely on a hypothesis (biological plausibility = the right path)
- We need to increase data and observe correlation
Differences:
- LLMs are an empirical approach ("scale and see")
- We are a theoretical approach ("do it like the brain")
Strengths:
- Biological foundation — the brain works, so the principle is valid
- Interpretability — we see patterns and understand why
- Efficiency — O(k×s) instead of O(n²×d)
- Incremental learning — no need to retrain the whole model
Weaknesses / unknowns:
- Text generation — the brain generates speech via other mechanisms (Broca's area); we do not model this yet
- Scaling — not validated at millions of neurons
- Quality — not compared against LLMs on real tasks
Neuroscience:
- Memory in the brain really works via strengthening connections (Hebb, 1949)
- Myelination is real and speeds signal conduction
- Hippocampus → cortex is a real consolidation mechanism
Scale:
- The brain has ~86 billion neurons
- We model ~1000 neurons
- A difference of 86 million times
Question: Will emergent properties appear with scaling?
LLMs showed: yes—new capabilities emerge with scaling (in-context learning, reasoning).
Analogy with LLMs:
- GPT-1 was weak but showed the direction
- GPT-2 showed that scaling works
- GPT-3 was a breakthrough
We are currently at the "GPT-1" stage — proof of concept is done; we need to scale.
- Take a real dataset (Russian Wikipedia, ~2 million articles)
- Train the model (millions of sentences)
- Measure:
- How many connections become MYELINATED?
- What patterns form?
- Does recall work on complex queries?
If this shows strong results, we have grounds for a revolution. If not, we will learn what needs to change.
This is what LLM creators did: they scaled and observed.
Pipeline: curriculum → preschool → grade1 → FineWeb-Edu (1000 articles, 40K sentences)
Neurons: 48,301
Connections: 1,453,469
MYELINATED: 19,252 (1.3%)
USED: 77,745 (5.3%)
NEW: 1,356,472
Episodes: 68,947 (30,748 CONSOLIDATED, 2,139 REPLAYED)
CURRICULUM: 49/50 (98.0%)
STRICT: 3/3 (100%)
PRESCHOOL: 46/48 (95.8%)
GRADE1: 64/64 (100%)
FineWeb-Edu: 7/9 (77.8%)
bAbI: 250/250 (100%)
TOTAL: 419/424 (98.8%)
Tests (without retraining):
python3 test_brain.py --no-gpt --no-llm --skip-babi # Fast tests
python3 test_brain.py # All tests with LLMModel training:
python3 test_brain.py --train # Full training pipeline- After scaling (10,000+ articles)
- When recall shows morphology issues
import pymorphy2
morph = pymorphy2.MorphAnalyzer()
# "cat" → ["cat", ""]
# "cats" → ["cat", "s"]
# A shared root can create connections- The brain has areas for morphemes (fusiform gyrus, ~130ms)
- Morphemes are minimal units of meaning
- Enables understanding new words: "reboot" = [re][boot]
If you use this work in your research, please cite:
@article{belyi2026brain,
title={Brain: Structural Memory, Thought Formation, and Language in a Biologically Grounded System},
author={Belyi, Vitalii},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2026},
url={https://github.com/sss777999/Brain}
}Full paper: docs/arxiv.pdf
| Version | Date | Changes |
|---|---|---|
| v1.0 | Jan 24, 2026 | Initial release. 98.8% accuracy (419/424 tests). Hippocampus, PFC, Basal Ganglia, Broca's area, STDP, sleep consolidation. |
MIT License — see LICENSE for details.
This is an open research project. Contributions, suggestions, and collaborations are welcome!
- Issues: Report bugs or suggest features
- Pull requests: Code improvements
- Discussions: Ideas about biological plausibility, new mechanisms
Contact: GitHub Issues