MBTI Personality Classifier

Production-grade MBTI personality classification system with neural network implementation in pure Rust.

Performance Summary

Model	Method	Accuracy	vs Random	Training Time
V7	🏆 TF-IDF + BERT + Multi-Task GPU	52.05%	8.3x	~50s
V6	BERT + MLP (single-task)	31.99%	5.1x	322s
V1	TF-IDF + Naive Bayes	21.73%	3.5x	2s
V2	9 Psychological Features	21.21%	3.4x	3s
V3	930 Psychological Features	20.12%	3.2x	30s
V5	BERT Only + Cosine	18.39%	2.9x	583s

Random Baseline: 6.25% (16 classes)

V7 Multi-Task Model Breakdown

Dimension	Accuracy	Notes
E/I	82.77%	Extraversion vs Introversion
S/N	88.18%	Sensing vs Intuition (best)
T/F	81.67%	Thinking vs Feeling
J/P	77.12%	Judging vs Perceiving

🤗 Pre-trained Model: Available on Hugging Face

Quick Start

As a Library

Add to your Cargo.toml:

[dependencies]
# With auto-download from Hugging Face
psycial = { version = "0.1", features = ["auto-download"] }

Use in your code:

use psycial::api::Predictor;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let predictor = Predictor::new()?;
    let result = predictor.predict("I love solving complex problems")?;
    println!("Type: {} (confidence: {:.1}%)", 
             result.mbti_type, 
             result.confidence * 100.0);
    Ok(())
}

See full API documentation: cargo doc --open --features auto-download

Option 1: Use Pre-trained Model from Hugging Face (CLI)

Download the trained model and start predicting immediately:

from huggingface_hub import hf_hub_download
import shutil

# Download model files
mlp_weights = hf_hub_download(repo_id="ElderRyan/psycial", filename="mlp_weights_multitask.pt")
vectorizer = hf_hub_download(repo_id="ElderRyan/psycial", filename="tfidf_vectorizer_multitask.json")

# Copy to models directory
shutil.copy(mlp_weights, "models/mlp_weights_multitask.pt")
shutil.copy(vectorizer, "models/tfidf_vectorizer_multitask.json")

Then use the Rust binary for prediction:

./target/release/psycial hybrid predict "Your text here"

Option 2: Train from Scratch

# Show all available models
cargo run --release

# Run baseline model
cargo run --release -- baseline

# Run best model (multi-task hybrid)
cargo run --release -- hybrid train --multi-task

# Run BERT model
cargo run --release -- bert-mlp

See CLI_GUIDE.md for detailed CLI usage.

As a Library

Add to your Cargo.toml:

[dependencies]
psycial = { git = "https://github.com/polyjuicelab/psycial", features = ["bert"] }

Use in your code:

use psycial::{load_data, MultiTaskGpuMLP, RustBertEncoder};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load data and initialize BERT
    let records = load_data("data/mbti_1.csv")?;
    let bert = RustBertEncoder::new()?;
    
    // Extract features
    let texts: Vec<String> = records.iter().map(|r| r.posts.clone()).collect();
    let labels: Vec<String> = records.iter().map(|r| r.mbti_type.clone()).collect();
    let features = bert.extract_features_batch(&texts)?;
    
    // Train multi-task model (4 binary classifiers: E/I, S/N, T/F, J/P)
    let mut model = MultiTaskGpuMLP::new(384, vec![256, 128], 0.001, 0.5);
    model.train(&features, &labels, 20, 32);
    
    // Predict
    let test_features = bert.extract_features("I love planning everything.")?;
    let mbti_type = model.predict(&test_features);
    println!("Predicted: {}", mbti_type);
    
    Ok(())
}

See LIBRARY_USAGE.md for detailed library integration examples.

Key Technical Achievements

1. Neural Network Beats Statistics

Statistical (TF-IDF + NB):    21.73%
Deep Learning (BERT + MLP):   31.99%  (+47% improvement)

Proven: Deep learning can leverage BERT features that simple classifiers cannot.

2. Pure Rust Implementation

✅ MLP with backpropagation
✅ BERT integration (rust-bert)
✅ 930 psychological features
✅ Automatic feature engineering
✅ No Python dependencies for inference

3. Production Ready

Clean Rust API
Comprehensive error handling
Modular architecture
Full test coverage
Metal GPU support (M1/M2/M3)

Architecture

BERT + MLP (Best Performing)

Text Input
    ↓
BERT Encoder (384-dim embeddings)
    ↓
MLP Neural Network (384 -> 256 -> 128 -> 16)
    ├─ Xavier initialization
    ├─ ReLU activation
    ├─ Softmax output
    └─ SGD optimizer
    ↓
MBTI Type (16 classes)

Features

BERT Model: all-MiniLM-L12-v2 (sentence-transformers)
Embedding Dim: 384
MLP Architecture: 3 layers (256, 128 hidden units)
Activation: ReLU
Optimizer: SGD with learning rate 0.001
Training: 25 epochs, batch size 32

Installation

Requirements

Rust 1.70+
8GB RAM minimum
2GB disk space (for models)

Setup

# Clone
git clone <repo>
cd snapMBTI

# Build (downloads libtorch automatically on first build)
cargo build --release --features bert --bin bert-mlp

# This will take ~60 minutes first time (downloading & compiling libtorch)
# Subsequent builds are fast

Usage

Run Best Model

cargo run --release --features bert --bin bert-mlp

Run All Models

# V1: TF-IDF baseline
cargo run --release --bin baseline

# V2-V3: Psychological features
cargo run --release --bin psyattention
cargo run --release --bin psyattention-full

# V5: BERT experiments
cargo run --release --features bert --bin bert-only

# V6: BERT + MLP (best)
cargo run --release --features bert --bin bert-mlp

Technical Details

Dataset

Source: MBTI Kaggle Dataset
Size: 8,675 samples
Split: 80% train (6,940) / 20% test (1,735)
Classes: 16 MBTI types
Imbalance: INFP (21%), ENTP (~1%)

MLP Implementation

Pure Rust neural network with:

Xavier weight initialization
Backpropagation
Mini-batch SGD
ReLU activations
Softmax output
Cross-entropy loss

Code: src/neural_net.rs (259 lines)

BERT Integration

Library: rust-bert v0.22

Automatic model downloads
Sentence embeddings
Metal GPU support
Pure Rust API (libtorch backend)

Code: src/psyattention/bert_rustbert.rs

Project Structure

snapMBTI/
├── src/
│   ├── main.rs                  # V1: Baseline
│   ├── bert_mlp.rs              # V6: BERT + MLP (BEST)
│   ├── bert_only.rs             # V5: BERT experiments
│   ├── neural_net.rs            # MLP implementation
│   └── psyattention/            # Psychological features
│       ├── bert_rustbert.rs     # BERT encoder
│       ├── seance.rs            # 271 emotion features
│       ├── taaco.rs             # 168 coherence features
│       ├── taales.rs            # 491 complexity features
│       └── ...
├── data/mbti_1.csv              # Dataset
└── docs/                        # Documentation

Performance Analysis

Why Neural Network Works

BERT + k-NN: 18.39%

High-dimensional features
Simple nearest-neighbor
Cannot learn complex patterns
Result: Underutilizes BERT

BERT + MLP: 31.99%

High-dimensional features
Neural network classifier
Learns non-linear decision boundaries
Result: Properly utilizes BERT

Benchmarks (M1 Max)

Model	Compilation	Training	Inference	Accuracy
Baseline	30s	2s	<1s	21.73%
BERT + MLP	60min (first)	322s	61s	31.99%

*First compilation downloads libtorch (~500MB)

Dependencies

[dependencies]
csv = "1.3"
serde = "1.0"
rand = "0.8"
ndarray = "0.16"

# Optional: BERT support
rust-bert = { version = "0.22", optional = true, features = ["download-libtorch"] }

Citation

@software{snapMBTI2025,
  title={MBTI Personality Classifier with Neural Networks},
  author={Ryan Kung},
  year={2025},
  note={Pure Rust, 31.99\% accuracy, BERT + MLP implementation}
}

License

AGPL-3.0

See LICENSE file for details.

Acknowledgments

rust-bert: Guillaume BE
Dataset: MBTI Kaggle Dataset

Status: Production Ready
Best Model: BERT + MLP (31.99%)
Platform: macOS (Metal GPU), Linux, Windows
Language: 100% Rust

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
data		data
examples		examples
huggingface		huggingface
scripts		scripts
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
config.toml		config.toml
logo.png		logo.png

Folders and files

Latest commit

History

Repository files navigation

MBTI Personality Classifier

Performance Summary

V7 Multi-Task Model Breakdown

Quick Start

As a Library

Option 1: Use Pre-trained Model from Hugging Face (CLI)

Option 2: Train from Scratch

As a Library

Key Technical Achievements

1. Neural Network Beats Statistics

2. Pure Rust Implementation

3. Production Ready

Architecture

BERT + MLP (Best Performing)

Features

Installation

Requirements

Setup

Usage

Run Best Model

Run All Models

Technical Details

Dataset

MLP Implementation

BERT Integration

Project Structure

Performance Analysis

Why Neural Network Works

Benchmarks (M1 Max)

Dependencies

Citation

License

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages