Skip to content

polyjuicelab/psycial

Repository files navigation

Psycial Logo

MBTI Personality Classifier

CI Rust Accuracy License Hugging Face

Production-grade MBTI personality classification system with neural network implementation in pure Rust.


Performance Summary

Model Method Accuracy vs Random Training Time
V7 🏆 TF-IDF + BERT + Multi-Task GPU 52.05% 8.3x ~50s
V6 BERT + MLP (single-task) 31.99% 5.1x 322s
V1 TF-IDF + Naive Bayes 21.73% 3.5x 2s
V2 9 Psychological Features 21.21% 3.4x 3s
V3 930 Psychological Features 20.12% 3.2x 30s
V5 BERT Only + Cosine 18.39% 2.9x 583s

Random Baseline: 6.25% (16 classes)

V7 Multi-Task Model Breakdown

Dimension Accuracy Notes
E/I 82.77% Extraversion vs Introversion
S/N 88.18% Sensing vs Intuition (best)
T/F 81.67% Thinking vs Feeling
J/P 77.12% Judging vs Perceiving

🤗 Pre-trained Model: Available on Hugging Face


Quick Start

As a Library

Add to your Cargo.toml:

[dependencies]
# With auto-download from Hugging Face
psycial = { version = "0.1", features = ["auto-download"] }

Use in your code:

use psycial::api::Predictor;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let predictor = Predictor::new()?;
    let result = predictor.predict("I love solving complex problems")?;
    println!("Type: {} (confidence: {:.1}%)", 
             result.mbti_type, 
             result.confidence * 100.0);
    Ok(())
}

See full API documentation: cargo doc --open --features auto-download

Option 1: Use Pre-trained Model from Hugging Face (CLI)

Download the trained model and start predicting immediately:

from huggingface_hub import hf_hub_download
import shutil

# Download model files
mlp_weights = hf_hub_download(repo_id="ElderRyan/psycial", filename="mlp_weights_multitask.pt")
vectorizer = hf_hub_download(repo_id="ElderRyan/psycial", filename="tfidf_vectorizer_multitask.json")

# Copy to models directory
shutil.copy(mlp_weights, "models/mlp_weights_multitask.pt")
shutil.copy(vectorizer, "models/tfidf_vectorizer_multitask.json")

Then use the Rust binary for prediction:

./target/release/psycial hybrid predict "Your text here"

Option 2: Train from Scratch

# Show all available models
cargo run --release

# Run baseline model
cargo run --release -- baseline

# Run best model (multi-task hybrid)
cargo run --release -- hybrid train --multi-task

# Run BERT model
cargo run --release -- bert-mlp

See CLI_GUIDE.md for detailed CLI usage.

As a Library

Add to your Cargo.toml:

[dependencies]
psycial = { git = "https://github.com/polyjuicelab/psycial", features = ["bert"] }

Use in your code:

use psycial::{load_data, MultiTaskGpuMLP, RustBertEncoder};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load data and initialize BERT
    let records = load_data("data/mbti_1.csv")?;
    let bert = RustBertEncoder::new()?;
    
    // Extract features
    let texts: Vec<String> = records.iter().map(|r| r.posts.clone()).collect();
    let labels: Vec<String> = records.iter().map(|r| r.mbti_type.clone()).collect();
    let features = bert.extract_features_batch(&texts)?;
    
    // Train multi-task model (4 binary classifiers: E/I, S/N, T/F, J/P)
    let mut model = MultiTaskGpuMLP::new(384, vec![256, 128], 0.001, 0.5);
    model.train(&features, &labels, 20, 32);
    
    // Predict
    let test_features = bert.extract_features("I love planning everything.")?;
    let mbti_type = model.predict(&test_features);
    println!("Predicted: {}", mbti_type);
    
    Ok(())
}

See LIBRARY_USAGE.md for detailed library integration examples.


Key Technical Achievements

1. Neural Network Beats Statistics

Statistical (TF-IDF + NB):    21.73%
Deep Learning (BERT + MLP):   31.99%  (+47% improvement)

Proven: Deep learning can leverage BERT features that simple classifiers cannot.

2. Pure Rust Implementation

  • ✅ MLP with backpropagation
  • ✅ BERT integration (rust-bert)
  • ✅ 930 psychological features
  • ✅ Automatic feature engineering
  • ✅ No Python dependencies for inference

3. Production Ready

  • Clean Rust API
  • Comprehensive error handling
  • Modular architecture
  • Full test coverage
  • Metal GPU support (M1/M2/M3)

Architecture

BERT + MLP (Best Performing)

Text Input
    ↓
BERT Encoder (384-dim embeddings)
    ↓
MLP Neural Network (384 -> 256 -> 128 -> 16)
    ├─ Xavier initialization
    ├─ ReLU activation
    ├─ Softmax output
    └─ SGD optimizer
    ↓
MBTI Type (16 classes)

Features

  • BERT Model: all-MiniLM-L12-v2 (sentence-transformers)
  • Embedding Dim: 384
  • MLP Architecture: 3 layers (256, 128 hidden units)
  • Activation: ReLU
  • Optimizer: SGD with learning rate 0.001
  • Training: 25 epochs, batch size 32

Installation

Requirements

  • Rust 1.70+
  • 8GB RAM minimum
  • 2GB disk space (for models)

Setup

# Clone
git clone <repo>
cd snapMBTI

# Build (downloads libtorch automatically on first build)
cargo build --release --features bert --bin bert-mlp

# This will take ~60 minutes first time (downloading & compiling libtorch)
# Subsequent builds are fast

Usage

Run Best Model

cargo run --release --features bert --bin bert-mlp

Run All Models

# V1: TF-IDF baseline
cargo run --release --bin baseline

# V2-V3: Psychological features
cargo run --release --bin psyattention
cargo run --release --bin psyattention-full

# V5: BERT experiments
cargo run --release --features bert --bin bert-only

# V6: BERT + MLP (best)
cargo run --release --features bert --bin bert-mlp

Technical Details

Dataset

  • Source: MBTI Kaggle Dataset
  • Size: 8,675 samples
  • Split: 80% train (6,940) / 20% test (1,735)
  • Classes: 16 MBTI types
  • Imbalance: INFP (21%), ENTP (~1%)

MLP Implementation

Pure Rust neural network with:

  • Xavier weight initialization
  • Backpropagation
  • Mini-batch SGD
  • ReLU activations
  • Softmax output
  • Cross-entropy loss

Code: src/neural_net.rs (259 lines)

BERT Integration

Library: rust-bert v0.22

  • Automatic model downloads
  • Sentence embeddings
  • Metal GPU support
  • Pure Rust API (libtorch backend)

Code: src/psyattention/bert_rustbert.rs


Project Structure

snapMBTI/
├── src/
│   ├── main.rs                  # V1: Baseline
│   ├── bert_mlp.rs              # V6: BERT + MLP (BEST)
│   ├── bert_only.rs             # V5: BERT experiments
│   ├── neural_net.rs            # MLP implementation
│   └── psyattention/            # Psychological features
│       ├── bert_rustbert.rs     # BERT encoder
│       ├── seance.rs            # 271 emotion features
│       ├── taaco.rs             # 168 coherence features
│       ├── taales.rs            # 491 complexity features
│       └── ...
├── data/mbti_1.csv              # Dataset
└── docs/                        # Documentation

Performance Analysis

Why Neural Network Works

BERT + k-NN: 18.39%

  • High-dimensional features
  • Simple nearest-neighbor
  • Cannot learn complex patterns
  • Result: Underutilizes BERT

BERT + MLP: 31.99%

  • High-dimensional features
  • Neural network classifier
  • Learns non-linear decision boundaries
  • Result: Properly utilizes BERT

Benchmarks (M1 Max)

Model Compilation Training Inference Accuracy
Baseline 30s 2s <1s 21.73%
BERT + MLP 60min (first) 322s 61s 31.99%

*First compilation downloads libtorch (~500MB)


Dependencies

[dependencies]
csv = "1.3"
serde = "1.0"
rand = "0.8"
ndarray = "0.16"

# Optional: BERT support
rust-bert = { version = "0.22", optional = true, features = ["download-libtorch"] }

Citation

@software{snapMBTI2025,
  title={MBTI Personality Classifier with Neural Networks},
  author={Ryan Kung},
  year={2025},
  note={Pure Rust, 31.99\% accuracy, BERT + MLP implementation}
}

License

AGPL-3.0

See LICENSE file for details.


Acknowledgments

  • rust-bert: Guillaume BE
  • Dataset: MBTI Kaggle Dataset

Status: Production Ready
Best Model: BERT + MLP (31.99%)
Platform: macOS (Metal GPU), Linux, Windows
Language: 100% Rust

About

Production-grade MBTI personality classifier with GPU-accelerated multi-task neural networks in Rust (52.05% accuracy)

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages