🧠 Single-Bit Quantization in NLP: Experiments & Insights

📌 Overview

Microsoft’s BitNet b1.58 — a 1.58-bit large-language model (LLM) capable of running on commodity CPUs — reignited interest in ultra-low-precision inference.
Inspired by this work, I explored single-bit (and ternary) quantization on the SST-2 sentiment-analysis task.
This repo walks through eight progressively refined approaches, starting from scratch-built transformers and culminating in a quantized, fine-tuned BERT.

🧪 Experimentation Strategy

Baseline – scratch-built transformer + 1-bit weights.
Incremental tricks – add positional encoding, dropout, mixed precision, QAT.
Advanced tricks – median-scaling, Straight-Through Estimator (STE), progressive Mixed-precision Quantization (MoQ).
Pre-trained models – swap in BERT, then apply STE / ternary + activation quantization.

At each stage I addressed shortcomings of the previous approach while monitoring accuracy/F1, model size, and training stability.

🔍 Detailed Approaches

Approach 1 – Simple Quantized Transformer (Classifier)

Goal – prove 1-bit feasibility.
Key steps
- Scratch implementation of a miniature Transformer encoder.
- Replaced all linear layers with custom BitLinear (sign-only weights).
- Adam + CE loss; no fancy schedulers.
Results –
Accuracy 76.38 % | F1 76.38 %
Takeaway – works, but capacity is tiny and no positional clues → limited ceiling.

Approach 2 – + Positional Encoding & Mixed Precision

Added sinusoidal PE, automatic mixed precision (AMP), scheduler + grad-clip.
Results – 62.27 % / 61.26 %.
Why worse? AMP introduced instability with sign-only weights; capacity still low.

Approach 3 – + Dropout & Quantization-Aware Training (QAT)

Injected dropout; trained with fake-quant ops (PyTorch QAT).
Results – 63.88 % / 63.65 %.
Takeaway – tiny bump; still under-fits.

Approach 4 – Median Scaling + Straight-Through Estimator (STE)

Normalised activations via median scaling; back-prop with STE.
Results – 69.84 % / 69.65 %.
Takeaway – big jump → scaling + STE help gradients flow in 1-bit nets.

Approach 5 – Variant of (4)

Tweaked scaling factor & clipping range.
Results – 70.76 % / 70.66 %.
Takeaway – careful hyper-tuning matters even in low-bit land.

Approach 6 – Multi-Head Attention & Progressive MoQ

Upgraded to full MH-Attention encoder; progressively lowered precision (8→4→1-bit) during fine-tuning.
Results – 70.18 % / 70.15 %.
Takeaway – capacity ↑, but extra heads partly cancelled by quantization loss.

Approach 7 – Pre-trained BERT (+ STE)

Started from bert-base-uncased; swapped every dense/attn projection to BitLinear; STE for back-prop.
Results – 85.67 % / 85.65 % (best).
Takeaway – pre-training supplies strong linguistic priors; 1-bit layers fine-tune well with STE.

Approach 8 – Ternary BERT (+ Activation Quant)

Pushed further: ternary weights {-1,0,+1} + per-layer activation quant + sub-layer norm.
Results – 50.92 % / 34.36 %.
Takeaway – too aggressive; activation quant hurt expressive power.

⏱️ Note: Each approach in this repository was trained for only 3 epochs due to time and resource constraints. Despite this, the results already reveal promising trends in low-bit training. I believe the community can build on these implementations — running longer training schedules, tuning hyperparameters, and applying these ideas to larger tasks — to unlock even better performance and deeper insights into ultra-low precision NLP.

📊 Result Table

#	Model / Technique	Acc.	F1
1	Scratch Transformer + 1-bit weights	76.38	76.38
2	+ PosEnc & AMP	62.27	61.26
3	+ Dropout & QAT	63.88	63.65
4	+ Median Scaling & STE	69.84	69.65
5	Variant of 4	70.76	70.66
6	+ MH-Attention & Progressive MoQ	70.18	70.15
7	BERT-base + STE-quantized	85.67	85.65
8	Ternary BERT (+ Activation Quant)	50.92	34.36

⚠️ Notebook Rendering Issue on GitHub

Due to a known compatibility issue with Jupyter widgets metadata (metadata.widgets.state missing), GitHub is currently unable to render the notebook properly on the web interface.

📌 Workaround:
To view and run the notebook without errors, please clone the repository locally and open the notebook in VS Code, JupyterLab, or another local IDE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
SingleBitLLMs.ipynb		SingleBitLLMs.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Single-Bit Quantization in NLP: Experiments & Insights

📌 Overview

🧪 Experimentation Strategy

🔍 Detailed Approaches

📊 Result Table

⚠️ Notebook Rendering Issue on GitHub

About

Uh oh!

Releases

Packages

Languages

License

palindromeRice/SingleBitLLMs

Folders and files

Latest commit

History

Repository files navigation

🧠 Single-Bit Quantization in NLP: Experiments & Insights

📌 Overview

🧪 Experimentation Strategy

🔍 Detailed Approaches

📊 Result Table

⚠️ Notebook Rendering Issue on GitHub

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages