ContentOS Benchmark

A reproducible bilingual (English + Russian) AI-text-detection ensemble with adversarial robustness evaluation.

Headline numbers (unchanged in v1.1.0, baseline from v1.0.2 2026-04-29):

EN OOD AUROC: 0.864 (176-sample expanded smoke battery)

RU OOD AUROC: 0.846

EN Adversarial AUROC: 0.998 on 300-sample paraphrase-paired set

p50 latency: 1.2 s on 8-core CPU, no GPU

v1.1 (2026-05-01) adds: 17-fixture regression corpus (was 6), Yandex Дзен engine profile (6 engines now), per-engine A/B discrimination harness, KG-lite L0.5 factcheck shortcut, 50-claim factcheck golden set, calibration v2.0 corpus harvest scripts, brief-quality drift detector. Full notes in CHANGELOG.md.

What this is

Open-source benchmark + reproduction code for the ContentOS preprint:

Pre-print: benchmark/paper.pdf — full methodology, 9 sections + 5 appendices (~6,000 words)
Calibration corpus + JSON + figures: HF Dataset
This repo: evaluation scripts, regression test suite, atomic-swap deploy templates, methodology notes
Production API: the ensemble runs inside ContentOS. Full HTTP API surface (70 endpoints incl. /analyze/ai-detect-ensemble, /factcheck, /aeo-score) documented in humanswith-ai/contentos-api-docs (private, HWAI team).

Why this matters

Commercial AI-text detectors (Originality, GPTZero, Winston) publish "99% accuracy" claims on closed corpora that nobody can verify. Independent peer-reviewed evaluations show those numbers drop to 0.70–0.88 AUROC on out-of-distribution text and below 0.65 under paraphrase attack.

Our claim is different. Clone this repo, run the regression suite in 0.05 seconds, and get bit-identical numbers to those reported in the paper. The defensible moat in 2026 AI-text detection is reproducibility — not vendor accuracy claims on proprietary data.

Authors and affiliation

Gregory Shevchenko — author, founder of Humanswith.ai
Humanswith.ai team — methodology, calibration, evaluation infrastructure

Reproducibility

See REPRODUCIBILITY.md for the full method.

Quick start:

# 1. Clone + install (Python 3.10+)
git clone https://github.com/humanswith-ai/contentos-benchmark
cd contentos-benchmark
pip install -r requirements.txt

# 2. Pull corpus from Hugging Face
python -c "from huggingface_hub import snapshot_download; \
  snapshot_download(repo_id='Humanswith-ai/contentos-preprint', \
  repo_type='dataset', local_dir='./hf_corpus')"

# 3. Run regression suite (0.05 seconds, validates 8 pinned baselines)
pytest tests/test_calibration_regression.py -v

# 4. (Optional) Run live evaluation against your own ml-services-hwai
export ML_SERVICES_URL=http://your-ml-host:3300
export ML_SERVICES_API_KEY=cqa_yourkey
python scripts/eval_ensemble_corpus.py

Citation

@misc{contentos2026,
  title={ContentOS: A Reproducible Bilingual AI-Text-Detection Ensemble with Adversarial Robustness Evaluation},
  author={Shevchenko, Gregory and Humanswith.ai team},
  year={2026},
  url={https://huggingface.co/datasets/Humanswith-ai/contentos-preprint},
}

License

MIT. See LICENSE. Underlying calibration data sources retain their original licenses (HC3, AINL-Eval-2025, ai-text-detection-pile).

Contact

Issues + PRs: this repo
Discussions: HF Dataset
Email: open an issue with [contact] tag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContentOS Benchmark

What this is

Why this matters

Authors and affiliation

Reproducibility

Citation

Related

License

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmark		benchmark
calibration		calibration
corpus		corpus
scripts		scripts
tests		tests
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
REPRODUCIBILITY.md		REPRODUCIBILITY.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ContentOS Benchmark

What this is

Why this matters

Authors and affiliation

Reproducibility

Citation

Related

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages