Bot Detection API

A BERT-based AI-powered bot detection service built with FastAPI. This API analyzes lists of texts and predicts whether they are from a AI-bot or human user.

Features

BERT-based inference: Uses finetuned mBERT model from HuggingFace for classification
Mean probability prediction: Aggregates individual text scores and applies a configurable threshold
Resource management: Limits CPU core usage for consistent performance
Input validation: Configurable minimum text count for predictions
Interactive API docs: Automatically generated Swagger UI for easy testing

Installation

Clone the repository:

git clone <repo-url>
cd AI-powered-bot-detection-API

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Configuration

All configuration parameters are defined in config/constants.py:

MIN_TEXT_COUNT (default: 20): Minimum number of texts required for prediction
BOT_PROBABILITY_THRESHOLD (default: 0.4): Threshold for binary classification (0-1)
MODEL_NAME (default: trokhymovych/mbert-ai-bot-detector): Hugging Face repo id or local model path
MAX_CPU_CORES (default: 4): Maximum CPU cores torch will use
BATCH_SIZE (default: 32): Batch size for model processing

Edit these values to customize the API behavior.

The model comes from Hugging Face:

trokhymovych/mbert-ai-bot-detector

For development, you can reference the Hugging Face repo id directly. Transformers will download/cache it automatically on first startup:

MODEL_NAME=trokhymovych/mbert-ai-bot-detector python main.py

For the server, download the model once and run from the local copy so restarts do not depend on Hugging Face availability:

huggingface-cli download trokhymovych/mbert-ai-bot-detector --local-dir models/mbert-ai-bot-detector
MODEL_NAME=models/mbert-ai-bot-detector python main.py

The deployed server .env should use:

MODEL_NAME=models/mbert-ai-bot-detector
BOT_PROBABILITY_THRESHOLD=0.4

If MODEL_NAME points to a local path (for example data/mbert_trained), that directory must exist.

Running the API

Start the FastAPI server:

python main.py

The API will be available at http://localhost:8000

Interactive Documentation

Swagger UI: http://localhost:8000/docs

API Endpoints

POST `/predict`

Predict if texts belong to a AI-bot.

Request:

{
  "texts": [
    "This is the first text.",
    "This is the second text.",
    "..."
  ]
}

Response:

{
  "is_bot": false,
  "confidence": 0.2345,
  "text_scores": [0.1234, 0.2456, ...],
  "num_texts": 3
}

Parameters:

texts (required): List of texts to analyze (minimum 20 texts required)
is_bot (boolean): Binary prediction (True = bot, False = human)
confidence (float): Mean probability score (0-1)
text_scores (array): Individual scores for each text provided
num_texts (integer): Number of texts processed

Error Responses:

400 Bad Request: Too few texts (< 20)
422 Unprocessable Entity: Invalid input format
500 Internal Server Error: Model inference failed

GET `/health`

Health check endpoint.

Response:

{
  "status": "healthy",
  "model_loaded": true
}

GET `/model-info`

Get current model information.

Response:

{
  "model_name": "<>",
  "threshold": 0.4,
}

Project Structure

/
├── config/
│   ├── __init__.py
│   └── constants.py              # Configuration parameters
├── models/
│   ├── __init__.py
│   └── bot_detector.py           # Core BERT inference logic
├── api/
│   ├── __init__.py
│   └── data_models.py              # Pydantic models for request/response validation
├── utils/
│   ├── __init__.py
│   └── exceptions.py             # Custom exceptions
├── main.py                       # FastAPI app and endpoint definitions
├── requirements.txt              # Python dependencies
├── README.md                     # This file
└── .gitignore                    # Git ignore rules

How It Works

Input Validation: Texts are validated (minimum 20 required)
Inference: Model runs inference on batches of texts
Probability Extraction: Positive class probabilities are extracted
Aggregation: Mean probability is calculated across all texts
Classification: Binary label is determined using configurable threshold
Response: Results are returned with individual scores and aggregated prediction

Resource Management

The API limits CPU core usage using PyTorch's torch.set_num_threads() to the value specified in MAX_CPU_CORES. This prevents resource exhaustion when running multiple inference requests.

Performance Considerations

Batch Processing: Texts are processed in batches (default: 32) for efficiency
CPU Limiting: Torch is restricted to use only MAX_CPU_CORES cores
Lazy Loading: Model is loaded on server startup for faster response times

Troubleshooting

Out of Memory

If you encounter memory issues:

Reduce BATCH_SIZE in constants.py
Reduce MAX_CPU_CORES if running multiple instances

Slow Inference

Increase BATCH_SIZE for better throughput
Ensure MAX_CPU_CORES is set appropriately for your hardware

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bot Detection API

Features

Installation

Configuration

Running the API

Interactive Documentation

API Endpoints

POST `/predict`

GET `/health`

GET `/model-info`

Project Structure

How It Works

Resource Management

Performance Considerations

Troubleshooting

Out of Memory

Slow Inference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
api		api
config		config
models		models
notebooks		notebooks
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Bot Detection API

Features

Installation

Configuration

Running the API

Interactive Documentation

API Endpoints

POST /predict

GET /health

GET /model-info

Project Structure

How It Works

Resource Management

Performance Considerations

Troubleshooting

Out of Memory

Slow Inference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

POST `/predict`

GET `/health`

GET `/model-info`

Packages