Das-Detects is a real-time desktop application designed to monitor VoIP calls (like WhatsApp) and detect whether the caller is using an AI-generated voice or a real human voice. It protects users from AI voice cloning scams by actively listening to the call, running inferences using local machine learning models (CNN and GMM), and checking the caller's ID against a community-flagged blocklist stored in Supabase.
The system operates through a series of multi-threaded, coordinated modules to achieve real-time monitoring without interrupting the user's system performance.
A background thread continuously polls the Windows Audio Session API (WASAPI) using pycaw and comtypes to detect active VoIP applications (e.g., WhatsApp).
- It uses a heuristic hysteresis pattern: a call is only marked "Active" when both VoIP audio reproduction AND microphone activity are detected. This prevents false positives when a user is merely listening to a voice note.
- It fires asynchronous callbacks (
on_call_start,on_call_end) to wake up the audio recording and UI subsystems only when needed.
Once triggered by the VoIP Monitor, the audio recorder loops over the system's loopback interface using soundcard.
- Audio is captured in fixed temporal frames (default 2.5 to 3.0 seconds) at a 48kHz sample rate.
- Frames are placed into a non-blocking queue for immediate processing, minimizing memory footprint and latency.
The system processes audio frames using two parallel machine learning models to maximize accuracy and robustness against noise.
- Silence Pruning: Audio chunks exhibiting >40% silence (based on RMS thresholds) are fast-discarded to save compute cycles.
- CNN Classifier (
tflite_inferencer.py):- Extracts fixed-length Mel-spectrogram features (60 mel bands, 512 FFT) representing 2.5s of audio.
- Normalizes the input to -20 dB RMS.
- Runs a quantized TensorFlow Lite (
.tflite) model in a dedicated background worker thread to yield an AI-likelihood probability.
- GMM Classifier (
gmm_inferencer.py):- Extracts Linear Frequency Cepstral Coefficients (LFCC) from the audio frames.
- Computes the log-likelihood scores across two pre-trained Gaussian Mixture Models (
.pkl): one modeling human speech and one modeling AI speech. - Converts the likelihood differential into a bounded confidence score via a sigmoid mapping.
To prevent the UI from flickering wildly due to instantaneous false classifications, the DecisionEngine acts as the smoothing layer.
- Uses a
deque-based rolling buffer (default capacity of 5 frames). - Computes a weighted ensemble average of the CNN score (default weight ~60-70%) and the GMM score (default weight ~30-40%).
- Emits three possible classifications:
Human(< 40% AI probability),Suspicious(40%–70%), andAI Generated(> 70%). - Throws targeted notification flags ONLY on confirmed state transitions to avoid notification spam.
Provides crowd-sourced scammer protection.
- Extraction: Uses the
uiautomationlibrary to scan the Windows UI tree specifically searching for active WhatsApp call windows, traversing up to 15 layers deep to extract the unformatted text. - Validation & Privacy: Cleans the text and asserts it is a pure E.164 phone number. It is then hashed (SHA-256) locally. Pure contact names (saved numbers) are ignored.
- Supabase Cloud Sync: Checks the hash against a remote Supabase Postgres instance via REST. If a match is found (
strike_count > 0), the UI throws an immediate red warning. If the current call is deemed AI by the local Decision Engine, the system automatically upserts the hash to the cloud server, incrementing itsstrike_countat call termination.
Tying the backend together is a premium, dark-mode customtkinter interface.
- Utilizes smooth frame-rate regulated
after()events to animate a faux-waveform visualization driven by a sine-phase math generator. - Animates color transitions using linear interpolation (
lerp) when buttons are clicked or when confidence states traverse safe (green), suspicious (yellow), or danger (red) zones. - Includes OS-level push notifications (
plyer) summarizing the call once disconnected.
voice_call_detection_ui.py: Main app entry point.requirements.txt: Python package dependencies.project.md: Project summary and documentation.
Detailed sub-modules handling discrete business logic as described above (audio_recorder.py, voip_monitor.py, decision_engine.py, etc.).
- Root directory contains the latest
.tfliteCNN models. models/gmm/: Containsgmm_ai.pklandgmm_human.pkl.
train_pipeline.py,augment.py,diagnose_dataset.py,cnn_classifier.py: Standalone scripts to preprocess audio data, apply synthetic augmentations, and train the master.h5Keras models.convert_tflite.py: Compresses and quantizes the trained models into the production.tfliteformat.batch_predict.py: CLI testing script simulating bulk evaluations.
- Ensure Python 3.9+ is installed (Windows recommended).
- Clone the repository.
- Install dependencies via
pip install -r requirements.txt. - Ensure
models/directory contains at least one.tfliteand the required.pklfiles inmodels/gmm/. - Set
SUPABASE_URLandSUPABASE_KEYin your.env(optional, for blocklist integration). - Run
python voice_call_detection_ui.py. .