An Autonomous Multimodal AI Learning Companion featuring distraction-free video processing, Indic language dubbing, and mathematically rigorous exam proctoring.
In a world of short attention spans and passive consumption, we built a platform that forces active learning. This platform takes standard educational content (like YouTube videos) and transforms it into a highly structured, distraction-free, and accessible learning environment. Furthermore, it tests that knowledge using an uncompromising, AI-powered proctoring engine.
- Distraction-Free UX: Strips YouTube videos of comments, recommendations, and algorithmic traps using custom iframe parameters.
- Instant Knowledge Extraction: Bypasses manual note-taking by utilizing
youtube-transcript-apito instantly extract video text. - Groq Llama-3 AI Engine: Generates highly structured Markdown notes and interactive
Mermaid.jsmindmaps from the transcript at 800+ tokens per second. - On-the-Fly Indic Dubbing: Translates the entire lecture to Hindi using Groq, synthesizes it into audio using
gTTS, and perfectly synchronizes playback with the user's video controls.
- Head-Pose Tracking: Utilizes
@vladmandic/face-api(TinyFaceDetector) running entirely client-side to detect if a student drops their gaze to look at a phone. - Environment Locking: Native Browser APIs strictly enforce Fullscreen mode and track Page Visibility to instantly catch tab-switching or Googling attempts.
- Merciless Flagging: A built-in tolerance window catches micro-flickers, issuing UI warnings and automatically submitting the exam upon repeated violations.
- Mistakes made during proctored mock tests are not forgotten. They are logged into a MongoDB Vault and rescheduled based on the Ebbinghaus Forgetting Curve to optimize daily revisions.
We utilized a robust microservice approach to ensure fault tolerance during the live demo:
- Frontend: React + Vite + Tailwind CSS.
- Backend: Python FastAPI for heavy asynchronous AI processing and data routing.
- Database: MongoDB (Motor Async Driver).
- AI Routing: * Groq API (Llama 3 70B): Dedicated exclusively to the heavy Multimodal Content Processor for blazing-fast text generation and translation.
- Google Gemini 2.5 Flash: Powers the dynamic Quiz Generation and Vault summarization.
- Node.js (v18+)
- Python (3.10+)
- MongoDB (Local or Atlas URI)
```bash git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name ```
Open a terminal and navigate to the backend folder: ```bash cd backend
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
pip install fastapi uvicorn groq google-generativeai youtube-transcript-api gTTS motor pydantic
cp .env.example .env
```
Configure your .env file:
```env
MONGO_URI=mongodb://localhost:27017
GROQ_API_KEY=your_groq_key_here
GEMINI_API_KEY=your_gemini_key_here
```
Start the Server:
```bash
uvicorn main:app --reload --port 8000
```
Open a terminal and navigate to the frontend folder: ```bash cd frontend
npm install
npm run dev ```
Ensure the TinyFaceDetector neural network weights are present.
Download tiny_face_detector_model-weights_manifest.json and tiny_face_detector_model.weights.bin and place them inside the frontend's public/models directory.
- The WASM Bundler War: We initially attempted to build our proctoring engine using WebGazer.js. However, modern Vite bundlers clashed violently with Google's MediaPipe WASM binaries, throwing continuous 404 errors. We engineered a hard pivot to
@vladmandic/face-api, trading exact pupil-tracking for highly accurate head-pose (phone-checking) tracking, which integrated flawlessly with React. - Audio-Video Synchronization: Making an AI-generated
.mp3file pause, play, and seek perfectly in sync with a third-party YouTube Iframe required writing custom ReactuseRefhooks to forcefully tether the HTML5 audio element's clock to the Iframe's emitted progress events.
Built with late nights and a lot of caffeine by Team kaala khatta.
- Rahul Sahu
- Shourya Sinha
- Ashutosh Behera
Note: The backend is currently configured to use an instant-text testing route via youtube-transcript-api to avoid live stage-demo timeout risks associated with downloading raw YouTube audio.