Manasa Deshagouni ManasaDeshagouni

`~/about`

I'm Manasa, a Master's student in Computer Science at San José State University and a Graduate Research Assistant working on zero-shot malware attribution with metric-learned embeddings.

Previously, I spent 2+ years at Optum (UnitedHealth Group) shipping production ML, GenAI features, and backend systems for a secure file-transfer platform.

I care about the part of ML that actually gets used: retrieval, inference, guardrails, evaluation, rollout safety, and the systems that turn predictions into action.

🔬 Currently researching: zero-shot retrieval, metric learning, FAISS-based search
🧠 Interested in: production ML, GenAI/RAG, embeddings, backend systems
🤝 Open to: SWE, MLE, Applied ML, and Research Engineer roles

`projects/featured`

🧠 Digital Life Curator

  ╭──────────────────────────────────────────╮
  │  🔍  "What was that paper about          │
  │       attention mechanisms I read         │
  │       last month?"                        │
  │                                           │
  │  Found 3 results in 47ms                  │
  │  ├── 📄 attention_is_all_you_need.pdf     │
  │  │   ✦ 0.94 relevance · chunk 3/12       │
  │  ├── 📝 transformer_notes.md              │
  │  │   ✦ 0.89 relevance · tagged: #nlp     │
  │  └── 🖼️ architecture_diagram.png          │
  │      ✦ 0.81 relevance · CLIP matched     │
  ╰──────────────────────────────────────────╯

Your second brain, with semantic superpowers.

A local-first AI agent that ingests everything — PDFs, notes, receipts, images, code snippets — and makes it all searchable by meaning, not keywords. Multimodal embeddings (MiniLM for text, CLIP for images), FAISS HNSW index, adaptive retrieval with temporal reranking.

What it handles	How it performs
100k+ documents indexed	< 500ms retrieval
5 features: search, Q&A, summarize, rank, discover	Metadata filtering + temporal reranking

🔔 NotifyOps

  03:14:22 ⚠  ALERT  disk_full on prod-db-03
  03:14:22 ⚠  ALERT  disk_full on prod-db-03   ← duplicate, suppressed
  03:14:23 ⚠  ALERT  disk_full on prod-db-03   ← duplicate, suppressed
  03:14:24 🔔 PAGED  @sarah (on-call: infra)    ← 1.9s from first alert
  03:14:26 ✅ ACK    @sarah acknowledged         ← 220ms ack→resolve
  
  ┌─────────────────────────────┐
  │ 3 alerts → 1 page → 1 ack  │
  │ 58% noise eliminated       │
  └─────────────────────────────┘

Pages the right engineer. Kills the noise.

Multi-tenant on-call alerting SaaS with real-time dedup, correlation, and idempotent workers. JWT/HMAC-secured ingest. React + WebSocket console for live ack/resolve. Runs standalone or as a pre-filter ahead of PagerDuty/Opsgenie.

Metric	Result
First-notify p95	1.9s @ 350 req/s
Duplicate suppression	58%
Delivery success	77% → 94% (retries + backoff + DLQ < 0.6%)
Ack→resolve p95	220ms

🎭 TruthReaper

          AUDIO                          TEXT
            │                              │
   ┌────────▼────────┐          ┌──────────▼──────────┐
   │  Acoustic feats  │          │  DistilBERT + cues  │
   │  → BiLSTM        │          │  → text embedding    │
   └────────┬────────┘          └──────────┬──────────┘
            │         confidence            │
            └──────────┐  ┌────────────────┘
                    ┌──▼──▼──┐
                    │ XGBoost │  ← late fusion
                    │  Fuser  │
                    └────┬───┘
                         │
                    ┌────▼────┐
                    │ TRUTH or │
                    │ DECEPTION│
                    └─────────┘
                    
   accuracy: 89.4%  ·  precision: 93.5%

Your voice says more than your words.

Multimodal deception detection that fuses what you say with how you say it. Temporal acoustic features encoded via BiLSTM, transcript text encoded via DistilBERT with explicit lexical/linguistic cues, late-fused through XGBoost for robust classification on short, noisy clips.

🎮 More builds → QuizChronicles

QuizChronicles

  ┌─ ROOM: "algo-arena" ──────── 4/10 players ─┐
  │                                              │
  │  🟢 alice    142 pts   solving Q3...         │
  │  🟢 bob      138 pts   submitted ✓  180ms   │
  │  🟡 charlie  120 pts   idle                  │
  │  🟢 you      155 pts   🏆 leading            │
  │                                              │
  │  ⏱️ 02:34 remaining                          │
  │  fan-out: 120ms p95 across 800 sockets       │
  └──────────────────────────────────────────────┘

Interactive coding + quiz platform with modular Spring Boot backend, sandboxed code execution, React + Monaco editor, real-time rooms, leaderboards, proctor controls over WebSockets, and a timed game-themed solo coding mode.

Metric	Result
Submission p95	180ms @ 200 users
Fan-out p95	120ms @ 800 sockets
Dropped updates	< 0.5%
Cache speedup	230 → 140ms (−39%)

`work/production`

Note

Can't open-source proprietary code — but here's what I built and what it did.

⚡

Predictive Reliability Engine

Kafka → FastAPI+ONNX → Spring Boot gates
     scoring: p95 85ms @ 1.2k msgs/s

shadow mode → canary (14d, 0 FP) → prod

🤖

GenAI Product Features

config/logs → sanitize → RAG retrieve
  → LLM (LLaMA-2 / Mistral-7B + LoRA)
  → validate schema → serve

🔧

ECG Platform Services

20+ UIs + Spring Boot APIs
dual-schema rollout → 0 breakages
correlation IDs: UI→API→workers

`research/active`

🦠 Neural Fingerprints for Malware

Graduate Research Assistant @ SJSU

malware binary → image → ResNet encoder
  → L2 normalize → hypersphere embedding
  → FAISS HNSW → nearest family match

  train: 47 families (MalNet + MalImg)
  test:  17 unseen families (zero-shot)

Learning domain-robust embeddings with ProxyAnchor, Triplet, and SupCon losses so unseen malware can be clustered and retrieved — without retraining.

Key insight: Strong in-domain separation ≠ cross-domain generalization. The bottleneck is representation stability under dataset shift, not loss function choice.


Cross-domain retrieval	88.02% (MalNet→MalImg)
Strict zero-shot	57–67% (17 unseen families)
Best loss	ProxyAnchor

🏃 Cross-Domain HAR

Zero-shot Pocket Activity Recognition

4 source datasets (heterogeneous phones)
  → unified calibration pipeline
  → physics-aware temporal model
  → 3 zero-shot target datasets

  standing recall: 0% ──fix pipeline──→ ~99%

Multi-source domain adaptation for Sitting / Standing / Walking. Physics-aware calibrator auto-detects sampling rate, units, and orientation of unseen sensors.

Key insight: Standing recall collapsed to ~0% from preprocessing-distribution mismatch — not model weakness. Fixing the pipeline fixed the model.


Source-domain F1	94.1% (subject-disjoint)
Zero-shot transfer	~95.9% (UTwente)
Standing recovery	0% → ~99%

`papers/`

Year	Paper	Venue
🏆 2024	Brain Tumor Detection using Machine Learning	ICCDS 2024 · Best Paper Award
2023	Deep Learning Techniques for Detection of Deepfakes	IJSRSET (ICSCR 2023)

`stack.yml`

languages:
  - Java
  - Python
  - Go
  - C++
  - TypeScript
  - SQL

ml_and_ai:
  - PyTorch
  - TensorFlow
  - Transformers
  - ONNX Runtime
  - FAISS
  - PEFT / LoRA
  - LangChain
  - scikit-learn
  - XGBoost

backend_and_systems:
  - Spring Boot
  - Spring Security
  - FastAPI
  - Kafka
  - Redis
  - PostgreSQL
  - MongoDB
  - Docker
  - Kubernetes
  - GCP
  - AWS

frontend:
  - React
  - Angular
  - Tailwind
  - Vite

observability:
  - Grafana
  - Prometheus
  - Selenium / Cucumber
  - JUnit
  - k6

"Research matters. Production proves it."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly