Skip to content
View saikat107's full-sized avatar

Highlights

  • Pro

Block or report saikat107

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
saikat107/README.md

🌟 Profile Summary

saikat107

saikat107

πŸ‘‹ About me

Hi, I am Saikat, a Senior Researcher at the Research in Software Engineering (RiSE) group at Microsoft Research, working on reliability of large language models for code and post-training. I bring 10 years of experience in training and evaluating code models, with a focus on improving the correctness and fidelity of generated programs under real-world constraints.

My work guides code generation models through static and dynamic correctness signalsβ€”via tests, program analysis, and verificationβ€”and uses these signals through fine-tuning and reinforcement learning. I view reliability as fundamentally a training problem, driven by structured and composable feedback.

Earlier, I graduated with a Ph.D. in Computer Science from Columbia University, advised by Professor Baishakhi Ray. I wrote my Ph.D. thesis on Learning to Edit Code.

🌐 Website: saikatc.info

πŸ‘€ Research Focus

  • Post-training for code: SFT, RLHF/GRPO, reward modeling, reranking, retrieval-augmented fine-tuning
  • Correctness supervision: Reward design using test generation, execution feedback, mutation testing, specification inference, and program analysis
  • Agent-driven testing: DeepTest β€” symbolic analysis + LLMs for testing production code at scale
  • Formal verification: DeepProof β€” post-training models for theorem proving (F*, Rocq, Lean)
  • Systems: PyTorch, Megatron-LM, Ray, distributed GPU clusters, Kubernetes

πŸ“’ Selected Highlights

  • πŸ† ICSE'25 β€” Neural Synthesis for Proof-Oriented Programming [Distinguished Paper Award]
  • πŸ† ISSTA'23 β€” Contrastive Learning for Code Understanding [Distinguished Paper Award]
  • πŸ“„ ACL'25 β€” Teaching an Old LLM Secure Coding via Localized Preference Optimization
  • πŸ“„ EMNLP'23 β€” Ranking LLM-Generated Loop Invariants
  • πŸ“„ ICSE'24 β€” Causal Learning for Code Understanding
  • πŸ“„ FSE'22 β€” NatGen: Semantic Rewriting for Pretraining of Code Models
  • πŸ“„ NAACL'21 β€” Unified Pretraining for Code Understanding and Generation

πŸ‘ Open to Collaboration

  • Use of LLMs for program synthesis, editing, and verification
  • Reinforcement learning with execution and correctness feedback for code
  • Formal methods meets machine learning (proof generation, specification mining)

✨ Connect with me

Website   Google Scholar   LinkedIn   Twitter

Pinned Loading

  1. Journal Journal Public

    A lightweight, self-hosted scrum journal for daily standups β€” log updates, blockers, and asks with GitHub integration, AI-powered summaries, and multi-database support (SQLite, PostgreSQL, MySQL, M…

    Python

  2. NatGen NatGen Public template

    Python 41 10

  3. microsoft/NeuralInvariantRanker microsoft/NeuralInvariantRanker Public

    Ranking LLM-Generated Loop Invariants for Program Verification.

    Slash 12 5

  4. Devign Devign Public

    Python 89 34

  5. C-Code-Slicer C-Code-Slicer Public

    Python 21 9