I study LLM safety and evaluation, with a focus on process-level diagnosis of instruction-following failures and refusal dynamics.
I am an undergraduate student at Chung-Ang University, pursuing Art and Technology with Cyber Security as a convergence major. My work asks when, where, and how LLM failures form before they appear in final outputs. I like projects that turn research questions into reproducible evaluation pipelines, diagnostic signals, and working systems.
- LLM safety and evaluation
- Logit dynamics, refusal/compliance margins, and early-token behavior
- Jailbreak and instruction-following failure analysis
- Benchmark design, LLM-as-Judge evaluation, and automated experiment pipelines
- Applied AI systems, privacy-aware ML, and retrieval-based reasoning
| Project | Description |
|---|---|
| Logit-Margin_Score | Temporal analysis of LLM safety activation via logit-margin scores. |
| Persona_Attack | Incremental memory injection jailbreak attack experiments against LLMs. |
| GraphRAG | Graph-based retrieval framework for financial-security regulation interpretation. |
| FinSec-LLM-PostTraining | RAG and QLoRA post-training pipeline for Korean financial-security and regulatory QA. |
| SafeAI_FInal | Machine unlearning experiment on privacy, fairness, and eye-coordinate regression. |
| AutoValetParking | Centralized autonomous valet parking simulation with reservation-based path planning. |
- I prefer benchmark design and reproducible workflows over one-off results.
- I keep model weights, private datasets, generated artifacts, and credentials out of public repositories.
- I document assumptions around data, evaluation conditions, metrics, and compute constraints.
- Portfolio: 2betforyou.github.io
- GitHub: github.com/2betforyou
