Skip to content
View Chrislysen's full-sized avatar
🧑‍🦯
Poking around in current projects
🧑‍🦯
Poking around in current projects

Highlights

  • Pro

Block or report Chrislysen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Chrislysen/README.md

Christian Lysenstøen

Second-year AI student at Inland Norway University of Applied Sciences, currently on exchange at UC Berkeley. I build ML systems that work under real-world constraints — crash-prone hardware, tight budgets, noisy quantum backends — and I publish the results honestly, including when things don't work.

Papers on arXiv:

  • Training-Free Lexical–Dense Fusion for Conversational-Memory Retrieval — Training-free, CPU-only score-level fusion of BM25 with turn-level late-interaction dense retrieval on the LoCoMo conversational-memory benchmark. 0.752 Hit@1 vs. 0.640 for BM25 (+8.8–17.2 pp over late interaction alone across six encoders). Includes the negative results: a cross-encoder reranker hurts, and the gain fades on LongMemEval-S. (arXiv:2606.04194, June 2026)
  • Feasible-First Exploration for Constrained ML Deployment Optimization — Crash-aware TBA→TPE hybrid optimizer. 80% discovery rate of the globally optimal model vs. 30% for standalone TPE; reduced wasted trials from 74% to 42%. Benchmarked on DeployBench: 5 architectures × 3 backends × 3 quantizations × 6 batch sizes across 5 NVIDIA GPUs (H100, A100, RTX 5080, L4, T4), 10 seeds each. 46/46 tests passing. (arXiv:2604.25073, April 2026)
  • SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving — Two-phase TBA→TPE optimizer for vLLM tuning. 150-trial A100 study: 75/75 feasibility, zero crashes; statistically tied with random search on best latency (p=0.84) but 4.4× tighter cross-seed variance under concurrent load. (arXiv:2604.17627, April 2026)
  • Hidden Device Heterogeneity in Constrained ML Deployment — PyTorch's INT8 quantization silently switches from GPU to CPU, creating 39% feasibility flip rates. (submitted April 2026)

Currently working on

  • Expanding multi-GPU benchmark results for the TBA deployment optimizer (H100, A100, RTX 5080, L4, T4)
  • Waiting on D-Wave LaunchPad QPU access to complete the quantum annealing benchmark
  • Cross-hardware validation of SLO-Guard on non-A100 GPUs
  • Iterating on the personal-assistant retrieval stack — speaker-aware ranking and operational event-log feedback as signals for retrieval quality
  • Extending the lexical–dense fusion retrieval study (opsem, arXiv:2606.04194) — encoder scaling and additional conversational-memory benchmarks beyond LoCoMo and LongMemEval-S
  • Building Solon, an autonomous ML-research agent — verification-first architecture (skeleton-constrained authoring, receipt-traced claims, a 2σ + out-of-sample-persistence credibility gate) with a FunSearch-style evolutionary search engine, MAP-Elites quality-diversity, and a holdout gate against selection-bias overfit
  • Coursework at UC Berkeley (CS 61C, concurrent with research)

Coursework (UC Berkeley, Spring 2026)

Enrolled:

  • CS 61C — Great Ideas in Computer Architecture
  • EECS 127 — Optimization Models in Engineering
  • ENGIN 183 — Technology Innovation and Entrepreneurship
  • ASTRON C12 — The Planets (astrophysics breadth)

Auditing:

  • CS 152/252A — Computer Architecture and Engineering
  • CS 170 — Efficient Algorithms and Intractable Problems
  • CS 185/285 — Deep Reinforcement Learning, Decision Making, and Control
  • CS 61B — Data Structures

Focus areas: systems architecture, optimization theory, and deep RL — chosen to complement my ML deployment research with hardware-level understanding and formal optimization foundations.


Research interests

ML systems and deployment optimization — How do you find the best inference configuration (backend, quantization, batch size) when most of the search space crashes or violates constraints? I built a two-phase optimizer (Thermal Budget Annealing → constrained TPE) that treats crashes as data and maps feasible regions before exploiting them. I also packaged one finding from that work into deploy-doctor — a small PyTorch CLI that catches the silent failure where an int8 "GPU" model actually runs on the CPU.

Quantum-classical benchmarking — Fair comparisons between classical simulated annealing, D-Wave quantum annealing, and QAOA on IBM hardware. I design standardized solver interfaces and report negative results when the hypothesis doesn't hold.

Agentic AI and applied CV — LLM-powered tool-calling agents for domain-specific automation, and competition-grade object detection pipelines with ONNX inference and ensemble methods. The same applied-CV thread runs through an honest exploration framework for underwater aquaculture net-damage detection, where I worked the full pipeline end to end: foundation-model and self-supervised anomaly detection (PatchCore, DINOv2, from-scratch SimCLR), synthetic-to-real domain-gap handling, adversarial and out-of-distribution evaluation, temporal video reasoning, and ONNX/FastAPI deployment — with, deliberately, no validated real-world claims, since all damage is synthetic (net-inspection-cv, private repo).

Personal AI and grounded retrieval — How does a local-first assistant remember what matters across years of notes and conversations, and ground its answers without hallucinating? I'm building one (private repo): JARVIS, a voice-driven personal "cortex" running entirely on a local Windows machine — an always-listening wake-word HUD, a dual-brain runtime that hot-swaps between the Claude API and a local Ollama model for a zero-network privacy mode, 60+ tools for actually controlling the computer (apps, shell, media, mail/calendar), and a markdown memory vault rendered as a navigable 3D memory atlas. Under the hood: a hybrid BM25 + dense-embedding + reciprocal-rank-fusion retrieval stack, a classifier-driven subsystem distilling raw conversation logs into a queryable knowledge vault, and a fixed-query eval harness so retrieval changes are measured, not hand-waved. The retrieval recipe is published separately as opsem (arXiv:2606.04194).

Autonomous research agents and verifiable AI discovery — Can an AI agent run the scientific loop end to end — propose, implement, evaluate, and report a result you can actually trust? I'm building one (private repo): Solon, a verification-first autonomous ML-research agent. Since most LLM-agent ML results are fabricated or invalidated, the writer can't invent numbers: every metric is parsed from real stdout, every claim traces to a reproducibility receipt, and a credibility gate certifies an effect only if it clears 2σ and survives fresh seeds. On that spine sits a FunSearch/AlphaEvolve-style evolutionary engine — a MAP-Elites archive of diverse "stepping stones" plus verified-fragment memory, so discoveries compound across runs. Pointed at the real LoCoMo benchmark (same as opsem), it produced an honest null: a holdout gate caught a seed-lucky +14 pp Hit@1 that reversed to −2.3 pp on unseen seeds — exactly the selection-bias overfit it exists to stop. The lesson: the bottleneck isn't the model, it's the objective and the rigor.


Pinned repositories

Constrained-ML-Deployment Two research papers sharing the DeployBench infrastructure. (1) TBA: crash-aware two-phase optimizer for constrained ML deployment. (2) Hidden Device Heterogeneity: empirical study showing INT8 dynamic quantization silently moves inference to CPU, creating stochastic feasibility boundaries. 2,150 measurement trials, 5 GPU types, full reproducibility.

SLO-Guard Crash-aware autotuner for vLLM serving. Optimizes vLLM configs (batching, memory, execution mode) under hard latency/memory SLOs. Crashes are encoded as constraint violations and replayed into a warm-started TPE phase, so failed trials inform subsequent search. 150-trial A100 study on Qwen2-1.5B: 75/75 feasibility, zero crashes; statistically tied with random search on peak latency (Mann-Whitney p=0.84) but 4.4× tighter cross-seed variance on best latency under concurrent load. Paper at arXiv:2604.17627. Both sequential and concurrent harness datasets published for replication.

opsem Reproduction code and paper for Training-Free Lexical–Dense Fusion for Conversational-Memory Retrieval. Fuses BM25 with turn-level late-interaction (max-sim over per-turn vectors) dense retrieval at the score level — no training, runs on CPU, one leave-one-conversation-out weight. On LoCoMo: 0.752 Hit@1 vs. 0.640 BM25, +8.8–17.2 pp over late interaction alone across six encoders. Every number in the paper has a JSON + Markdown receipt; honest leave-one-conversation-out cross-validation throughout. Paper at arXiv:2606.04194.

deploy-agent Productized version of TBA. CLI + FastAPI dashboard + MCP server for automated ML deployment optimization. Give it a model and hardware constraints, it searches backends/quantization/batch sizes and returns the best feasible config with full evidence. Crash handling, structured JSON logs, live WebSocket charts.

deploy-doctor A small PyTorch CLI that flags silent device-placement footguns — e.g. an int8 model that quietly runs on the CPU instead of the GPU you asked for. GPU-free static diagnosis, CI-friendly. MIT.

dwave-benchmark Classical SA vs D-Wave quantum annealing on Max-Cut and spin glass problems. Phase 1 complete: all classical solvers converge to identical solutions up to n=500 with zero quality gap. Phase 2 (QPU) pending D-Wave access. Common solver interface, reproducible seeds, timing analysis showing Neal SA ~400x faster than pure Python SA.

qaoa-benchmark Negative-result study: budget-aware classical optimizers vs COBYLA/SPSA for QAOA parameter tuning under noisy simulation. Finding: shallow QAOA landscapes are too smooth — COBYLA with fixed defaults matched or beat the learning optimizer. 375 total runs, 3 graphs, 5 budget levels, 5 seeds.

net-inspection-cv (private repo) Honest, research-grade framework for flagging damage (holes/tears) in aquaculture net footage. Benchmarks five detectors (classical, anomaly, label-free PatchCore, supervised YOLOv8 detect/seg, and a det∧seg ensemble), closing the synthetic-to-real gap by compositing labelled damage onto real SINTEF SOLAQUA ROV frames — localisation F1 0.12 → 0.50 → 0.78 → 0.97. Adversarial "is it cheating?" eval, OOD review gate, temporal confirmation, SSL backbone ablation (DINOv2 / from-scratch SimCLR), ROS-bag ingestion, ONNX export, and a FastAPI/Streamlit service. Reports its failures too, and claims no validated real-world numbers, since all damage is synthetic.

Norwegian-AI-Championship NM i AI 2026 competition entry (Team INNBerkeley). Three tasks: YOLOv8x object detection with ONNX inference, multi-scale TTA, and WBF ensembling; a FastAPI accounting agent using Gemini 2.5 Flash + Tripletex API; and an A* pathfinding agent for Norse world prediction.

ML2-Exam Machine Learning 2 exam work.


Stack

Python, PyTorch, vLLM, ONNX Runtime, Optuna, Qiskit, D-Wave Ocean SDK, FastAPI, Docker, LaTeX, SciPy, NetworkX, Matplotlib, NumPy, scikit-learn, Qiskit Aer, Google Colab, sentence-transformers, Hugging Face Transformers, Hugging Face Datasets, BM25 / rank fusion, Ollama, SQLite, OpenCV, Ultralytics YOLOv8, torchvision, scikit-image, Streamlit, rosbags, Pandas, Pillow, Modal, MAP-Elites / quality-diversity search, PyTorch quantization, pytest, GitHub Actions, Ruff


Contact

UC Berkeley (exchange 2025–2026) · INN Norway (home institution)

Pinned Loading

  1. Constrained-ML-Deployment Constrained-ML-Deployment Public

    Crash-aware two-phase optimizer (TBA→TPE) for constrained ML-deployment config search — 80% optimal-model discovery vs 30% for TPE across 5 NVIDIA GPUs. Two papers + the DeployBench suite.

    Python

  2. deploy-agent deploy-agent Public

    Feasible-first exploration optimizer for constrained ML deployment in crash-prone hierarchical search spaces. TBA→TPE hybrid with DeployBench benchmark suite.

    Python

  3. dwave-benchmark dwave-benchmark Public

    Classical SA vs D-Wave quantum annealing on Max-Cut and spin glass problems. Phase 1 complete: all classical solvers converge to identical solutions up to n=500. Waiting on D-Wave QPU access to tes…

    Python

  4. qaoa-benchmark qaoa-benchmark Public

    Benchmarking budget-aware classical optimizers vs COBYLA/SPSA for QAOA parameter tuning on IBM quantum hardware — negative result: shallow QAOA landscapes are too smooth to benefit from constrained…

    Python

  5. SLO-Guard SLO-Guard Public

    Crash-aware autotuner for vLLM serving under hard latency/memory SLOs via feasibility-boundary learning — 150-trial A100 study: 75/75 feasible, 4.4× tighter variance. Paper + datasets.

    Python

  6. opsem opsem Public

    Training-free, CPU-only lexical–dense fusion for conversational-memory retrieval — 0.752 vs 0.640 Hit@1 over BM25 on LoCoMo, fully reproducible. Paper + reproduction code.

    Python 1