Hey 👋🏽, I'm cpuimage
AI engineer working on AIGC, inference optimization, and audio/video/image algorithms.
I build real-world AI systems, accelerate models, and share open-source work here on GitHub.
If my projects help you, feel free to buy me a coffee. ☕️
- AIGC engineering (Stable Diffusion, FLUX, SDXL, high-res synthesis)
- Inference optimization (TensorRT, FP16, Flash Attention, async pipelines)
- Audio/video/image algorithms (TTS, matting, OpenGL effects)
- Training stability & numerical optimization
- Multi-time CTO experience in AI companies
- 👨🏽💻 Worked at leading tech companies including Baidu, KingSoft, and others.
- 🧩 Multi-time CTO for AI companies (AIGC, image generation, inference optimization).
- 📱 Developed algorithms for multiple applications: ToolWiz Photos, Mypic, DOUPAI.
- 💡 Delivered production-level AI technical customization and consulting services.
I work across Stable Diffusion, inference acceleration, training stability, and audio/video algorithms.
主要研究领域:大语言模型 (LLM)、生成式 AI、训练稳定性、推理加速及音视频算法。
- Efficient Semi-supervised Learning via Structural Regularization for Consistent Reasoning in LLMs
- 基于半监督结构性正则化,通过约束特征与 Logits 的演变一致性抑制“死记硬背”,提升推理稳定性。
- One-Pass LLM: From-Scratch Pre-training and SFT with Adaptive Gradient Modulation
- 引入自适应梯度调节机制,实现从零预训练与 SFT 同步进行的高效单次训练方案。
- Memory-Efficient LLM Training
- 针对大规模语言模型的显存优化训练方案。
- MozzyTokenizer: Adaptive Byte-Level Tokenizer
- 自适应字节级分词器,优化输入端编码效率。
- LLM from Scratch with PyTorch
- 基于 PyTorch 框架从零构建大语言模型架构。
- Training-Free Universal High-Resolution Synthesis for Any Vision Model
- 适用于各类视觉模型的免训练通用超分辨率合成技术。
- FLUX.1 FP16 Inference Deployment + Low-Memory LoRA Training
- FLUX.1 模型全链路部署与低显存 LoRA 训练优化。
- Stable Diffusion Architectural Distillation
- Stable Diffusion 系列模型的架构蒸馏与轻量化方案。
- Image Synthesis and Semantic Manipulation Using Stable Diffusion Networks
- 利用 SD 网络实现图像合成与深层语义操控。
- Super-Resolution / Video Editing Solutions based on Stable Diffusion
- 基于扩散模型的超分辨率重构与视频编辑方案。
- Porting SDXL 1.0, SD X4 Upscaler, PromptGen to TensorFlow/ONNX (FP16 Support)
- 跨框架移植 SD 核心模型并实现针对 FP16 的性能优化。
- Robustness and Speed: An Adaptive, Efficient Optimizer for Stable Training
- 全能型高效优化器:集成免学习率/预热、梯度累积纠正及长尾梯度缓解等特性。
- Numerical Stability via Scalable Parallel Compensated Reductions
- 大规模并行计算中的数值稳定性改良方案。
- Adaptive Moving-Average BatchNorm Stabilization
- 基于自适应滑动平均的 BatchNorm 稳定化改进。
- Loss Regularization / Parameter-Free Weight Regularization
- 提升泛化能力的损失项与无参数权重正则化技术。
- Dynamic Loss Weighting for Multi-Task Learning
- 多任务学习环境下的动态损失权重分配策略。
- Chunked Flash Attention in Keras
- 在 Keras 框架中实现分块 Flash Attention 以支持长序列处理。
- Accelerate Stable Diffusion FP16 Inference Deployment with TensorRT
- 基于 TensorRT 的扩散模型推理加速。
- Stable Diffusion Architecture Optimization and Deployment on Mobile Devices
- 针对移动端侧环境的 SD 架构优化与部署方案。
- A Plug-And-Play Algorithm for Asynchronous Inference with Frequency-Domain Reconstruction
- 基于频域重构的可插拔异步推理算法。
- A Trimap-Free Solution for Real-Time Automatic Portrait Matting on Mobile Devices
- 移动端实时免 Trimap 自动人像抠图算法。
- Enhanced FaceFusion: Decoupled Modules & Optimized Inference
- 模块化解耦与推理流程优化的增强版 FaceFusion。
- Ultra High-Resolution Portrait Retouching
- 超高清人像修图与质感增强算法。
- Arbitrary Resolution Super-Resolution for Real-World Images
- 针对现实场景图像的任意分辨率超分方案。
- Content-aware 3-view Synthesis for Game Art
- 面向游戏美术资源开发的内容感知三视图合成技术。
- Real-time MMSE-STSA speech enhancement (embedded implementation)
I’m open to collaboration on AIGC, inference optimization, and audio/image algorithms.
Reach me on:
For paid technical services or consulting:


