gaozhihan cpuimage

Hey 👋🏽, I'm cpuimage

AI engineer working on AIGC, inference optimization, and audio/video/image algorithms.
I build real-world AI systems, accelerate models, and share open-source work here on GitHub.

If my projects help you, feel free to buy me a coffee. ☕️

⚡ What I Do | 我在做什么

AIGC engineering (Stable Diffusion, FLUX, SDXL, high-res synthesis)
Inference optimization (TensorRT, FP16, Flash Attention, async pipelines)
Audio/video/image algorithms (TTS, matting, OpenGL effects)
Training stability & numerical optimization
Multi-time CTO experience in AI companies

🧠 Professional Experience | 专业背景

👨🏽‍💻 Worked at leading tech companies including Baidu, KingSoft, and others.
🧩 Multi-time CTO for AI companies (AIGC, image generation, inference optimization).
📱 Developed algorithms for multiple applications: ToolWiz Photos, Mypic, DOUPAI.
💡 Delivered production-level AI technical customization and consulting services.

🚀 Research Progress & Achievements | 研究进展与成果

I work across Stable Diffusion, inference acceleration, training stability, and audio/video algorithms.

主要研究领域：大语言模型 (LLM)、生成式 AI、训练稳定性、推理加速及音视频算法。

🧠 大语言模型与逻辑建模 (LLMs & Reasoning)

Efficient Semi-supervised Learning via Structural Regularization for Consistent Reasoning in LLMs
- 基于半监督结构性正则化，通过约束特征与 Logits 的演变一致性抑制“死记硬背”，提升推理稳定性。
One-Pass LLM: From-Scratch Pre-training and SFT with Adaptive Gradient Modulation
- 引入自适应梯度调节机制，实现从零预训练与 SFT 同步进行的高效单次训练方案。
Memory-Efficient LLM Training
- 针对大规模语言模型的显存优化训练方案。
MozzyTokenizer: Adaptive Byte-Level Tokenizer
- 自适应字节级分词器，优化输入端编码效率。
LLM from Scratch with PyTorch
- 基于 PyTorch 框架从零构建大语言模型架构。

🎨 生成式 AI 与架构优化 (Generative AI & Architecture)

Training-Free Universal High-Resolution Synthesis for Any Vision Model
- 适用于各类视觉模型的免训练通用超分辨率合成技术。
FLUX.1 FP16 Inference Deployment + Low-Memory LoRA Training
- FLUX.1 模型全链路部署与低显存 LoRA 训练优化。
Stable Diffusion Architectural Distillation
- Stable Diffusion 系列模型的架构蒸馏与轻量化方案。
Image Synthesis and Semantic Manipulation Using Stable Diffusion Networks
- 利用 SD 网络实现图像合成与深层语义操控。
Super-Resolution / Video Editing Solutions based on Stable Diffusion
- 基于扩散模型的超分辨率重构与视频编辑方案。
Porting SDXL 1.0, SD X4 Upscaler, PromptGen to TensorFlow/ONNX (FP16 Support)
- 跨框架移植 SD 核心模型并实现针对 FP16 的性能优化。

⚡ 训练优化、稳定性与底层正则 (Optimization & Stability)

Robustness and Speed: An Adaptive, Efficient Optimizer for Stable Training
- 全能型高效优化器：集成免学习率/预热、梯度累积纠正及长尾梯度缓解等特性。
Numerical Stability via Scalable Parallel Compensated Reductions
- 大规模并行计算中的数值稳定性改良方案。
Adaptive Moving-Average BatchNorm Stabilization
- 基于自适应滑动平均的 BatchNorm 稳定化改进。
Loss Regularization / Parameter-Free Weight Regularization
- 提升泛化能力的损失项与无参数权重正则化技术。
Dynamic Loss Weighting for Multi-Task Learning
- 多任务学习环境下的动态损失权重分配策略。
Chunked Flash Attention in Keras
- 在 Keras 框架中实现分块 Flash Attention 以支持长序列处理。

🚀 推理加速与移动端部署 (Inference & Mobile Deployment)

Accelerate Stable Diffusion FP16 Inference Deployment with TensorRT
- 基于 TensorRT 的扩散模型推理加速。
Stable Diffusion Architecture Optimization and Deployment on Mobile Devices
- 针对移动端侧环境的 SD 架构优化与部署方案。
A Plug-And-Play Algorithm for Asynchronous Inference with Frequency-Domain Reconstruction
- 基于频域重构的可插拔异步推理算法。
A Trimap-Free Solution for Real-Time Automatic Portrait Matting on Mobile Devices
- 移动端实时免 Trimap 自动人像抠图算法。

🖼️ 计算机视觉与多媒体 (Computer Vision & Multimedia)

Enhanced FaceFusion: Decoupled Modules & Optimized Inference
- 模块化解耦与推理流程优化的增强版 FaceFusion。
Ultra High-Resolution Portrait Retouching
- 超高清人像修图与质感增强算法。
Arbitrary Resolution Super-Resolution for Real-World Images
- 针对现实场景图像的任意分辨率超分方案。
Content-aware 3-view Synthesis for Game Art
- 面向游戏美术资源开发的内容感知三视图合成技术。

📊 Statistical Algorithms

Real-time MMSE-STSA speech enhancement (embedded implementation)

🤝 Collaboration & Contact | 合作与联系

I’m open to collaboration on AIGC, inference optimization, and audio/image algorithms.

Reach me on:

For paid technical services or consulting:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly