Skip to content
View cpuimage's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report cpuimage

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
cpuimage/README.md

Hey 👋🏽, I'm cpuimage

AI engineer working on AIGC, inference optimization, and audio/video/image algorithms.
I build real-world AI systems, accelerate models, and share open-source work here on GitHub.

If my projects help you, feel free to buy me a coffee. ☕️


⚡ What I Do | 我在做什么

  • AIGC engineering (Stable Diffusion, FLUX, SDXL, high-res synthesis)
  • Inference optimization (TensorRT, FP16, Flash Attention, async pipelines)
  • Audio/video/image algorithms (TTS, matting, OpenGL effects)
  • Training stability & numerical optimization
  • Multi-time CTO experience in AI companies

🧠 Professional Experience | 专业背景

  • 👨🏽‍💻 Worked at leading tech companies including Baidu, KingSoft, and others.
  • 🧩 Multi-time CTO for AI companies (AIGC, image generation, inference optimization).
  • 📱 Developed algorithms for multiple applications: ToolWiz Photos, Mypic, DOUPAI.
  • 💡 Delivered production-level AI technical customization and consulting services.

🚀 Research Progress & Achievements | 研究进展与成果

I work across Stable Diffusion, inference acceleration, training stability, and audio/video algorithms.

主要研究领域:大语言模型 (LLM)生成式 AI训练稳定性推理加速音视频算法

🧠 大语言模型与逻辑建模 (LLMs & Reasoning)

  • Efficient Semi-supervised Learning via Structural Regularization for Consistent Reasoning in LLMs
    • 基于半监督结构性正则化,通过约束特征与 Logits 的演变一致性抑制“死记硬背”,提升推理稳定性。
  • One-Pass LLM: From-Scratch Pre-training and SFT with Adaptive Gradient Modulation
    • 引入自适应梯度调节机制,实现从零预训练与 SFT 同步进行的高效单次训练方案。
  • Memory-Efficient LLM Training
    • 针对大规模语言模型的显存优化训练方案。
  • MozzyTokenizer: Adaptive Byte-Level Tokenizer
    • 自适应字节级分词器,优化输入端编码效率。
  • LLM from Scratch with PyTorch
    • 基于 PyTorch 框架从零构建大语言模型架构。

🎨 生成式 AI 与架构优化 (Generative AI & Architecture)

  • Training-Free Universal High-Resolution Synthesis for Any Vision Model
    • 适用于各类视觉模型的免训练通用超分辨率合成技术。
  • FLUX.1 FP16 Inference Deployment + Low-Memory LoRA Training
    • FLUX.1 模型全链路部署与低显存 LoRA 训练优化。
  • Stable Diffusion Architectural Distillation
    • Stable Diffusion 系列模型的架构蒸馏与轻量化方案。
  • Image Synthesis and Semantic Manipulation Using Stable Diffusion Networks
    • 利用 SD 网络实现图像合成与深层语义操控。
  • Super-Resolution / Video Editing Solutions based on Stable Diffusion
    • 基于扩散模型的超分辨率重构与视频编辑方案。
  • Porting SDXL 1.0, SD X4 Upscaler, PromptGen to TensorFlow/ONNX (FP16 Support)
    • 跨框架移植 SD 核心模型并实现针对 FP16 的性能优化。

⚡ 训练优化、稳定性与底层正则 (Optimization & Stability)

  • Robustness and Speed: An Adaptive, Efficient Optimizer for Stable Training
    • 全能型高效优化器:集成免学习率/预热、梯度累积纠正及长尾梯度缓解等特性。
  • Numerical Stability via Scalable Parallel Compensated Reductions
    • 大规模并行计算中的数值稳定性改良方案。
  • Adaptive Moving-Average BatchNorm Stabilization
    • 基于自适应滑动平均的 BatchNorm 稳定化改进。
  • Loss Regularization / Parameter-Free Weight Regularization
    • 提升泛化能力的损失项与无参数权重正则化技术。
  • Dynamic Loss Weighting for Multi-Task Learning
    • 多任务学习环境下的动态损失权重分配策略。
  • Chunked Flash Attention in Keras
    • 在 Keras 框架中实现分块 Flash Attention 以支持长序列处理。

🚀 推理加速与移动端部署 (Inference & Mobile Deployment)

  • Accelerate Stable Diffusion FP16 Inference Deployment with TensorRT
    • 基于 TensorRT 的扩散模型推理加速。
  • Stable Diffusion Architecture Optimization and Deployment on Mobile Devices
    • 针对移动端侧环境的 SD 架构优化与部署方案。
  • A Plug-And-Play Algorithm for Asynchronous Inference with Frequency-Domain Reconstruction
    • 基于频域重构的可插拔异步推理算法。
  • A Trimap-Free Solution for Real-Time Automatic Portrait Matting on Mobile Devices
    • 移动端实时免 Trimap 自动人像抠图算法。

🖼️ 计算机视觉与多媒体 (Computer Vision & Multimedia)

  • Enhanced FaceFusion: Decoupled Modules & Optimized Inference
    • 模块化解耦与推理流程优化的增强版 FaceFusion。
  • Ultra High-Resolution Portrait Retouching
    • 超高清人像修图与质感增强算法。
  • Arbitrary Resolution Super-Resolution for Real-World Images
    • 针对现实场景图像的任意分辨率超分方案。
  • Content-aware 3-view Synthesis for Game Art
    • 面向游戏美术资源开发的内容感知三视图合成技术。

📊 Statistical Algorithms

  • Real-time MMSE-STSA speech enhancement (embedded implementation)

🤝 Collaboration & Contact | 合作与联系

I’m open to collaboration on AIGC, inference optimization, and audio/image algorithms.

Reach me on:

  • Telegram Badge
  • Wechat Badge
  • QQ Badge

For paid technical services or consulting:

  • mail Badge

Pinned Loading

  1. chunked-flash-attention-keras chunked-flash-attention-keras Public

    Implementation of Chunked Flash Attention in Keras

    Python 1

  2. CelebAHairMask-HQ CelebAHairMask-HQ Public

    A large-scale face dataset for hair segmentation, hair recognition, and GANs for hair generation and editing.

    89 7

  3. minSDXLTF minSDXLTF Public

    Stable Diffusion XL Inference With PyTorch Weights And More Features Like Stable Diffusion Web UI In Keras 3.x

    Python 8 1

  4. minSDTF minSDTF Public

    Stable Diffusion V1.5 Inference With PyTorch Weights And More Features Like Stable Diffusion Web UI In Keras 3.x

    Python 16 2

  5. resampler resampler Public

    A Simple and Efficient Audio Resampler Implementation in C

    C 157 69

  6. WebRTC_NS WebRTC_NS Public

    Noise Suppression Module Port From WebRTC

    C 345 160