aigc3d · mirai-gpro · Feb 7, 2026 · Feb 7, 2026 · Feb 8, 2026 · Feb 8, 2026
diff --git a/ANALYSIS_REQUEST.md b/ANALYSIS_REQUEST.md
@@ -0,0 +1,199 @@
+# LAM_Audio2Expression 解析・実装依頼
+
+## 依頼の背景
+
+Audio2ExpressionサービスをGoogle Cloud Runにデプロイしようと48時間以上、40回以上試行したが、モデルが「mock」モードのままで正しく初期化されない。対症療法的な修正を繰り返しても解決できないため、根本的なアプローチの見直しが必要。
+
+## 前任AIの反省点
+
+**重要**: 前任AI（Claude）は以下の問題を抱えていた：
+
+1. **古い知識ベースからの推論に依存**
+   - 一般的な「Cloud Runデプロイ」パターンを適用しようとした
+   - LAM_Audio2Expression固有の設計思想を理解できていなかった
+
+2. **表面的なコード理解**
+   - コードを読んだが、なぜそのように設計されているかを理解していなかった
+   - 元々どのような環境・ユースケースを想定したコードなのかを考慮しなかった
+
+3. **対症療法の繰り返し**
+   - ログからエラーを見つけ→修正→デプロイ→また別のエラー、の無限ループ
+   - 根本原因を特定せず、見えている症状だけを修正し続けた
+
+4. **思い込み**
+   - 「モデルの読み込みや初期化がうまくいっていない」と決めつけていた
+   - 問題はそこではなく、もっと根本的なアプローチの誤りである可能性がある
+
+**この解析を行う際は、上記の落とし穴にハマらないよう注意してください。**
+
+## 解析対象コード
+
+### 主要ファイル
+
+**1. audio2exp-service/app.py** (現在のサービス実装)
+- FastAPI を使用したWebサービス
+- `/health`, `/debug`, `/api/audio2expression`, `/ws/{session_id}` エンドポイント
+- `Audio2ExpressionEngine` クラスでモデル管理
+
+**2. LAM_Audio2Expression/engines/infer.py**
+- `InferBase` クラス: モデル構築の基底クラス
+- `Audio2ExpressionInfer` クラス: 音声→表情推論
+- `infer_streaming_audio()`: リアルタイムストリーミング推論
+
+**3. LAM_Audio2Expression/models/network.py**
+- `Audio2Expression` クラス: PyTorchニューラルネットワーク
+- wav2vec2 エンコーダー + Identity Encoder + Decoder構成
+
+**4. LAM_Audio2Expression/engines/defaults.py**
+- `default_config_parser()`: 設定ファイル読み込み
+- `default_setup()`: バッチサイズ等の設定計算
+- `create_ddp_model()`: 分散データ並列ラッパー
+
+## 具体的な解析依頼
+
+### Q1: モデル初期化が完了しない根本原因
+
+```python
+# app.py での初期化
+self.infer = INFER.build(dict(type=cfg.infer.type, cfg=cfg))
+self.infer.model.eval()
+```
+
+この処理がCloud Run環境で正常に完了しない理由を特定してください。
+
+考えられる原因:
+- [ ] メモリ不足 (8GiBで足りない?)
+- [ ] CPU環境での動作制限
+- [ ] 分散処理設定が単一インスタンスで問題を起こす
+- [ ] ファイルシステムの書き込み権限
+- [ ] タイムアウト (コールドスタート時間)
+- [ ] その他
+
+### Q2: default_setup() の問題
+
+```python
+# defaults.py
+def default_setup(cfg):
+    world_size = comm.get_world_size()  # Cloud Runでは1
+    cfg.num_worker = cfg.num_worker if cfg.num_worker is not None else mp.cpu_count()
+    cfg.num_worker_per_gpu = cfg.num_worker // world_size
+    assert cfg.batch_size % world_size == 0  # 失敗する可能性?
+```
+
+推論時にこの設定が問題を起こしていないか確認してください。
+
+### Q3: ロガー設定の問題
+
+```python
+# infer.py
+self.logger = get_root_logger(
+    log_file=os.path.join(cfg.save_path, "infer.log"),
+    file_mode="a" if cfg.resume else "w",
+)
+```
+
+Cloud Runのファイルシステムでログファイル作成が失敗する可能性を確認してください。
+
+### Q4: wav2vec2 モデル読み込み
+
+```python
+# network.py
+if os.path.exists(pretrained_encoder_path):
+    self.audio_encoder = Wav2Vec2Model.from_pretrained(pretrained_encoder_path)
+else:
+    config = Wav2Vec2Config.from_pretrained(wav2vec2_config_path)
+    self.audio_encoder = Wav2Vec2Model(config)  # ランダム重み!
+```
+
+- wav2vec2-base-960h フォルダの構成は正しいか?
+- HuggingFaceからのダウンロードが必要なファイルはないか?
+
+### Q5: 適切なデプロイ方法
+
+Cloud Runが不適切な場合、以下の代替案を検討:
+- Google Compute Engine (GPU インスタンス)
+- Cloud Run Jobs (バッチ処理)
+- Vertex AI Endpoints
+- Kubernetes Engine
+
+## 期待する成果
+
+### 1. 分析結果
+- 根本原因の特定
+- なぜ40回以上の試行で解決できなかったかの説明
+
+### 2. 修正されたコード
+```
+audio2exp-service/
+├── app.py         # 修正版
+├── Dockerfile     # 必要なら修正
+└── cloudbuild.yaml # 必要なら修正
+```
+
+### 3. 動作確認方法
+```bash
+# ヘルスチェック
+curl https://<service-url>/health
+# 期待する応答: {"model_initialized": true, "mode": "inference", ...}
+
+# 推論テスト
+curl -X POST https://<service-url>/api/audio2expression \
+  -H "Content-Type: application/json" \
+  -d '{"audio_base64": "...", "session_id": "test"}'
+```
+
+## 技術スペック
+
+### モデル仕様
+| 項目 | 値 |
+|------|-----|
+| 入力サンプルレート | 24kHz (API) / 16kHz (内部) |
+| 出力フレームレート | 30 fps |
+| 出力次元 | 52 (ARKit blendshape) |
+| モデルファイルサイズ | ~500MB (LAM) + ~400MB (wav2vec2) |
+
+### デプロイ環境
+| 項目 | 値 |
+|------|-----|
+| プラットフォーム | Cloud Run Gen 2 |
+| リージョン | asia-northeast1 |
+| メモリ | 8GiB |
+| CPU | 4 |
+| max-instances | 4 |
+
+### 依存関係 (requirements.txt)
+```
+torch==2.0.1
+torchaudio==2.0.2
+transformers==4.30.2
+librosa==0.10.0
+fastapi==0.100.0
+uvicorn==0.23.0
+numpy==1.24.3
+scipy==1.11.1
+pydantic==2.0.3
+```
+
+## ファイルの場所
+
+```bash
+# プロジェクトルート
+cd /home/user/LAM_gpro
+
+# メインサービス
+cat audio2exp-service/app.py
+
+# 推論エンジン
+cat audio2exp-service/LAM_Audio2Expression/engines/infer.py
+
+# ニューラルネットワーク
+cat audio2exp-service/LAM_Audio2Expression/models/network.py
+
+# 設定
+cat audio2exp-service/LAM_Audio2Expression/engines/defaults.py
+cat audio2exp-service/LAM_Audio2Expression/configs/lam_audio2exp_config_streaming.py
+```
+
+---
+
+以上、よろしくお願いいたします。
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,172 @@
+# ============================================================
+# Dockerfile for HF Spaces Docker SDK (GPU)
+# ============================================================
+# Reproduces the exact environment from concierge_modal.py's
+# Modal Image definition, but as a standard Dockerfile.
+#
+# Build: docker build -t lam-concierge .
+# Run:   docker run --gpus all -p 7860:7860 lam-concierge
+# HF:    Push to a HF Space with SDK=Docker, Hardware=GPU
+# ============================================================
+
+FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
+
+ENV DEBIAN_FRONTEND=noninteractive
+ENV PYTHONUNBUFFERED=1
+
+# System packages
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3.10 python3.10-dev python3.10-venv python3-pip \
+    git wget curl ffmpeg tree \
+    libgl1-mesa-glx libglib2.0-0 libusb-1.0-0 \
+    build-essential ninja-build clang llvm libclang-dev \
+    xz-utils libxi6 libxxf86vm1 libxfixes3 \
+    libxrender1 libxkbcommon0 libsm6 \
+    && rm -rf /var/lib/apt/lists/*
+
+# Make python3.10 the default
+RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1 && \
+    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
+
+# Upgrade pip
+RUN python -m pip install --upgrade pip setuptools wheel
+
+# numpy first (pinned for compatibility — must stay <2.0 for PyTorch 2.4 + mediapipe)
+RUN pip install 'numpy==1.26.4'
+
+# ============================================================
+# PyTorch 2.4.0 + CUDA 12.1
+# ============================================================
+RUN pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 \
+    --index-url https://download.pytorch.org/whl/cu121
+
+# ============================================================
+# xformers — CRITICAL for DINOv2 MemEffAttention
+# Without it, model produces garbage output ("bird monster").
+# ============================================================
+RUN pip install xformers==0.0.27.post2 \
+    --index-url https://download.pytorch.org/whl/cu121
+
+# CUDA build environment
+ENV FORCE_CUDA=1
+ENV CUDA_HOME=/usr/local/cuda
+ENV MAX_JOBS=4
+ENV TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0"
+ENV CC=clang
+ENV CXX=clang++
+
+# CUDA extensions (require no-build-isolation)
+RUN pip install chumpy==0.70 --no-build-isolation
+
+# pytorch3d — build from source (C++17 required for CUDA 12.1)
+ENV CXXFLAGS="-std=c++17"
+RUN pip install git+https://github.com/facebookresearch/pytorch3d.git --no-build-isolation
+
+# diff-gaussian-rasterization — patch CUDA 12.1 header issues then build
+RUN git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization.git /tmp/dgr && \
+    find /tmp/dgr -name '*.cu' -exec sed -i '1i #include <cfloat>' {} + && \
+    find /tmp/dgr -name '*.h' -path '*/cuda_rasterizer/*' -exec sed -i '1i #include <cstdint>' {} + && \
+    pip install /tmp/dgr --no-build-isolation && \
+    rm -rf /tmp/dgr
+
+# simple-knn — patch cfloat for CUDA 12.1 then build
+RUN git clone https://github.com/camenduru/simple-knn.git /tmp/simple-knn && \
+    sed -i '1i #include <cfloat>' /tmp/simple-knn/simple_knn.cu && \
+    pip install /tmp/simple-knn --no-build-isolation && \
+    rm -rf /tmp/simple-knn
+
+# nvdiffrast — JIT compilation at runtime (requires -devel image)
+RUN pip install git+https://github.com/ShenhanQian/nvdiffrast.git@backface-culling --no-build-isolation
+
+# ============================================================
+# Python dependencies
+# ============================================================
+RUN pip install \
+    "gradio==4.44.0" \
+    "gradio_client==1.3.0" \
+    "fastapi" \
+    "uvicorn" \
+    "omegaconf==2.3.0" \
+    "pandas" \
+    "scipy<1.14.0" \
+    "opencv-python-headless==4.9.0.80" \
+    "imageio[ffmpeg]" \
+    "moviepy==1.0.3" \
+    "rembg" \
+    "scikit-image" \
+    "pillow" \
+    "huggingface_hub>=0.24.0" \
+    "filelock" \
+    "typeguard" \
+    "transformers==4.44.2" \
+    "diffusers==0.30.3" \
+    "accelerate==0.34.2" \
+    "tyro==0.8.0" \
+    "mediapipe==0.10.21" \
+    "tensorboard" \
+    "rich" \
+    "loguru" \
+    "Cython" \
+    "PyMCubes" \
+    "trimesh" \
+    "einops" \
+    "plyfile" \
+    "jaxtyping" \
+    "ninja" \
+    "patool" \
+    "safetensors" \
+    "decord" \
+    "numpy==1.26.4"
+
+# onnxruntime-gpu for CUDA 12 — MUST be installed AFTER rembg to prevent
+# rembg from pulling in the PyPI default (CUDA 11) build
+RUN pip install onnxruntime-gpu==1.18.1 \
+    --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
+
+# FBX SDK Python bindings (for OBJ -> FBX -> GLB avatar export)
+RUN pip install https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/aigc3d/data/LAM/fbx-2020.3.4-cp310-cp310-manylinux1_x86_64.whl
+
+# ============================================================
+# Blender 4.2 LTS (for GLB generation)
+# ============================================================
+RUN wget -q https://download.blender.org/release/Blender4.2/blender-4.2.0-linux-x64.tar.xz -O /tmp/blender.tar.xz && \
+    mkdir -p /opt/blender && \
+    tar xf /tmp/blender.tar.xz -C /opt/blender --strip-components=1 && \
+    ln -sf /opt/blender/blender /usr/local/bin/blender && \
+    rm /tmp/blender.tar.xz
+
+# ============================================================
+# Clone LAM repo and build cpu_nms
+# ============================================================
+RUN git clone https://github.com/aigc3d/LAM.git /app/LAM
+
+# Build cpu_nms for FaceBoxesV2
+RUN cd /app/LAM/external/landmark_detection/FaceBoxesV2/utils/nms && \
+    python -c "\
+from setuptools import setup, Extension; \
+from Cython.Build import cythonize; \
+import numpy; \
+setup(ext_modules=cythonize([Extension('cpu_nms', ['cpu_nms.pyx'])]), \
+include_dirs=[numpy.get_include()])" \
+    build_ext --inplace
+
+# ============================================================
+# Download model weights (cached in Docker layer)
+# ============================================================
+COPY download_models.py /app/download_models.py
+RUN python /app/download_models.py
+
+# ============================================================
+# Copy application code (after model download for cache)
+# ============================================================
+WORKDIR /app/LAM
+
+# Copy our app into the container
+COPY app_concierge.py /app/LAM/app_concierge.py
+
+# HF Spaces expects port 7860
+EXPOSE 7860
+ENV GRADIO_SERVER_NAME=0.0.0.0
+ENV GRADIO_SERVER_PORT=7860
+
+CMD ["python", "app_concierge.py"]