SceneIntruderMCP is no longer just a lightweight scene-analysis demo. The current product is a unified creative workspace built around one Go backend and one React SPA, with four connected modules:
- Interactive scenes — analyze text into scene data, characters, items, context, and branching story flows.
- Comics Studio — a 5-step workflow for analysis, prompts, key elements, references, image generation, and export.
- Video Studio — build timeline data from comics results, generate clip assets asynchronously, inspect recovery state, and export bundles.
- New Script — create writing projects, generate initial drafts, revise chapters, and export manuscripts.
The system also centralizes LLM / Vision / Video provider configuration, long-running task tracking via SSE, and file-based persistence under data/.
- Go + Gin server
- SPA hosting from the same binary
- Unified config in
data/config.json - Encrypted API key storage via
data/.encryption_keyorCONFIG_ENCRYPTION_KEY - SSE progress endpoint:
GET /api/progress/:taskID - Plain WebSocket endpoints for scene/user realtime channels
- File-based storage for scenes, stories, comics, scripts, exports, and users
/— scenes home/settings— LLM, Vision, Video settings/scenes/:id— scene detail/scenes/:id/story— story mode/scenes/:id/comic— Comics Studio/scenes/:id/comic/video— Video Studio/scripts//scripts/:id— script workspace
The backend currently registers these providers:
openaianthropicgoogledeepseekqwenmistralgrokglmgithubmodelsopenrouternvidia
Notes:
- Reasoning / thinking mode is now default-off across the LLM layer for structured analysis safety.
- Provider-specific default suppression is applied where supported, including Google, Qwen, and NVIDIA.
- The frontend still contains an
ollamaoption in Settings UI, but it is not part of the current backend-supported provider matrix and is therefore not documented here as an officially available path.
placeholdersdwebuidashscopegeminiarkopenaiglm
Default model and model catalog are delivered through GET /api/settings via vision_default_model, vision_models, and vision_model_providers.
dashscopeklinggooglevertexarkmock
Default video model routing is also exposed through GET /api/settings via video_default_model, video_models, and video_model_providers.
frontend (React + Vite + MUI)
│
├─ REST (/api/*)
├─ SSE (/api/progress/:taskID)
└─ WS (/ws/*)
│
backend (Go + Gin)
│
├─ config / auth / rate limit / API handlers
├─ LLM / Vision / Video / Story / Script / Comic services
└─ file storage under data/
Core source directories:
cmd/server— server entryinternal/api— router, handlers, middlewareinternal/app— application bootstrapping and provider registrationinternal/config— runtime config and encryption handlinginternal/llm— provider abstraction and reasoning controlinternal/services— business logicinternal/vision— vision providersfrontend— SPA client
- Go 1.21+
- Node.js 18+
- npm 9+
- At least one usable provider credential
go mod download
cd frontend
npm install
cd ..cd frontend
npm run build
cd ..go run ./cmd/serverDefault address: http://localhost:8080
- Home:
http://localhost:8080/ - Settings:
http://localhost:8080/settings
The runtime config is persisted to data/config.json.
Important top-level fields:
llm_provider,llm_configvision_provider,vision_default_model,vision_configvideo_provider,video_default_model,video_configvision_models,vision_model_providersvideo_models,video_model_providers
Minimal example:
{
"port": "8080",
"data_dir": "data",
"static_dir": "frontend/dist/assets",
"templates_dir": "frontend/dist",
"log_dir": "logs",
"debug_mode": true,
"llm_provider": "nvidia",
"llm_config": {
"api_key": "",
"base_url": "https://integrate.api.nvidia.com/v1",
"default_model": "moonshotai/kimi-k2.5"
},
"vision_provider": "glm",
"vision_default_model": "glm-image",
"vision_config": {
"endpoint": "https://open.bigmodel.cn/api/paas/v4",
"api_key": ""
},
"video_provider": "dashscope",
"video_default_model": "wan2.6-i2v-flash",
"video_config": {
"endpoint": "https://dashscope.aliyuncs.com/api/v1",
"api_key": "",
"public_base_url": "https://your-domain.example"
}
}- If
CONFIG_ENCRYPTION_KEYis absent in development mode, the app createsdata/.encryption_keyautomatically. - Keep
data/.encryption_keytogether withdata/config.json. - Deleting the key file invalidates previously encrypted credentials.
- Build frontend assets.
- Start backend.
- Open Settings and configure one working LLM provider.
- Optionally configure Vision and Video providers.
- Create a scene or a standalone comic workspace.
Analysis, prompt generation, image generation, script generation, and video generation are asynchronous. The usual pattern is:
- Start a job and receive
task_id - Subscribe to
GET /api/progress/:taskID - Fetch final result from the corresponding
GETendpoint
- Many scene-oriented routes degrade to
console_userwhen auth is missing or invalid. - User-scoped routes under
/api/users/:user_id/...require authenticated ownership. - Scripts routes require authenticated access.
Some video providers need a publicly reachable reference image URL. In practice, that means video_config.public_base_url should usually be configured for deployed environments.
Backend:
go test ./...
go run ./cmd/serverFrontend:
cd frontend
npm run dev
npm test
npm run lint
npm run buildThis repository already contains significantly more than an initial prototype. The maintained documentation now treats it as:
- a multi-workspace creative platform,
- with provider-configurable AI services,
- job-based asynchronous generation,
- and documented operational deployment requirements.
That is the baseline future changes should preserve.
