Bilifan is a local Bilibili video learning-note generator.
The current implementation is a first-stage local MVP. It accepts one Bilibili
or YouTube public video URL, processes only the current P/video, downloads audio, builds a transcript,
chunks the transcript, asks codex exec for structured learning-note chapters,
and renders an offline report.html. If Chrome is available it also tries to
export report.pdf.
- Bilifan runs locally and processes only URLs you provide.
- Bilifan does not provide a hosted scraping service.
- Bilifan does not include Bilibili cookies.
- Bilifan does not bypass login, payment, region, risk-control, DRM, or access controls.
- You are responsible for having permission to access and summarize the content.
- If you provide cookies, they must only be used locally for content your account can already view.
- Codex-backed summarization sends transcript chunks to the model service configured in your local Codex environment.
- First-stage MVP does not generate SVG diagrams, extract video screenshots, or run a batch queue.
From an installed package:
python -m bilifan summarize "https://www.bilibili.com/video/BV...?p=2" \
--yes-i-understand \
--out ./outputsFrom a source checkout without installing the package:
PYTHONPATH=src python -m bilifan summarize "https://www.bilibili.com/video/BV...?p=2" \
--yes-i-understand \
--out ./outputsCommon options:
python -m bilifan summarize "https://www.bilibili.com/video/BV...?p=1" \
--format html,pdf \
--transcriber auto \
--language auto \
--llm-provider codex-exec \
--llm-model gpt-5.5 \
--yes-i-understand \
--out ./outputsUseful flags:
--force-whisper: skip Bilibili subtitles and force local Whisper.--language auto|zh|en: control Whisper fallback language/model selection. Bilibili subtitles are still preferred unless--force-whisperis used.--require-pdf: return non-zero if Chrome cannot export PDF.--allow-long-video: allow videos longer than 180 minutes.--cookies-file/--cookies-from-browser: pass cookies toyt-dlp; Bilifan does not store the cookie file name in reports or config.--overwrite: replace a run if the timestamp collides.
This creates a run directory like:
outputs/
BV..._p2/
latest.json
runs/
<timestamp>/
diagnostics.json
metadata.json
transcript.json
transcript.txt
transcript.srt
chunks.json
partial_summaries/
chunk_001.json
chapters.json
notes.md
report.html
report.pdf
content_bundle.json
The command prints a relative run path such as:
Prepared Bilifan run: BV..._p2/runs/<timestamp>
report.html is the success artifact. report.pdf is best effort unless
--require-pdf is passed. Successful transcript and summarization stages may
also write transcript.txt, transcript.srt, and notes.md for easier reading
or reuse outside Bilifan.
Successful runs write content_bundle.json. This is the stable integration
artifact for downstream tools such as Nabaichuan. It contains source metadata,
chapter summaries, transcript segments, artifact links, and provenance without
local absolute paths.
Use retry to regenerate downstream artifacts without downloading audio or rerunning Whisper:
python -m bilifan retry outputs/BV1abcDEF12G_p1/runs/2026-06-09_120000 --from bundle
python -m bilifan retry outputs/BV1abcDEF12G_p1/runs/2026-06-09_120000 --from render
python -m bilifan retry outputs/BV1abcDEF12G_p1/runs/2026-06-09_120000 --from summarizationSupported retry stages are bundle, render, and summarization.
YouTube support is limited to public ordinary videos with youtube.com/watch
or youtu.be URLs. Bilifan prefers subtitles and falls back to Whisper when
needed. Playlists, private videos, members-only videos, age-restricted videos,
cookies, live streams, and Shorts-specific behavior are outside this phase.
See docs/nabaichuan-integration.md for the bundle mapping and JSONL converter:
python examples/content_bundle_to_nabaichuan.py \
outputs/BV1abcDEF12G_p1/runs/2026-06-09_120000/content_bundle.json \
--out /tmp/nabaichuan.jsonlStart the local Web UI with:
python -m bilifan serve --no-openBy default bilifan serve binds only to 127.0.0.1, uses ./outputs as the
fixed history/output directory, generates a one-time access token, and prints a
URL like:
http://127.0.0.1:8765/?token=<token>
Open that URL in your browser. Without --no-open, the CLI will try to open it
automatically with your local default browser.
Current MVP boundaries:
- Web UI access is protected by the printed token in the URL.
- The server only supports local
127.0.0.1binding in this MVP. - Web UI job outputs are always read from and written to
./outputs. - The Web UI can show TXT/SRT/MD export links and can ask macOS to open a run's local folder.
- The Web UI does not support entering cookies.
- The CLI still supports
--cookies-fileand--cookies-from-browserfor local runs.
- Python 3.11+
yt-dlpffmpegandffprobeopenai-whisperfor local fallback transcription- Codex CLI authenticated in the local environment
- Chrome for PDF export
Project dependencies are declared in pyproject.toml and can be installed with
uv sync --extra dev.
Unit tests:
PYTHONDONTWRITEBYTECODE=1 .venv/bin/python -m pytest -p no:cacheprovider -qLive metadata smoke for the three public test URLs:
BILIFAN_METADATA_SMOKE=1 PYTHONDONTWRITEBYTECODE=1 \
.venv/bin/python -m pytest -p no:cacheprovider tests/test_metadata_smoke.py -qEnd-to-end smoke uses the real network, Bilibili, Whisper/Codex if needed, and Chrome PDF export. Prefer a short public video first.