Skip to content

rox1694125-bit/bilifan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bilifan

Bilifan is a local Bilibili video learning-note generator.

The current implementation is a first-stage local MVP. It accepts one Bilibili or YouTube public video URL, processes only the current P/video, downloads audio, builds a transcript, chunks the transcript, asks codex exec for structured learning-note chapters, and renders an offline report.html. If Chrome is available it also tries to export report.pdf.

Boundaries

  • Bilifan runs locally and processes only URLs you provide.
  • Bilifan does not provide a hosted scraping service.
  • Bilifan does not include Bilibili cookies.
  • Bilifan does not bypass login, payment, region, risk-control, DRM, or access controls.
  • You are responsible for having permission to access and summarize the content.
  • If you provide cookies, they must only be used locally for content your account can already view.
  • Codex-backed summarization sends transcript chunks to the model service configured in your local Codex environment.
  • First-stage MVP does not generate SVG diagrams, extract video screenshots, or run a batch queue.

Usage

From an installed package:

python -m bilifan summarize "https://www.bilibili.com/video/BV...?p=2" \
  --yes-i-understand \
  --out ./outputs

From a source checkout without installing the package:

PYTHONPATH=src python -m bilifan summarize "https://www.bilibili.com/video/BV...?p=2" \
  --yes-i-understand \
  --out ./outputs

Common options:

python -m bilifan summarize "https://www.bilibili.com/video/BV...?p=1" \
  --format html,pdf \
  --transcriber auto \
  --language auto \
  --llm-provider codex-exec \
  --llm-model gpt-5.5 \
  --yes-i-understand \
  --out ./outputs

Useful flags:

  • --force-whisper: skip Bilibili subtitles and force local Whisper.
  • --language auto|zh|en: control Whisper fallback language/model selection. Bilibili subtitles are still preferred unless --force-whisper is used.
  • --require-pdf: return non-zero if Chrome cannot export PDF.
  • --allow-long-video: allow videos longer than 180 minutes.
  • --cookies-file / --cookies-from-browser: pass cookies to yt-dlp; Bilifan does not store the cookie file name in reports or config.
  • --overwrite: replace a run if the timestamp collides.

This creates a run directory like:

outputs/
  BV..._p2/
    latest.json
    runs/
      <timestamp>/
        diagnostics.json
        metadata.json
        transcript.json
        transcript.txt
        transcript.srt
        chunks.json
        partial_summaries/
          chunk_001.json
        chapters.json
        notes.md
        report.html
        report.pdf
        content_bundle.json

The command prints a relative run path such as:

Prepared Bilifan run: BV..._p2/runs/<timestamp>

report.html is the success artifact. report.pdf is best effort unless --require-pdf is passed. Successful transcript and summarization stages may also write transcript.txt, transcript.srt, and notes.md for easier reading or reuse outside Bilifan.

Content Bundle

Successful runs write content_bundle.json. This is the stable integration artifact for downstream tools such as Nabaichuan. It contains source metadata, chapter summaries, transcript segments, artifact links, and provenance without local absolute paths.

Retry

Use retry to regenerate downstream artifacts without downloading audio or rerunning Whisper:

python -m bilifan retry outputs/BV1abcDEF12G_p1/runs/2026-06-09_120000 --from bundle
python -m bilifan retry outputs/BV1abcDEF12G_p1/runs/2026-06-09_120000 --from render
python -m bilifan retry outputs/BV1abcDEF12G_p1/runs/2026-06-09_120000 --from summarization

Supported retry stages are bundle, render, and summarization.

YouTube Scope

YouTube support is limited to public ordinary videos with youtube.com/watch or youtu.be URLs. Bilifan prefers subtitles and falls back to Whisper when needed. Playlists, private videos, members-only videos, age-restricted videos, cookies, live streams, and Shorts-specific behavior are outside this phase.

Nabaichuan

See docs/nabaichuan-integration.md for the bundle mapping and JSONL converter:

python examples/content_bundle_to_nabaichuan.py \
  outputs/BV1abcDEF12G_p1/runs/2026-06-09_120000/content_bundle.json \
  --out /tmp/nabaichuan.jsonl

Local Web UI

Start the local Web UI with:

python -m bilifan serve --no-open

By default bilifan serve binds only to 127.0.0.1, uses ./outputs as the fixed history/output directory, generates a one-time access token, and prints a URL like:

http://127.0.0.1:8765/?token=<token>

Open that URL in your browser. Without --no-open, the CLI will try to open it automatically with your local default browser.

Current MVP boundaries:

  • Web UI access is protected by the printed token in the URL.
  • The server only supports local 127.0.0.1 binding in this MVP.
  • Web UI job outputs are always read from and written to ./outputs.
  • The Web UI can show TXT/SRT/MD export links and can ask macOS to open a run's local folder.
  • The Web UI does not support entering cookies.
  • The CLI still supports --cookies-file and --cookies-from-browser for local runs.

Runtime Requirements

  • Python 3.11+
  • yt-dlp
  • ffmpeg and ffprobe
  • openai-whisper for local fallback transcription
  • Codex CLI authenticated in the local environment
  • Chrome for PDF export

Project dependencies are declared in pyproject.toml and can be installed with uv sync --extra dev.

Verification

Unit tests:

PYTHONDONTWRITEBYTECODE=1 .venv/bin/python -m pytest -p no:cacheprovider -q

Live metadata smoke for the three public test URLs:

BILIFAN_METADATA_SMOKE=1 PYTHONDONTWRITEBYTECODE=1 \
  .venv/bin/python -m pytest -p no:cacheprovider tests/test_metadata_smoke.py -q

End-to-end smoke uses the real network, Bilibili, Whisper/Codex if needed, and Chrome PDF export. Prefer a short public video first.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages