Turn high-density social media videos into readable knowledge.
Capture the transcript. Clean it. Summarize it. Reality-check it.
Typical moment: A friend, teacher, or coworker sends you a 10-minute video packed with ideas. You do not have time to watch it now — but you still need the substance.
TLDR.skill gives you the whole chain:
social video link -> transcript -> cleaned transcript -> summary -> Reality Check
When a friend, teacher, or coworker sends you a long, high-density video from social media, the problem is usually not "can I open the link?"
The real problem is:
- you don't have time to watch it now
- the video may contain useful ideas buried inside dense speech
- you still want to know what it says, what matters, and what to trust
TLDR.skill solves that full value chain:
social media video link -> transcript capture -> transcript cleanup -> summary -> Reality Check
That is the core value:
- get the video into text from Xiaohongshu / Douyin / YouTube
- clean the transcript so it becomes readable
- summarize the content so you get the key takeaway fast
- add a Reality Check so you know what sounds solid vs what needs caution
So the output is not just more readable — it is more usable.
Input: a video URL from
- Xiaohongshu / 小红书
- Douyin / 抖音
- YouTube
Output:
- Summary — the distilled takeaway
- Reality Check — what seems credible vs what needs caution
- Cleaned Transcript — punctuated, paragraphized, easier to read than raw ASR
video -> raw transcript -> summary
video -> raw transcript -> cleaned transcript -> summary + reality check
That extra cleanup stage matters a lot in practice, especially for:
- mixed Chinese / English terminology
- ASR mistakes like
token/AIDC/Agent - creator videos with poor punctuation after transcription
- short-form video content that is fast, noisy, and context-heavy
This is the most common case:
- a friend sends you a creator video
- a teacher shares a lecture clip
- a coworker drops a social-media link in chat
The video may be valuable, but it is long, dense, and inconvenient to watch right now.
What you really need is:
- the transcript
- the cleaned version
- the summary
- the Reality Check
You see a video about AI, startups, monetization, or distribution and want the point without spending 10 minutes watching it.
Instead of sending “watch this,” send:
- summary
- reality check
- transcript
Turn noisy video links into Markdown you can archive, grep, reuse, and export.
- official / existing captions via
youtube-transcript-api yt-dlpauto subtitles- media download + ASR fallback
- Playwright + local Chrome
- capture the real audio request
- download audio
- run STT
yt-dlpdownload- STT
# 视频转录 Digest - <title>
- 平台:<platform>
- 链接:<url>
- 转录来源:<provider>
## 总结
### 一句话结论
...
### 核心要点
- ...
### 关键信号
- ...
## Reality Check
### 核心判断
...
### 哪些点相对可信
- ...
### 哪些点需要谨慎
- ...
### 最终结论
...
## 转录稿
...python3.11 -m venv venv
source venv/bin/activate
pip install -e ".[video]"tldr-skill "https://www.youtube.com/watch?v=Mfzucn4f9Xk" --output /tmp/video_digest.mdtldr-skill "https://v.douyin.com/xxxx/" --format jsonsrc/tldr_skill/
cli.py
llm.py
transcription.py
video_digest.py
tests/
test_video_digest.py
skills/
video-link-transcript-digest/SKILL.md
This repo is now intentionally lean. It contains the actual TLDR.skill implementation, not a full Hermes snapshot.
If a target platform needs browser cookies:
export HERMES_YTDLP_COOKIES_FROM_BROWSER=chrome
export HERMES_YTDLP_BROWSER_HOME=/Users/your-usernameIf you want to override the browser path for Douyin capture:
export TLDR_SKILL_CHROME_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"If you want to override the LLM endpoint / model:
export OPENAI_API_KEY=...
export TLDR_SKILL_MODEL=gpt-4.1-mini
# optional
export TLDR_SKILL_BASE_URL=...pytest -qInside Hermes, the intended trigger is:
转录 <url>转录这个视频 <url>总结这个视频 <url>
MIT.