📑 YouTube Research Assistant

This project automates the workflow of finding YouTube videos related to an article, extracting transcripts, and saving metadata into structured JSON files.

It combines:

Scraping (scrapper.py)
Article parsing (TXT, PDF, DOCX)
Keyword generation via OpenRouter GPT API
YouTube search & transcripts (via yt_dlp + youtube_transcript_api)
Logging system (avoid re-processing the same videos)

🚀 Features

Load research articles (.txt, .pdf, .docx)
Extract a smart keyword phrase using OpenRouter GPT
Search YouTube for top 3 relevant videos
Fetch transcripts:
- First try youtube_transcript_api
- Fallback to yt_dlp subtitles
- Auto-translate to English if needed
Save results as JSON (results/<article>.json)
Maintain logs:
- Processed videos → logs/videos.log
- Scraper logs → logs/scraper.log

📂 Project Structure

.
├── Article/              # Input folder (research articles)
│   └── example.pdf
├── results/              # Output folder (JSON metadata & transcripts)
├── logs/                 # Logging (processed videos & scraper logs)
│   ├── videos.log
│   └── scraper.log
├── scrapper.py           # Custom scraper script (provided separately)
├── main.py               # Main script (this file)
└── README.md             # Documentation

⚙️ Installation

Clone repo and install dependencies:

git clone <repo-url>
cd <repo>
pip install -r requirements.txt

Create .env file with your OpenRouter API key:
```
OPENROUTER_API_KEY=your_api_key_here
```
Add at least one article file (.txt, .pdf, or .docx) in Article/.

▶️ Usage

Run the main script:

python main.py

Steps performed:

Runs the web scraper (scrapper.py)
Loads the first article from Article/
Extracts best YouTube search phrase
Fetches top videos & transcripts
Saves results to results/<article>.json

📜 Example Output (`results/example.json`)

[
  {
    "title": "AI Research Breakthrough 2025",
    "url": "https://www.youtube.com/watch?v=abcd1234",
    "views": 120394,
    "published": "20250801",
    "duration": 600,
    "channel": "AI News",
    "transcript": "In this video, we explore the latest research..."
  }
]

📝 Logs

logs/videos.log → Keeps track of already processed videos (avoids duplicates)
logs/scraper.log → Handled internally by scrapper.py

⚠️ Notes

If no OPENROUTER_API_KEY is found, it defaults to "latest research update" as the query.
If no transcript is available, a placeholder message is saved.
scrapper.py must be implemented with a function scrape_links() (imported in main.py).

📌 Requirements

Python 3.8+

Libraries:

yt-dlp
deep-translator
youtube-transcript-api
python-dotenv
PyPDF2
python-docx
requests

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Readme.md		Readme.md
links.txt		links.txt
main.py		main.py
requirements.txt		requirements.txt
scrapper.py		scrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📑 YouTube Research Assistant

🚀 Features

📂 Project Structure

⚙️ Installation

▶️ Usage

📜 Example Output (`results/example.json`)

📝 Logs

⚠️ Notes

📌 Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📑 YouTube Research Assistant

🚀 Features

📂 Project Structure

⚙️ Installation

▶️ Usage

📜 Example Output (results/example.json)

📝 Logs

⚠️ Notes

📌 Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

📜 Example Output (`results/example.json`)

Packages