-
Notifications
You must be signed in to change notification settings - Fork 8
feat: add Crunchyroll, HBO Max and TVB Anywhere support, fix Youku extractor #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3cf350f to
bbd9476
Compare
bbd9476 to
1a277cc
Compare
- Add hbomax_extractor to extract episode data from HBO Max website - Add tvbanywhere_extractor to extract episode data from TVB Anywhere website - Update extractor.py to recognize HBO Max and TVB Anywhere domains - Fix youku_extractor to handle cases where API does not return show field - Use Playwright to extract showId from page when API fails - Use pure numeric episode numbering (1, 2, 3...) matching project conventions
3de5316 to
d623b21
Compare
d623b21 to
a980a54
Compare
4d32fce to
3ad208a
Compare
… retry mechanism for CAPTCHA handling
3ad208a to
6a7a700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for three new streaming platforms (Crunchyroll, HBO Max, and TVB Anywhere) and fixes an issue in the Youku extractor where the API doesn't always return show information.
Changes:
- Added Crunchyroll extractor with authorization token capture via Playwright and multi-language support
- Added HBO Max extractor with Playwright-based data scraping from Next.js page data
- Added TVB Anywhere extractor with language-aware API integration
- Fixed Youku extractor to handle missing show data by falling back to Playwright-based page scraping
- Updated extractor.py to register the three new streaming platform domains
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| tmdb-import/extractors/youku.py | Enhanced with fallback Playwright extraction when API doesn't return show information |
| tmdb-import/extractors/tvbanywhere.py | New extractor for TVB Anywhere with programme ID extraction and multi-language support |
| tmdb-import/extractors/hbomax.py | New extractor using Playwright to extract from Next.js page data |
| tmdb-import/extractors/crunchyroll.py | New extractor with authorization token capture for accessing Crunchyroll's API |
| tmdb-import/extractor.py | Registered new domain handlers for Crunchyroll, HBO Max, and TVB Anywhere |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
835a6ff to
6a7a700
Compare
…andling and improved error handling
- Remove manual API request construction and locale handling
- Use page.on('response') to capture dynamic API responses from webpage
- Add comprehensive error handling with try-except block
- Extract episode backdrop logic to separate helper function
- Add localStorage/sessionStorage clearing to avoid language caching
- Improve code structure and follow project conventions
- Remove unused imports (urlparse)
…ormat
- Fix issue where only first season episodes were extracted by actively requesting episodes API for each season
- Use captured API template with correct locale parameter to request all seasons' episode data
- Implement smart episode numbering: pure numbers for single season (1,2,3...), S{season}E{episode} for multi-season
- Remove continuous episode numbering logic, use actual episodeNumber from API data
fix(hbomax): fix episode numbering to match project conventions
- Implement smart episode numbering: pure numbers for single season, S{season}E{episode} for multi-season
- Remove continuous episode_count logic, use actual episodeNumber from API data
…ere_extractor - Remove language parameter from function signature since language code is always derived from URL path - Update extractor.py call to match new signature - TVB Anywhere API language is determined by URL structure (/en/, /tc/, /sc/), not by parameter
- 恢复原始的category_map检查方式,只提取category_map[1]为'正片'的内容 - 移除标题回退逻辑,避免当second_title为空时回退到title(整部剧名称) - 优化episode_counter逻辑,避免番外篇覆盖主线集数 - 添加标题去重逻辑,当所有标题都一样时清空所有标题 - 支持综艺等没有明确集数编号的内容
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 16 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| soureData = json.loads(open_url(apiRequest)) | ||
| except Exception as e: | ||
| logging.error(f"Failed to request TVB Anywhere API: {e}") | ||
| return {} | ||
|
|
||
| episodes = {} | ||
|
|
||
| programme_name = soureData.get('programme_name', '') | ||
| programme_desc = soureData.get('programme_desc', '') | ||
| total_episodes = soureData.get('no_of_episode', 0) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name 'soureData' should be corrected to 'sourceData' for consistency with the corrected variable name above.
| soureData = json.loads(open_url(apiRequest)) | |
| except Exception as e: | |
| logging.error(f"Failed to request TVB Anywhere API: {e}") | |
| return {} | |
| episodes = {} | |
| programme_name = soureData.get('programme_name', '') | |
| programme_desc = soureData.get('programme_desc', '') | |
| total_episodes = soureData.get('no_of_episode', 0) | |
| sourceData = json.loads(open_url(apiRequest)) | |
| except Exception as e: | |
| logging.error(f"Failed to request TVB Anywhere API: {e}") | |
| return {} | |
| episodes = {} | |
| programme_name = sourceData.get('programme_name', '') | |
| programme_desc = sourceData.get('programme_desc', '') | |
| total_episodes = sourceData.get('no_of_episode', 0) |
| soureData = json.loads(open_url(apiRequest)) | ||
| except Exception as e: | ||
| logging.error(f"Failed to request TVB Anywhere API: {e}") | ||
| return {} | ||
|
|
||
| episodes = {} | ||
|
|
||
| programme_name = soureData.get('programme_name', '') | ||
| programme_desc = soureData.get('programme_desc', '') | ||
| total_episodes = soureData.get('no_of_episode', 0) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name 'soureData' should be corrected to 'sourceData' for consistency with the corrected variable name above.
| soureData = json.loads(open_url(apiRequest)) | |
| except Exception as e: | |
| logging.error(f"Failed to request TVB Anywhere API: {e}") | |
| return {} | |
| episodes = {} | |
| programme_name = soureData.get('programme_name', '') | |
| programme_desc = soureData.get('programme_desc', '') | |
| total_episodes = soureData.get('no_of_episode', 0) | |
| sourceData = json.loads(open_url(apiRequest)) | |
| except Exception as e: | |
| logging.error(f"Failed to request TVB Anywhere API: {e}") | |
| return {} | |
| episodes = {} | |
| programme_name = sourceData.get('programme_name', '') | |
| programme_desc = sourceData.get('programme_desc', '') | |
| total_episodes = sourceData.get('no_of_episode', 0) |
| soureData = json.loads(open_url(apiRequest)) | ||
| except Exception as e: | ||
| logging.error(f"Failed to request TVB Anywhere API: {e}") | ||
| return {} | ||
|
|
||
| episodes = {} | ||
|
|
||
| programme_name = soureData.get('programme_name', '') | ||
| programme_desc = soureData.get('programme_desc', '') | ||
| total_episodes = soureData.get('no_of_episode', 0) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the variable name 'soureData' which should be 'sourceData' for consistency and correct spelling.
| soureData = json.loads(open_url(apiRequest)) | |
| except Exception as e: | |
| logging.error(f"Failed to request TVB Anywhere API: {e}") | |
| return {} | |
| episodes = {} | |
| programme_name = soureData.get('programme_name', '') | |
| programme_desc = soureData.get('programme_desc', '') | |
| total_episodes = soureData.get('no_of_episode', 0) | |
| sourceData = json.loads(open_url(apiRequest)) | |
| except Exception as e: | |
| logging.error(f"Failed to request TVB Anywhere API: {e}") | |
| return {} | |
| episodes = {} | |
| programme_name = sourceData.get('programme_name', '') | |
| programme_desc = sourceData.get('programme_desc', '') | |
| total_episodes = sourceData.get('no_of_episode', 0) |
hbomax_extractorto extract episode data from HBO Max websitetvbanywhere_extractorto extract episode data from TVB Anywhere websiteyouku_extractorto handle cases where API does not return show field