Bug Description
The arXiv fetcher (fetchers/arxiv_fetcher.py) fails to fetch any papers because the URL format https://arxiv.org/list/{category}/new?skip=0&show={max_results} now returns HTTP 400.
Root Cause
arXiv no longer supports the ?skip=0&show=N query parameters on the /list/{category}/new page. The base URL without query params (https://arxiv.org/list/cs.AI/new) works fine and returns papers normally.
Line 11 in fetchers/arxiv_fetcher.py:
url = f"https://arxiv.org/list/{category}/new?skip=0&show={max_results}"
Reproduction
import requests
# This returns 400:
r1 = requests.get("https://arxiv.org/list/cs.AI/new?skip=0&show=5")
print(r1.status_code) # 400
# This works fine:
r2 = requests.get("https://arxiv.org/list/cs.AI/new")
print(r2.status_code) # 200
Suggested Fix
- Remove the query parameters from the URL
- Limit results in Python after parsing:
url = f"https://arxiv.org/list/{category}/new"
# ... parse papers ...
return papers[:max_results]
Bug Description
The arXiv fetcher (
fetchers/arxiv_fetcher.py) fails to fetch any papers because the URL formathttps://arxiv.org/list/{category}/new?skip=0&show={max_results}now returns HTTP 400.Root Cause
arXiv no longer supports the
?skip=0&show=Nquery parameters on the/list/{category}/newpage. The base URL without query params (https://arxiv.org/list/cs.AI/new) works fine and returns papers normally.Line 11 in
fetchers/arxiv_fetcher.py:Reproduction
Suggested Fix