Skip to content

arXiv fetcher broken: query params ?skip=0&show=N return HTTP 400 #2

@lyy0323

Description

@lyy0323

Bug Description

The arXiv fetcher (fetchers/arxiv_fetcher.py) fails to fetch any papers because the URL format https://arxiv.org/list/{category}/new?skip=0&show={max_results} now returns HTTP 400.

Root Cause

arXiv no longer supports the ?skip=0&show=N query parameters on the /list/{category}/new page. The base URL without query params (https://arxiv.org/list/cs.AI/new) works fine and returns papers normally.

Line 11 in fetchers/arxiv_fetcher.py:

url = f"https://arxiv.org/list/{category}/new?skip=0&show={max_results}"

Reproduction

import requests
# This returns 400:
r1 = requests.get("https://arxiv.org/list/cs.AI/new?skip=0&show=5")
print(r1.status_code)  # 400

# This works fine:
r2 = requests.get("https://arxiv.org/list/cs.AI/new")
print(r2.status_code)  # 200

Suggested Fix

  1. Remove the query parameters from the URL
  2. Limit results in Python after parsing:
url = f"https://arxiv.org/list/{category}/new"
# ... parse papers ...
return papers[:max_results]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions