Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions discogs_client/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def __init__(self, user_agent, consumer_key=None, consumer_secret=None, token=No
self.user_agent = user_agent
self.verbose = False
self._fetcher = RequestsFetcher()
self._trust_per_page = True # Default: True

if consumer_key and consumer_secret:
self.set_consumer_key(consumer_key, consumer_secret)
Expand Down Expand Up @@ -219,3 +220,13 @@ def set_timeout(self,
"""
self._fetcher.connect_timeout = connect
self._fetcher.read_timeout = read

@property
def trust_per_page(self) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of an observation, but we don't have much of method level doc strings at the moment, maybe we should go through the models at some point and document the main methods, at least the ones mentioned in our readthedocs page.

Copy link
Copy Markdown
Contributor

@JOJ0 JOJ0 May 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we certainly should and I'm also recently thinking we should start to use automatic linters (eg. ruff). I'll open a new issue for that.

Do you mind adding docstrings to all newly added methods? I'm kind of somewhere else at the moment. Otherwise I'll do it when I find a minute in the next days. I think whenever we touch something we should complete at least docstrings around there.

Or maybe @GFlores17 you have a minute to fire a commit with docstrings for all the new methods and properties that dont have one yet. thanks!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll write up some docstrings for all the new methods and props that I added. I'll look at other methods/props as well.

return self._trust_per_page

@trust_per_page.setter
def trust_per_page(self, value: bool) -> None:
if not isinstance(value, bool):
raise ValueError("trust_per_page must be a bool")
self._trust_per_page = value
42 changes: 33 additions & 9 deletions discogs_client/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,18 +354,42 @@ def _transform(self, item):
return item

def __getitem__(self, index):
Copy link
Copy Markdown
Contributor

@AnssiAhola AnssiAhola Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, this would query each page starting from page 1, which isn't ideal.
Could we compare the offset with actual page item count and walk backwards if the actual count is less that offset? This way we could dramatically reduce requests for large indexes.

Also this change in logic could/should(?) be opt-in (like our "exponential backoff" feature) and could be enabled with something like

# Different ideas for enabling/disabling this feature

client.safe_pagination = True # Default: False.
client.unsafe_pagination = False # Default: True.
client.safe_pagination_indexing = True # Default: False
client.unsafe_pagination_indexing = False # Default: True

# ... Other options?

@JOJ0 Any ideas/suggestions on this?

Also, please add tests that at least reproduces this bug so we can make sure it is fixed and stays fixed. Look in tests/test_models.py for examples how to mock api responses.

Copy link
Copy Markdown
Contributor

@JOJ0 JOJ0 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GFlores17 thanks for the submission, it is a good idea to finally find a solution for this even though it's not optimal. I agree with all @AnssiAhola mentioned. We need:

  • A test to reproduce
  • It should be configurable
  • And we need to document it

If you are you still motivated to move this PR forward, we can assist with the details then.

I asked and AI to draft some tests that might be a starting point. @AnssiAhola please have a look at them if good enough or too much. Thanks!

Copy link
Copy Markdown
Contributor

@JOJ0 JOJ0 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GFlores17 I rebased your 2 commits into one and force pushed. Please reset your branch to upstream. I also pushed the drafted tests.

Copy link
Copy Markdown
Contributor

@JOJ0 JOJ0 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the config option, what about:

trust_per_page = False  # Default: True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests seem fine, even though a bit verbose, as is tradition for AI.

For the config option, what about:

trust_per_page = False  # Default: True

This sounds good to me. With some documentation this should be clear enough.

Also realized that my suggestion for "reverse walk" wouldn't work, so the original idea to start always from page 1 is pretty much necessary.

page_index = index // self.per_page + 1
offset = index % self.per_page
"""Retrieve an item by its index.

try:
page = self.page(page_index)
except HTTPError as e:
if e.status_code == 404:
raise IndexError(e.msg)
else:
By default, uses the API's ``per_page`` value to calculate the page
containing the item directly. If the API returns fewer items per page
than reported, this may yield incorrect results — set
``client.trust_per_page = False`` to fall back to a sequential page
walk at the cost of performance.
"""
if self.client._trust_per_page:
page_index = index // self.per_page + 1
offset = index % self.per_page

try:
page = self.page(page_index)
except HTTPError as e:
if e.status_code == 404:
raise IndexError(e.msg) from e
raise

return page[offset]

# Fallback to sequential page loading if we're not trusting the per_page parameter
current = 0
page_index = 1
while True:
try:
page = self.page(page_index)
except HTTPError as e:
if e.status_code == 404:
raise IndexError(e.msg) from e
raise
if current + len(page) > index:
return page[index - current]
current += len(page)
page_index += 1

return page[offset]

def __len__(self):
return self.count
Expand Down
40 changes: 40 additions & 0 deletions discogs_client/tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,46 @@ def test_pagination(self):
results.per_page = 10
self.assertTrue(results._num_pages is None)

def test_pagination_index_without_trust_per_page_and_short_pages(self):
"""Without trust_per_page, sequential page walking handles under-filled pages correctly."""
client = Client('ua')
client.trust_per_page = False # opt into sequential walk
client._base_url = ''
client._trust_per_page = False
client._fetcher = MemoryFetcher({
'/artists/1': (
b'{"id": 1, "name": "Badger", "releases_url": "/artists/1/releases"}',
200,
),
'/artists/1/releases?page=1&per_page=50': (
b'{"pagination": {"per_page": 50, "pages": 3, "items": 5}, '
b'"releases": ['
b'{"id": 101, "type": "release", "title": "A"},'
b'{"id": 102, "type": "release", "title": "B"}'
b']}',
200,
),
'/artists/1/releases?page=2&per_page=50': (
b'{"pagination": {"per_page": 50, "pages": 3, "items": 5}, '
b'"releases": [{"id": 103, "type": "release", "title": "C"}]}',
200,
),
'/artists/1/releases?page=3&per_page=50': (
b'{"pagination": {"per_page": 50, "pages": 3, "items": 5}, '
b'"releases": ['
b'{"id": 104, "type": "release", "title": "D"},'
b'{"id": 105, "type": "release", "title": "E"}'
b']}',
200,
),
})

results = client.artist(1).releases
self.assertEqual(results[0].id, 101) # page 1
self.assertEqual(results[2].id, 103) # page 2, cross-page
self.assertEqual(results[4].id, 105) # page 3, last item
self.assertRaises(IndexError, lambda: results[5])

def test_timeout_defaults_to_none(self):
# Need to create client without LoggingDelegator here
# self.d would throw AttributeError trying to access timeout properties on LoggingDelegator
Expand Down
22 changes: 22 additions & 0 deletions docs/source/optional_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,25 @@ client.set_timeout(
```

_Timeouts support integer and float values, you can also set either value to `None` to disable timeout for connect or read separately_

## Trust Discogs per page value

When accessing paginated results by index (e.g. `results[42]`),
python3-discogs-client uses the `per_page` value from the API response to
calculate which page contains the item directly. This is fast, but assumes the
API always returns exactly `per_page` items per page.

If the API returns fewer items per page than the reported `per_page` value,
this calculation can be off. In such cases, disable this behaviour to fall
back to a sequential page walk:

```python
>>> import discogs_client
>>> d = discogs_client.Client('ExampleApplication/0.1')
>>> d.trust_per_page = False
```

:::{attention}
The sequential fallback is slower for large result sets, as it must fetch pages
one by one until it reaches the requested index.
:::
Loading