Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 16, 2025

Continuation of #128214. This PR

  • Improves performance of bytes creation from a list or tuple
  • Fixes a free-threading bug
  • Reduces some duplicated code between list and tuple handling

Comment on lines 2871 to 2872
/* Py_None as a fallback sentinel to the slow path */
bytes = Py_None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Py_None as a fallback sentinel to the slow path */
bytes = Py_None;
/* Py_None as a fallback sentinel to the slow path */
Py_INCREF(Py_None);
bytes = Py_None;

Is needed, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_None is immortal, so not needed

@eendebakpt eendebakpt marked this pull request as draft April 16, 2025 12:41
eendebakpt and others added 4 commits April 16, 2025 14:58
…e-128213.Y71jDi.rst

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@eendebakpt eendebakpt marked this pull request as ready for review April 16, 2025 14:09
goto error;
PyObject *const *items = PySequence_Fast_ITEMS(x);
for (Py_ssize_t i = 0; i < size; i++) {
if (!PyLong_Check(items[i])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're interested in speed, PyLong_CheckExact will allow you to use _PyLong_IsNonNegativeCompact and _PyLong_CompactValue to get the C int in a few cycles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. For the common/happy path we now we call PyNumber_AsSsize_t, which calls PyLong_AsSsize_t. That methods does another PyLong_Check and then a call to _PyLong_IsCompact.

So we have some options:

  1. We could remove the call to PyLong_Check (as it is already covered by PyLong_AsSsize_t), that improves performance a bit for all cases.
  2. We could use PyLong_CheckExact with _PyLong_IsNonNegativeCompact. Fastest for exact ints, but non-exact ints become slower. I suspect non-exact ints are rare though.
  3. We add a fast path using PyLong_CheckExact and a fallback fast path using PyNumber_AsSsize_t for the non-exacts ints. Fast for all cases, but takes a but more code.

@markshannon Any preference? I am happy to work out any of the above.

@markshannon
Copy link
Member

Tuples are immutable, so why does creating a bytes object from a tuple require synchronization?

@eendebakpt
Copy link
Contributor Author

Tuples are immutable, so why does creating a bytes object from a tuple require synchronization?

Tuples indeed do not require synchronization.

In this PR exact lists and tuples use the path (using synchronization with Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST). We could split the path, but I suspect the performance gain for tuples would be minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants