Skip to content

Conversation

@rdspring1
Copy link
Collaborator

@rdspring1 rdspring1 commented Jan 8, 2026

Library Size -- Nanobind is 52.6% smaller than PyBind11

Size (MB) Library
1.060 Nanobind
2.238 Pybind11

@rdspring1 rdspring1 added the Direct Bindings Python extension with direct mapping to NvFuser CPP objects. label Jan 8, 2026
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Review updated until commit f264148

Description

  • Migrate library from pybind11 to nanobind for 52.6% size reduction

  • Fix github workflow configuration for new binding system

  • Resolve define_tensor_error_generator compatibility issues

  • Optimize from_pysequence implementation for nanobind

  • Handle IntegerProxy type conversion properly

Changes walkthrough

Relevant files

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 No relevant tests
⚡ Recommended focus areas for review
API Migration Completeness

The migration appears comprehensive but needs verification that all pybind11 APIs have been correctly translated to nanobind equivalents. Pay special attention to complex template instantiations, return value policies, and STL container bindings. The convertToVal function and macro definitions should be thoroughly tested.

if (std::holds_alternative<Val*>(value)) {
  Val* v = std::get<Val*>(value);
  NVF_ERROR(
      dtype == std::nullopt || v == nullptr ||
      std::get<PrimDataType>(v->dtype().type) == dtype.value());
  return std::get<Val*>(value);
}
Type Conversion Robustness

The new toPolymorphicValue function replaces torch::jit::toIValue. This is a significant change in type conversion logic that could affect how Python objects are converted to C++ types. Verify that all expected Python types (tensors, scalars, complex numbers) are properly handled and that error cases throw appropriate exceptions.

PolymorphicValue toPolymorphicValue(const nb::handle& obj) {
  if (nb::isinstance<nb::ndarray<nb::pytorch>>(obj)) {
    return PolymorphicValue(nb::cast<at::Tensor>(obj));
  } else if (nb::isinstance<nb::bool_>(obj)) {
    return PolymorphicValue(nb::cast<bool>(obj));
  } else if (nb::isinstance<nb::int_>(obj)) {
    return PolymorphicValue(nb::cast<int64_t>(obj));
  } else if (nb::isinstance<nb::float_>(obj)) {
    return PolymorphicValue(nb::cast<double>(obj));
  } else if (nb::isinstance<std::complex<double>>(obj)) {
    return PolymorphicValue(nb::cast<std::complex<double>>(obj));
  }
  NVF_THROW("Cannot convert provided nb::handle to a PolymorphicValue.");
}
Module Initialization

The module definition changed from PYBIND11_MODULE to NB_MODULE. While this looks correct, ensure that the initialization function nvfuser::python::initNvFuserPythonBindings is compatible with the nanobind module interface and that all submodule bindings work correctly.

NB_MODULE(PYTHON_DIRECT_EXTENSION, m) {
  m.doc() = "Python bindings for NvFuser Direct CPP API";
  nvfuser::python::initNvFuserPythonBindings(m);
}

Test failures

  • (High, 46) System-wide NCCL NVLink-SHARP (NVLS) binding failure in multidevice nvFuser/distributed tests on dlcluster_viking_ci

    Test Name H100 (dist.) Source
    tests.python.multidevice.test_communication.test_allgather
    tests.python.multidevice.test_communication.test_allgather_expanded_broadcast
    tests.python.multidevice.test_communication.test_allreduce
    tests.python.multidevice.test_communication.test_reduce_scatter
    tests.python.multidevice.test_communication.test_reduce_scatter_noncontiguous
    tests.python.multidevice.test_dtensor.test_column_parallel_linear
    tests.python.multidevice.test_dtensor.test_plus_one
    tests.python.multidevice.test_dtensor.test_row_parallel_linear
    tests.python.multidevice.test_expert_parallel.test_dispatch_and_combine
    tests.python.multidevice.test_matmul.test_column_parallel_grouped_mm
    ... with 36 more test failures omitted. Check internal logs.
  • (Medium, 18) nvFuser define_tensor() argument mismatch in OpInfo legacy error tests (multiple dtypes)

    Test Name A100 GB200 H100 Source
    tests.python.opinfo.test_legacy_ops.test_errors_define_tensor_complex128
    tests.python.opinfo.test_legacy_ops.test_errors_define_tensor_complex64
    tests.python.opinfo.test_legacy_ops.test_errors_define_tensor_float32
    tests.python.opinfo.test_legacy_ops.test_errors_define_tensor_float64
    tests.python.opinfo.test_legacy_ops.test_errors_define_tensor_int32
    tests.python.opinfo.test_legacy_ops.test_errors_define_tensor_int64
  • (Medium, 1) NCCL invalid usage error in tests/python/multidevice overlap test

    Test Name H100 (dist.) Source
    tests.python.multidevice.test_overlap.test_overlap_allgather_matmul_shard_outermost[backend_type=CommunicatorBackend.cuda]

@rdspring1 rdspring1 force-pushed the nanobind_direct branch 3 times, most recently from 1373f72 to d5b6860 Compare January 9, 2026 05:18
@rdspring1
Copy link
Collaborator Author

!test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Direct Bindings Python extension with direct mapping to NvFuser CPP objects.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants