warp-pypose

⚡ NVIDIA Warp-accelerated Lie group operations for PyPose

Features • Installation • Quick Start • Operations • Benchmarks • Development

warp-pypose provides a high-performance NVIDIA Warp-based backend for PyPose LieTensor operations. It offers significant speedups for Lie group computations on both CPU and CUDA, with full support for automatic differentiation.

Features

🚀 Drop-in acceleration — Seamlessly swap PyPose backends with a single function call
⚡ Warp-powered kernels — Optimized parallel implementations for CPU and CUDA
🔄 Full autodiff support — Analytical gradients for all operations with PyTorch integration
📐 Comprehensive Lie group coverage — SE(3), SO(3), se(3), so(3) algebras and groups
🎯 FP16/FP32/FP64 precision — Multi-precision support with numerically stable implementations
📊 Arbitrary batch dimensions — Full broadcasting support up to 4D batches

Installation

Requirements

Python 3.10+
PyTorch 2.0+
CUDA 12.0+ (for GPU acceleration)

Install from source

git clone https://github.com/MAC-VO/warp-pypose.git
cd warp-pypose
pip install -e .

Dependencies

pip install torch pypose warp-lang

Quick Start

Basic Usage

import torch
import pypose as pp
import pypose_warp

# Create a standard PyPose SE3 LieTensor
poses = pp.randn_SE3(1000, device="cuda", dtype=torch.float32)
points = torch.randn(1000, 3, device="cuda", dtype=torch.float32)

# Convert to Warp backend for accelerated computation
poses_warp = pypose_warp.to_warp_backend(poses)

# Use exactly like PyPose — all operations are accelerated
transformed = poses_warp.Act(points)      # Apply SE3 to points
matrices = poses_warp.matrix()            # Convert to 4x4 matrices
logs = poses_warp.Log()                   # Logarithm map to se3
composed = poses_warp @ poses_warp.Inv()  # Compose transformations

Gradient Computation

import torch
import pypose as pp
from pypose_warp import to_warp_backend

# Enable gradients
poses = pp.randn_SE3(100, device="cuda", requires_grad=True)
poses_warp = to_warp_backend(poses)
points = torch.randn(100, 3, device="cuda", requires_grad=True)

# Forward pass with Warp backend
result = poses_warp.Act(points)

# Backward pass — analytical gradients computed via Warp kernels
loss = result.sum()
loss.backward()

# Gradients available on original tensors
print(poses.grad.shape)   # (100, 7)
print(points.grad.shape)  # (100, 3)

Backend Conversion

from pypose_warp import to_warp_backend, to_pypose_backend, is_warp_backend

# Check and convert backends
poses = pp.randn_SE3(100)

if not is_warp_backend(poses):
    poses = to_warp_backend(poses)  # Convert to Warp for speed

# Convert back to PyPose if needed (e.g., for unsupported operations)
poses = to_pypose_backend(poses)

Supported Operations

SE(3) Group — Rigid Body Transformations

Operation	Description	Method
Act	Apply transform to 3D points	`X.Act(p)`
Act4	Apply transform to homogeneous points	`X.Act(p)` (4D)
Mul	Compose two SE3 transforms	`X @ Y`
Inv	Invert transformation	`X.Inv()`
Log	Logarithm map to se(3)	`X.Log()`
Adj	Adjoint action on se(3)	`X.Adj(a)`
AdjT	Transpose adjoint action	`X.AdjT(a)`
Jinvp	Inverse left Jacobian action	`X.Jinvp(p)`
matrix	Convert to 4×4 matrix	`X.matrix()`
add_	In-place update via Exp	`X.add_(delta)`

SO(3) Group — 3D Rotations

Operation	Description	Method
Act	Rotate 3D points	`R.Act(p)`
Act4	Rotate homogeneous points	`R.Act(p)` (4D)
Mul	Compose rotations	`R @ S`
Log	Logarithm map to so(3)	`R.Log()`
Adj	Adjoint action on so(3)	`R.Adj(a)`
AdjT	Transpose adjoint action	`R.AdjT(a)`
Jinvp	Inverse left Jacobian action	`R.Jinvp(p)`
matrix	Convert to 3×3 matrix	`R.matrix()`
add_	In-place update via Exp	`R.add_(delta)`

se(3) Algebra — SE(3) Tangent Space

Operation	Description	Method
Exp	Exponential map to SE(3)	`xi.Exp()`
Mat	Twist to 4×4 matrix	`xi.matrix()`

so(3) Algebra — SO(3) Tangent Space

Operation	Description	Method
Exp	Exponential map to SO(3)	`w.Exp()`
Mat	Angular velocity to 3×3 matrix	`w.matrix()`
Jr	Right Jacobian	`w.Jr()`

Benchmarks

Run the benchmark suite to compare Warp vs PyPose performance:

# Run all benchmarks (generates PNG charts)
python -m bench

# Run specific operator benchmarks
python -m bench.SE3_group
python -m bench.SO3_group
python -m bench.SE3_algebra
python -m bench.SO3_algebra

# Run individual operator with custom settings
python -m bench.SE3_group.Act --device cuda --dtype fp32 --size 10000

Benchmarks test across:

Devices: CPU, CUDA
Data types: FP16, FP32, FP64
Batch sizes: 128 to 32,768
Modes: Forward and backward passes

Results are saved as PNG charts in the respective benchmark directories.

Development

Docker Environment

The recommended development environment uses Docker with NVIDIA GPU support:

# Auto-detect CUDA version and start container
./launch.sh

# Force specific CUDA version
FORCE_CUDA=12 ./launch.sh

# Mount additional paths
./launch.sh /path/to/dataset /path/to/models

Supported configurations:

Linux x86_64: CUDA 12.x, CUDA 13.x
Jetson Orin: CUDA 12.x (aarch64)
Jetson Thor: CUDA 13.x (aarch64)

Running Tests

# Run full test suite
pytest tests/ -v

# Run specific test file
pytest tests/test_SE3_group_Act.py -v

# Run with specific device/dtype
pytest tests/ -v -k "cuda and fp32"

# Run with coverage
pytest tests/ --cov=pypose_warp --cov-report=html

Project Structure

warp-pypose/
├── pypose_warp/
│   ├── __init__.py          # Backend conversion utilities
│   ├── ltype/
│   │   ├── SE3_group/        # SE(3) Lie group operations
│   │   ├── SO3_group/        # SO(3) Lie group operations
│   │   ├── SE3_algebra/      # se(3) Lie algebra operations
│   │   ├── SO3_algebra/      # so(3) Lie algebra operations
│   │   └── common/           # Shared kernel utilities
│   └── utils/
├── bench/                    # Benchmark suite
│   ├── SE3_group/
│   ├── SO3_group/
│   ├── SE3_algebra/
│   └── SO3_algebra/
├── tests/                    # Comprehensive test suite
├── docker/                   # Docker development environment
└── launch.sh                 # Container launch script

Adding New Operations

Each operator follows a consistent pattern:

Forward kernel (fwd.py): Warp kernel implementing the operation
Backward kernel (bwd.py): Warp kernel for analytical gradients
Autograd wrapper (__init__.py): PyTorch Function connecting both

Example structure for SE3_Act:

# fwd.py - Forward pass
@wp.kernel
def se3_act_kernel(...):
    # Warp kernel implementation
    
def SE3_Act_fwd(X, p):
    # Prepare tensors, launch kernel, return result

# bwd.py - Backward pass  
@wp.kernel
def se3_act_bwd_kernel(...):
    # Gradient computation kernel

def SE3_Act_bwd(X, out, grad_output):
    # Compute gradients

# __init__.py - PyTorch integration
class SE3_Act(torch.autograd.Function):
    @staticmethod
    def forward(ctx, X, p):
        return SE3_Act_fwd(X, p)
    
    @staticmethod
    def backward(ctx, grad_output):
        return SE3_Act_bwd(...)

License

This project is licensed under the MIT License — see the LICENSE file for details.

Acknowledgments

PyPose — Differentiable Lie groups for robotics
NVIDIA Warp — High-performance simulation and graphics programming

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

warp-pypose

Features

Installation

Requirements

Install from source

Dependencies

Quick Start

Basic Usage

Gradient Computation

Backend Conversion

Supported Operations

SE(3) Group — Rigid Body Transformations

SO(3) Group — 3D Rotations

se(3) Algebra — SE(3) Tangent Space

so(3) Algebra — SO(3) Tangent Space

Benchmarks

Development

Docker Environment

Running Tests

Project Structure

Adding New Operations

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
bench		bench
docker		docker
pypose_warp		pypose_warp
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
launch.sh		launch.sh

Folders and files

Latest commit

History

Repository files navigation

warp-pypose

Features

Installation

Requirements

Install from source

Dependencies

Quick Start

Basic Usage

Gradient Computation

Backend Conversion

Supported Operations

SE(3) Group — Rigid Body Transformations

SO(3) Group — 3D Rotations

se(3) Algebra — SE(3) Tangent Space

so(3) Algebra — SO(3) Tangent Space

Benchmarks

Development

Docker Environment

Running Tests

Project Structure

Adding New Operations

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages