RusselWebber · RusselWebber · Jan 31, 2025 · Jan 30, 2025 · Jan 30, 2025 · Jan 30, 2025
diff --git a/README.md b/README.md
@@ -1,3 +1,160 @@
+![GitHub Latest Tag](https://badgen.net/github/tag/RusselWebber/arrowsqlbcpy) ![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/RusselWebber/arrowsqlbcpy/ci.yml) ![PyPI Python Versions](https://img.shields.io/pypi/pyversions/arrowsqlbcpy)
+
 # arrowsqlbcpy
 
-A tiny library that uses .Net SqlBulkCopy to enable fast data loading to SQL Server. Apache Arrow is used to serialise data between Python and the native DLL. .Net AOT compilation is used to generate the native DLL.
+A tiny library that uses .Net SqlBulkCopy to enable fast data loading to Microsoft SQL Server. Apache Arrow is used to serialise data between Python and the native DLL. .Net Native Library AOT compilation is used to generate the native DLL.
+
+This library is _much_ faster than any other Python solution, including bcpandas, pyodbc and pymssql. See the benchmark results below.
+
+![Performance plot](performance.png)
+
+## Installation
+
+Binary wheels are available from PyPi and can be installed using your preferred package manager:
+
+> pip install arrowsqlbcpy
+
+or
+
+> uv add arrowsqlbcpy
+
+## Usage
+
+Connection strings for .Net are documented [here](https://www.connectionstrings.com/microsoft-data-sqlclient/)
+
+```python
+
+import pandas as pd
+from arrowsqlbcpy import bulkcopy_from_pandas
+
+# Create a connection string
+cn = r"Server=myServerAddress;Database=myDataBase;Trusted_Connection=True;"
+# The table to load into must exist and have the same column names and types as the pandas df
+tablename = "test"
+
+df = pd.DataFrame({"a":[1]*10000, "b":[2]*10000, "c":[3]*10000})
+
+bulkcopy_from_pandas(df, cn, tablename)
+
+```
+
+When testing it can be useful to have pandas create the table for you, see [tests/test_load.py](https://github.com/RusselWebber/arrowsqlbcpy/blob/main/tests/test_load.py) for an example.
+
+## Requirements
+
+Wheels are available for the latest versions of Windows 64 bit, MacOS ARM 64bit and Ubuntu 64 bit.
+
+Wheels are available for Python 3.9-3.13.
+
+### Linux support
+
+The Ubuntu wheels _may_ work on other Linux distros. Building C# native libaries and then packaging appropriately for multiple Linux distros is not straightforward. The simplest solution for most Linux distros is to simply pull the source from Github and build locally. These are the high-level steps:
+
+1. Install .net
+   https://learn.microsoft.com/en-us/dotnet/core/install/linux
+2. Clone the source
+   > git clone https://github.com/RusselWebber/arrowsqlbcpy
+3. Install uv
+   https://docs.astral.sh/uv/getting-started/installation/
+4. Build the wheel locally
+   > uv build --wheel
+5. Install the wheel
+   > pip install dist/wheel_file.whl
+
+## Benchmarks
+
+The benchmarks were run using the [richbench](https://github.com/tonybaloney/rich-bench) package. Tests were run repeatedly to get stable benchmarks.
+
+> richbench ./benchmarks
+
+The benchmarks load a 3m row parquet file of New York taxi data. Times are recorded for loading 1000 rows, 10 000 rows, 100 000 rows, 1 000 000 rows and finally all 3 000 000 rows.
+
+The benchmarks have a baseline of using pandas `to_sql()` and SQLAlchemy with pyodbc and pymssql. This is a common solution for loading pandas dataframes into SQL Server. A batch size of 10 000 rows was used in the benchmarks.
+
+The benchmarks show the time taken to load using various alternative strategies:
+
+| Label                 | Description                                                                                                                                                                                                                    |
+| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| fast_executemany=True | Use pandas `to_sql()`, SQLAlchemy, pyodbc, pymssql with the fast_executemany=True option as discussed [here](https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc) |
+| bcpandas              | Use the [bcpandas](https://github.com/yehoshuadimarsky/bcpandas) package to load the dataframes. The package writes temp files and spawns bcp processes to load them                                                           |
+| arrowsqlbcpy          | This package using .Net SqlBulkCopy                                                                                                                                                                                            |
+
+The richbench tables show the min, max and mean time in seconds for the baseline in the left three columns; then the min, max, mean time in seconds for the alternative strategy.
+
+For example this row:
+
+| Benchmark                          | Min | Max | Mean | Min (+)  | Max (+)  | Mean (+) |
+| ---------------------------------- | --- | --- | ---- | -------- | -------- | -------- |
+| 1 000 rows - fast_executemany=True | 1.0 | 1.0 | 1.0  | 0.5 (2x) | 0.5 (2x) | 0.5 (2x) |
+
+should be interpreted as: the strategy of setting fast_executemany=True resulted in a 2x speedup over the baseline when loading 1000 rows, so fast_executemany=True reduced the average time in seconds to load 1000 rows from 1.0 to 0.5, a 2x speedup.
+
+### Windows 11 (local db)
+
+**Summary results**
+
+|                       | 1000             | 10000            | 10000            | 1000000          | 3000000           |
+| --------------------- | ---------------- | ---------------- | ---------------- | ---------------- | ----------------- |
+| df.to_sql()           | 0.055            | 0.495            | 4.601            | 46.648           | 198.57            |
+| arrowsqlbcpy          | 0.106 (-1.9x)    | **0.101 (4.9x)** | **0.933 (4.9x)** | **8.864 (5.3x)** | **26.048 (7.6x)** |
+| bcpandas              | 0.156 (-3.0x)    | 0.336 (1.5x)     | 2.567 (1.8x)     | 24.627 (1.9x)    | 72.353 (2.7x)     |
+| fast_executemany=True | **0.035 (2.4x)** | 0.235 (2.3x)     | 2.246 (2.3x)     | 22.044 (2.1x)    | 65.344 (3.0x)     |
+
+**Detailed richbench results**
+
+| Benchmark                              | Min     | Max     | Mean    | Min (+)       | Max (+)       | Mean (+)      |
+| -------------------------------------- | ------- | ------- | ------- | ------------- | ------------- | ------------- |
+| 1 000 - arrowsqlbcp                    | 0.053   | 0.056   | 0.055   | 0.015 (3.6x)  | 0.198 (-3.5x) | 0.106 (-1.9x) |
+| 10 000 rows - arrowsqlbcp              | 0.489   | 0.502   | 0.495   | 0.099 (4.9x)  | 0.103 (4.9x)  | 0.101 (4.9x)  |
+| 100 000 rows - arrowsqlbcp             | 4.587   | 4.616   | 4.601   | 0.922 (5.0x)  | 0.944 (4.9x)  | 0.933 (4.9x)  |
+| 1 000 000 rows - arrowsqlbcp           | 46.558  | 46.738  | 46.648  | 8.842 (5.3x)  | 8.886 (5.3x)  | 8.864 (5.3x)  |
+| 3 000 000 rows - arrowsqlbcp           | 198.464 | 198.676 | 198.570 | 26.016 (7.6x) | 26.079 (7.6x) | 26.048 (7.6x) |
+| 1 000 - bcpandas                       | 0.051   | 0.052   | 0.052   | 0.121 (-2.4x) | 0.190 (-3.6x) | 0.156 (-3.0x) |
+| 10 000 rows - bcpandas                 | 0.499   | 0.500   | 0.500   | 0.333 (1.5x)  | 0.339 (1.5x)  | 0.336 (1.5x)  |
+| 100 000 rows - bcpandas                | 4.543   | 4.547   | 4.545   | 2.565 (1.8x)  | 2.570 (1.8x)  | 2.567 (1.8x)  |
+| 1 000 000 rows - bcpandas              | 45.298  | 46.443  | 45.871  | 24.581 (1.8x) | 24.674 (1.9x) | 24.627 (1.9x) |
+| 3 000 000 rows - bcpandas              | 197.292 | 197.699 | 197.496 | 72.301 (2.7x) | 72.405 (2.7x) | 72.353 (2.7x) |
+| 1 000 - fast_executemany=True          | 0.052   | 0.116   | 0.084   | 0.030 (1.7x)  | 0.041 (2.9x)  | 0.035 (2.4x)  |
+| 10 000 rows - fast_executemany=True    | 0.513   | 0.550   | 0.531   | 0.233 (2.2x)  | 0.237 (2.3x)  | 0.235 (2.3x)  |
+| 100 000 rows - fast_executemany=True   | 5.018   | 5.374   | 5.196   | 2.239 (2.2x)  | 2.253 (2.4x)  | 2.246 (2.3x)  |
+| 1 000 000 rows - fast_executemany=True | 45.470  | 45.582  | 45.526  | 22.036 (2.1x) | 22.051 (2.1x) | 22.044 (2.1x) |
+| 3 000 000 rows - fast_executemany=True | 194.152 | 194.523 | 194.337 | 65.153 (3.0x) | 65.534 (3.0x) | 65.344 (3.0x) |
+
+### Ubuntu (WSL2) (local db in docker container)
+
+**Summary results**
+
+|                       | 1000             | 10000            | 10000            | 1000000           | 3000000           |
+| --------------------- | ---------------- | ---------------- | ---------------- | ----------------- | ----------------- |
+| df.to_sql()           | 0.070            | 0.506            | 5.074            | 50.089            | 208.811           |
+| arrowsqlbcpy          | 0.154 (-2.2x)    | **0.120 (4.2x)** | **1.070 (4.7x)** | **10.572 (4.7x)** | **30.673 (6.8x)** |
+| bcpandas              | 0.158 (-2.4x)    | 0.438 (1.2x)     | 3.383 (1.5x)     | 32.774 (1.5x)     | 95.200 (2.2x)     |
+| fast_executemany=True | **0.059 (1.6x)** | 0.323 (1.7x)     | 3.039 (1.6x)     | 29.810 (1.7x)     | 87.419 (2.4x)     |
+
+**Detailed richbench results**
+
+| Benchmark                              | Min     | Max     | Mean    | Min (+)       | Max (+)       | Mean (+)      |
+| -------------------------------------- | ------- | ------- | ------- | ------------- | ------------- | ------------- |
+| 1 000 - arrowsqlbcp                    | 0.069   | 0.071   | 0.070   | 0.028 (2.4x)  | 0.280 (-3.9x) | 0.154 (-2.2x) |
+| 10 000 rows - arrowsqlbcp              | 0.503   | 0.510   | 0.506   | 0.115 (4.4x)  | 0.126 (4.0x)  | 0.120 (4.2x)  |
+| 100 000 rows - arrowsqlbcp             | 5.062   | 5.085   | 5.074   | 1.064 (4.8x)  | 1.076 (4.7x)  | 1.070 (4.7x)  |
+| 1 000 000 rows - arrowsqlbcp           | 49.746  | 50.433  | 50.089  | 10.566 (4.7x) | 10.578 (4.8x) | 10.572 (4.7x) |
+| 3 000 000 rows - arrowsqlbcp           | 208.669 | 208.953 | 208.811 | 30.364 (6.9x) | 30.982 (6.7x) | 30.673 (6.8x) |
+| 1 000 - bcpandas                       | 0.066   | 0.068   | 0.067   | 0.149 (-2.2x) | 0.167 (-2.5x) | 0.158 (-2.4x) |
+| 10 000 rows - bcpandas                 | 0.500   | 0.508   | 0.504   | 0.431 (1.2x)  | 0.444 (1.1x)  | 0.438 (1.2x)  |
+| 100 000 rows - bcpandas                | 5.016   | 5.028   | 5.022   | 3.369 (1.5x)  | 3.397 (1.5x)  | 3.383 (1.5x)  |
+| 1 000 000 rows - bcpandas              | 49.771  | 50.535  | 50.153  | 32.603 (1.5x) | 32.945 (1.5x) | 32.774 (1.5x) |
+| 3 000 000 rows - bcpandas              | 208.104 | 208.350 | 208.227 | 95.057 (2.2x) | 95.343 (2.2x) | 95.200 (2.2x) |
+| 1 000 - fast_executemany=True          | 0.068   | 0.116   | 0.092   | 0.049 (1.4x)  | 0.069 (1.7x)  | 0.059 (1.6x)  |
+| 10 000 rows - fast_executemany=True    | 0.514   | 0.557   | 0.535   | 0.322 (1.6x)  | 0.324 (1.7x)  | 0.323 (1.7x)  |
+| 100 000 rows - fast_executemany=True   | 4.934   | 4.961   | 4.948   | 3.023 (1.6x)  | 3.056 (1.6x)  | 3.039 (1.6x)  |
+| 1 000 000 rows - fast_executemany=True | 49.298  | 50.658  | 49.978  | 29.783 (1.7x) | 29.836 (1.7x) | 29.810 (1.7x) |
+| 3 000 000 rows - fast_executemany=True | 207.245 | 213.096 | 210.171 | 87.219 (2.4x) | 87.620 (2.4x) | 87.419 (2.4x) |
+
+Benchmarks for the typical case of a remote DB still need to be added.
+
+## Limitations
+
+`bulkcopy_from_pandas()` will establish its own database connection to load the data, reusing existing connections and transactions are not supported.
+
+Only basic MacOS testing has been done.
diff --git a/benchmarks/bench_loadperf.py b/benchmarks/bench_loadperf.py
@@ -0,0 +1,89 @@
+import pandas as pd
+from arrowsqlbcpy import bulkcopy_from_pandas
+from sqlalchemy import create_engine, text
+from sqlalchemy.engine import URL
+from bcpandas import SqlCreds, to_sql
+from functools import partial
+
+cn = r"Server=PC\SQLEXPRESS;Database=test;Trusted_Connection=True;Encrypt=false;"
+tablename = "test"
+max_chunksize = 10_000
+df = pd.read_parquet(r"C:\Users\russe\Downloads\yellow_tripdata_2024-01.parquet")
+
+connection_url = URL.create(
+    "mssql+pyodbc",
+    host=r"PC\SQLEXPRESS",
+    database="test",
+    query={
+        "driver": "SQL Server Native Client 11.0",
+        "Encrypt": "yes",
+        "TrustServerCertificate": "yes",
+    },
+)
+engine = create_engine(connection_url)
+fast_executemany_engine = create_engine(connection_url, echo=False, fast_executemany=True)
+creds = SqlCreds.from_engine(engine)
+
+# Create the table
+df.head(1).to_sql(name=tablename, con=engine, index=False, if_exists="replace")
+
+def default_to_sql(nrows=None):
+    with engine.begin() as conn:
+        conn.execute(text(f"TRUNCATE TABLE {tablename}"))    
+    local_df = df.iloc[:nrows] if nrows else df
+    with engine.begin() as conn:
+        local_df.to_sql(name=tablename, con=conn, index=False, chunksize=max_chunksize, if_exists="append")
+
+def fast_executemany__to_sql(nrows=None):
+    with fast_executemany_engine.begin() as conn:
+        conn.execute(text(f"TRUNCATE TABLE {tablename}"))    
+    local_df = df.iloc[:nrows] if nrows else df
+    with fast_executemany_engine.begin() as conn:
+        local_df.to_sql(name=tablename, con=conn, index=False, chunksize=max_chunksize, if_exists="append")
+
+def arrow_to_sql(nrows=None):
+    with engine.begin() as conn:
+        conn.execute(text(f"TRUNCATE TABLE {tablename}"))    
+    local_df = df.iloc[:nrows] if nrows else df
+    bulkcopy_from_pandas(local_df, cn, tablename, max_chunksize=max_chunksize)
+
+def bcpandas_to_sql(nrows=None):
+    with engine.begin() as conn:
+        conn.execute(text(f"TRUNCATE TABLE {tablename}"))    
+    local_df = df.iloc[:nrows] if nrows else df
+    to_sql(local_df, tablename, creds, index=False, if_exists="append", batch_size=min(max_chunksize, local_df.shape[0]))
+
+default_to_sql_1000 = partial(default_to_sql, 1_000)
+fast_executemany__to_sql_1000 = partial(fast_executemany__to_sql, 1_000)
+arrow_to_sql_1000 = partial(arrow_to_sql, 1_000)
+bcpandas_to_sql_1000 = partial(bcpandas_to_sql, 1_000)
+default_to_sql_10000 = partial(default_to_sql, 10_000)
+fast_executemany__to_sql_10000 = partial(fast_executemany__to_sql, 10_000)
+arrow_to_sql_10000 = partial(arrow_to_sql, 10_000)
+bcpandas_to_sql_10000 = partial(bcpandas_to_sql, 10_000)
+default_to_sql_100000 = partial(default_to_sql, 100_000)
+fast_executemany__to_sql_100000 = partial(fast_executemany__to_sql, 100_000)
+arrow_to_sql_100000 = partial(arrow_to_sql, 100_000)
+bcpandas_to_sql_100000 = partial(bcpandas_to_sql, 100_000)
+default_to_sql_1000000 = partial(default_to_sql, 1_000_000)
+fast_executemany__to_sql_1000000 = partial(fast_executemany__to_sql, 1_000_000)
+arrow_to_sql_1000000 = partial(arrow_to_sql, 1_000_000)
+bcpandas_to_sql_1000000 = partial(bcpandas_to_sql, 1_000_000)
+
+__benchmarks__ = [
+    (default_to_sql_1000, fast_executemany__to_sql_1000, "1e3 rows - fast_executemany=True"),
+    (default_to_sql_1000, bcpandas_to_sql_1000, "1e3 rows - bcpandas"),
+    (default_to_sql_1000, arrow_to_sql_1000, "1e3 rows - arrowsqlbcp"),
+    (default_to_sql_10000, fast_executemany__to_sql_10000, "1e4 rows - fast_executemany=True"),
+    (default_to_sql_10000, bcpandas_to_sql_10000, "1e4 rows - bcpandas"),
+    (default_to_sql_10000, arrow_to_sql_10000, "1e4 rows - arrowsqlbcp"),        
+    (default_to_sql_100000, fast_executemany__to_sql_100000, "1e5 rows - fast_executemany=True"),
+    (default_to_sql_100000, bcpandas_to_sql_100000, "1e5 rows - bcpandas"),
+    (default_to_sql_100000, arrow_to_sql_100000, "1e5 rows - arrowsqlbcp"),    
+    (default_to_sql_1000000, fast_executemany__to_sql_1000000, "1e6 rows - fast_executemany=True"),
+    (default_to_sql_1000000, bcpandas_to_sql_1000000, "1e6 rows - bcpandas"),
+    (default_to_sql_1000000, arrow_to_sql_1000000, "1e6 rows - arrowsqlbcp"),        
+    (default_to_sql, fast_executemany__to_sql, "3e6 rows - fast_executemany=True"),
+    (default_to_sql, bcpandas_to_sql, "3e6 rows - bcpandas"),
+    (default_to_sql, arrow_to_sql, "3e6 rows - arrowsqlbcp")            
+]
diff --git a/performance.png b/performance.png
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,7 +1,7 @@
 [project]
 name = "arrowsqlbcpy"
 license = {text = "MIT"}
-description = "Fast bcp from pandas to SQL Server using .Net SqlBulkCopy"
+description = "A tiny library that uses .Net SqlBulkCopy to enable fast data loading to Microsoft SQL Server. Apache Arrow is used to serialise data between Python and the native DLL. .Net Native Library AOT compilation is used to generate the native DLL."
 readme = "README.md"
 keywords = ["bcp", "sql", "pandas"]
 authors = [
@@ -13,6 +13,22 @@ dependencies = [
     "pyarrow>=19.0.0",
 ]
 dynamic = ["version"]
+classifiers = [
+  "Development Status :: 4 - Beta",
+  "Intended Audience :: Developers",
+  "Programming Language :: Python :: 3.9",
+  "Programming Language :: Python :: 3.10",
+  "Programming Language :: Python :: 3.11",
+  "Programming Language :: Python :: 3.12",
+  "Programming Language :: Python :: 3.13",
+  "Programming Language :: C#",
+  "Programming Language :: Python :: Implementation :: CPython",
+  "Topic :: Database",
+  "Topic :: Database :: Database Engines/Servers"
+]
+
+[project.urls]
+Repository = "https://github.com/RusselWebber/arrowsqlbcpy.git"
 
 [build-system]
 requires = ["setuptools>=61", "wheel"]
@@ -33,9 +49,11 @@ addopts = [
 
 [dependency-groups]
 dev = [
+    "bcpandas>=2.6.5",
     "pymssql>=2.3.2",
     "pyodbc>=5.2.0",
     "pytest>=8.3.4",
+    "richbench>=1.0.3",
     "ruff>=0.9.3",
     "sqlalchemy>=2.0.37",
     "wheel>=0.45.1",

diff --git a/src/arrowsqlbcpy/__init__.py b/src/arrowsqlbcpy/__init__.py
@@ -14,11 +14,11 @@
     sqllibname = "Microsoft.Data.SqlClient.SNI.dll"
 elif is_mac:
     libname = "ArrowSqlBulkCopyNet.dylib"
-    sqllibname = None    
+    sqllibname = None
 else:
     libname = "ArrowSqlBulkCopyNet.so"
     sqllibname = None
-    
+
 func_name = "write"
 error_size = 1000