Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
289 commits
Select commit Hold shift + click to select a range
5d60fd7
[SPARK-56672][SQL] Rename TableViewCatalog.listRelationSummaries to l…
gengliangwang Apr 30, 2026
c251011
[SPARK-56684][K8S] Expose `KubernetesClusterSchedulerBackend.kubernet…
dongjoon-hyun Apr 30, 2026
4b3f8c3
[SPARK-56568][SQL] Add id() to DSv2 Column to detect drop-and-re-add …
longvu-db Apr 30, 2026
174a230
[SPARK-56639][SQL] frozen path semantics
srielau May 1, 2026
294f6c1
[SPARK-56689][K8S] Improve `ExecutorResizePlugin` to reuse `Kubernete…
dongjoon-hyun May 1, 2026
f4b320e
[SPARK-56669][SQL] Implement group filtering for WriteDelta row level…
aokolnychyi May 1, 2026
8e37824
[SPARK-56693][K8S] Support built-in K8s `ExecutorPVCResizePlugin`
dongjoon-hyun May 1, 2026
586ac79
[SPARK-56594][SQL] Add time_bucket scalar function
vranes May 1, 2026
520835a
[SPARK-56598][SQL] Custom metrics support for TruncatableTable
ZiyaZa May 1, 2026
3a685b1
[SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DES…
cloud-fan May 1, 2026
87dd933
[SPARK-56699][K8S] Default `ExecutorPVCResizePlugin` interval to `5mi…
dongjoon-hyun May 1, 2026
9967250
[SPARK-56694][SQL] Fix `DynamicPruningSubquery` canonicalization for …
peter-toth May 1, 2026
3416213
[SPARK-56702][K8S] Restrict `ExecutorPVCResizePlugin` to `direct` pod…
dongjoon-hyun May 1, 2026
8457567
[SPARK-56677][SQL] Propagate filter conditions through `Join` nodes i…
peter-toth May 1, 2026
e6ff38e
[SPARK-56686][SQL] Support streaming row-level CDC post-processing
gengliangwang May 1, 2026
6bfe0ef
[SPARK-56633][SQL][TESTS] Add comprehensive Parquet vectorized-reader…
LuciferYang May 1, 2026
2df302d
[SPARK-56607][PYTHON][FOLLOWUP] Use pyspark.sql.DataFrame to support …
gaogaotiantian May 1, 2026
0eb4fc1
[SPARK-56687][SQL] Support netChanges for DSv2 CDC streaming reads
gengliangwang May 2, 2026
096afb5
[SPARK-56686][FOLLOWUP][SQL] Mark CDC streaming rewrite via attribute…
gengliangwang May 2, 2026
b54dad7
[SPARK-56688][INFRA][PYTHON] Reorganize pyspark tests to empty a CI slot
gaogaotiantian May 4, 2026
a528186
[SPARK-56696][BUILD] Exclude old `junit:junit` from the dependency tr…
sarutak May 4, 2026
d57f1de
[SPARK-56675][BUILD] Make sbt publishLocal work
sarutak May 4, 2026
2afd35f
[SPARK-56711][SQL] Restrict CDC `_commit_version` column to LongType …
gengliangwang May 4, 2026
c119d8d
[SPARK-55810][UI] Fix missing spacing between table and pagination co…
XdithyX May 4, 2026
359e444
[SPARK-56570][SQL][FOLLOWUP] `PlanMerger` correctness fix and code cl…
peter-toth May 4, 2026
928b3e1
[SPARK-56712][SQL][DOCS] Document pushdown contract for CDC Changelog…
gengliangwang May 4, 2026
b8d43ff
[SPARK-56022][SQL] Preserve SparkThrowable error classes in parmap/aw…
linhongliu-db May 5, 2026
805fca6
[SPARK-52857] Add is_valid_variant expression
chenhao-db May 5, 2026
4e3e96f
[SPARK-55794][FOLLOWUP][SQL] Wrap outer star expansion results in Ali…
mihailotim-db May 5, 2026
718e0e7
[SPARK-55658][PYTHON] SparkSessionBuilder.create in PySpark classic s…
jonmio May 5, 2026
a579aa9
[SPARK-56588][CORE] Fix `PathOutputCommitProtocol` dynamic partition …
peter-toth May 5, 2026
8485a91
[SPARK-56714][SQL] Remove `__metadata_col` metadata from aggregated a…
mihailoale-db May 5, 2026
3d3df32
[MINOR][DOCS] Clarify when to ask the user before running tests in AG…
cloud-fan May 6, 2026
f35fa1f
[SPARK-56647][SQL] Optimize storage of SparkSQL Last Attempt Metrics
juliuszsompolski May 6, 2026
13e690b
[SPARK-56700][SS] Make DataStreamReader.name public
ericm-db May 6, 2026
322a687
[SPARK-56736][K8S] Add `sparkVersion` method to `KubernetesConf` abst…
dongjoon-hyun May 6, 2026
fe506a7
[SPARK-36082][SQL] Restrict null-aware anti joins to broadcastable ri…
jayanth86 May 6, 2026
04a2e7a
[SPARK-56737][SQL] Avoid binary-incompatible mode method on `Broadcas…
viirya May 6, 2026
4857ab1
[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs
chenwang-databricks May 7, 2026
d37b376
[SPARK-56454][DOCS][FOLLOWUP] Document supported SRIDs in geospatial …
pratham76 May 7, 2026
7ac9c56
[SPARK-56663][SQL] Restore fast path for date_trunc MINUTE/HOUR/DAY
Licht-T May 7, 2026
cebd28b
[SPARK-56762][PYTHON][TESTS] Speed up test_crossvalidator_on_pipeline
gaogaotiantian May 7, 2026
3225ce9
[SPARK-56482][SQL][4.2] Enable whole-stage codegen fusion for `UnionE…
LuciferYang May 7, 2026
bd39316
[SPARK-56395][SQL] Add NEAREST BY top-K ranking join (catalyst-side)
zhidongqu-db May 7, 2026
4476f13
[SPARK-56546][SQL] Block-chunked segment-tree window frame for non-in…
yaooqinn May 8, 2026
d2acbe9
[SPARK-56770][DOCS] Update PROJ version to 9.8.1 in the documentation…
uros-db May 8, 2026
aa564a5
[SPARK-56482][SQL][FOLLOWUP] Simplify UnionExec codegen and narrow pa…
cloud-fan May 8, 2026
496b82b
[SPARK-56768][PYTHON][INFRA] Share SBT compile artifact across pyspar…
zhengruifeng May 8, 2026
c7d903e
[SPARK-56765][INFRA] Fix mypy attr-defined errors with PyArrow 24+
sarutak May 8, 2026
fa9039b
[SPARK-56682][GEO][CONNECT][PYTHON][SQL] Extend the ST_AsBinary funct…
uros-db May 8, 2026
707012b
[SPARK-56771][SQL] Enable geospatial support by default
uros-db May 8, 2026
d0917d6
[SPARK-56792][SQL] Support pan and zoom for SQL plan visualization
yaooqinn May 9, 2026
8348141
[SPARK-56755][SQL] Fix SHOW CREATE TABLE for v2 table partitioned by …
pan3793 May 9, 2026
3a7f360
[SPARK-53327][SQL] Workaround datasketches-memory Java 25 support
pan3793 May 11, 2026
a3209b0
[MINOR][INFRA][4.2] Fix `branch` input default in build_and_test.yml
zhengruifeng May 11, 2026
9cc4a77
[SPARK-56586][CONNECT][TESTS] Bound and retry the flaky python foreac…
LuciferYang May 11, 2026
78c9adf
[SPARK-56814][SQL][TESTS] Add lateral join tests for outer attribute …
mihailotim-db May 11, 2026
36cc5f6
[SPARK-56815][SPARK-55852][DOCS] Document Java 25 support
pan3793 May 11, 2026
f34a014
[SPARK-55897][SQL] Handle UserDefinedType in ColumnarRow, ColumnarBat…
james-willis May 11, 2026
a985468
[SPARK-56551][SQL][FOLLOW-UP] Fix setting `numDeletedRows` metric as -1
ZiyaZa May 11, 2026
23d271e
[SPARK-56798][SQL][DOCS] Clarify streaming CDC emission timing and ne…
gengliangwang May 11, 2026
9f65c53
[SPARK-55978][SQL] Add TABLESAMPLE SYSTEM block sampling with DSv2 pu…
stanyao May 12, 2026
49af916
[SPARK-56812][INFRA] Fix URL of get-pip.py in dev/infra/Dockerfile fo…
sarutak May 12, 2026
8dffe02
[SPARK-56831][INFRA][R] Share SBT precompile artifact with sparkr CI job
zhengruifeng May 12, 2026
52d3c2e
[SPARK-56793][K8S] Avoid cluster-wide LIST in executor pods polling
TongWei1105 May 12, 2026
85b593f
[SPARK-56833][TESTS] Add `-XX:+EnableDynamicAgentLoading` to test JVM…
dongjoon-hyun May 12, 2026
defde6b
[SPARK-56799][SQL] Search and highlight nodes in SQL plan visualization
yaooqinn May 12, 2026
ff7b7b6
[SPARK-56815][DOCS][FOLLOWUP] Add Java 25 to `building-spark.md`
dongjoon-hyun May 12, 2026
52f2673
[SPARK-56834][K8S] Use Java `25-jre` instead of `21-jre` image in K8s…
dongjoon-hyun May 12, 2026
716f3d2
[SPARK-56835][K8S][DOCS][INFRA] Upgrade Volcano to 1.14.2
dongjoon-hyun May 12, 2026
f619d4d
[SPARK-56813][DOCS] Refine the documentation for geospatial types and…
uros-db May 13, 2026
0c75fa8
[SPARK-49671][SQL] Remove the RTRIM collation config
uros-db May 13, 2026
6aea601
[SPARK-56546][SQL][FOLLOWUP] Address review comments in segment-tree …
cloud-fan May 13, 2026
fecd8c3
[SPARK-56832][INFRA] Surface fatal javadoc errors in unidoc log summa…
cloud-fan May 13, 2026
1c99423
[SPARK-54735][SQL][REVERT] Revert column comments preservation in vie…
miland-db May 13, 2026
4ecc831
[SPARK-56811][UI] Restore sub-execution grouping on the SQL tab listing
yaooqinn May 13, 2026
c70c91b
[SPARK-56843][BUILD] Update `checkJavaVersion` to ban 25.0.[0-2] prop…
dongjoon-hyun May 13, 2026
b5d5e41
[SPARK-56680][SQL] DSv2 INSERT and Insert-Only MERGE Metrics
ZiyaZa May 13, 2026
58ee7e9
[SPARK-56844][SQL] Support ArrayType / MapType / StructType in Consta…
mzhang May 14, 2026
2c356ac
[SPARK-56395][CONNECT][PYTHON] Add NEAREST BY DataFrame API
dilipbiswal May 14, 2026
75580b2
[SPARK-56395][SQL][FOLLOWUP] Fix missing comma in MimaExcludes on bra…
cloud-fan May 14, 2026
37d5798
[SPARK-56840][SQL] Avoid unresolved NullIf type lookup
sunchao May 14, 2026
edebd63
[SPARK-56817][BUILD][4.2] Upgrade Netty to 4.2.13.Final
LuciferYang May 14, 2026
17f9b9d
[SPARK-56756][SQL] Add error class for recursiveFileLookup conflict w…
markj-db May 14, 2026
e5e8678
[SPARK-56152][SQL] Enable implicit cast from STRING to TIME type
uros-db May 14, 2026
2910e7f
[SPARK-56840][SQL][FOLLOW-UP] Add a real NullIf repro test
sunchao May 15, 2026
595ecd5
[SPARK-56809][UI] Show SQL description and metadata on the execution …
yaooqinn May 15, 2026
601ac5a
[SPARK-56001][SQL][FOLLOWUP] Reject table alias for INSERT ... REPLAC…
cloud-fan May 15, 2026
9105985
[SPARK-56881][CORE][TESTS] Improve `FsHistoryProviderSuite` to be mor…
dongjoon-hyun May 15, 2026
bb97b9a
[SPARK-56879][CORE][TESTS] Improve `JavaUtilsSuite` to be more robust
dongjoon-hyun May 15, 2026
5850019
[SPARK-56888][PYTHON][DOCS] Simplify `get_caller_source_code_location…
dongjoon-hyun May 16, 2026
8340dbb
[SPARK-56885][CORE][TESTS] Make `CompressionCodecSuite` be independen…
dongjoon-hyun May 16, 2026
bc87577
[SPARK-56886][CORE][TESTS] Improve `UtilsSuite` to be more robust
dongjoon-hyun May 16, 2026
dff2e58
[SPARK-56872][SQL][4.2] Fix NPE in DowncastLongUpdater.decodeSingleDi…
LuciferYang May 16, 2026
89cf745
[SPARK-56448][CONNECT] Fix NPE on Spark Connect client restart due to…
yadavay-amzn May 16, 2026
405a6aa
[SPARK-56904][SQL] Fix Int overflow in LongToUnsafeRowMap page size c…
viirya May 17, 2026
4bdb5e4
[SPARK-56864][INFRA][PYTHON][4.2] Consolidate python-ps-minimum image…
zhengruifeng May 18, 2026
e42a8ee
[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs
srielau May 19, 2026
ce67547
[SPARK-50610][SQL] Fix decimal precision in HiveInspectors.toInspector
shrirangmhalgi May 19, 2026
d0503dd
[SPARK-56862][SQL] Preserve SQL UDF call-site origin in input Cast fo…
mikhailnik-db May 19, 2026
6f70be6
[SPARK-56873][CORE] Fix potential race condition in bounded k-way mer…
ivoson May 19, 2026
637953a
[SPARK-33902][SQL][FOLLOWUP] Add createTableLike delegation to Delega…
sarutak May 19, 2026
bd8872a
[SPARK-56938][CONNECT][TESTS] Initialize base session in `AddArtifact…
sarutak May 19, 2026
dc0edc1
[SPARK-56943][INFRA] Share SBT precompile artifact with JVM build matrix
zhengruifeng May 20, 2026
1c26164
[SPARK-56934][4.2][INFRA] Make build_infra_images_cache workflow erro…
zhengruifeng May 20, 2026
33bfb52
[SPARK-54119][SQL][FOLLOWUP] Format METRIC_VIEW DESCRIBE output disti…
cloud-fan May 20, 2026
e81309c
[SPARK-56931][SQL] Support complex constant metadata in row materiali…
mzhang May 20, 2026
0dd24fa
[SPARK-46625][SQL] Place IDENTIFIER placeholder in command name slot
cloud-fan May 20, 2026
9e27d31
[SPARK-56550][SQL][4.2] Support source with fewer columns/fields in I…
szehon-ho May 20, 2026
236f040
[SPARK-56643][SQL][TESTS] Add DSv2 temp view with stored plan tests
longvu-db May 20, 2026
4728743
[SPARK-56676][SQL][DML] DSv2 Transactional Streaming Writes need to V…
andreaschat-db May 20, 2026
e04b2be
[SPARK-56955][CONNECT] Replace -Dio.netty.noUnsafe=false with --sun-m…
attilapiros May 20, 2026
699f10b
[SPARK-56743][SQL] Use SQLLastAttemptMetric for DSv2 UPDATE/DELETE/ME…
juliuszsompolski May 20, 2026
c473089
[SPARK-56838][SDP] Introduce AutoCDC parameters dataclass
AnishMahto May 20, 2026
3125935
[SPARK-56572][SDP] Inject Spark session into Python files
anew May 20, 2026
6a5cedc
[SPARK-56933][SQL] Cache SQL metrics in MergeRowsExec interpreted ite…
szehon-ho May 21, 2026
22f018b
[SPARK-55250][SQL][FOLLOWUP] Skip createNamespace for IF NOT EXISTS o…
cloud-fan May 21, 2026
b6bd005
[SPARK-56920][SQL][FOLLOWUP] Add CreateMetricView logical plan and pr…
cloud-fan May 21, 2026
0253f02
[SPARK-56650][SDP][CONNECT] Add AutoCDC Spark Connect APIs
anew May 21, 2026
f070aae
[SPARK-56856][SDP] Implement SCD1 Batch Processor; Microbatch Dedupli…
AnishMahto May 21, 2026
d2797e2
[SPARK-56977][SQL] RewriteNearestByJoin should respect joinType in th…
zhidongqu-db May 21, 2026
b15d2db
[SPARK-56964][INFRA] Share Maven precompile artifact across maven_tes…
zhengruifeng May 22, 2026
1ddac97
[SPARK-56681][SQL] PATH cleanup
srielau May 4, 2026
0d76818
[SPARK-56750] default path config
srielau May 12, 2026
15777cb
[SPARK-56853] Improve PATH Tests
srielau May 15, 2026
8968023
[SPARK-56939][SQL] Resolve deadlock between USE and function lookup
srielau May 20, 2026
7ee145d
[SPARK-46625][SQL][FOLLOWUP] Resolve identifier expression in InsertI…
haoyangeng-db May 22, 2026
5dacfe3
[SPARK-56654][SQL] Reject unpaired UTF-16 surrogates in Variant JSON …
NJAHNAVI2907 May 22, 2026
c259da2
[SPARK-48091][SQL] Preserve aliases inside lambda when ExtractGenerat…
shrirangmhalgi May 22, 2026
c7ccc7d
[SPARK-56870][SDP] Implement SCD1 Batch Processor; Extend Microbatch …
AnishMahto May 22, 2026
cb88e1c
[SPARK-56998] Add SECURITY.md + AGENTS.md Security section for scan-a…
potiuk May 22, 2026
e235cf6
[SPARK-56882][SDP] Implement SCD1 Batch Processor; Target Column Proj…
AnishMahto May 22, 2026
8680857
[SPARK-56249][SDP] Implement SCD1 Batch Processor; Merge Tombstones o…
AnishMahto May 22, 2026
edfd030
[SPARK-56719][SS] Add DataStreamWriter.name() API for sink evolution
ericm-db May 23, 2026
df8f7c0
[SPARK-56961][SQL] Pass all options while loading changelog
aokolnychyi May 23, 2026
8067dbb
[SPARK-55846][DOCS] Update Web UI documentation for UI modernization
yaooqinn May 24, 2026
cb28819
[SPARK-56953][SDP] Implement SCD1 Batch Processor; foreachBatch Callback
AnishMahto May 25, 2026
44818a7
[SPARK-56697][SQL][DML] Refactor Catalog Manager
andreaschat-db May 25, 2026
fec42df
[SPARK-55978][SQL][FOLLOWUP] Don't block V2 join pushdown when pushed…
cloud-fan May 25, 2026
539cbe8
[SPARK-55601][SS][FOLLOWUP] Cache offsetLog.getLatest() to avoid redu…
cloud-fan May 26, 2026
6a91b1d
[SPARK-56921][SQL] Fix CTE ID normalization for nested CTEs
puneetdixit200 May 26, 2026
45ac51d
[SPARK-56643][CONNECT][TEST] Add DSv2 temp view with stored plan test…
longvu-db May 26, 2026
4aafef1
[SPARK-56956][SDP] Introduce AutoCDC Flow Dataclasses
AnishMahto May 26, 2026
278ce4b
[SPARK-56651][CONNECT][SDP] Add Python APIs for Auto CDC SCD Type 1
AnishMahto May 26, 2026
57a77e5
[SPARK-57041][SQL] Fix deadlock between waitForSubqueries and lazy va…
jiwen624 May 26, 2026
c98470a
[SPARK-57040][SQL] JDBC connector supports pushdown TABLESAMPLE SYSTEM
pan3793 May 26, 2026
dfdf66e
[SPARK-57037][CORE][TESTS] Force GC before allocating large array in …
gengliangwang May 26, 2026
47ca28f
[SPARK-56619][SQL][TEST] Add DSv2 repeated table access tests with in…
longvu-db May 27, 2026
d103710
[SPARK-57070][PYTHON][DOC] Add `time_(from|to)_*` functions to Python…
zhengruifeng May 27, 2026
b69d0e6
[SPARK-57072][PYTHON][DOC] Add missing 4.2 methods to PySpark API ref…
zhengruifeng May 27, 2026
f423462
[SPARK-57088][SQL] Allow non-deterministic ranking expression for EXA…
zhidongqu-db May 27, 2026
5d76c00
[SPARK-57066][SECURITY] Use constant-time comparison for authenticati…
sarutak May 27, 2026
165c6c4
[SPARK-54918][SQL] Normalize floating numbers in array set operations
asugranyes May 27, 2026
d1a157d
[SPARK-57083][SQL] Preserve geography SRID across encoders, Parquet r…
uros-db May 27, 2026
637265a
[SPARK-56984][DOC] Document the SQL PATH feature
srielau May 27, 2026
6a8ee7a
[SPARK-56957][SDP] AutoCDC Flow Execution; Introduce and Integrate SC…
AnishMahto May 27, 2026
9da4192
[SPARK-57080][SDP] Register AutoCDC Flows from `PipelinesHandler`
AnishMahto May 27, 2026
0d43764
[SPARK-57035][DOCS] Always target /docs/latest/ in DocSearch index
gengliangwang May 27, 2026
ee64680
[SPARK-56975][SS] Reject user-specified schema in DataStreamReader.ta…
PorridgeSwim May 21, 2026
9f90cd4
[SPARK-57098][4.2][UI] Worker UI JSON endpoint redaction
peter-toth May 27, 2026
e7c40b7
[SPARK-56991][SQL] Add helper method for deriving Changelog Deduplica…
aokolnychyi May 27, 2026
a415bd9
[SPARK-56120][PYTHON][TEST][FOLLOWUP] Make _WindowAggArrowBenchMixin …
viirya May 28, 2026
db9a293
[SPARK-57116][SQL][PYTHON][DOC] Fix versionadded/@since for kll_merge…
zhengruifeng May 28, 2026
4ceb4f5
[SPARK-55934][PYTHON][TEST][FOLLOWUP] Fix MAP_ARROW_ITER bench UDF re…
viirya May 28, 2026
f9a3a9b
[SPARK-55754][PYTHON][TEST][FOLLOWUP] Fix pure_ints type mismatch in …
viirya May 28, 2026
afb6253
[SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s in…
zhengruifeng May 28, 2026
f1f1be9
[SPARK-56426][SQL] Fix SQL failure when LATERAL VIEW column alias con…
jiwen624 May 28, 2026
a3ceb14
[SPARK-57005][SQL] Fix None.get in RewritePredicateSubquery when subq…
yadavay-amzn May 28, 2026
1ec1a31
[SPARK-54022][SPARK-56617][SQL][TEST] Add DSv2 CACHE TABLE impact on …
longvu-db May 28, 2026
8336339
[SPARK-57081][TESTS] Fix percent-encoding issue in SparkTestUtils cla…
sarutak May 28, 2026
25ffaf7
[SPARK-57113][SDP] Prevent AutoCDC keys from changing across SDP runs
AnishMahto May 28, 2026
9964d3c
[SPARK-53890][SDP] Test (and fix) read/readstream options are respect…
AnishMahto May 28, 2026
7c54f73
[SPARK-55724][PYTHON][TEST][FOLLOWUP] Unify wide_values to wide_cols …
viirya May 28, 2026
2f411c1
[SPARK-56132][SS] Call pruneColumns on V2 streaming to fix metadata r…
zikangh May 28, 2026
24a4229
[SPARK-57137][PYTHON][TEST] Share base mixin across Arrow/Pandas sibl…
viirya May 28, 2026
5ec90f0
[SPARK-57115][INFRA] Add Java 17 workflow file
gaogaotiantian May 29, 2026
49db669
[SPARK-57075][INFRA] Share precompile Coursier cache with host-runner…
zhengruifeng May 29, 2026
3172427
[SPARK-57138][PYTHON][TEST] Share base mixin across Window and Cogrou…
viirya May 29, 2026
055d99e
[SPARK-57054][SQL] Make view collation sticky across ALTER SCHEMA DEF…
ilicmarkodb May 29, 2026
bda4bd2
[SPARK-57139][SQL] Skip deriving PartitionPredicate for partially pus…
szehon-ho May 29, 2026
3e40e71
Revert "[SPARK-56975][SS] Reject user-specified schema in DataStreamR…
May 30, 2026
425bfab
[SPARK-57109][SQL] SYSTEM.SESSION should not be part of SYSTEM_PATH
srielau May 31, 2026
cfcf173
[SPARK-53454][SQL] Handle AlwaysTrue/AlwaysFalse in JDBCSQLBuilder
shrirangmhalgi May 31, 2026
421ae2a
[SPARK-52812][SQL] Make Spark Connect Catalog.createTable eager
rishav23 May 26, 2026
c107747
[SPARK-52812][CONNECT] Preserve spark.sql.sources.default for eager c…
haoyangeng-db May 31, 2026
3e2be31
[SPARK-56618][SQL][TEST] Add DSv2 join refresh tests for incrementall…
longvu-db May 31, 2026
19ba15f
[SPARK-56917][TEST][CONNECT] Expand Connect-specific tests for DataFr…
zhengruifeng May 31, 2026
0d68d8a
[SPARK-56462][SQL] Fix MERGE UPDATE */INSERT * schema evolution failu…
jiwen624 May 31, 2026
b75044e
[SPARK-55326][PYTHON][CONNECT][FOLLOWUP] Skip cleanup RPCs in _on_exi…
cloud-fan Jun 1, 2026
098cbbd
[SPARK-57185][SQL] Use thread-local ICU collators to fix lock content…
dejankrak-db Jun 1, 2026
f56d2bf
[SPARK-56956][SPARK-56651][CONNECT][SDP][FOLLOWUP] Address review com…
cloud-fan Jun 1, 2026
0129584
[SPARK-56032][SQL][FOLLOWUP] Skip FilterExec subexpression eliminatio…
cloud-fan Jun 2, 2026
6be4fe5
[SPARK-57058][SQL] Introduce BinaryView and migrate geo types to it f…
cloud-fan Jun 2, 2026
079eda6
[SPARK-57187][SQL] Fix INTERNAL_ERROR when current_user() is used as …
dejankrak-db Jun 2, 2026
5ecea32
[SPARK-57188][SQL] Parameterless function takes precedence over UDF p…
dejankrak-db Jun 2, 2026
1a1cc2e
[SPARK-57200][SQL] Fix JVM Codegen Bug - NULL for 3-arg form with col…
rgyhuang Jun 2, 2026
4a330ce
[SPARK-53454][SQL][FOLLOWUP] Parenthesize AlwaysTrue/AlwaysFalse SQL …
cloud-fan Jun 2, 2026
adf9799
[SPARK-57186][SQL] Handle NullType in ExtractValue to return NULL ins…
dejankrak-db Jun 2, 2026
3eace3d
[SPARK-56032][SQL][FOLLOWUP] Add conf to gate subexpression eliminati…
cloud-fan Jun 2, 2026
4ba170b
[SPARK-57218][PYTHON] Pin pandas and pandas stub version for lint image
gaogaotiantian Jun 3, 2026
71a6adf
[SPARK-57239][PYTHON][TEST] Use sqlite uri for mlflow model
gaogaotiantian Jun 3, 2026
105ad9a
[SPARK-57183][4.2][SS] Close LRUCache on RocksDB.close() in unbounded…
kete1987 Jun 3, 2026
f5d54ed
[SPARK-57196][SQL] Make UnionExec whole-stage codegen thread-safe
gengliangwang Jun 3, 2026
5ced709
[SPARK-57113][SDP][FOLLOWUP] Cleanup AutoCDC Flow code
szehon-ho Jun 4, 2026
ff5ea1b
[SPARK-51262][SQL] Fix exceptAll after dropDuplicates with subset
shrirangmhalgi Jun 4, 2026
fc52c9a
[SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job
zhengruifeng Jun 4, 2026
daa6f59
[SPARK-57191][YARN] Fix driver hang when MonitorThread encounters une…
shrirangmhalgi Jun 4, 2026
778f09c
[SPARK-57144][INFRA] Unify Coursier cache to a single key across all …
zhengruifeng Jun 4, 2026
48e2c8d
[SPARK-49798][DOC] Fix inaccurate documentation of RuntimeConfig.get
brijrajk Jun 4, 2026
da095f9
[SPARK-56663][SQL][FOLLOWUP] Fix silent overflow in date_trunc fast p…
chenhao-db Jun 4, 2026
06b6a2a
[SPARK-56695][SQL][DML] Remove Path Based Table support in SQL
andreaschat-db Jun 4, 2026
ab3ec61
[SPARK-57262][SQL][WEBUI] Job description derived from a query should…
sarutak Jun 6, 2026
090930a
Revert "[SPARK-57262][SQL][WEBUI] Job description derived from a quer…
dongjoon-hyun Jun 6, 2026
49b438d
[SPARK-57297][SQL][TESTS] Add a test that SQL execution description r…
dongjoon-hyun Jun 7, 2026
0f99c26
[SPARK-57262][SQL][WEBUI] Job description derived from a query should…
sarutak Jun 8, 2026
06fad36
[SPARK-57278][INFRA] Install zstd in CI container images to fix GitHu…
zhengruifeng Jun 8, 2026
5afc5a3
[SPARK-57277][INFRA] Make CI cache keys OS-specific
zhengruifeng Jun 8, 2026
ae62b5b
[SPARK-56830][INFRA] Share SBT compile artifact with python hosted ru…
zhengruifeng Jun 8, 2026
9640153
[SPARK-56995][SQL][DML] Allow dataframe caching in the DSv2 Transacti…
andreaschat-db Jun 8, 2026
3e4ea3b
[SPARK-57254][INFRA] Put CI-unrelated files in a module so CI won't b…
gaogaotiantian Jun 8, 2026
47782ed
Preparing Spark release v4.2.0-rc1
Jun 8, 2026
367f4a1
Preparing development version 4.2.1-SNAPSHOT
Jun 8, 2026
5e1ce65
[SPARK-57298][SQL] collect_set fails to dedupe float/double NaN/-0.0 …
jiwen624 Jun 9, 2026
9952e45
[SPARK-57287][SQL] Escape backslash in LIKE pattern for STARTS_WITH/E…
shrirangmhalgi Jun 9, 2026
c5259ac
[SPARK-57330][INFRA] Switch shared CI compile artifacts to zstd compr…
zhengruifeng Jun 9, 2026
20f27e8
[SPARK-54876][SQL] Fix splitSemiColon dropping statement ending with …
yadavay-amzn Jun 9, 2026
f5b8305
[SPARK-57281][SQL][SS] Remove @Experimental annotation from Real-time…
jerrypeng Jun 9, 2026
6a9bd14
[MINOR][PYTHON][DOC] Fix broken See Also links in pyspark.sql.functions
zhengruifeng Jun 10, 2026
3a64906
[SPARK-57348][PYTHON][TESTS] Replace sql_keywords doctest show() with…
sarutak Jun 10, 2026
dce33ae
[SPARK-57344][INFRA] Ensure tests for `pipelines` module triggered wh…
LuciferYang Jun 10, 2026
eca0447
[SPARK-57325][CONNECT] Stop streaming queries registered while the Co…
dbtsai Jun 10, 2026
80c7f6b
[SPARK-57355][PYTHON] Fix __module__ check in udf profiler
gaogaotiantian Jun 10, 2026
06d5602
[SPARK-56995][SQL][DML][TESTS][FOLLOWUP] Fix AutoCdcScd1FullRefreshSu…
sarutak Jun 10, 2026
4c70294
[SPARK-57332][SQL] Fix MySQL backslash escaping in LIKE predicate pus…
cloud-fan Jun 10, 2026
0681ae7
[SPARK-57313][SQL] Fix SampleExec numOutputRows metric when whole-sta…
jiwen624 Jun 11, 2026
e8eae5c
Merge tag 'v4.2.0-rc1' into feature/NGSOK-1703
giggsoff Jun 11, 2026
c6f3012
[SPARK-57332][SQL][FOLLOWUP] Fix line length exceeding 100 characters…
sarutak Jun 11, 2026
18ee2f0
[SPARK-57383][SQL][PYTHON] Honor configured Arrow zstd compression le…
viirya Jun 11, 2026
6213179
[SPARK-57073][SS][PYTHON][TEST] Catch AnalysisException for test_pari…
gaogaotiantian Jun 11, 2026
2b62e1e
[SPARK-57393] Build: PySpark and SparkR source distributions are miss…
huaxingao Jun 11, 2026
8e1deb8
Preparing Spark release v4.2.0-rc2
Jun 12, 2026
7d9a2e9
Preparing development version 4.2.1-SNAPSHOT
Jun 12, 2026
9a6bd1e
[SPARK-57388][INFRA] Pin downstream actions/checkout to a single reso…
zhengruifeng Jun 12, 2026
89906d7
[SPARK-57397][PYTHON] Raise an exception when parsed UDT is the wrong…
gaogaotiantian Jun 12, 2026
560dc9d
Preparing Spark release v4.2.0-rc3
Jun 12, 2026
b7ebbdb
CI fix/update
giggsoff Jun 11, 2026
9b0507c
chore: bump version to 4.2.0.1-4.3.0-1
giggsoff Jun 11, 2026
0f2f06f
Merge tag 'v4.2.0-rc2' into feature/NGSOK-1703
giggsoff Jun 13, 2026
415d891
Merge tag 'v4.2.0-rc3' into feature/NGSOK-1703
giggsoff Jun 13, 2026
88e07cc
NGSOK-1703 Reduce RocksDBStateStoreIntegrationSuite bounded memory test
giggsoff Jun 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
57 changes: 30 additions & 27 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,10 @@ jobs:
include:
- name: "core / utils / tags"
slug: "core-utils-tags"
modules: ":spark-core_2.13,:spark-launcher_2.13,:spark-network-common_2.13,:spark-network-shuffle_2.13,:spark-network-yarn_2.13,:spark-unsafe_2.13,:spark-kvstore_2.13,:spark-tags_2.13,:spark-sketch_2.13,:spark-common-utils_2.13"
- name: "graphx / examples / repl"
slug: "graphx-examples-repl"
modules: ":spark-graphx_2.13,:spark-examples_2.13,:spark-repl_2.13"
- name: "catalyst / sql-api / hive-thriftserver"
slug: "catalyst-sql-api-hive-thriftserver"
modules: ":spark-sql-api_2.13,:spark-catalyst_2.13,:spark-hive-thriftserver_2.13"
modules: ":spark-core_2.13,:spark-launcher_2.13,:spark-network-common_2.13,:spark-network-shuffle_2.13,:spark-network-yarn_2.13,:spark-unsafe_2.13,:spark-kvstore_2.13,:spark-tags_2.13,:spark-sketch_2.13,:spark-common-utils_2.13,:spark-common-utils-java_2.13,:spark-udf-worker-core_2.13"
- name: "catalyst / sql-api / hive-thriftserver / pipelines / graphx / examples / repl"
slug: "catalyst-graphx"
modules: ":spark-sql-api_2.13,:spark-catalyst_2.13,:spark-hive-thriftserver_2.13,:spark-pipelines_2.13,:spark-graphx_2.13,:spark-examples_2.13,:spark-repl_2.13"
- name: "sql - extended tests"
slug: "sql"
modules: ":spark-sql_2.13"
Expand All @@ -52,10 +49,19 @@ jobs:
- name: "hive"
slug: "hive"
modules: ":spark-hive_2.13"
- name: "streaming / mllib / yarn / k8s / connect / protobuf / kafka / avro"
slug: "streaming-mllib-yarn-k8s-connect-protobuf-kafka-avro"
modules: ":spark-streaming_2.13,:spark-sql-kafka-0-10_2.13,:spark-streaming-kafka-0-10_2.13,:spark-token-provider-kafka-0-10_2.13,:spark-mllib-local_2.13,:spark-mllib_2.13,:spark-yarn_2.13,:spark-kubernetes_2.13,:spark-hadoop-cloud_2.13,:spark-connect_2.13,:spark-connect-common_2.13,:spark-connect-client-jvm_2.13,:spark-protobuf_2.13,:spark-avro_2.13,:spark-assembly_2.13"
- name: "mllib"
slug: "mllib"
modules: ":spark-mllib-local_2.13,:spark-mllib_2.13"
- name: "connect / protobuf"
slug: "connect-protobuf"
modules: ":spark-connect_2.13,:spark-connect-common_2.13,:spark-connect-client-jvm_2.13,:spark-connect-client-jdbc_2.13,:spark-protobuf_2.13"
extra: -Dtest.exclude.tags=org.apache.spark.tags.AmmoniteTest
- name: "streaming / kafka / avro"
slug: "streaming-kafka-avro"
modules: ":spark-streaming_2.13,:spark-sql-kafka-0-10_2.13,:spark-streaming-kafka-0-10_2.13,:spark-token-provider-kafka-0-10_2.13,:spark-avro_2.13"
- name: "yarn / k8s / hadoop-cloud / assembly"
slug: "yarn-k8s-hadoop-cloud-assembly"
modules: ":spark-yarn_2.13,:spark-kubernetes_2.13,:spark-hadoop-cloud_2.13,:spark-assembly_2.13"
steps:
- uses: actions/checkout@v6

Expand All @@ -77,8 +83,8 @@ jobs:
run: |
python3 -m pip install --upgrade pip
python3 -m pip install 'numpy>=1.20.0' 'pyarrow' 'pandas' 'scipy' \
'unittest-xml-reporting' 'grpcio==1.56.0' 'protobuf==4.25.3' \
'grpcio-status==1.56.0' 'googleapis-common-protos==1.56.4' \
'unittest-xml-reporting' 'grpcio==1.76.0' 'protobuf==6.33.5' \
'grpcio-status==1.76.0' 'googleapis-common-protos==1.71.0' \
'zstandard==0.25.0'

- name: Build dependent modules (compile main+tests, install incl. test-jars)
Expand Down Expand Up @@ -149,23 +155,19 @@ jobs:
matrix:
include:
- name: sql
modules: pyspark-sql,pyspark-resource,pyspark-testing
- name: core
modules: pyspark-core,pyspark-streaming
modules: pyspark-sql,pyspark-resource,pyspark-testing,pyspark-core,pyspark-errors,pyspark-logger
- name: ml
modules: pyspark-mllib,pyspark-ml
modules: pyspark-mllib,pyspark-ml,pyspark-ml-connect,pyspark-pipelines
- name: streaming
modules: pyspark-streaming,pyspark-structured-streaming,pyspark-structured-streaming-connect
- name: connect
modules: pyspark-connect
- name: pandas
modules: pyspark-pandas
- name: pandas-slow
modules: pyspark-pandas-slow
- name: connect
modules: pyspark-connect
- name: pandas-connect
modules: pyspark-pandas-connect
- name: pandas-slow-connect
modules: pyspark-pandas-slow-connect
- name: errors
modules: pyspark-errors
- name: pandas-connect-and-slow
modules: pyspark-pandas-connect,pyspark-pandas-slow-connect
env:
MODULES_TO_TEST: ${{ matrix.modules }}
PYTHON_TO_TEST: python3.10
Expand All @@ -192,11 +194,12 @@ jobs:
'numpy==1.26.4' 'pyarrow==18.0.0' 'pandas==2.2.0' 'scipy' \
'unittest-xml-reporting' 'coverage' \
'memory-profiler' 'plotly<6' 'matplotlib' \
'grpcio==1.56.0' 'grpcio-status==1.56.0' \
'protobuf==4.25.3' 'googleapis-common-protos==1.56.4' \
'grpcio==1.76.0' 'grpcio-status==1.76.0' \
'protobuf==6.33.5' 'googleapis-common-protos==1.71.0' \
'graphviz>=0.20' 'openpyxl' \
'scikit-learn==1.1.*' 'mlflow==3.12.0' \
'torch==2.0.1' 'torchvision==0.15.2' 'torcheval'
'torch==2.5.1' 'torchvision==0.20.1' 'torcheval' \
'zstandard==0.25.0'

- name: Build Spark (full reactor including assembly)
env:
Expand Down
8 changes: 7 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Avoid introducing non-ASCII characters in code or comments. String literals may

## Build and Test

Build and tests can take a long time. Before running tests, ask the user if they have more changes to make.
Build and tests can take a long time. If the user explicitly asked to run tests, run them. Otherwise (you are running tests on your own to verify a change), first ask the user if they have more changes to make.

Prefer SBT over Maven for faster incremental compilation. Module names are defined in `project/SparkBuild.scala`.

Expand Down Expand Up @@ -128,3 +128,9 @@ DO NOT push to the upstream repo. Always push to the personal fork. Open PRs aga
DO NOT force push or use `--amend` on pushed commits unless the user explicitly asks. If the remote branch has new commits, fetch and rebase before pushing.

Always get user approval before external operations such as pushing commits, creating PRs, or posting comments. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI.

## Security

Security model: [SECURITY.md](./SECURITY.md)

Agents that scan this repository should consult `SECURITY.md` for the project's threat model, in-scope / out-of-scope declarations, and known non-findings before reporting issues.
4 changes: 4 additions & 0 deletions R/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,7 @@ pkg/man
pkg/html
SparkR.Rcheck/
SparkR_*.tar.gz
# Transient copies made by dev/make-distribution.sh while building the package.
pkg/LICENSE
pkg/NOTICE
DESCRIPTION.orig
2 changes: 1 addition & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: SparkR
Type: Package
Version: 4.2.0.1-4.3.0-0
Version: 4.2.0.1-4.3.0-1
Title: R Front End for 'Apache Spark'
Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>.
Authors@R:
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ This README file only contains basic setup instructions.
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.14.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.14.yml) |
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.14_nogil.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.14_nogil.yml) |
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_minimum.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_minimum.yml) |
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_ps_minimum.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_ps_minimum.yml) |
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_connect40.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_connect40.yml) |
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_connect.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_connect.yml) |
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_sparkr_window.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_sparkr_window.yml) |
Expand Down
13 changes: 13 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Security policy

## Reporting a vulnerability

To report a security vulnerability in Apache Spark, follow the ASF security process at <https://www.apache.org/security/>. Reports can be sent privately to security@apache.org or to private@spark.apache.org.

## Threat model

Apache Spark's security guidance for operators — covering authentication, encryption, access control, Kerberos integration, trust boundaries, environmental assumptions, security properties provided and not provided, and downstream operator responsibilities — is documented at:

<https://spark.apache.org/docs/latest/security.html>

Automated security scanning agents should consult that document for the project's in-scope / out-of-scope declarations before reporting issues.
2 changes: 1 addition & 1 deletion assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.13</artifactId>
<version>4.2.0.1-4.3.0-0</version>
<version>4.2.0.1-4.3.0-1</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion common/kvstore/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.13</artifactId>
<version>4.2.0.1-4.3.0-0</version>
<version>4.2.0.1-4.3.0-1</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion common/network-common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.13</artifactId>
<version>4.2.0.1-4.3.0-0</version>
<version>4.2.0.1-4.3.0-1</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.*;
import static org.junit.jupiter.api.Assumptions.assumeFalse;

public class JavaUtilsSuite {

Expand Down Expand Up @@ -52,6 +53,10 @@ public void testCreateDirectory() throws IOException {
// 4. The parent directory cannot write
assertTrue(testDir.canWrite());
assertTrue(testDir.setWritable(false));
// Skip when setWritable(false) has no effect (e.g. running as root,
// or on a filesystem that ignores POSIX write bits).
assumeFalse(testDir.canWrite(),
"setWritable(false) had no effect; skipping write-denied scenario");
assertThrows(IOException.class,
() -> JavaUtils.createDirectory(testDirPath, "scenario4"));
assertTrue(testDir.setWritable(true));
Expand Down
2 changes: 1 addition & 1 deletion common/network-shuffle/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.13</artifactId>
<version>4.2.0.1-4.3.0-0</version>
<version>4.2.0.1-4.3.0-1</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion common/network-yarn/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.13</artifactId>
<version>4.2.0.1-4.3.0-0</version>
<version>4.2.0.1-4.3.0-1</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion common/sketch/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.13</artifactId>
<version>4.2.0.1-4.3.0-0</version>
<version>4.2.0.1-4.3.0-1</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion common/tags/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.13</artifactId>
<version>4.2.0.1-4.3.0-0</version>
<version>4.2.0.1-4.3.0-1</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion common/unsafe/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.13</artifactId>
<version>4.2.0.1-4.3.0-0</version>
<version>4.2.0.1-4.3.0-1</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ public record CollationMeta(
public static class Collation {
public final String collationName;
public final String provider;
private final Collator collator;
private final ThreadLocal<Collator> threadLocalCollator;
public final Comparator<UTF8String> comparator;

/**
Expand Down Expand Up @@ -187,7 +187,7 @@ public static class Collation {
public Collation(
String collationName,
String provider,
Collator collator,
ThreadLocal<Collator> threadLocalCollator,
Comparator<UTF8String> comparator,
String version,
Function<UTF8String, byte[]> sortKeyFunction,
Expand All @@ -197,7 +197,7 @@ public Collation(
boolean supportsSpaceTrimming) {
this.collationName = collationName;
this.provider = provider;
this.collator = collator;
this.threadLocalCollator = threadLocalCollator;
this.comparator = comparator;
this.version = version;
this.sortKeyFunction = sortKeyFunction;
Expand All @@ -216,7 +216,7 @@ public Collation(
}

public Collator getCollator() {
return collator;
return threadLocalCollator != null ? threadLocalCollator.get() : null;
}

/**
Expand Down Expand Up @@ -1016,29 +1016,40 @@ protected Collation buildCollation() {
builder.setUnicodeLocaleKeyword("ks", "level1");
}
ULocale resultLocale = builder.build();
Collator collator = Collator.getInstance(resultLocale);
// Freeze ICU collator to ensure thread safety.
collator.freeze();

// Use thread-local Collator instances to avoid lock contention.
// A frozen RuleBasedCollator serializes all threads through a ReentrantLock on its
// internal collation buffer (used by getCollationKey/compare). By creating independent
// per-thread instances via Collator.getInstance(), each thread operates on its own
// buffer without locking. Each instance is frozen as a mutation guard so that any
// accidental call to setStrength() or similar throws immediately.
ThreadLocal<Collator> threadLocalCollator = ThreadLocal.withInitial(
() -> {
Collator collator = Collator.getInstance(resultLocale);
collator.freeze();
return collator;
});

Comparator<UTF8String> comparator;
Function<UTF8String, byte[]> sortKeyFunction;

if (spaceTrimming == SpaceTrimming.NONE) {
comparator = (s1, s2) ->
collator.compare(s1.toValidString(), s2.toValidString());
sortKeyFunction = s -> collator.getCollationKey(s.toValidString()).toByteArray();
threadLocalCollator.get().compare(s1.toValidString(), s2.toValidString());
sortKeyFunction = s ->
threadLocalCollator.get().getCollationKey(s.toValidString()).toByteArray();
} else {
comparator = (s1, s2) -> collator.compare(
comparator = (s1, s2) -> threadLocalCollator.get().compare(
applyTrimmingPolicy(s1, spaceTrimming).toValidString(),
applyTrimmingPolicy(s2, spaceTrimming).toValidString());
sortKeyFunction = s -> collator.getCollationKey(
sortKeyFunction = s -> threadLocalCollator.get().getCollationKey(
applyTrimmingPolicy(s, spaceTrimming).toValidString()).toByteArray();
}

return new Collation(
normalizedCollationName(),
PROVIDER_ICU,
collator,
threadLocalCollator,
comparator,
ICU_VERSION,
sortKeyFunction,
Expand Down
Loading
Loading