Skip to content

Add hooks into velox core to initialize cudf-exchange components.#2

Open
dan13bauer wants to merge 2 commits into
root_fragfrom
exchange_hooks
Open

Add hooks into velox core to initialize cudf-exchange components.#2
dan13bauer wants to merge 2 commits into
root_fragfrom
exchange_hooks

Conversation

@dan13bauer
Copy link
Copy Markdown
Owner

No description provided.

@zoltan
Copy link
Copy Markdown
Collaborator

zoltan commented Sep 11, 2025

@majetideepak this is the hook PR. we know it doesn't work in the current form. I think the best solution would be to make callback registration possible somehow for Task, so it knows what needs to get initialization/teardown.

@majetideepak
Copy link
Copy Markdown

majetideepak commented Sep 12, 2025

@zoltan Is it accurate that the CudfOutputQueueManager is the GPU counterpart for the OutputBufferManager on the CPU?
One approach is to make the OutputBufferManager an interface and extend CudfOutputQueueManager from it. We can then register multiple OutputBufferManagers via a registration as you mentioned and initialize all of them in the Task.

@zoltan
Copy link
Copy Markdown
Collaborator

zoltan commented Sep 12, 2025

yes, it's the GPU counterpart. the thing is, for almost everything, there is a GPU counterpart.

I think we should define a common interface instead and let both implement it and let it register multiples of those interface callbacks. this would sound more extensible to me.

do the velox folks frown upon multiple inheritance and generic designs like this? :)

lga-zurich referenced this pull request in lga-zurich/velox-exchange Sep 19, 2025
lga-zurich referenced this pull request in lga-zurich/velox-exchange Sep 19, 2025
@GregoryKimball GregoryKimball removed this from libcudf Oct 17, 2025
dan13bauer pushed a commit that referenced this pull request Feb 2, 2026
Summary:
Fixes OSS Asan segV due to calling 'as->' on a nullptr.

```
=================================================================
==4058438==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000a563a4 bp 0x7ffd54ee5bc0 sp 0x7ffd54ee5aa0 T0)
==4058438==The signal is caused by a READ memory access.
==4058438==Hint: address points to the zero page.
    #0 0x000000a563a4 in facebook::velox::FlatVector<int>* facebook::velox::BaseVector::as<facebook::velox::FlatVector<int>>() /velox/./velox/vector/BaseVector.h:116:12
    #1 0x000000a563a4 in facebook::velox::test::(anonymous namespace)::FlatMapVectorTest_encodedKeys_Test::TestBody() /velox/velox/vector/tests/FlatMapVectorTest.cpp:156:5
    #2 0x70874f90ce0b  (/lib64/libgtest.so.1.11.0+0x4fe0b) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #3 0x70874f8ed825 in testing::Test::Run() (/lib64/libgtest.so.1.11.0+0x30825) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #4 0x70874f8ed9ef in testing::TestInfo::Run() (/lib64/libgtest.so.1.11.0+0x309ef) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #5 0x70874f8edaf8 in testing::TestSuite::Run() (/lib64/libgtest.so.1.11.0+0x30af8) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #6 0x70874f8fcfc4 in testing::internal::UnitTestImpl::RunAllTests() (/lib64/libgtest.so.1.11.0+0x3ffc4) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #7 0x70874f8fa7c7 in testing::UnitTest::Run() (/lib64/libgtest.so.1.11.0+0x3d7c7) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #8 0x70877c073153 in main (/lib64/libgtest_main.so.1.11.0+0x1153) (BuildId: c3a576d37d6cfc6875afdc98684c143107a226a0)
    #9 0x70874f48460f in __libc_start_call_main (/lib64/libc.so.6+0x2a60f) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    #10 0x70874f4846bf in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a6bf) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    #11 0x00000044c1b4 in _start (/velox/_build/debug/velox/vector/tests/velox_vector_test+0x44c1b4) (BuildId: 6da0b0d1074134be8f4d4534e5dbac9eeb9d482b)
```

Reviewed By: peterenescu

Differential Revision: D91275269

fbshipit-source-id: 0806aa7562dc8cf4ad708fc6a8e4b29409507745
dan13bauer pushed a commit that referenced this pull request Feb 2, 2026
Summary:
Pull Request resolved: facebookincubator#16102

Fixes Asan error in S3Util.cpp, See stack trace below:

```
==4125762==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000006114ff at pc 0x70aa17bc0120 bp 0x7ffe905f3030 sp 0x7ffe905f3028
READ of size 1 at 0x0000006114ff thread T0
    #0 0x70aa17bc011f in facebook::velox::filesystems::parseAWSStandardRegionName[abi:cxx11](std::basic_string_view<char, std::char_traits<char>>) /velox/velox/connectors/hive/storage_adapters/s3fs/S3Util.cpp:160:16
    #1 0x00000055790b in facebook::velox::filesystems::S3UtilTest_parseAWSRegion_Test::TestBody() /velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:147:3
    #2 0x70aa2e89be0b  (/lib64/libgtest.so.1.11.0+0x4fe0b) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #3 0x70aa2e87c825 in testing::Test::Run() (/lib64/libgtest.so.1.11.0+0x30825) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #4 0x70aa2e87c9ef in testing::TestInfo::Run() (/lib64/libgtest.so.1.11.0+0x309ef) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #5 0x70aa2e87caf8 in testing::TestSuite::Run() (/lib64/libgtest.so.1.11.0+0x30af8) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #6 0x70aa2e88bfc4 in testing::internal::UnitTestImpl::RunAllTests() (/lib64/libgtest.so.1.11.0+0x3ffc4) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #7 0x70aa2e8897c7 in testing::UnitTest::Run() (/lib64/libgtest.so.1.11.0+0x3d7c7) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #8 0x70aa2e8ba153 in main (/lib64/libgtest_main.so.1.11.0+0x1153) (BuildId: c3a576d37d6cfc6875afdc98684c143107a226a0)
    #9 0x70aa01ceb60f in __libc_start_call_main (/lib64/libc.so.6+0x2a60f) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    #10 0x70aa01ceb6bf in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a6bf) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    #11 0x000000408684 in _start (/velox/_build/debug/velox/connectors/hive/storage_adapters/s3fs/tests/velox_s3file_test+0x408684) (BuildId: bbf3099c9a66a548c6da234b17ad1b631e9ed649)

0x0000006114ff is located 33 bytes before global variable '.str.135' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:126' (0x000000611520) of size 46
  '.str.135' is ascii string 'isHostExcludedFromProxy(hostname, pair.first)'
0x0000006114ff is located 1 bytes before global variable '.str.133' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:122' (0x000000611500) of size 1
  '.str.133' is ascii string ''
0x0000006114ff is located 42 bytes after global variable '.str.132' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:121' (0x0000006114c0) of size 21
  '.str.132' is ascii string 'localhost,foobar.com'
AddressSanitizer: global-buffer-overflow /velox/velox/connectors/hive/storage_adapters/s3fs/S3Util.cpp:160:16 in facebook::velox::filesystems::parseAWSStandardRegionName[abi:cxx11](std::basic_string_view<char, std::char_traits<char>>)
Shadow bytes around the buggy address:
```

Reviewed By: pedroerp

Differential Revision: D91278230

fbshipit-source-id: 05283bc8408069fa3f5ab8a7840b2bd0835fa7d6
dan13bauer pushed a commit that referenced this pull request Mar 6, 2026
The DataAndMetadata struct has a stream field that is used in onData()
to create the PackedTableWithStream, but the stream was never stored
after being obtained from the pool. This meant onData() would use an
uninitialized stream view.

Set ptr->stream = stream immediately after obtaining the stream from
the global pool, before the allocation try/catch.

Review: @wence- comment #2
dan13bauer pushed a commit that referenced this pull request Mar 17, 2026
… mode (facebookincubator#16401)

Summary:
Pull Request resolved: facebookincubator#16401

This diff fixes data races detected by ThreadSanitizer (TSAN) in the barrier processing code under multi-threaded execution mode.

**Race condition #1**: Between `Driver::startBarrier()` and `Driver::hasBarrier()`
- Write: `startBarrier()` setting `barrier_` state
- Read: `hasBarrier()` (via `isDraining()`) checking barrier state
- These accesses happen concurrently from different driver threads.

**Race condition #2**: Between `Driver::dropInput()` and `Driver::shouldDropOutput()`
- Write: `dropInput()` modifying `barrier_.dropInputOpId` (called from a different driver's thread via `Task::dropInputLocked()`)
- Read: `shouldDropOutput()` reading `barrier_.dropInputOpId` (called from this driver's own thread)

**Fix approach:**
1. Added atomic flag `hasBarrier_` to track whether barrier processing is active, with `memory_order_acquire` on reads and `memory_order_release` on writes.
2. Changed `dropInputOpId` from `std::optional<int32_t>` to `std::atomic_int32_t` with sentinel value `kNoDropInput = -1` for thread-safe cross-driver access.
3. Added `BarrierState::reset()` method to cleanly reset barrier state.
4. Note that `barrier_` state is only meaningful when `hasBarrier_` is true.
5. Added `waitForAllTasksToBeDeleted()` in `barrierAfterNoMoreSplits` and `MergeJoinTest.barrier` tests to ensure all driver threads complete before test iterations end.

The acquire-release memory ordering ensures proper synchronization: any thread that reads `hasBarrier_` as `true` is guaranteed to see the fully initialized `barrier_` state.

Reviewed By: kunigami, srsuryadev

Differential Revision: D93355327

fbshipit-source-id: 5d7d3c636bef62f58daaa036089f41ea01572d3d
dan13bauer pushed a commit that referenced this pull request Mar 23, 2026
…sh (facebookincubator#16830)

Summary:
Pull Request resolved: facebookincubator#16830

## Root Cause Analysis

The production crash was a SIGSEGV at page-aligned address
`0x7fa369c00000` in `PatternStringIterator::charAt()` during
`LikeGeneric::apply()`. The root cause is that `StringView` is a
non-owning pointer — when the backing memory (likely memory-mapped file
pages from a scan operator) was reclaimed/unmapped under memory pressure
(1.96GB peak spill), the pointer became dangling.

This is a known class of bugs in Velox — the DWRF FlatMap writer had an
identical issue (`TestFlatMapDanglingStringViewKeyOnRehash`), where
`StringView` keys in an F14 map dangled after the input vector's buffer
was freed.

**Why the buffer can be freed during `apply()`:**
- Memory arbitration can be triggered by any pool allocation (e.g.,
  `context.ensureWritable()`)
- The arbitrator reclaims from OTHER operators/tasks by spilling them
- The current operator is protected by `NonReclaimableSectionGuard`, but
  operators that produced the input vectors are NOT
- When those operators' memory is reclaimed, the mmap'd pages backing
  string data can be munmap'd
- `DecodedVector` stores raw pointers (NOT shared_ptr) to the base
  vector's data, so it doesn't prevent reclamation

## Fix

Copy the pattern string to a local `std::string` before passing it to
`determinePatternKind()`, eliminating the dependency on the original
buffer's lifetime. The performance cost is minimal: one heap allocation
per row in the already-slow non-constant pattern path.

## Tests

- `likeGenericWithLongPatterns`: Exercises the full `LikeGeneric::apply`
  code path with patterns >12 bytes (non-inline StringView) across all
  optimized pattern kinds (fixed, prefix, suffix, substring, generic).

- `likePatternCopyProtectsAgainstDanglingPointer`: Uses `mmap`/`munmap`
  to precisely reproduce the production crash scenario where
  memory-mapped pages are unmapped. Verifies the defensive copy protects
  against the dangling pointer. Death test guarded with
  `#ifndef RE2_BUILDING_WITH_SAN` following the established Velox pattern
  from `ThreadDebugInfoDeathTest`.

```
W0317 10:45:17.338729   962 [MemoryCheckerTh] PeriodicMemoryChecker.cpp:171] System used memory 98.92GB exceeded limit: 98.00GB
I0317 10:45:17.338799   962 [MemoryCheckerTh] AsyncDataCache.cpp:883] Try to shrink cache to free up 8.92GB  memory
I0317 10:45:18.426632   962 [MemoryCheckerTh] AsyncDataCache.cpp:912] Freed 8.92GB cache memory, spent 1.09s
AsyncDataCache:
Cache size: 47.83GB tinySize: 104.37MB large size: 47.73GB
Cache entries: 631619 read pins: 9 write pins: 0 pinned shared: 3.76MB pinned exclusive: 0B
 num write wait: 2214463 empty entries: 24640747
Cache access miss: 1015380044 hit: 689290647 hit bytes: 297.70TB eviction: 1014733085 savable eviction: 41203463 eviction checks: 210822364904 aged out: 15340 stales: 0
Prefetch entries: 66813 bytes: 299.18MB
Alloc Megaclocks 187071714
Allocated pages: 17191373 cached pages: 12511075
Backing: Memory Allocator[MMAP total capacity 79.00GB free capacity 13.42GB allocated pages 17191373 mapped pages 18370872 external mapped pages 4642968
[size 1: 80775(315MB) allocated 126234 mapped]
[size 2: 81815(639MB) allocated 168151 mapped]
[size 4: 53466(835MB) allocated 85150 mapped]
[size 8: 25931(810MB) allocated 41224 mapped]
[size 16: 23254(1453MB) allocated 31144 mapped]
[size 32: 15357(1919MB) allocated 19535 mapped]
[size 64: 28495(7123MB) allocated 34663 mapped]
[size 128: 10600(5300MB) allocated 10600 mapped]
[size 256: 30620(30620MB) allocated 30845 mapped]
]
SSD: Ssd cache IO: Write 18.22TB read 4.24TB Size 1.44TB Occupied 1.37TB 3745K entries (max 9765K).
GroupStats: <dummy FileGroupStats>
I0317 10:45:18.555488   962 [MemoryCheckerTh] PeriodicMemoryChecker.cpp:228] Memory pushback shrunk 8.92GB Effective bytes shrunk: 9.01GB
I0317 10:45:18.669054  1527 BcAdaptiveTokenManager.cpp:976] BcAdaptiveTokenManager[RX]: AIMD adjustment - UNDERUTILIZED (1.624 GB/s -> 1.65 GB/s)
I0317 10:45:19.350598  1457 BcAdaptiveTokenManager.cpp:974] BcAdaptiveTokenManager[TX]: AIMD adjustment - UNDERUTILIZED (2.619 GB/s -> 2.619 GB/s)
E0317 10:45:23.854575 843003 [ExchangeCPU3305] PrestoExchangeSource.cpp:550] Abort results failed: proxygen::HTTPException: ingress timeout, streamID=123, timeout=60000ms, path /v1/task/20260317_174241_33921_32fmu.2.0.156.0/results/79
E0317 10:45:23.855425 843003 [ExchangeCPU3305] PrestoExchangeSource.cpp:550] Abort results failed: proxygen::HTTPException: ingress timeout, streamID=219, timeout=60000ms, path /v1/task/20260317_174241_33921_32fmu.3.0.156.0/results/79
E0317 10:45:27.578008 843003 [ExchangeCPU3305] PrestoExchangeSource.cpp:550] Abort results failed: proxygen::HTTPException: ingress timeout, streamID=2569, timeout=60000ms, path /v1/task/20260317_174245_33922_32fmu.2.0.29.0/results/7
I0317 10:45:28.113592   954 [clean_old_tasks] TaskManager.cpp:1003] cleanOldTasks: Cleaned 66 old task(s) in 0 ms
E0317 10:45:28.130775   954 [clean_old_tasks] TaskManager.cpp:313] There are 1 zombie Task that satisfy cleanup conditions but could not be cleaned up, because the Task are referenced by more than 1 owners. RUNNING[0] FINISHED[0] CANCELED[0] ABORTED[1] FAILED[0]  Sample task IDs (shows only 20 IDs):
E0317 10:45:28.130795   954 [clean_old_tasks] TaskManager.cpp:323] Zombie Task [1/1]: Extra Refs: 1, 20260317_172201_33407_32fmu.1.0.16.0
I0317 10:45:32.531476   951 [report_spill_st] PeriodicStatsReporter.cpp:264] Spill memory usage: current[0B] peak[2.24GB]
E0317 10:45:36.031675 843003 [ExchangeCPU3305] PrestoExchangeSource.cpp:550] Abort results failed: proxygen::HTTPException: ingress timeout, streamID=295, timeout=60000ms, path /v1/task/20260317_174249_33924_32fmu.4.0.15.0/results/30
E0317 10:45:36.049696 843003 [ExchangeCPU3305] PrestoExchangeSource.cpp:550] Abort results failed: proxygen::HTTPException: ingress timeout, streamID=179, timeout=60000ms, path /v1/task/20260317_174249_33924_32fmu.3.0.15.0/results/30
E0317 10:45:36.054131 843003 [ExchangeCPU3305] PrestoExchangeSource.cpp:550] Abort results failed: proxygen::HTTPException: ingress timeout, streamID=59, timeout=60000ms, path /v1/task/20260317_174249_33924_32fmu.2.0.15.0/results/30
E0317 10:45:36.055419 843003 [ExchangeCPU3305] PrestoExchangeSource.cpp:550] Abort results failed: proxygen::HTTPException: ingress timeout, streamID=207, timeout=60000ms, path /v1/task/20260317_174249_33924_32fmu.5.0.15.0/results/30
E0317 10:45:36.084476 843003 [ExchangeCPU3305] PrestoExchangeSource.cpp:550] Abort results failed: proxygen::HTTPException: ingress timeout, streamID=123, timeout=60000ms, path /v1/task/20260317_174249_33924_32fmu.9.0.15.0/results/30
I0317 10:45:36.100535 842798 [HTTPSrvCpu24857] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.20.0.12.0
E0317 10:45:36.100556 842798 [HTTPSrvCpu24857] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.100762 842710 [HTTPSrvCpu24854] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.24.0.94.0
E0317 10:45:36.100787 842710 [HTTPSrvCpu24854] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.100847 842786 [HTTPSrvCpu24855] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.21.0.12.0
E0317 10:45:36.100879 842786 [HTTPSrvCpu24855] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.101078 842800 [HTTPSrvCpu24857] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.23.0.12.0
E0317 10:45:36.101096 842800 [HTTPSrvCpu24857] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.101325 842816 [HTTPSrvCpu24858] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.22.0.12.0
E0317 10:45:36.101351 842816 [HTTPSrvCpu24858] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.101361 842408 [HTTPSrvCpu24850] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.25.0.81.0
E0317 10:45:36.101398 842408 [HTTPSrvCpu24850] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.101961 843007 [HTTPSrvCpu24859] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.27.0.12.0
E0317 10:45:36.101987 843007 [HTTPSrvCpu24859] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.102123 842791 [HTTPSrvCpu24856] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.29.0.94.0
E0317 10:45:36.102156 842791 [HTTPSrvCpu24856] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.102209 843009 [HTTPSrvCpu24859] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.28.0.12.0
E0317 10:45:36.102236 843009 [HTTPSrvCpu24859] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.107154 843015 [HTTPSrvCpu24860] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.16.0.1.0
E0317 10:45:36.107189 843015 [HTTPSrvCpu24860] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.107321 843014 [HTTPSrvCpu24859] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.18.0.12.0
E0317 10:45:36.107344 843014 [HTTPSrvCpu24859] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.108820 842799 [HTTPSrvCpu24857] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.8.0.12.0
E0317 10:45:36.108840 842799 [HTTPSrvCpu24857] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.109087 843036 [HTTPSrvCpu24862] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.10.0.84.0
E0317 10:45:36.109143 843036 [HTTPSrvCpu24862] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.109302 842807 [HTTPSrvCpu24858] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.9.0.94.0
E0317 10:45:36.109320 842807 [HTTPSrvCpu24858] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.109992 843029 [HTTPSrvCpu24861] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.12.0.12.0
E0317 10:45:36.110028 843029 [HTTPSrvCpu24861] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.110075 842709 [HTTPSrvCpu24854] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.13.0.12.0
E0317 10:45:36.110100 842709 [HTTPSrvCpu24854] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.110234 843032 [HTTPSrvCpu24861] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.14.0.94.0
E0317 10:45:36.110255 843032 [HTTPSrvCpu24861] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.271823 843028 [HTTPSrvCpu24861] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.15.0.84.0
I0317 10:45:36.271857 843022 [HTTPSrvCpu24860] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.7.0.12.0
I0317 10:45:36.271853 843027 [HTTPSrvCpu24861] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.2.0.12.0
E0317 10:45:36.271932 843027 [HTTPSrvCpu24861] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
E0317 10:45:36.272019 843022 [HTTPSrvCpu24860] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.271862 843033 [HTTPSrvCpu24861] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.1.0.12.0
I0317 10:45:36.271858 842793 [HTTPSrvCpu24856] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.5.0.12.0
E0317 10:45:36.272382 843033 [HTTPSrvCpu24861] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.271858 843018 [HTTPSrvCpu24860] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.3.0.12.0
E0317 10:45:36.272401 842793 [HTTPSrvCpu24856] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
E0317 10:45:36.271872 843028 [HTTPSrvCpu24861] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
E0317 10:45:36.272423 843018 [HTTPSrvCpu24860] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:36.271854 843019 [HTTPSrvCpu24860] TaskManager.cpp:877] Deleting task 20260317_174453_33962_32fmu.6.0.12.0
E0317 10:45:36.272475 843019 [HTTPSrvCpu24860] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:38.938845   964 [Announcement] PeriodicServiceInventoryManager.cpp:130] Announcement succeeded: HTTP 202. State: active.
I0317 10:45:41.508436 842050 [HTTPSrvCpu24849] TaskManager.cpp:877] Deleting task 20260317_174452_33961_32fmu.3.0.93.0
E0317 10:45:41.508466 842050 [HTTPSrvCpu24849] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
I0317 10:45:41.510092 843020 [HTTPSrvCpu24860] TaskManager.cpp:877] Deleting task 20260317_174452_33961_32fmu.1.0.109.0
E0317 10:45:41.510118 843020 [HTTPSrvCpu24860] Exceptions.h:53] Line: fbcode/velox/exec/Task.cpp:2468, Function:terminate, Expression:  Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE
 *** Aborted at 1773769541 (Unix time, try 'date -d 1773769541') ***
 *** Signal 11 (SIGSEGV) (0x7facb7e00000) received by PID 113 (pthread TID 0x7faeb1fff000) (linux TID 838941) (code: address not mapped to object), stack trace: ***
    @ 0000000017d3365a folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*) [clone .__uniq.302291754384189453301783370447166124111]
                       ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:552
    @ 0000000017d335c7 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*) [clone .__uniq.302291754384189453301783370447166124111] [clone .llvm.3532004345868697328]
                       ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:573
    @ 000000000004455f (unknown)
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:8
                       -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c
    @ 0000000047329fde facebook::velox::functions::determinePatternKind(std::basic_string_view<char, std::char_traits<char> >, std::optional<char>)
                       ./fbcode/velox/functions/lib/Re2Functions.cpp:1988
    @ 0000000047329d4f facebook::velox::functions::(anonymous namespace)::LikeGeneric::apply(facebook::velox::SelectivityVector const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&, std::shared_ptr<facebook::velox::Type const> const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) const::{lambda(facebook::velox::StringView const&, facebook::velox::StringView const&, std::optional<char> const&)#2}::operator()(facebook::velox::StringView const&, facebook::velox::StringView const&, std::optional<char> const&) const [clone .__uniq.96824254187906811421847846720006205206]
                       ./fbcode/velox/functions/lib/Re2Functions.cpp:947
    @ 0000000015cbed20 _ZNK8facebook5velox17SelectivityVector15applyToSelectedIZNS0_4exec7EvalCtx22applyToSelectedNoThrowIZNKS0_9functions12_GLOBAL__N_111LikeGeneric5applyERKS1_RSt6vectorISt10shared_ptrINS0_10BaseVectorEESaISE_EERKSC_IKNS0_4TypeEERS4_RSE_EUlT_E_ZNS4_22applyToSelectedNoThrowISQ_EEvSA_SP_EUlSP_E_EEvSA_SP_T0_EUlSP_E_EEvSP_.__uniq.96824254187906811421847846720006205206
                       ./fbcode/velox/functions/lib/Re2Functions.cpp:1010
    @ 0000000047102acd facebook::velox::functions::(anonymous namespace)::LikeGeneric::apply(facebook::velox::SelectivityVector const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&, std::shared_ptr<facebook::velox::Type const> const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) const [clone .__uniq.96824254187906811421847846720006205206]
                       fbcode/velox/expression/EvalCtx.h:299
    @ 0000000046c466b0 facebook::velox::exec::Expr::applyFunction(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
                       ./fbcode/velox/expression/Expr.cpp:1604
    @ 0000000046c457fa facebook::velox::exec::Expr::evalWithNulls(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
                       ./fbcode/velox/expression/Expr.cpp:1519
    @ 0000000046c8e7d0 facebook::velox::exec::ConjunctExpr::evalSpecialForm(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
                       fbcode/velox/expression/Expr.cpp:1149
    @ 0000000046c4572d facebook::velox::exec::Expr::evalWithNulls(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
                       ./fbcode/velox/expression/Expr.cpp:1646
    @ 0000000046c41399 facebook::velox::exec::Expr::eval(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::exec::ExprSet const*)
                       ./fbcode/velox/expression/Expr.cpp:1149
    @ 0000000046c40015 facebook::velox::exec::ExprSet::eval(int, int, bool, facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
                       ./fbcode/velox/expression/Expr.cpp:2064
    @ 0000000046dda7e2 facebook::velox::exec::FilterProject::getOutput()
                       ./fbcode/velox/exec/FilterProject.cpp:282
    @ 0000000046c5f19d facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)
                       ./fbcode/velox/exec/Driver.cpp:486
    @ 0000000046cfed97 facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>)
                       ./fbcode/velox/exec/Driver.cpp:802
    @ 0000000046cfdb98 folly::CPUThreadPoolExecutor::threadRun(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)
                       fbcode/velox/exec/Driver.cpp:281
    @ 00000000162fd84d void std::__invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&>(std::__invoke_memfun_deref, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&)
                       fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/invoke.h:74
    @ 00000000000df5b4 execute_native_thread_routine
                       /home/engshare/third-party2/libgcc/11.x/src/gcc-11.x/x86_64-facebook-linux/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:82
    @ 000000000009abc8 start_thread
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
    @ 000000000012ce4b __clone3
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
```

Reviewed By: spershin

Differential Revision: D97223914

fbshipit-source-id: f334cac3f7ed3885d951fab857717d2ab370812b
dan13bauer added a commit that referenced this pull request Apr 17, 2026
PR facebookincubator#16037 review comment #2: endpoints_ does not need a mutex because
all access paths (assocEndpointRef, listenerCallback, removeEndpointRef)
run on the single Communicator thread. Document this invariant on the
member variable.

Remove endpoints_.size() from stop() log message since stop() runs on
an external thread — reading a non-thread-safe container from outside
the owning thread is a data race. Also fix typo workQueue_._size().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants