Skip to content

Commit d4a4544

Browse files
authored
[ML] Fix flaky CIoManagerTest/testFileIoGood on linux-x86_64 (#3017)
The reader hits a premature EOF at 8192 bytes (two filesystem pages) while the remaining ~1.8KB of data hasn't been flushed from the kernel buffer yet. With MAX_EOF_RETRIES=10 and 40ms sleeps (400ms total), the reader gives up before the flush completes on loaded CI agents. Increase MAX_EOF_RETRIES from 10 to 50 (2 seconds total patience) which should be ample time for the kernel to flush a few KB of data. Fixes #2890 Made-with: Cursor
1 parent 4d4abf7 commit d4a4544

1 file changed

Lines changed: 27 additions & 1 deletion

File tree

lib/test/CThreadDataReader.cc

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,22 @@ void CThreadDataReader::run() {
6565

6666
static const std::streamsize BUF_SIZE{512};
6767
char buffer[BUF_SIZE];
68-
while (strm.good()) {
68+
69+
// For regular files the reader can open the file while the writer
70+
// is still flushing, hit a premature EOF, and stop too early.
71+
// After hitting EOF we clear the stream state and retry a limited
72+
// number of times, sleeping between each attempt. Any successful
73+
// read resets the counter so we only give up after the writer has
74+
// truly finished.
75+
//
76+
// 50 retries * 40ms sleep = 2 seconds total patience after the last
77+
// successful read. The previous value of 10 (~400ms) was too low
78+
// for loaded CI agents where kernel buffer flushing can be delayed.
79+
// See https://github.com/elastic/ml-cpp/issues/2890.
80+
static const std::size_t MAX_EOF_RETRIES{50};
81+
std::size_t eofRetries{0};
82+
83+
for (;;) {
6984
if (m_Shutdown) {
7085
return;
7186
}
@@ -75,6 +90,7 @@ void CThreadDataReader::run() {
7590
return;
7691
}
7792
if (strm.gcount() > 0) {
93+
eofRetries = 0;
7894
core::CScopedLock lock(m_Mutex);
7995
// This code deals with the test character we write to
8096
// detect the short-lived connection problem on Windows
@@ -88,6 +104,16 @@ void CThreadDataReader::run() {
88104
m_Data.append(copyFrom, copyLen);
89105
}
90106
}
107+
if (strm.eof()) {
108+
if (strm.gcount() == 0) {
109+
++eofRetries;
110+
if (eofRetries > MAX_EOF_RETRIES) {
111+
break;
112+
}
113+
std::this_thread::sleep_for(std::chrono::milliseconds(m_SleepTimeMs));
114+
}
115+
strm.clear();
116+
}
91117
}
92118
}
93119

0 commit comments

Comments
 (0)