The file_reader class provides efficient, buffered access to file data with support for byte-level and line-level
reading. It's optimized for reading large files while maintaining low memory overhead by using a fixed-size internal
buffer (1MB by default).
- Overview
- Core Features
- Basic Usage
- Advanced Reading Methods
- Block Views and Validity
- Iterator Interface
- File Information
- Shallow Copy Optimization
- Complete Examples
- Concepts and Traits
file_reader is a C++26 streaming file reader designed for efficient, sequential access to file data. Unlike in-memory
readers (e.g., buffer_reader), it reads data in configurable chunks (default: 1MB) to handle arbitrarily large files
without loading everything into memory at once.
- Buffered Reading: Uses a 1MB buffer by default for efficient I/O operations
- Lazy Loading: Data is only loaded when needed
- Cursor Management: Maintains position tracking for random access to loaded data
- Shallow Copy Support: Enables efficient backtracking in parsing operations
- View-Based API: Returns
block_viewobjects that safely reference buffer data
- Line-by-line iteration with standard C++ iterator interface
- Random line access via
read_line(line_number) - Byte-level sequential access with
next_byte()andprevious_byte() - Predicate-based reading with
read_until(predicate) - File metadata (size, path, existence checks)
- Load tracking to detect when buffer reloads occur
- Shallow copy support for efficient state saving and backtracking (for efficient integraton with
copa parser)
#include <fil/file/file_reader.hh>
#include <iostream>
int main() {
// Create a file reader from a file path
fil::file_reader reader(std::filesystem::path("large_file.txt"));
// Check if the file exists and is readable
if (!reader.exists()) {
std::cerr << "File not found!" << std::endl;
return 1;
}
return 0;
}The most common usage pattern is iterating through lines in a file:
#include <fil/file/file_reader.hh>
#include <iostream>
int main() {
fil::file_reader reader("data.txt");
// Iterate through all lines
for (auto line_iter = reader.make_line_iterator(); line_iter != reader.end(); ++line_iter) {
auto line = *line_iter;
// Check if the line is still valid (important for large files with buffer reloads)
if (line.is_valid()) {
std::cout << "Line " << line_iter.line() << ": " << line.get() << std::endl;
}
}
return 0;
}Note: Always check is_valid() before using a block_view from file_reader, as the underlying buffer may be
reloaded.
For sequential byte-by-byte reading (useful in parsing):
#include <fil/file/file_reader.hh>
#include <iostream>
int main() {
fil::file_reader reader("input.bin");
std::optional<std::uint8_t> byte;
while ((byte = reader.next_byte()).has_value()) {
std::cout << static_cast<char>(*byte);
}
return 0;
}Access any line in the file directly (performs a seek to that line):
#include <fil/file/file_reader.hh>
int main() {
fil::file_reader reader("data.txt");
// Jump directly to line 42
auto line_42 = reader.read_line(42);
if (line_42.is_valid()) {
std::cout << "Line 42: " << line_42.get() << std::endl;
}
return 0;
}Performance Note: read_line() performs file seeks, making it slower for sequential access. For iteration, use the
iterator interface.
Read data from the current position until a predicate condition is met (receiving accumulated std::string_view):
#include <fil/file/file_reader.hh>
int main() {
fil::file_reader reader("data.txt");
// Read until we find a closing brace
auto block = reader.read_until([](std::string_view accumulated) {
return accumulated.back() == '}';
});
if (!block.get().empty()) {
std::cout << "Block: " << block.get() << std::endl;
}
return 0;
}Read data until a predicate on individual characters is satisfied:
#include <fil/file/file_reader.hh>
int main() {
fil::file_reader reader("data.txt");
// Read until we encounter a newline
auto line = reader.read_until([](char c) {
return c == '\n';
});
if (!line.get().empty()) {
std::cout << "Line: " << line.get() << std::endl;
}
return 0;
}Both read_until variants support a minimum_size parameter to control when buffer reloading occurs:
// Read until condition, but only reload if less than 500 bytes remain
auto block = reader.read_until(
[](std::string_view sv) { return sv.find(";") != std::string_view::npos; },
500 // minimum_size parameter
);A block_view is a safe wrapper around a std::string_view that points to data in the file_reader's internal buffer.
Since the buffer can be reloaded during reading, views may become invalid.
struct block_view {
/**
* @return true if the referenced data is still valid in the buffer
*
* Returns false if:
* - The buffer has been reloaded (load_counter has changed)
* - The reader has been destroyed
*/
[[nodiscard]] bool is_valid() const;
/**
* @return string_view to the block data, or empty if invalid
*
* Always check is_valid() before calling get()
*/
[[nodiscard]] std::string_view get() const;
};#include <fil/file/file_reader.hh>
#include <vector>
#include <string>
int main() {
fil::file_reader reader("large_file.txt");
std::vector<std::string> lines;
// ✓ CORRECT: Copy the data immediately
for (auto line_iter = reader.make_line_iterator(); line_iter != reader.end(); ++line_iter) {
auto line = *line_iter;
if (line.is_valid()) {
lines.push_back(std::string(line.get())); // Copy the view to a string
}
}
// ✗ WRONG: Storing the view directly (will become invalid after buffer reload)
// std::vector<fil::file_reader::block_view> bad_lines;
// for (auto line_iter = reader.make_line_iterator(); ...) {
// bad_lines.push_back(*line_iter); // Dangerous!
// }
return 0;
}The file_reader::line_iterator provides standard C++ iterator semantics for line-based iteration:
#include <fil/file/file_reader.hh>
int main() {
fil::file_reader reader("data.txt");
// Create an iterator starting at line 1
auto begin = reader.make_line_iterator(1);
auto end = reader.end();
// Use with standard algorithms
int line_count = 0;
for (auto it = begin; it != end; ++it) {
++line_count;
std::cout << "Line " << it.line() << std::endl;
}
return 0;
}// Create iterator at specific line
auto iter = reader.make_line_iterator(10);
// Get current line data
auto block = *iter;
auto line_num = iter.line();
// Advance to next line
++iter;
// Check for end
if (iter != reader.end()) {
// More lines to read
}Use file_reader_line for convenient range-based for loops:
#include <fil/file/file_reader.hh>
int main() {
fil::file_reader reader("data.txt");
// Range-based iteration starting from line 1
for (const auto& line : fil::file_reader_line(reader, 1)) {
if (line.is_valid()) {
std::cout << line.get() << std::endl;
}
}
// Start from line 100
for (const auto& line : fil::file_reader_line(reader, 100)) {
if (line.is_valid()) {
std::cout << line.get() << std::endl;
}
}
return 0;
}The file_reader provides access to file metadata:
#include <fil/file/file_reader.hh>
int main() {
fil::file_reader reader("data.txt");
// File path
auto path = reader.get_path();
std::cout << "File: " << path << std::endl;
// File size in bytes
std::size_t file_size = reader.size();
std::cout << "Size: " << file_size << " bytes" << std::endl;
// Check existence
if (reader.exists()) {
std::cout << "File exists" << std::endl;
}
// Current position in file stream
auto file_pos = reader.get_file_cursor();
// Current position in buffer
std::size_t buffer_pos = reader.get_buffer_cursor();
// Equivalent: reader_cursor() returns buffer position
std::size_t cursor = reader.reader_cursor();
// Number of buffer reloads that have occurred
std::size_t loads = reader.load_counter();
std::cout << "Buffer loaded " << loads << " times" << std::endl;
return 0;
}When parsing with backtracking (e.g., using or_rule in Copa), the parser needs to save reader state. Without shallow
copy optimization, this would require copying the entire file buffer, which is extremely inefficient for large files.
The file_reader implements shallow_copy to:
- Create a new file stream positioned at the same location
- Copy only the cursor position, not the buffer data
- Maintain reference to the same buffer accessor
The file_reader satisfies the fil::meta::bytes_reader concept:
static_assert(fil::meta::bytes_reader<fil::file_reader>);Required interface:
next_byte()→std::optional<std::uint8_t>previous_byte()→std::optional<std::uint8_t>peek()→std::optional<std::uint8_t>reader_cursor()→std::size_t
The file_reader satisfies the fil::meta::line_reader concept:
static_assert(fil::meta::line_reader<fil::file_reader>);Allows seamless integration with parsing libraries requiring sequential line access.
The default buffer size is 1MB (READER_BUFFER_SIZE = 1024 * 1024 + 1). This is optimized for most use cases:
- Larger files: Multiple buffer loads, but memory-efficient
- Smaller files: An entire file fits in one load
- Line-heavy parsing: Good balance between I/O operations and memory
- Buffer reloads happen automatically when reading beyond the current buffer
- Previous unread data is preserved (cursor adjusted on reload)
- File stream position is carefully managed to minimize seeks
- Line iteration requires scanning for newline characters
- Consider byte-level reading for non-line-oriented formats
- Load counter helps detect when buffer has been reloaded
Q: Why is my block_view empty?
- Check
is_valid()before usingget(). The buffer may have been reloaded.
Q: Can I access a line at the end of a large file efficiently?
- Use
read_line(line_number)for random access, but be aware it performs file seeks.
Q: Is byte-level reading efficient for large files?
- Yes, it uses the same buffering mechanism as line reading. Buffer reloads are transparent.