File Reader Documentation

The file_reader class provides efficient, buffered access to file data with support for byte-level and line-level reading. It's optimized for reading large files while maintaining low memory overhead by using a fixed-size internal buffer (1MB by default).

Overview
Core Features
Basic Usage
Advanced Reading Methods
- Reading Specific Lines
- Reading Until a Condition
Block Views and Validity
Iterator Interface
File Information
Shallow Copy Optimization
Complete Examples
Concepts and Traits

Overview

file_reader is a C++26 streaming file reader designed for efficient, sequential access to file data. Unlike in-memory readers (e.g., buffer_reader), it reads data in configurable chunks (default: 1MB) to handle arbitrarily large files without loading everything into memory at once.

Key Design Principles

Buffered Reading: Uses a 1MB buffer by default for efficient I/O operations
Lazy Loading: Data is only loaded when needed
Cursor Management: Maintains position tracking for random access to loaded data
Shallow Copy Support: Enables efficient backtracking in parsing operations
View-Based API: Returns block_view objects that safely reference buffer data

Core Features

Line-by-line iteration with standard C++ iterator interface
Random line access via read_line(line_number)
Byte-level sequential access with next_byte() and previous_byte()
Predicate-based reading with read_until(predicate)
File metadata (size, path, existence checks)
Load tracking to detect when buffer reloads occur
Shallow copy support for efficient state saving and backtracking (for efficient integraton with copa parser)

Basic Usage

Creating a File Reader

#include <fil/file/file_reader.hh>
#include <iostream>

int main() {
    // Create a file reader from a file path
    fil::file_reader reader(std::filesystem::path("large_file.txt"));

    // Check if the file exists and is readable
    if (!reader.exists()) {
        std::cerr << "File not found!" << std::endl;
        return 1;
    }

    return 0;
}

Line-by-Line Iteration

The most common usage pattern is iterating through lines in a file:

#include <fil/file/file_reader.hh>
#include <iostream>

int main() {
    fil::file_reader reader("data.txt");

    // Iterate through all lines
    for (auto line_iter = reader.make_line_iterator(); line_iter != reader.end(); ++line_iter) {
        auto line = *line_iter;
        
        // Check if the line is still valid (important for large files with buffer reloads)
        if (line.is_valid()) {
            std::cout << "Line " << line_iter.line() << ": " << line.get() << std::endl;
        }
    }

    return 0;
}

Note: Always check is_valid() before using a block_view from file_reader, as the underlying buffer may be reloaded.

Byte-Level Reading

For sequential byte-by-byte reading (useful in parsing):

#include <fil/file/file_reader.hh>
#include <iostream>

int main() {
    fil::file_reader reader("input.bin");

    std::optional<std::uint8_t> byte;
    while ((byte = reader.next_byte()).has_value()) {
        std::cout << static_cast<char>(*byte);
    }

    return 0;
}

Reading Specific Lines

Access any line in the file directly (performs a seek to that line):

#include <fil/file/file_reader.hh>

int main() {
    fil::file_reader reader("data.txt");

    // Jump directly to line 42
    auto line_42 = reader.read_line(42);
    
    if (line_42.is_valid()) {
        std::cout << "Line 42: " << line_42.get() << std::endl;
    }

    return 0;
}

Performance Note: read_line() performs file seeks, making it slower for sequential access. For iteration, use the iterator interface.

Reading Until a Condition (String-based Predicate)

Read data from the current position until a predicate condition is met (receiving accumulated std::string_view):

#include <fil/file/file_reader.hh>

int main() {
    fil::file_reader reader("data.txt");

    // Read until we find a closing brace
    auto block = reader.read_until([](std::string_view accumulated) {
        return accumulated.back() == '}';
    });

    if (!block.get().empty()) {
        std::cout << "Block: " << block.get() << std::endl;
    }

    return 0;
}

Reading Until a Condition (Char-based Predicate)

Read data until a predicate on individual characters is satisfied:

#include <fil/file/file_reader.hh>

int main() {
    fil::file_reader reader("data.txt");

    // Read until we encounter a newline
    auto line = reader.read_until([](char c) {
        return c == '\n';
    });

    if (!line.get().empty()) {
        std::cout << "Line: " << line.get() << std::endl;
    }

    return 0;
}

Custom Minimum Buffer Size

Both read_until variants support a minimum_size parameter to control when buffer reloading occurs:

// Read until condition, but only reload if less than 500 bytes remain
auto block = reader.read_until(
    [](std::string_view sv) { return sv.find(";") != std::string_view::npos; },
    500  // minimum_size parameter
);

Block Views and Validity

Understanding `block_view`

A block_view is a safe wrapper around a std::string_view that points to data in the file_reader's internal buffer. Since the buffer can be reloaded during reading, views may become invalid.

struct block_view {
    /**
     * @return true if the referenced data is still valid in the buffer
     * 
     * Returns false if:
     * - The buffer has been reloaded (load_counter has changed)
     * - The reader has been destroyed
     */
    [[nodiscard]] bool is_valid() const;

    /**
     * @return string_view to the block data, or empty if invalid
     * 
     * Always check is_valid() before calling get()
     */
    [[nodiscard]] std::string_view get() const;
};

Best Practices with `block_view`

#include <fil/file/file_reader.hh>
#include <vector>
#include <string>

int main() {
    fil::file_reader reader("large_file.txt");
    std::vector<std::string> lines;

    // ✓ CORRECT: Copy the data immediately
    for (auto line_iter = reader.make_line_iterator(); line_iter != reader.end(); ++line_iter) {
        auto line = *line_iter;
        if (line.is_valid()) {
            lines.push_back(std::string(line.get()));  // Copy the view to a string
        }
    }

    // ✗ WRONG: Storing the view directly (will become invalid after buffer reload)
    // std::vector<fil::file_reader::block_view> bad_lines;
    // for (auto line_iter = reader.make_line_iterator(); ...) {
    //     bad_lines.push_back(*line_iter);  // Dangerous!
    // }

    return 0;
}

Iterator Interface

Line Iterator

The file_reader::line_iterator provides standard C++ iterator semantics for line-based iteration:

#include <fil/file/file_reader.hh>

int main() {
    fil::file_reader reader("data.txt");

    // Create an iterator starting at line 1
    auto begin = reader.make_line_iterator(1);
    auto end = reader.end();

    // Use with standard algorithms
    int line_count = 0;
    for (auto it = begin; it != end; ++it) {
        ++line_count;
        std::cout << "Line " << it.line() << std::endl;
    }

    return 0;
}

Iterator Operations

// Create iterator at specific line
auto iter = reader.make_line_iterator(10);

// Get current line data
auto block = *iter;
auto line_num = iter.line();

// Advance to next line
++iter;

// Check for end
if (iter != reader.end()) {
    // More lines to read
}

Range-Based Iteration

Use file_reader_line for convenient range-based for loops:

#include <fil/file/file_reader.hh>

int main() {
    fil::file_reader reader("data.txt");

    // Range-based iteration starting from line 1
    for (const auto& line : fil::file_reader_line(reader, 1)) {
        if (line.is_valid()) {
            std::cout << line.get() << std::endl;
        }
    }

    // Start from line 100
    for (const auto& line : fil::file_reader_line(reader, 100)) {
        if (line.is_valid()) {
            std::cout << line.get() << std::endl;
        }
    }

    return 0;
}

File Information

The file_reader provides access to file metadata:

#include <fil/file/file_reader.hh>

int main() {
    fil::file_reader reader("data.txt");

    // File path
    auto path = reader.get_path();
    std::cout << "File: " << path << std::endl;

    // File size in bytes
    std::size_t file_size = reader.size();
    std::cout << "Size: " << file_size << " bytes" << std::endl;

    // Check existence
    if (reader.exists()) {
        std::cout << "File exists" << std::endl;
    }

    // Current position in file stream
    auto file_pos = reader.get_file_cursor();
    
    // Current position in buffer
    std::size_t buffer_pos = reader.get_buffer_cursor();
    
    // Equivalent: reader_cursor() returns buffer position
    std::size_t cursor = reader.reader_cursor();

    // Number of buffer reloads that have occurred
    std::size_t loads = reader.load_counter();
    std::cout << "Buffer loaded " << loads << " times" << std::endl;

    return 0;
}

Shallow Copy Optimization

Why Shallow Copy Matters

When parsing with backtracking (e.g., using or_rule in Copa), the parser needs to save reader state. Without shallow copy optimization, this would require copying the entire file buffer, which is extremely inefficient for large files.

The file_reader implements shallow_copy to:

Create a new file stream positioned at the same location
Copy only the cursor position, not the buffer data
Maintain reference to the same buffer accessor

Concepts and Traits

Bytes Reader Concept

The file_reader satisfies the fil::meta::bytes_reader concept:

static_assert(fil::meta::bytes_reader<fil::file_reader>);

Required interface:

next_byte() → std::optional<std::uint8_t>
previous_byte() → std::optional<std::uint8_t>
peek() → std::optional<std::uint8_t>
reader_cursor() → std::size_t

Line Reader Concept

The file_reader satisfies the fil::meta::line_reader concept:

static_assert(fil::meta::line_reader<fil::file_reader>);

Allows seamless integration with parsing libraries requiring sequential line access.

Performance Considerations

Buffer Size

The default buffer size is 1MB (READER_BUFFER_SIZE = 1024 * 1024 + 1). This is optimized for most use cases:

Larger files: Multiple buffer loads, but memory-efficient
Smaller files: An entire file fits in one load
Line-heavy parsing: Good balance between I/O operations and memory

I/O Strategy

Buffer reloads happen automatically when reading beyond the current buffer
Previous unread data is preserved (cursor adjusted on reload)
File stream position is carefully managed to minimize seeks

Iterator Overhead

Line iteration requires scanning for newline characters
Consider byte-level reading for non-line-oriented formats
Load counter helps detect when buffer has been reloaded

Troubleshooting

Q: Why is my block_view empty?

Check is_valid() before using get(). The buffer may have been reloaded.

Q: Can I access a line at the end of a large file efficiently?

Use read_line(line_number) for random access, but be aware it performs file seeks.

Q: Is byte-level reading efficient for large files?

Yes, it uses the same buffering mechanism as line reading. Buffer reloads are transparent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File Reader Documentation

Table of Contents

Overview

Key Design Principles

Core Features

Basic Usage

Creating a File Reader

Line-by-Line Iteration

Byte-Level Reading

Reading Specific Lines

Reading Until a Condition (String-based Predicate)

Reading Until a Condition (Char-based Predicate)

Custom Minimum Buffer Size

Block Views and Validity

Understanding `block_view`

Best Practices with `block_view`

Iterator Interface

Line Iterator

Iterator Operations

Range-Based Iteration

File Information

Shallow Copy Optimization

Why Shallow Copy Matters

Concepts and Traits

Bytes Reader Concept

Line Reader Concept

Performance Considerations

Buffer Size

I/O Strategy

Iterator Overhead

Troubleshooting

FilesExpand file tree

file_reader.md

Latest commit

History

file_reader.md

File metadata and controls

File Reader Documentation

Table of Contents

Overview

Key Design Principles

Core Features

Basic Usage

Creating a File Reader

Line-by-Line Iteration

Byte-Level Reading

Reading Specific Lines

Reading Until a Condition (String-based Predicate)

Reading Until a Condition (Char-based Predicate)

Custom Minimum Buffer Size

Block Views and Validity

Understanding block_view

Best Practices with block_view

Iterator Interface

Line Iterator

Iterator Operations

Range-Based Iteration

File Information

Shallow Copy Optimization

Why Shallow Copy Matters

Concepts and Traits

Bytes Reader Concept

Line Reader Concept

Performance Considerations

Buffer Size

I/O Strategy

Iterator Overhead

Troubleshooting

Understanding `block_view`

Best Practices with `block_view`