Conversation
- Add FileByteType enum to represent validation strictness modes (default: Strict) - Extend IFormFileTypeProvider.FindValidatedTypeAsync with optional `validationType` parameter - Extend IValidator.IsValidAsync with optional `validationType` parameter - Update FormFileTypeProvider to pass `validationType` through to the validator - Update Validator to prefer FileByteFilter.Matches(byte[], FileByteType) when available, with fallback to IFileType.Matches(byte[]) - Keep existing behavior for all callers that do not pass `validationType` (Strict remains the default)
- Keep strict PDF validation by requiring %%EOF at the physical end of the file (existing EndsWithAnyOf variants) - Add relaxed PDF validation mode using TailContains(1024, "%%EOF") to accept PDFs with trailing bytes after EOF - Route strict vs. relaxed behavior through FileByteType (Strict vs. Default) within a single Pdf.cs implementation - Improve compatibility with PDFs tolerated by common viewers while preserving strict validation as the default
| private readonly List<ByteCheck> _neededByteChecks = []; | ||
| private readonly List<ByteCheck[]> _oneOfEachByteChecks = []; | ||
| private readonly List<byte?[]> _anywhereByteChecks = []; | ||
| private readonly List<TailContainsCheck> _tailContainsChecks = []; |
There was a problem hiding this comment.
The multiple list fields are intentional. I store checks by check kind (fixed offset / one-of / anywhere / tail-window) and by validation mode.
• The “base” lists are global and always evaluated. This keeps existing formats backwards compatible, because they continue to register checks without specifying a mode.
• The Strict lists are evaluated only when Matches(..., Strict) is requested.
• The Lazy lists are evaluated only when Matches(..., Lazy) is requested.
This avoids mutating filter instances at runtime and allows a single format (e.g. PDF) to define strict and relaxed rules side-by-side without duplicating file type classes.
I’m happy to adjust this if there’s a cleaner approach (e.g. grouping the lists into a small CheckSet per mode or using a dictionary-based structure) - suggestions welcome.
- Update tests for new IsValidAsync `validationType` parameter (default: Strict)
- Add tests covering Strict mode EOF-at-end requirement - Add tests covering Default mode TailContains(1024) behavior (trailing bytes + EOF window) - Ensure PDFs with invalid header/EOF markers are still rejected
- Update README to document FileByteType validation modes (Strict vs Lazy) and new optional parameters - Add usage examples for selecting validationType in IFormFileTypeProvider and Validator - Document mode-specific magic byte configuration on FileByteFilter (optional FileByteType parameter) - Bump MagicBytesValidator package version in csproj to reflect the new API/documentation
19b6810 to
02834a7
Compare
Summary
This PR introduces validation modes for magic-bytes checks to support strict vs. lazy ("relaxed") validation rules for certain formats (currently: PDF). The main motivation is that many real-world PDFs are still considered valid by common viewers even if the
%%EOFmarker is not the very last bytes of the file, as long as it appears close to the end.Motivation
Our current PDF validation requires
%%EOFto be at the physical end of the file, which rejects PDFs that are tolerated by common PDF readers. A more compatible rule is:%%EOFmust occur somewhere within the last 1024 bytes of the file. This PR adds a mode-aware validation path to support that without duplicating the PDF file type implementation.What changed
Added
FileByteTypevalidation modes:Strict(intended for strict validation)Lazy(intended for relaxed / viewer-compatible validation)Extended public APIs to accept an optional validation mode:
IFormFileTypeProvider.FindValidatedTypeAsync(..., FileByteType validationType = Strict)IValidator.IsValidAsync(..., FileByteType validationType = Strict)Updated
FileByteFilterso magic-byte checks can be registered either:StrictorLazy)by using the existing fluent methods with an optional
FileByteTypeparameter.PDF behavior
Strictkeeps the previous behavior: the file must end with one of the supported%%EOFvariants.Lazyallows%%EOFto appear anywhere within the last 1024 bytes of the file (TailContains(1024, "%%EOF")).Tests
validationTypeparameter.Lazy) behavior, including trailing bytes and the “last 1024 bytes” window.Notes / Backwards compatibility
validationType, the default is used (intended:Strict).