Skip to content

Conversation

@jrosskopf
Copy link
Contributor

@jrosskopf jrosskopf commented Jan 21, 2026

Summary

  • Add IFileProvider interface abstracting file operations (ReadFile, FileExists, ListFiles, IsRemotePath)
  • Implement LocalFileProvider using std::filesystem for local file operations
  • Add PathSchemeUtils for URI scheme detection (s3://, gs://, az://, azure://, http://, https://, file://)
  • Add FileProviderFactory for creating appropriate providers based on path scheme
  • Include comprehensive unit tests (17 test cases, 108 assertions)
  • Add design document outlining the full VFS feature roadmap

Test plan

  • All 17 VFS adapter unit tests pass (108 assertions)
  • Full test suite passes (310/312 - 2 pre-existing macOS symlink failures unrelated to this PR)
  • Debug build compiles successfully
  • Manual verification: Existing functionality unaffected

Implementation Notes

This PR implements the first task (flapi-272) of the VFS epic (flapi-gwy). The abstraction layer is designed to:

  1. Maintain backward compatibility - LocalFileProvider wraps existing std::filesystem operations
  2. Enable future cloud storage - Interface designed for DuckDBVFSProvider (next task)
  3. Support URI-style paths - Scheme detection ready for s3://, gs://, az://, https://

Next steps (future PRs):

  • flapi-6f0: Implement DuckDBVFSProvider with httpfs extension
  • flapi-pc9: Integrate VFSAdapter into ConfigLoader
  • flapi-snt: Integrate VFSAdapter into SQLTemplateProcessor

Closes #10 (partial - first phase of multi-phase implementation)

…action (#10)

Implement the foundational abstraction layer for file operations that will
enable reading config and SQL files from cloud storage (S3, GCS, Azure, HTTP).

- Define IFileProvider interface with ReadFile, FileExists, ListFiles, IsRemotePath
- Implement LocalFileProvider using std::filesystem for local operations
- Add PathSchemeUtils for URI scheme detection (s3://, gs://, az://, https://)
- Add FileProviderFactory for creating appropriate providers based on path
- Include comprehensive unit tests (17 test cases, 108 assertions)
- Add design document for the VFS abstraction feature

This is the first step in the VFS epic. Future tasks will add DuckDBVFSProvider
for remote storage and integrate with ConfigLoader/SQLTemplateProcessor.

Part of #10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jrosskopf added a commit that referenced this pull request Jan 21, 2026
jrosskopf and others added 7 commits January 21, 2026 17:11
Implement DuckDBVFSProvider that uses DuckDB's FileSystem API to read files
from remote storage (S3, GCS, Azure, HTTP/HTTPS). This enables loading
configuration and SQL files from cloud storage via httpfs extension.

- Add DuckDBVFSProvider class using DuckDB's FileSystem interface
- Implement ReadFile, FileExists, ListFiles for remote paths
- Update FileProviderFactory to return DuckDBVFSProvider for remote URIs
- Add constructor that obtains FileSystem from DatabaseManager singleton
- Wrap exceptions from DatabaseManager in FileOperationError for clean API
- Update tests to handle both initialized and uninitialized DatabaseManager

Part of #10
#10)

- Add IFileProvider dependency injection to ConfigLoader
- Update loadYamlFile() to use provider for file operations
- Add isRemoteConfig() method for remote path detection
- Add readFile() and getFileProvider() accessor methods
- Add support for remote path resolution (s3://, gs://, https://, etc.)
- Add comprehensive VFS integration tests with MockFileProvider
- Maintain full backward compatibility with existing local file usage
…ate support (#10)

- Add getFileProvider() method to ConfigManager for VFS access
- Update SQLTemplateProcessor::loadTemplateContent() to use IFileProvider
- Update getFullTemplatePath() to handle remote paths (s3://, gs://, https://)
- Remote template source paths are used directly (not combined with base)
- Local absolute paths are preserved unchanged
- Add VFS integration tests for remote path resolution
Add comprehensive path validation to prevent security vulnerabilities:
- Path traversal attack prevention (blocking '..' sequences)
- URL-encoded traversal detection (%2e%2e = ..)
- Prefix-based access control for directory whitelisting
- URL scheme whitelisting (default: file, https)
- Path canonicalization with Windows/Unix separator normalization

Security tests cover:
- OWASP path traversal patterns
- URL-encoded and double-encoded attacks
- Remote path validation with scheme checking
- Edge cases (empty paths, whitespace-only)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive integration tests for the VFS abstraction feature:
- TestVFSLocalBaseline: Verify local file paths continue to work
- TestVFSHttpServer: Test HTTP server can serve config/templates
- TestVFSErrorHandling: Test error handling for unreachable URLs
- TestVFSPathSecurity: Verify path traversal handling
- TestVFSMixedPaths: Test mixed local/remote configurations
- TestVFSS3Integration: Placeholder tests for S3 (requires LocalStack)

Tests verify backward compatibility with local filesystem paths
while enabling remote file access via DuckDB's VFS.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add section 2.11 "Storage Configuration (VFS)" to CONFIG_REFERENCE.md
- Create comprehensive CLOUD_STORAGE_GUIDE.md for users
- Add example configurations for S3, Azure, and HTTPS in examples/cloud-native/
- Fix ConfigLoader path resolution tests to use canonical() for macOS
  /var -> /private/var symlink handling
- Fix ConfigManager endpoint test to match actual template resolution behavior
jrosskopf added a commit that referenced this pull request Jan 22, 2026
8 issues created:
- flapi-1jr: P0 - Double-encoded path traversal bypass
- flapi-2c5: P1 - azure:// scheme not recognized as remote
- flapi-bax: P1 - Case-sensitive scheme handling mismatch
- flapi-0po: P1 - Symlink escape vulnerability
- flapi-b33: P2 - Unstable DuckDB internal APIs
- flapi-p9h: P2 - Dangling pointer risk
- flapi-96u: P3 - No file size limits in ReadFile
- flapi-bpi: P3 - FileExists swallows exceptions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security fixes identified by OpenAI Codex review of VFS abstraction layer:

P0 Critical:
- Fix double-encoded path traversal bypass: UrlDecode now iteratively decodes
  up to 3 times to catch %252e%252e (double-encoded ..) attacks

P1 High:
- Fix azure:// not treated as remote: Added "azure" to remote_schemes set
- Fix case-sensitive scheme handling: PathSchemeUtils now uses case-insensitive
  scheme matching to prevent S3:// vs s3:// routing inconsistencies
- Fix symlink escape vulnerability: Added resolve_symlinks config option using
  std::filesystem::weakly_canonical for security-critical paths

P2 Medium:
- Document unstable DuckDB API usage: Added TODO comments for capi_internal.hpp
  and reinterpret_cast usage that may break on DuckDB upgrades
- Document dangling pointer risk: Added TODO noting _file_system can dangle
  if DatabaseManager is reset

P3 Low:
- Add file size limits: DuckDBVFSProvider::ReadFile now enforces 10MB max
  to prevent memory spikes from accidentally loading large files
- Fix silent exception swallowing: FileExists now logs warnings via
  CROW_LOG_WARNING for failed operations instead of silently returning false

Tests updated to reflect new security-enhanced behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jrosskopf added a commit that referenced this pull request Jan 22, 2026
Closed 8 bug tickets identified by Codex code review of PR #12:
- flapi-1jr: P0 Double-encoded path traversal bypass (fixed)
- flapi-2c5: P1 azure:// not recognized as remote (fixed)
- flapi-bax: P1 Case-sensitive scheme handling (fixed)
- flapi-0po: P1 Symlink escape vulnerability (fixed)
- flapi-b33: P2 Unstable DuckDB APIs (documented)
- flapi-p9h: P2 Dangling pointer risk (documented)
- flapi-96u: P3 No file size limits (fixed)
- flapi-bpi: P3 FileExists exception swallowing (fixed)

All fixes implemented in feature/gh-10-vfs-abstraction branch.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update conftest.py to detect universal binary and architecture-specific
  build directories on macOS (build/universal/, build/release-arm64/)
- Update Makefile integration-test-ci target to use universal binary on macOS

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jrosskopf jrosskopf merged commit c024669 into main Jan 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support DuckDB Virtual File System for Configuration and SQL Files

2 participants