-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add VFS abstraction layer for cloud storage support #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…action (#10) Implement the foundational abstraction layer for file operations that will enable reading config and SQL files from cloud storage (S3, GCS, Azure, HTTP). - Define IFileProvider interface with ReadFile, FileExists, ListFiles, IsRemotePath - Implement LocalFileProvider using std::filesystem for local operations - Add PathSchemeUtils for URI scheme detection (s3://, gs://, az://, https://) - Add FileProviderFactory for creating appropriate providers based on path - Include comprehensive unit tests (17 test cases, 108 assertions) - Add design document for the VFS abstraction feature This is the first step in the VFS epic. Future tasks will add DuckDBVFSProvider for remote storage and integrate with ConfigLoader/SQLTemplateProcessor. Part of #10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jrosskopf
added a commit
that referenced
this pull request
Jan 21, 2026
Implement DuckDBVFSProvider that uses DuckDB's FileSystem API to read files from remote storage (S3, GCS, Azure, HTTP/HTTPS). This enables loading configuration and SQL files from cloud storage via httpfs extension. - Add DuckDBVFSProvider class using DuckDB's FileSystem interface - Implement ReadFile, FileExists, ListFiles for remote paths - Update FileProviderFactory to return DuckDBVFSProvider for remote URIs - Add constructor that obtains FileSystem from DatabaseManager singleton - Wrap exceptions from DatabaseManager in FileOperationError for clean API - Update tests to handle both initialized and uninitialized DatabaseManager Part of #10
#10) - Add IFileProvider dependency injection to ConfigLoader - Update loadYamlFile() to use provider for file operations - Add isRemoteConfig() method for remote path detection - Add readFile() and getFileProvider() accessor methods - Add support for remote path resolution (s3://, gs://, https://, etc.) - Add comprehensive VFS integration tests with MockFileProvider - Maintain full backward compatibility with existing local file usage
…ate support (#10) - Add getFileProvider() method to ConfigManager for VFS access - Update SQLTemplateProcessor::loadTemplateContent() to use IFileProvider - Update getFullTemplatePath() to handle remote paths (s3://, gs://, https://) - Remote template source paths are used directly (not combined with base) - Local absolute paths are preserved unchanged - Add VFS integration tests for remote path resolution
Add comprehensive path validation to prevent security vulnerabilities: - Path traversal attack prevention (blocking '..' sequences) - URL-encoded traversal detection (%2e%2e = ..) - Prefix-based access control for directory whitelisting - URL scheme whitelisting (default: file, https) - Path canonicalization with Windows/Unix separator normalization Security tests cover: - OWASP path traversal patterns - URL-encoded and double-encoded attacks - Remote path validation with scheme checking - Edge cases (empty paths, whitespace-only) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive integration tests for the VFS abstraction feature: - TestVFSLocalBaseline: Verify local file paths continue to work - TestVFSHttpServer: Test HTTP server can serve config/templates - TestVFSErrorHandling: Test error handling for unreachable URLs - TestVFSPathSecurity: Verify path traversal handling - TestVFSMixedPaths: Test mixed local/remote configurations - TestVFSS3Integration: Placeholder tests for S3 (requires LocalStack) Tests verify backward compatibility with local filesystem paths while enabling remote file access via DuckDB's VFS. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add section 2.11 "Storage Configuration (VFS)" to CONFIG_REFERENCE.md - Create comprehensive CLOUD_STORAGE_GUIDE.md for users - Add example configurations for S3, Azure, and HTTPS in examples/cloud-native/
- Fix ConfigLoader path resolution tests to use canonical() for macOS /var -> /private/var symlink handling - Fix ConfigManager endpoint test to match actual template resolution behavior
jrosskopf
added a commit
that referenced
this pull request
Jan 22, 2026
8 issues created: - flapi-1jr: P0 - Double-encoded path traversal bypass - flapi-2c5: P1 - azure:// scheme not recognized as remote - flapi-bax: P1 - Case-sensitive scheme handling mismatch - flapi-0po: P1 - Symlink escape vulnerability - flapi-b33: P2 - Unstable DuckDB internal APIs - flapi-p9h: P2 - Dangling pointer risk - flapi-96u: P3 - No file size limits in ReadFile - flapi-bpi: P3 - FileExists swallows exceptions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security fixes identified by OpenAI Codex review of VFS abstraction layer: P0 Critical: - Fix double-encoded path traversal bypass: UrlDecode now iteratively decodes up to 3 times to catch %252e%252e (double-encoded ..) attacks P1 High: - Fix azure:// not treated as remote: Added "azure" to remote_schemes set - Fix case-sensitive scheme handling: PathSchemeUtils now uses case-insensitive scheme matching to prevent S3:// vs s3:// routing inconsistencies - Fix symlink escape vulnerability: Added resolve_symlinks config option using std::filesystem::weakly_canonical for security-critical paths P2 Medium: - Document unstable DuckDB API usage: Added TODO comments for capi_internal.hpp and reinterpret_cast usage that may break on DuckDB upgrades - Document dangling pointer risk: Added TODO noting _file_system can dangle if DatabaseManager is reset P3 Low: - Add file size limits: DuckDBVFSProvider::ReadFile now enforces 10MB max to prevent memory spikes from accidentally loading large files - Fix silent exception swallowing: FileExists now logs warnings via CROW_LOG_WARNING for failed operations instead of silently returning false Tests updated to reflect new security-enhanced behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jrosskopf
added a commit
that referenced
this pull request
Jan 22, 2026
Closed 8 bug tickets identified by Codex code review of PR #12: - flapi-1jr: P0 Double-encoded path traversal bypass (fixed) - flapi-2c5: P1 azure:// not recognized as remote (fixed) - flapi-bax: P1 Case-sensitive scheme handling (fixed) - flapi-0po: P1 Symlink escape vulnerability (fixed) - flapi-b33: P2 Unstable DuckDB APIs (documented) - flapi-p9h: P2 Dangling pointer risk (documented) - flapi-96u: P3 No file size limits (fixed) - flapi-bpi: P3 FileExists exception swallowing (fixed) All fixes implemented in feature/gh-10-vfs-abstraction branch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update conftest.py to detect universal binary and architecture-specific build directories on macOS (build/universal/, build/release-arm64/) - Update Makefile integration-test-ci target to use universal binary on macOS Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
IFileProviderinterface abstracting file operations (ReadFile, FileExists, ListFiles, IsRemotePath)LocalFileProviderusing std::filesystem for local file operationsPathSchemeUtilsfor URI scheme detection (s3://, gs://, az://, azure://, http://, https://, file://)FileProviderFactoryfor creating appropriate providers based on path schemeTest plan
Implementation Notes
This PR implements the first task (
flapi-272) of the VFS epic (flapi-gwy). The abstraction layer is designed to:Next steps (future PRs):
flapi-6f0: Implement DuckDBVFSProvider with httpfs extensionflapi-pc9: Integrate VFSAdapter into ConfigLoaderflapi-snt: Integrate VFSAdapter into SQLTemplateProcessorCloses #10 (partial - first phase of multi-phase implementation)