S3 Mirror is a production-ready Python utility for synchronizing buckets and objects between S3-compatible endpoints. Built on boto3, it provides enterprise-grade reliability with comprehensive logging, parallelized transfers, and automation-friendly operation.
- Overview
- Key Features
- Prerequisites
- Installation
- Configuration
- Usage
- Logging
- Safety Considerations
- Continuous Integration
- Project Structure
- Contributing
- License
Motivation: While MinIO's mc client has served as a capable mirroring tool, recent upstream changes and deprecation of essential features created concerns about long-term reliability and availability. S3 Mirror addresses this gap by providing:
- Complete independence from proprietary tooling ecosystems
- Foundation on boto3, the industry-standard AWS SDK for Python
- Full transparency and auditability of synchronization operations
- Universal S3 compatibility across AWS, MinIO, Ceph, Backblaze B2, Wasabi, and other providers
This tool is designed for infrastructure engineers and DevOps teams requiring dependable, scriptable S3 replication without vendor lock-in.
β
Multi-Endpoint Synchronization β Mirror buckets and objects between any S3-compatible services
β
Performance Optimization β Configurable parallelization with multipart upload support
β
True Mirroring β Optional deletion of extraneous destination objects for exact replication
β
Flexible Configuration β YAML/JSON config files with CLI flag overrides
β
Production Logging β Multiple logging modes including cron-friendly file output with silent console operation
β
Automation Ready β Idempotent design for reliable scheduled execution
β
CI/CD Validated β Automated linting and formatting across Python 3.10β3.13
β
Dependency Management β Automated security updates via Dependabot
- Python 3.10 or higher (tested through 3.13)
- S3 Credentials: AWS access keys or IAM credentials for both source and destination endpoints
- Network Access: Connectivity to both S3 endpoints (including proxy/firewall configuration if required)
Clone the repository and set up the Python environment:
git clone https://github.com/soakes/s3mirror.git
cd s3mirror
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtS3 Mirror uses YAML or JSON configuration files to define connection parameters and synchronization behavior. Create a config file based on the template below:
source:
endpoint_url: "https://s3.source.example.com"
aws_access_key_id: "SOURCE_ACCESS_KEY"
aws_secret_access_key: "SOURCE_SECRET_KEY"
region_name: "us-east-1"
verify_ssl: false
destination:
endpoint_url: "https://s3.destination.example.com"
aws_access_key_id: "DEST_ACCESS_KEY"
aws_secret_access_key: "DEST_SECRET_KEY"
region_name: "us-east-1"
verify_ssl: false
performance:
max_workers: 20 # Parallel transfer threads
multipart_threshold: 8388608 # 8 MB - files larger trigger multipart upload
multipart_chunksize: 8388608 # 8 MB - chunk size for multipart uploads
max_concurrency: 10 # Concurrent S3 operations per thread
max_pool_connections: 50 # HTTP connection pool size
sync:
delete_extraneous: true # Remove objects in destination not present in source
exclude_buckets: [] # Bucket names to skip during mirroringSource/Destination Blocks:
endpoint_url: S3-compatible API endpoint URLaws_access_key_id/aws_secret_access_key: Authentication credentialsregion_name: AWS region identifier (required even for non-AWS endpoints)verify_ssl: SSL certificate verification (disable for self-signed certificates)
Performance Tuning:
- Adjust
max_workersbased on available CPU cores and network bandwidth - Set
multipart_thresholdandmultipart_chunksizeaccording to typical object sizes - Increase
max_pool_connectionsfor high-throughput scenarios
Sync Behavior:
delete_extraneous: Enable true mirroring by removing destination-only objectsexclude_buckets: Skip specific buckets (useful for test/temporary buckets)
Execute a synchronization using your configuration file:
./s3mirror.py --config config.yaml# Silent mode (console shows errors only)
./s3mirror.py --config config.yaml --quiet
# Log to file with silent console (ideal for cron jobs)
./s3mirror.py --config config.yaml --log-file /var/log/s3mirror.log
# Debug mode with verbose output
./s3mirror.py --config config.yaml --debug
# Disable deletion of extraneous objects
./s3mirror.py --config config.yaml --no-deleteAdd to your crontab for scheduled synchronization:
# Run daily at 2:00 AM with file logging
0 2 * * * /path/to/s3mirror/.venv/bin/python /path/to/s3mirror/s3mirror.py --config /path/to/config.yaml --log-file /var/log/s3mirror.log --quietS3 Mirror provides multiple logging modes tailored to different operational contexts:
| Mode | Console Output | File Output | Use Case |
|---|---|---|---|
| Normal | Human-readable progress messages | None | Interactive execution |
| Debug | Colorized [LEVEL] messages with details |
None | Troubleshooting |
| File Log | Errors only | Full DEBUG with timestamps | Production automation |
| Quiet | None (unless errors occur) | None | Minimal output scenarios |
Recommendation: For production cron jobs, use --log-file with --quiet to maintain detailed logs while preventing unnecessary console output.
delete_extraneous: true, S3 Mirror removes objects from the destination that do not exist in the source. This ensures perfect replication but requires careful consideration.
Best Practices:
- Test in non-production environments first to validate configuration
- Enable deletion only when true mirroring is required (vs. one-way copying)
- Use
exclude_bucketsto protect specific buckets from synchronization - Review logs regularly to identify unexpected deletions or errors
- Maintain backup copies of critical data before enabling deletion
To disable deletion while still copying new/changed objects:
./s3mirror.py --config config.yaml --no-deleteOr set delete_extraneous: false in the configuration file.
Every commit and pull request is automatically validated through GitHub Actions across Python 3.10 through 3.13:
Linting (lint.yml):
- Pylint: Static code analysis for code quality and standards compliance
- Cross-version testing: Validates compatibility across all supported Python versions
Formatting (format.yml):
- Black: Code formatting verification (PEP 8 conformance)
- isort: Import statement organization
- Consistent style enforcement across the entire codebase
Dependabot (dependabot.yml):
- Automated dependency updates for security patches and version bumps
- Weekly scanning of Python packages and GitHub Actions
- Auto-merge workflow (
dependabot-auto-merge.yml) for patch and minor updates
The CI pipeline ensures code quality, security, and cross-version compatibility, providing confidence for production deployment.
s3mirror/
βββ .github/
β βββ dependabot.yml # Dependabot configuration
β βββ workflows/
β βββ dependabot-auto-merge.yml # Auto-merge for dependency updates
β βββ format.yml # Code formatting checks (Black, isort)
β βββ lint.yml # Linting workflow (Pylint)
βββ .pylintrc # Pylint configuration and standards
βββ LICENSE # MIT License
βββ README.md # This documentation
βββ requirements.txt # Python dependencies (boto3, PyYAML, etc.)
βββ s3mirror.py # Main synchronization script
Contributions are welcome and appreciated. To contribute:
- Fork the repository on GitHub
- Create a feature branch (
git checkout -b feature/your-feature) - Implement your changes with appropriate tests
- Ensure CI passes (run
pylint,black, andisortlocally) - Submit a pull request with a clear description of changes
Run code quality checks before committing:
# Format code
black s3mirror.py
# Sort imports
isort s3mirror.py
# Run linter
pylint s3mirror.pyAreas for contribution:
- Bug fixes and reliability improvements
- Performance optimizations
- Enhanced error handling and recovery
- Documentation improvements
- Additional S3-compatible endpoint testing
Please open an issue before starting work on major features to discuss implementation approach.
This project is licensed under the MIT License. See the LICENSE file for complete details.
Developed by Simon Oakes
Infrastructure Engineer | Open Source Contributor
Β© 2025