Batch processing for multiple images with parallel execution support. Manages a collection of ImageContext instances and provides efficient batch operations including filtering, transformations, statistics, and ML dataset conversion.
Module: image_toolkit.batch_handler
Import: from image_toolkit.batch_handler import BatchImageHandler
| Method | Purpose | Returns |
|---|---|---|
__init__(paths) |
Initialize with image paths | None |
from_directory(dir_path, pattern) |
Create batch from directory | BatchImageHandler |
from_glob(pattern) |
Create batch from glob pattern | BatchImageHandler |
filter_valid(parallel) |
Remove invalid images | self |
filter_by_size(min_width, max_width, ...) |
Filter by dimensions | self |
filter_by_aspect_ratio(min_ratio, max_ratio) |
Filter by aspect ratio | self |
filter_by_file_size(min_size, max_size) |
Filter by file size | self |
analyze_duplicates(hash_method, threshold) |
Analyze duplicate groups | dict |
filter_duplicates(hash_method, threshold) |
Remove duplicates | self |
sample(n, random_sample) |
Sample n images | self |
map(func, parallel) |
Apply custom function | self |
apply_transform(transform_name, **kwargs) |
Apply ImageHandler method | self |
chain_transforms(transforms) |
Apply multiple transforms | self |
resize(width, height) |
Resize all images | self |
adjust(brightness, contrast) |
Adjust brightness/contrast | self |
to_grayscale(keep_2d) |
Convert to grayscale | self |
save(output_dir, prefix, suffix) |
Save all images | self |
get_batch_stats() |
Get aggregated statistics | dict |
detect_outliers(metric, threshold) |
Detect outlier images | List[Path] |
verify_uniformity(check_size, ...) |
Check batch uniformity | dict |
to_dataset(format, normalized) |
Convert to ML dataset | array/tensor |
split_dataset(train_ratio, val_ratio, ...) |
Split into train/val/test | dict |
create_grid(rows, cols, cell_size) |
Create grid visualization | Image |
unload() |
Free memory | self |
process_in_chunks(chunk_size, func) |
Process in chunks | self |
Legend:
- Methods returning
selfsupport chaining - See individual method documentation for detailed parameters
class BatchImageHandler:
"""
Batch processing for multiple images with parallel execution support.
Manages a collection of ImageContext instances and provides efficient batch operations
including filtering, transformations, statistics, and ML dataset conversion.
"""
def __init__(self, paths: List[Union[str, Path]]):
"""Initialize BatchImageHandler with list of image paths."""
...Parameters:
paths: List of paths to image files
Attributes:
_contexts: List of ImageContext instances_errors: List of (path, exception) tuples encountered during processing_progress_callback: Optional progress callback function_executor: ParallelExecutor for parallel processing
Initialize BatchImageHandler with list of image paths.
Parameters:
paths: List of paths to image files
Example:
paths = ["photo1.jpg", "photo2.jpg", "photo3.jpg"]
batch = BatchImageHandler(paths)Create batch from all images in a directory.
Parameters:
dir_path: Directory pathpattern: File pattern (e.g., ".jpg", ".png", "jpg") (default:"*")
Returns: BatchImageHandler instance
Example:
batch = BatchImageHandler.from_directory("photos/", "*.jpg")Create batch from glob pattern.
Parameters:
pattern: Glob pattern (e.g., "photos/**/*.jpg")
Returns: BatchImageHandler instance
Example:
batch = BatchImageHandler.from_glob("photos/**/*.jpg")Remove corrupted/invalid images from the batch.
Parameters:
parallel: Use parallel processing (default:True)
Returns: self (supports chaining)
Example:
batch.filter_valid()filter_by_size(min_width: Optional[int] = None, max_width: Optional[int] = None, min_height: Optional[int] = None, max_height: Optional[int] = None) -> BatchImageHandler
Filter images by dimensions.
Parameters:
min_width: Minimum width in pixels (optional)max_width: Maximum width in pixels (optional)min_height: Minimum height in pixels (optional)max_height: Maximum height in pixels (optional)
Returns: self (supports chaining)
Example:
batch.filter_by_size(min_width=800, max_width=2000, min_height=600)filter_by_aspect_ratio(min_ratio: Optional[float] = None, max_ratio: Optional[float] = None) -> BatchImageHandler
Filter by aspect ratio (width/height).
Parameters:
min_ratio: Minimum aspect ratio (optional)max_ratio: Maximum aspect ratio (optional)
Returns: self (supports chaining)
Example:
batch.filter_by_aspect_ratio(min_ratio=1.3)filter_by_file_size(min_size: Optional[Union[int, str]] = None, max_size: Optional[Union[int, str]] = None) -> BatchImageHandler
Filter images by file size on disk.
Parameters:
min_size: Minimum file size (int: bytes, str: "50KB", "1.5MB", "2GB") (optional)max_size: Maximum file size (same format as min_size) (optional)
Returns: self (supports chaining)
Raises:
ValueError: If size format is invalid
Example:
batch.filter_by_file_size(min_size='50KB', max_size='5MB')analyze_duplicates(hash_method: str = 'dhash', hash_size: int = 8, threshold: Optional[int] = None, parallel: bool = True) -> Dict[str, Any]
Analyze images to identify duplicate groups and singleton images.
Parameters:
hash_method: Hashing algorithm ('dhash', 'phash', 'ahash', 'whash') (default:'dhash')hash_size: Hash size (8 = 64-bit, 16 = 256-bit) (default:8)threshold: Hamming distance threshold (None = use recommended default) (optional)parallel: Use parallel processing (default:True)
Returns: dict containing duplicate_groups, singleton_groups, hash_map, and stats
Example:
analysis = batch.analyze_duplicates(hash_method='dhash', threshold=5)filter_duplicates(hash_method: str = 'dhash', hash_size: int = 8, threshold: Optional[int] = None, keep: str = 'first', parallel: bool = True) -> BatchImageHandler
Filter duplicate using perceptual hashing and keep one from each group.
Parameters:
hash_method: Hashing algorithm ('dhash', 'phash', 'ahash', 'whash') (default:'dhash')hash_size: Hash size (8 = 64-bit, 16 = 256-bit) (default:8)threshold: Maximum Hamming distance to consider as duplicate (optional)keep: Which duplicate to keep ('first', 'last', 'largest', 'smallest') (default:'first')parallel: Use parallel processing (default:True)
Returns: self (supports chaining)
Example:
batch.filter_duplicates(hash_method='dhash', threshold=8)Sample n images from the batch.
Parameters:
n: Number of images to samplerandom_sample: If True, random sampling; if False, takes first n (default:True)
Returns: self (supports chaining)
Example:
batch.sample(50, random_sample=True)map(func: Callable[[ImageHandler], ImageHandler], parallel: bool = True, workers: Optional[int] = None) -> BatchImageHandler
Apply custom function to each image.
Parameters:
func: Function that takes ImageHandler and returns ImageHandlerparallel: Use parallel processing (default:True)workers: Number of worker threads (auto if None) (optional)
Returns: self (supports chaining)
Example:
def my_pipeline(img):
img.resize_aspect(width=512)
return img
batch.map(my_pipeline, parallel=True)apply_transform(transform_name: str, parallel: bool = True, workers: Optional[int] = None, **kwargs) -> BatchImageHandler
Apply any ImageHandler method to all images.
Parameters:
transform_name: Name of the ImageHandler methodparallel: Use parallel processing (default:True)workers: Number of worker threads (optional)**kwargs: Arguments to pass to the method
Returns: self (supports chaining)
Example:
batch.apply_transform('resize_aspect', width=800)Apply multiple transformations in sequence.
Parameters:
transforms: List of (method_name, kwargs) tuplesparallel: Use parallel processing (default:True)
Returns: self (supports chaining)
Example:
transforms = [
('resize_aspect', {'width': 800}),
('adjust', {'brightness': 1.2})
]
batch.chain_transforms(transforms)resize(width: Optional[int] = None, height: Optional[int] = None, parallel: bool = True, **kwargs) -> BatchImageHandler
Resize all images.
Parameters:
width: Target width (optional)height: Target height (optional)parallel: Use parallel processing (default:True)**kwargs: Additional arguments for resize_aspect
Returns: self (supports chaining)
Example:
batch.resize(width=800)Adjust brightness/contrast for all images.
Parameters:
brightness: Brightness factor (default:1.0)contrast: Contrast factor (default:1.0)parallel: Use parallel processing (default:True)
Returns: self (supports chaining)
Example:
batch.adjust(brightness=1.2, contrast=1.1)Convert all images to grayscale.
Parameters:
keep_2d: Keep single channel mode (default:False)parallel: Use parallel processing (default:True)
Returns: self (supports chaining)
Example:
batch.to_grayscale(keep_2d=False)save(output_dir: Union[str, Path], prefix: str = "", suffix: str = "", quality: int = 95, overwrite: bool = False) -> BatchImageHandler
Save all processed images.
Parameters:
output_dir: Output directory pathprefix: Prefix to add to filenames (default:"")suffix: Suffix to add to filenames (before extension) (default:"")quality: JPEG quality (1-100) (default:95)overwrite: If True, overwrites existing files (default:False)
Returns: self (supports chaining)
Example:
batch.save("output/", prefix="processed_", suffix="_800w")Aggregate statistics across all images.
Returns: dict with aggregated statistics
Example:
stats = batch.get_batch_stats()Detect outlier images based on statistics.
Parameters:
metric: Metric to use ('size', 'width', 'height', 'aspect_ratio') (default:'size')threshold: Number of standard deviations (default:2.0)
Returns: List[Path] - List of paths to outlier images
Example:
outliers = batch.detect_outliers(metric='size', threshold=2.5)verify_uniformity(check_size: bool = True, check_format: bool = True, check_mode: bool = True, check_aspect_ratio: bool = False, aspect_tolerance: float = 0.01) -> Dict[str, Any]
Verify uniformity of images in the batch.
Parameters:
check_size: Check if all images have the same dimensions (default:True)check_format: Check if all images have the same format (default:True)check_mode: Check if all images have the same color mode (default:True)check_aspect_ratio: Check if all images have similar aspect ratios (default:False)aspect_tolerance: Tolerance for aspect ratio comparison (default:0.01)
Returns: dict containing uniformity report
Example:
report = batch.verify_uniformity(check_size=True, check_format=True)Plot distribution of image properties.
Parameters:
save_path: If provided, saves plot to this path instead of showing (optional)
Example:
batch.visualize_distribution(save_path="distribution.png")Convert batch to ML dataset format.
Parameters:
format: Output format ('torch', 'numpy', or 'list') (default:'torch')normalized: Scale to [0.0, 1.0] (default:True)channels_first: Return (C, H, W) instead of (H, W, C) (default:True)
Returns: Stacked tensor/array or list depending on format
Example:
dataset = batch.to_dataset(format='torch', normalized=True)split_dataset(train_ratio: float = 0.8, val_ratio: float = 0.1, test_ratio: float = 0.1, shuffle: bool = True, random_seed: Optional[int] = None) -> Dict[str, BatchImageHandler]
Split images into train/validation/test sets.
Parameters:
train_ratio: Proportion of data for training (0.0-1.0) (default:0.8)val_ratio: Proportion of data for validation (0.0-1.0) (default:0.1)test_ratio: Proportion of data for testing (0.0-1.0) (default:0.1)shuffle: Whether to shuffle before splitting (default:True)random_seed: Random seed for reproducibility (optional)
Returns: dict with keys 'train', 'val', 'test' containing BatchImageHandler instances
Raises:
ValueError: If ratios don't sum to 1.0 or are invalid
Example:
splits = batch.split_dataset(train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)create_grid(rows: int, cols: int, cell_size: Tuple[int, int] = (200, 200), padding: int = 5, background_color: Tuple[int, int, int] = (255, 255, 255)) -> Image.Image
Create a grid visualization of images.
Parameters:
rows: Number of rowscols: Number of columnscell_size: (width, height) of each cell (default:(200, 200))padding: Padding between cells in pixels (default:5)background_color: RGB color for background (default:(255, 255, 255))
Returns: Image.Image - Single PIL Image containing the grid
Example:
grid = batch.create_grid(4, 4, cell_size=(200, 200))Free memory for all loaded images.
Returns: self (supports chaining)
Example:
batch.unload()process_in_chunks(chunk_size: int, func: Callable[[ImageHandler], ImageHandler], parallel: bool = True) -> BatchImageHandler
Process large batches in smaller chunks to manage memory.
Parameters:
chunk_size: Number of images per chunkfunc: Processing function (accepts and returns ImageHandler)parallel: Use parallel processing within chunks (default:True)
Returns: self (supports chaining)
Example:
def transform(img):
img.resize_aspect(width=800)
return img
batch.process_in_chunks(100, transform)Set custom progress callback.
Parameters:
callback: Function(current, total) called during processing
Returns: self (supports chaining)
Example:
def my_progress(current, total):
print(f"Progress: {current}/{total}")
batch.set_progress_callback(my_progress)Get list of errors encountered during processing.
Returns: List[Tuple[Path, Exception]] - List of (path, exception) tuples
Example:
errors = batch.get_errors()Clear error log.
Returns: self (supports chaining)
Example:
batch.clear_errors()Create an independent copy of this BatchImageHandler.
Parameters:
deep: If True, deep copy all loaded image pixel data (default:True)
Returns: BatchImageHandler - New BatchImageHandler instance with copied state
Example:
copy1 = original.copy(deep=True)Returns the number of images in the batch.
Returns: int - Number of images
Example:
print(len(batch))Access images by index or slice.
Parameters:
key: Integer index or slice object
Returns: For integer: ImageHandler; For slice: BatchImageHandler
Raises:
IndexError: If index is out of rangeTypeError: If key is not int or slice
Example:
img = batch[0]
first_10 = batch[0:10]Iterate over images in the batch.
Yields: ImageHandler instances for each image
Example:
for img in batch:
print(img.path)String representation.
Returns: str - String representation
Example:
print(batch)Context manager support.
Returns: self
Example:
with BatchImageHandler.from_directory("photos/") as batch:
batch.resize(width=800)Context manager cleanup.
Example:
with batch:
batch.resize(width=800)- BatchImageHandler Usage Guide - Learn patterns, workflows, and best practices
- ImageHandler API Reference - Single image operations reference
- Module Documentation - Complete module reference
ImageHandler- Single image processing interfaceParallelExecutor- Parallel execution engineImageContext- Core image context object