I'd like to propose a new CsvFilter class to filter rows based on a comparison of specific column values. Currently, we don't have a feature for this kind of filtering.
This function should handle filtering for numerical and datetime values.
The function should keep rows where the specified conditions are met and discard the rest. It would be great if we could apply multiple conditions simultaneously.
For datetime values, which are typically passed as strings in a CSV, it would be beneficial to allow users to specify a format string to handle various formats. While we could use dateutil.parser.parse to automatically parse dates, using a more explicit method like datetime.strptime would make processing more robust.
I've outlined a potential configuration structure below. The names of the fields are just examples; feel free to suggest more appropriate names if you have better ideas.
Required arguments are "src_dir", "src_pattern", and at least one of "gt", "ge", "lt", "le", "eq", or "ne"
arguments:
src_dir: "/path/to/dir"
src_pattern: "pattern.csv"
operator: and # only can specify "and" or "or", default is "and"
date_fmt: "%Y-%m-%d %H:%M:%S" # specify datetime.strptime format
gt: # greater than
column_a: 1
column_b: "2025-09-01 00:00:00"
ge: # greater than or equal to
column_c: 3
lt: # less than
column_a: 5 # can target the same column in multiple conditions
le: # less than or equal to
column_d: "2025-12-31 23:59:59"
eq: # equals to
column_e: 100
ne: # not equals to
column_f: -273.15
I'd like to propose a new
CsvFilterclass to filter rows based on a comparison of specific column values. Currently, we don't have a feature for this kind of filtering.This function should handle filtering for numerical and datetime values.
The function should keep rows where the specified conditions are met and discard the rest. It would be great if we could apply multiple conditions simultaneously.
For datetime values, which are typically passed as strings in a CSV, it would be beneficial to allow users to specify a format string to handle various formats. While we could use
dateutil.parser.parseto automatically parse dates, using a more explicit method likedatetime.strptimewould make processing more robust.I've outlined a potential configuration structure below. The names of the fields are just examples; feel free to suggest more appropriate names if you have better ideas.
Required arguments are "src_dir", "src_pattern", and at least one of "gt", "ge", "lt", "le", "eq", or "ne"