Skip to content

Feature Request: Add a CsvFilter class #517

@skxeve

Description

@skxeve

I'd like to propose a new CsvFilter class to filter rows based on a comparison of specific column values. Currently, we don't have a feature for this kind of filtering.

This function should handle filtering for numerical and datetime values.
The function should keep rows where the specified conditions are met and discard the rest. It would be great if we could apply multiple conditions simultaneously.
For datetime values, which are typically passed as strings in a CSV, it would be beneficial to allow users to specify a format string to handle various formats. While we could use dateutil.parser.parse to automatically parse dates, using a more explicit method like datetime.strptime would make processing more robust.

I've outlined a potential configuration structure below. The names of the fields are just examples; feel free to suggest more appropriate names if you have better ideas.
Required arguments are "src_dir", "src_pattern", and at least one of "gt", "ge", "lt", "le", "eq", or "ne"

arguments:
  src_dir: "/path/to/dir"
  src_pattern: "pattern.csv"
  operator: and # only can specify "and" or "or", default is "and"
  date_fmt: "%Y-%m-%d %H:%M:%S" # specify datetime.strptime format
  gt: # greater than
    column_a: 1
    column_b: "2025-09-01 00:00:00"
  ge: # greater than or equal to
    column_c: 3
  lt: # less than
    column_a: 5 # can target the same column in multiple conditions
  le: # less than or equal to
    column_d: "2025-12-31 23:59:59"
  eq: # equals to
   column_e: 100
  ne: # not equals to
    column_f: -273.15

Metadata

Metadata

Assignees

No one assigned

    Labels

    difficulty: easyRequires ability to implement a single module or few small modules, and make unit tests for it.featA new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions