Skip to content

Wrong input column for exploded blocking columns when expand_length not set #145

@riley-harper

Description

@riley-harper

When working on #142, I noticed that in hlink/linking/matching/link_step_explode.py, if expand_length is not set for a blocking column, we run the following code:

explode_col_expr = explode(col(exploding_column_name))

However, the rest of the code treats exploding_column_name as the output column name and derived_from_column as the input column name. So I think there is a bug here. This should be

explode_col_expr = explode(col(derived_from_column))

instead unless I am misunderstanding something. This is probably a low-impact bug as you need to be blocking on an input column that is an array type to hit it. I believe that most exploded columns are integer columns with expand_length set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions