Skip to content

_delta_log folder to be considered as partition when checkpoint parquet files present by hudi reader without metadata table #813

@parisni

Description

@parisni

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

turns out when _delta_log has checkpoint parquet files such 00000000000000000150.checkpoint.parquet, then hudi fs backed metadata consider them as hudi partition and fails hard.

In hudi the metadata table is not the source of truce and it sometimes gets corrupted. As a result one can rebuild but having delta logs folder leads corrupt the hudi, unable to read/write.

I have mitigated the problem in our hudi fork by skipping both metadata/_delta_log folders, but i can't see any pure xtable mitigation so far, except take care of purging both _delta_log and metadata folder when recreating the MDT

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions