Skip to content

ADH-8300: Fix ORC timestamp migration schema mapping#23

Open
iamlapa wants to merge 1 commit into
develop/4.3.0/1.10.1.1from
bugfix/ADH-8300
Open

ADH-8300: Fix ORC timestamp migration schema mapping#23
iamlapa wants to merge 1 commit into
develop/4.3.0/1.10.1.1from
bugfix/ADH-8300

Conversation

@iamlapa

@iamlapa iamlapa commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Fixes metadata-only migration of ORC/Hive tables with TIMESTAMP columns.
For ORC-backed Spark V1 source tables, BaseTableCreationSparkAction now converts Spark TimestampType to TimestampNTZType only for the staged Iceberg schema. This matches the physical ORC TIMESTAMP type and prevents reads from failing with:
Can not promote TIMESTAMP type to TIMESTAMP
The change is scoped to ORC-backed sources detected via CatalogTable.provider() or storage().serde(). It does not change global Spark type conversion, SQL syntax, migration parameters, name mapping, partition transforms, or file import behavior.

Behavior

Existing ORC/Hive TIMESTAMP columns migrated by snapshot/migrate are represented as Iceberg timestamp without zone / Spark TIMESTAMP_NTZ.
write.format.default=parquet still only affects new writes. Existing imported ORC files are not rewritten.

Convert Spark TimestampType to TimestampNTZType when staging Iceberg tables for ORC-backed Spark V1 sources. This preserves ORC/Hive TIMESTAMP semantics during snapshot/migrate and avoids ORC projection failures caused by mapping existing ORC TIMESTAMP columns to Iceberg timestamp-with-zone. Add Spark 3.5/4.0 regressions for ORC TIMESTAMP migration through both SparkActions and system.migrate, plus an ORC projection guard test.
@giggsoff

Copy link
Copy Markdown
Collaborator

Please do not forget to validate over spark v4.2 with #27 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants