Just to clarify the workflow I followed:
I first ran infer-schema on the full archive:
Input:
D:\MP_cad\MPBhulekh_MP_Survey_Cadastrals.geojsonl.7z.001
Command:
uvx iomaps cli infer-schema ^
-i D:\MP_cad\MPBhulekh_MP_Survey_Cadastrals.geojsonl.7z.001 ^
-o D:\MP_cad\mp_full.schema.json ^
-g Polygon
This command completed successfully after a full scan of the archive
(~26 hours) and generated mp_full.schema.json.
I then used the same schema file (mp_full.schema.json) for filtering
the same input archive:
uvx iomaps cli filter-7z ^
-i D:\MP_cad\MPBhulekh_MP_Survey_Cadastrals.geojsonl.7z.001 ^
-o C:\MP_work\MP_full.gpkg ^
-b "73.3,18.8,84.5,28.9" ^
-s D:\MP_cad\mp_full.schema.json ^
-g Polygon ^
--no-clip
However, this fails with:
ValueError: Record does not match collection schema
So the schema was inferred by scanning the same input file first,
but while writing features during filter-7z, some records still do not
match the inferred schema (extra / missing properties).
This suggests that even a full-archive schema inference is not sufficient
to guarantee schema consistency during streaming writes.
From @kushkamal84-eng at ramSeraph/indian_cadastrals#8 (reply in thread)