Some starch files contain an extra entry for chromosome null
Example:
$ unstarch --list /gpfs/data/cegs/mapped//FCHWMG7BGXK/capture/Bl6_Cast_Igf2_Igf2_inverteddEnhdCTCFinICR_G12-BS15522A/Bl6_Cast_Igf2_Igf2_inverteddEnhdCTCFinICR_G12-BS15522A.cegsvectors_LP300.reads.starch
chr |filename |compressedSize |uncompressedLineCount |uncompressedLineMaxStrLength |totalNonUniqueBases |totalUniqueBases |duplicateElementExists |nestedElementExists |signature
LPv5_backbone |LPv5_backbone.pid80866.isgnodepdc005.bz2 |195 |72 |38 |72 |41 |true |false |hALcQvsm9fo392PoqCobW1oHyZM=
null |null.pid80866.isgnodepdc005.bz2 |14 |0 |0 |0 |0 |false |false |null
This can manifest as a hash validation error for that chromsome:
$ unstarch --verify-signature /gpfs/data/cegs/mapped//FCHWMG7BGXK/capture/Bl6_Cast_Igf2_Igf2_inverteddEnhdCTCFinICR_G12-```
BS15522A/Bl6_Cast_Igf2_Igf2_inverteddEnhdCTCFinICR_G12-BS15522A.cegsvectors_LP300.reads.starch
Expected and observed data integrity signatures match for chromosome [LPv5_backbone]
ERROR: Specified chromosome record may be corrupt -- observed and expected signatures do not match for chromosome [null]
Most frequently this is for the .genotypes.starch files, but I see this for the following example files, covering a variety of mapped genomes and outputs:
./FCHWMG7BGXK/dna/Water_Control-BS15460A/Water_Control-BS15460A.mm10.genotypes.starch
./FCHWMG7BGXK/dna/Water_Control-BS15460A/hotspot2/Water_Control-BS15460A.mm10.hotspots.fdr0.005.starch
./FCHWMG7BGXK/capture/Bl6_Cast_LP300[Igf2_1_4]_D5_dLsp1_C3-BS15534A/Bl6_Cast_LP300[Igf2_1_4]_D5_dLsp1_C3-BS15534A.cegsvectors_LP300.reads.starch
./FCHWMG7BGXK/capture/Bl6_Cast_LP300[Igf2_1_4]_D5_dHIDAD_C7-BS15529A/Bl6_Cast_LP300[Igf2_1_4]_D5_dHIDAD_C7-BS15529A.cegsvectors_pSpCas9GFP.coverage.allreads.starch
./FCHWMG7BGXK/capture/Bl6_Cast_LP300[Igf2_1_4]_D5_dSyt8_H5-BS15531A/Bl6_Cast_LP300[Igf2_1_4]_D5_dSyt8_H5-BS15531A.cegsvectors_pSpCas9GFP.coverage.starch
With the exception of the hotspot file, these are all created in callsnpsMerge.sh.
The .genotypes.starch is created differently (by starch - directly instead of starchcat).
Some starch files contain an extra entry for chromosome null
Example:
$ unstarch --list /gpfs/data/cegs/mapped//FCHWMG7BGXK/capture/Bl6_Cast_Igf2_Igf2_inverted
dEnhdCTCFinICR_G12-BS15522A/Bl6_Cast_Igf2_Igf2_inverteddEnhdCTCFinICR_G12-BS15522A.cegsvectors_LP300.reads.starchThis can manifest as a hash validation error for that chromsome:
$ unstarch --verify-signature /gpfs/data/cegs/mapped//FCHWMG7BGXK/capture/Bl6_Cast_Igf2_Igf2_inverted
dEnhdCTCFinICR_G12-```BS15522A/Bl6_Cast_Igf2_Igf2_inverted
dEnhdCTCFinICR_G12-BS15522A.cegsvectors_LP300.reads.starchExpected and observed data integrity signatures match for chromosome [LPv5_backbone]
ERROR: Specified chromosome record may be corrupt -- observed and expected signatures do not match for chromosome [null]