Skip to content

odc compare to support a tolerance threshold? #23

@matthewrmshin

Description

@matthewrmshin

Is your feature request related to a problem? Please describe.

In a recent usage of odc compare to compare a reference file to a test output, I got file differ exception when two files differs by a tiny value in a column. While the exception is correct, the difference between the values is so small that the error message does not actually tell what it is:

000 2024-08-28 11:55:03 (I) Comparator::compare: (1) ref.odb to (2) out.odb
000 2024-08-28 11:55:08 (E) Exception: Values different in column obsbias: 0.00219108 is not equal 0.00219108
000 2024-08-28 11:55:08 (E)  
000 2024-08-28 11:55:08 (I) While comparing rows number 3953389, columns 36 found different.
000 2024-08-28 11:55:08 (I)  Values different in column obsbias: 0.00219108 is not equal 0.00219108

000 2024-08-28 11:55:08 (I)  data1[36] = 2.191084e-03
000 2024-08-28 11:55:08 (I)  data2[36] = 2.191084e-03
000 2024-08-28 11:55:08 (I)  md1[36] = name: obsbias, type: REAL, codec: short_real2, range=<-1003.738464,5.535852>, hasMissing=false
000 2024-08-28 11:55:08 (I)  md2[36] = name: obsbias, type: REAL, codec: short_real2, range=<-1003.738464,5.535852>, hasMissing=false
000 2024-08-28 11:55:08 (E) Exception: Files differ.  
000 2024-08-28 11:55:08 (I) Comparing files ref.odb and out.odb: 4 seconds elapsed, 4 seconds cpu
000 2024-08-28 11:55:08 (E) ** Files differ.  Caught in  (/home/matt/opt/eckit/src/eckit/runtime/Tool.cc +32 start)
000 2024-08-28 11:55:08 (E) ** Exception terminates odc

Using odc sql to print out the columns in the files, I got this diff:

--- ref.odb.sql	2024-08-28 13:43:19.000000000 +0000
+++ out.odb.sql	2024-08-28 13:47:35.000000000 +0000
@@ -3953387,7 +3953387,7 @@
 -0.21317523717880249
 -0.44805592298507690
 0.08006642758846283
-0.00219108420424163
+0.00219108397141099
 0.04074956104159355
 0.09717535227537155
 0.29601469635963440

Describe the solution you'd like

It would be desirable for odc compare to support a tolerance threshold that can be specified by the user, so negligible differences can be ignored and for odc compare to return a success (0). For example, this will be useful for comparing reference outputs and test outputs in CI tests on alternate platforms.

Describe alternatives you've considered

Otherwise, odc compare should print a better error message that prints the full significant figures.

Additional context

No response

Organisation

Met Office

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions