Skip to content

Difficulty Matching Code Results with the Leaderboard #118

@ZEXUANW

Description

@ZEXUANW

I find that my coding results capture more informative (metric) than the leaderboard metric, and I am thinking about how to reconcile them.

Could you please advise on how to match the results from the code to the leaderboard? It seems that each metric has three versions in the code, whereas only one version is reported on the leaderboard.

  • dataset_id: op
    date_created: 17-12-2025
    file_size: 23048
    method_id: pearson_corr
    metric_ids:
    • sem_grn
    • sem_hvg
    • sem
      metric_values:
    • '0.0987688622048502'
    • '0.07334421631793887'
    • '0.08605653926139453'
  • dataset_id: op
    date_created: 17-12-2025
    file_size: 23048
    method_id: pearson_corr
    metric_ids:
    • tfb_precision
    • tfb_recall
    • tfb_f1
      metric_values:
    • '0.04776565481165778'
    • '0.02756733797997705'
    • '0.03495870537547806'
  • dataset_id: op
    date_created: 17-12-2025
    file_size: 23048
    method_id: pearson_corr
    metric_ids:
    • gs_precision
    • gs_recall
    • gs_f1
    • gs_n_active
      metric_values:
    • '0.70174168789988'
    • '0.3279813455378404'
    • '0.40692143797401853'
    • '348.6666666666667'
  • dataset_id: op
    date_created: 17-12-2025
    file_size: 23048
    method_id: pearson_corr
    metric_ids:
    • r2_raw
    • r_precision
    • r_recall
    • r_f1
      metric_values:
    • '0.6934894982541678'
    • '0.49783144316145583'
    • '0.273573899861934'
    • '0.35310538256265267'
  • dataset_id: op
    date_created: 17-12-2025
    file_size: 23048
    method_id: pearson_corr
    metric_ids:
    • vc_grn
    • vc_hvg
    • vc
      metric_values:
    • '0.6184723973274231'
    • '0.6489825248718262'
    • '0.6337274610996246'
  • dataset_id: op
    date_created: 17-12-2025
    file_size: 23048
    method_id: pearson_corr
    metric_ids:
    • rc_tf_act
      metric_values:
    • '0.35853068099355406'

In addition, we replicated the Pearson correlation–based evaluation. However, although TF binding precision, recall, and F1 score are intended to reflect transcription factor binding relevance, none of the reproduced values match the leaderboard result (reported as 0.09). Specifically, we obtained TF binding precision = 0.0478, recall = 0.0276, and F1 score = 0.0350.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions