Skip to content

Request of enhancements of Influenza w/ Entrez data #922

@joverlee521

Description

@joverlee521

Hello GenSpectrum folks, I've been looking into using INSDC data for Nextstrain's seasonal-flu with your influenza data. It's super helpful to pull directly from your API and get the records with linked segments!

Nextstrain analyses will need a little more metadata than what's currently available in GenSpectrum, so I'm working on merging in the data from Entrez. I'm hoping this data can be added upstream during GenSpectrum's ingest:

  1. Include the incomplete collection date instead of the current default to the first of the month/first of the year. See Influenza: incomplete collection dates default to the first day of the month/year #930
  2. Include more strain names. We are pulling the strain field from Entrez to supplement this since it's not available via NCBI Datasets.
  3. Include passage history. This is also not standardized in NCBI Datasets. I've seen it scattered across different fields from Entrez (isolation_source, lab_host, note). Passage history would also be helpful for ensuring that segments with different passage histories are not linked to the same record.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions