Skip to content

[RFE] New corpus schema #1

Description

@ingvagabund
  • Store artefacts as tarballs. Compressed json files are a lot smaller then uncompressed ones
  • Consolidate artefacts generated by the same extractor. E.g. symbols extractor extracts three pieces of data: api, static allocations and contracts. They should be stored under the same tarball.
  • Store relevant artefacts per package, not per project where it is sane. E.g. API of some projects can be quite huge. They should be stored by a project package so one does not have to read entire json to get a subset of data contained inside.

Some available artefacts:

  • golang-distribution-snapshot
  • golang-ipprefix-to-rpm
  • golang-project-content-metadata
  • golang-project-distribution-exported-api
  • golang-project-distribution-packages
  • golang-project-exported-api
  • golang-project-packages
  • golang-project-repository-commit
  • golang-project-repository-info

The artefacts can be broken down into the following categories:

  • project meta data:
    • golang-project-content-metadata
    • golang-project-repository-commit
    • golang-project-repository-info
  • snapshots:
    • golang-distribution-snapshot
    • golang-upstream-snapshot (TBD)
  • project data:
    • golang-project-distribution-exported-api
    • golang-project-distribution-packages
    • golang-project-exported-api
    • golang-project-packages
  • distribution:
    • golang-ipprefix-to-rpm

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions