Skip to content

Run zero-shot baseline on selected models and question set #48

@eemilkos

Description

@eemilkos

Use the implemented evaluation pipeline to run zero-shot prompts on the selected model set using the prepared question set.

This issue is about execution, not infrastructure design.

The run should specify:

  • which models are included
  • which question set version is used
  • which prompt template is used
  • where outputs are stored

The goal is to produce the first baseline dataset for later scoring and analysis.

Acceptance criteria

  • The selected models for this run are listed.
  • The question set version used for the run is identified.
  • Zero-shot prompts are executed on all selected models or a clearly stated subset.
  • Outputs are stored in the agreed structured format.
  • Any failed requests are preserved in the logs rather than silently dropped.
  • A short result summary is added to the issue or repository notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Master Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions