Run zero-shot baseline on selected models and question set

Use the implemented evaluation pipeline to run zero-shot prompts on the selected model set using the prepared question set.

This issue is about execution, not infrastructure design.

The run should specify:

* which models are included
* which question set version is used
* which prompt template is used
* where outputs are stored

The goal is to produce the first baseline dataset for later scoring and analysis.

### Acceptance criteria

* The selected models for this run are listed.
* The question set version used for the run is identified.
* Zero-shot prompts are executed on all selected models or a clearly stated subset.
* Outputs are stored in the agreed structured format.
* Any failed requests are preserved in the logs rather than silently dropped.
* A short result summary is added to the issue or repository notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run zero-shot baseline on selected models and question set #48

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run zero-shot baseline on selected models and question set #48

Description

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions