Define standard benchmark tasks for evaluating model performance.
Define standard benchmark tasks for evaluating model performance.