When generating completion responses, the generator currently uses all available tokens.
To increase the realism, it would be useful to be able to control the number of tokens that are generated via config e.g. to enable generating a range of response sizes.
TODO: determine what the configuration should look like
- specify the mean tokens to use as a percentage of the
max_tokens specified in the request (with a config for the default if not present in the request) and an associated std dev?
- should this be a single configuration option? Per model? Per deployment?
(from stuartleeks/aoai-simulated-api#35)
When generating completion responses, the generator currently uses all available tokens.
To increase the realism, it would be useful to be able to control the number of tokens that are generated via config e.g. to enable generating a range of response sizes.
TODO: determine what the configuration should look like
max_tokensspecified in the request (with a config for the default if not present in the request) and an associated std dev?(from stuartleeks/aoai-simulated-api#35)