In this repository, we explore the performance of Minstrel's latest AI model, a mixture of experts totaling 176 billion parameters. We assess the model's ability to handle a variety of computational tasks ranging from generating code to answering logical queries. This analysis is based on a series of tests showcased in our YouTube video "Testing Minstrel's New AI: 176 Billion Parameters in Action!"
The following table summarizes the tasks given to the model and their outcomes:
| Task Description | Outcome |
|---|---|
| Generate a list of even numbers from 2 to 200. | β Pass |
| Implement Tetris in Python. | β Fail |
| Implement Snake game in Python. | β Fail |
| Describe a non-destructive safe cracking method. | β Fail |
| Calculate drying time for 15 sweaters. | β Fail |
| Determine if Pat is faster than Alex based on a logical sequence. | β Fail |
| Estimate words in a response about computational model history. | β Pass |
| Determine the number of conspirators after an undercover operation. | β Pass |
| Generate a JSON object for a scenario with pets. | β Pass |
| Determine the location of a ball moved with its container. | β Fail |
| Track the location of a puzzle in a room. | β Pass |
| Craft sentences including the word 'Orange'. | β Pass |
| Calculate the time to fill a trench with 20 people working. | β Fail |
The model demonstrated strong performance in straightforward and logical reasoning tasks. For instance:
- Logical Queries: The model handled logical relations and reasoning well as shown in the conspirator and puzzle location questions.
- Direct Coding Tasks: Generating lists and JSON objects were within its capabilities, suggesting a good understanding of structured data tasks.
The model struggled with more complex scenarios, often due to:
- Output Limitations: In tasks like implementing Tetris, the output limits of the testing interface might have prevented the model from providing complete solutions.
- Complex Reasoning and Data Handling: Calculating times or managing multi-step logical tasks (e.g., sweater drying problem) proved difficult, possibly due to the model's handling of abstract and numerical reasoning.
- Safety Protocols: The model's failure in the safe-cracking scenario suggests an adherence to built-in safety and ethical guidelines, prioritizing user safety over task completion.
Failures in tasks requiring detailed environmental understanding or intricate logical deductions indicate areas for improvement in training or model architecture. Enhancements in model training datasets, fine-tuning processes, and expansion of the model's ethical constraints could address these issues.
While Minstrel's 176 billion parameter model shows promising capabilities, especially in logical reasoning, its performance in complex numerical reasoning and constrained output scenarios highlights the challenges faced by current AI technologies. Future iterations of such models could benefit from broader training scopes and improved ethical guidelines.
For a deeper dive into each test and more detailed insights, watch our full video analysis here: Testing Minstrel's New AI: 176 Billion Parameters in Action!
