This task challenges an LLM to implement a Fused SwiGLU Activation Layer in PyTorch, a critical component in modern Transformer architectures (like Llama).
-
Model Tested:
claude-3-haiku-20240307 - Sample Size: 100 iterations
- Pass Rate: about 25%
-
Key Challenge: The model must not only implement the mathematical formula
$SwiGLU(x) = SiLU(xW + b) \otimes (xV + c)$ but also fulfill the engineering constraint of weight fusion (using a singlenn.Linearfor both$W$ and$V$ ).
- Set
ANTHROPIC_API_KEY. - Run
uv run main.py.