Would be quite interested in how to construct the DeepSeek-R1 model with Burn, and how it performs.
Would be quite interested in how to construct the DeepSeek-R1 model with Burn, and how it performs.