We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.
It would be great to run DeepSeek V4 with llama.cpp!
It would be great to run DeepSeek V4 with llama.cpp!