Skip to content

Development Roadmap(H4) #259

@shijieliu

Description

@shijieliu

Here is the development roadmap for 2025 H4(2025 Dec to 2026 Feb/March).

Focus

  • Enhance DynamicEmb with a no-eviction table and full inference support, allowing users to train with DynamicEmb and deploy seamlessly in Python or C++.
  • Strengthen inference capabilities, particularly for the C++ runtime, by adding support for DynamicEmb and HSTU inference examples using the C++ pipeline.
  • Enable context parallelism for HSTU training to support ultra-long sequences and adress the imbalance issue for hstu training, laying the groundwork for the ultra-large model benchmark.
  • Introduce and optimize the first Semantic ID-based model, including improvements to beam search for better evaluation/inference efficiency.

Roadmap

  Dec Release Jan Release Feb/March Release Long-Term
Dynamicemb
  • Replace hkv with scored hash table [FEA] Replacement of HKV with ScoredHashTable #252
  • Deterministic mode [FEA] support deterministic mode for dynamicemb #238
  • Dynamic resizing + no-eviction table [FEA] Dynamic resizing of embedded tables #243
  • Python/CPP inference support
  • Incremental dump checkpointing [FEA] New checkpoint format of dynamicemb #258
  • Fuse multiple tables with same dimension
  • HSTU attention
  • Use FBGEMM hstu attn(including blackwell and fp8 support) [FEA] Use hstu kernel in FBGEMM #60
  • HSTU example training
  • Dynamic batching [FEA] Balanced jagged compute&memory among DP ranks #207
  • Context parallelism [FEA] Support HSTU Context parallelism in training #7
  • HSTU example inference
  • NVIDIA Triton Server HSTU model support [FEA] HSTU inference TritonServer support #161
  • kvcache manager optimization [FEA] HSTU KVCache version 2 #237
  • CPP runtime support for hstu inference [FEA] HSTU Inference in Torch Cpp Runtime #221
  • GEMM + Silu fusion
  • Multi-stream KVCache manager support
  • Semantic ID example training & inference
  • decoder-based semantic model training [FEA] [SID] Add decorder-only sequential gr example  #240
  • Support valania attention + custom mask + jagged
  • beam search optimization
  • Metadata

    Metadata

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions