Skip to content

关于run_unit_test.sh测试结果咨询 #12

@PXHnb666

Description

@PXHnb666

run_unit_test.sh
您好,我现在修改了run_unit_test.sh,for gqa in 4; do,--kv-heads 8 \后在A100平台下进行了测试for seqlen in 2048 8192 16384 32768 65536 131072; do
我实际测试如下,我有两个想请教您的问题
1.我在8k下测试结果表明FSA比NSA在Forward,Backward,1F1B Total 下都比NSA慢,2k也慢于NSA,是我测试文件的问题吗?
2.FSA论文图2里提到内存访问量FSA优势明显,但在显存占用上是不是不占优势?我测试的结果表明FSA在Memory Usage上都逊于NSA。

刚了解注意力机制方面,上述提问有不专业的地方还请谅解。
Configuration: seqlen=2048, block-size=64, topk=16, gqa=4
ℹ️ 📊 Performance Breakdown (ms):
┌─────────────┬──────────┬──────────┬─────────────┐
│ Phase │ NSA │ FSA │ Speedup │
├─────────────┼──────────┼──────────┼─────────────┤
│ Forward │ 9.36 │ 21.47 │ 0.44x │
│ Backward │ 10.17 │ 33.62 │ 0.30x │
│ 1F1B Total │ 19.54 │ 55.09 │ 0.35x │
└─────────────┴──────────┴──────────┴─────────────┘

ℹ️ 💾 Memory Usage Analysis:
ℹ️ Forward Memory Usage: NSA=0.38GB, FSA=0.39GB (FSA uses: 0.00GB more memory)
ℹ️ 1F1B Memory Usage: NSA=0.45GB, FSA=0.45GB (FSA uses: 0.00GB more memory)

Configuration: seqlen=8192, block-size=64, topk=16, gqa=4
📊 Performance Breakdown (ms):
┌─────────────┬──────────┬──────────┬─────────────┐
│ Phase │ NSA │ FSA │ Speedup │
├─────────────┼──────────┼──────────┼─────────────┤
│ Forward │ 22.97 │ 26.06 │ 0.88x │
│ Backward │ 34.79 │ 50.45 │ 0.69x │
│ 1F1B Total │ 57.76 │ 76.51 │ 0.75x │
└─────────────┴──────────┴──────────┴─────────────┘

ℹ️ 💾 Memory Usage Analysis:
ℹ️ Forward Memory Usage: NSA=1.11GB, FSA=1.11GB (FSA uses: 0.00GB more memory)
ℹ️ 1F1B Memory Usage: NSA=1.17GB, FSA=1.18GB (FSA uses: 0.00GB more memory)

Configuration: seqlen=16384, block-size=64, topk=16, gqa=4
ℹ️ 📊 Performance Breakdown (ms):
┌─────────────┬──────────┬──────────┬─────────────┐
│ Phase │ NSA │ FSA │ Speedup │
├─────────────┼──────────┼──────────┼─────────────┤
│ Forward │ 46.98 │ 40.42 │ 1.16x │
│ Backward │ 73.51 │ 68.89 │ 1.07x │
│ 1F1B Total │ 120.49 │ 109.30 │ 1.10x │
└─────────────┴──────────┴──────────┴─────────────┘

ℹ️ 💾 Memory Usage Analysis:
ℹ️ Forward Memory Usage: NSA=2.09GB, FSA=2.11GB (FSA uses: 0.02GB more memory)
ℹ️ 1F1B Memory Usage: NSA=2.14GB, FSA=2.15GB (FSA uses: 0.01GB more memory)

Configuration: seqlen=32768, block-size=64, topk=16, gqa=4
ℹ️ 📊 Performance Breakdown (ms):
┌─────────────┬──────────┬──────────┬─────────────┐
│ Phase │ NSA │ FSA │ Speedup │
├─────────────┼──────────┼──────────┼─────────────┤
│ Forward │ 99.58 │ 79.07 │ 1.26x │
│ Backward │ 162.84 │ 137.36 │ 1.19x │
│ 1F1B Total │ 262.41 │ 216.43 │ 1.21x │
└─────────────┴──────────┴──────────┴─────────────┘

ℹ️ 💾 Memory Usage Analysis:
ℹ️ Forward Memory Usage: NSA=4.08GB, FSA=4.37GB (FSA uses: 0.29GB more memory)
ℹ️ 1F1B Memory Usage: NSA=4.07GB, FSA=4.09GB (FSA uses: 0.02GB more memory)

Configuration: seqlen=65536, block-size=64, topk=16, gqa=4
ℹ️ 📊 Performance Breakdown (ms):
┌─────────────┬──────────┬──────────┬─────────────┐
│ Phase │ NSA │ FSA │ Speedup │
├─────────────┼──────────┼──────────┼─────────────┤
│ Forward │ 239.90 │ 199.96 │ 1.20x │
│ Backward │ 375.04 │ 294.62 │ 1.27x │
│ 1F1B Total │ 614.94 │ 494.59 │ 1.24x │
└─────────────┴──────────┴──────────┴─────────────┘

ℹ️ 💾 Memory Usage Analysis:
ℹ️ Forward Memory Usage: NSA=8.06GB, FSA=9.72GB (FSA uses: 1.67GB more memory)
ℹ️ 1F1B Memory Usage: NSA=7.94GB, FSA=7.97GB (FSA uses: 0.03GB more memory)

Configuration: seqlen=131072, block-size=64, topk=16, gqa=4
ℹ️ 📊 Performance Breakdown (ms):
┌─────────────┬──────────┬──────────┬─────────────┐
│ Phase │ NSA │ FSA │ Speedup │
├─────────────┼──────────┼──────────┼─────────────┤
│ Forward │ 639.56 │ 590.68 │ 1.08x │
│ Backward │ 891.85 │ 711.37 │ 1.25x │
│ 1F1B Total │ 1531.41 │ 1302.04 │ 1.18x │
└─────────────┴──────────┴──────────┴─────────────┘

ℹ️ 💾 Memory Usage Analysis:
ℹ️ Forward Memory Usage: NSA=16.01GB, FSA=23.50GB (FSA uses: 7.48GB more memory)
ℹ️ 1F1B Memory Usage: NSA=15.68GB, FSA=17.64GB (FSA uses: 1.97GB more memory)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions