Tencent Stem Cuts LLM Latency 3.6x
Tencent's new Stem sparse attention algorithm reduces first-token latency by 3.6x while maintaining near-dense accuracy …
1 articles about 'Stem Algorithm'
Tencent's new Stem sparse attention algorithm reduces first-token latency by 3.6x while maintaining near-dense accuracy …