@
techhe 不正常吧,我 32g 内存,跑 Qwen3.6-35B-A3B-4bit 速度还不错。
oMLX - LLM inference, optimized for your Mac
Benchmark Model: Qwen3.6-35B-A3B-4bit
Engine: Auto
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1265.0 15.99 809.5 tok/s 63.0 tok/s 3.296 349.5 tok/s 19.20 GB
pp4096/tg128 6128.9 16.64 668.3 tok/s 60.6 tok/s 8.242 512.5 tok/s 19.89 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 63.0 tok/s 1.00x 809.5 tok/s 809.5 tok/s 1265.0 3.296
2x 119.1 tok/s 1.89x 461.2 tok/s 230.6 tok/s 2847.9 6.591
4x 235.4 tok/s 3.74x 370.4 tok/s 92.6 tok/s 6131.4 13.234