abhikul0 2 days agoRan llama-bench on my M3 Pro with `--n-depth 0,8192,16384 --n-prompt 2048 --n-gen 256 --batch-size 2048 -ub 2048`: | model | size | params | backend | threads | n_ubatch | test | t/s | | ------------------------------- | ---------: | ---------: | ---------- | ------: | -------: | --------------: | -------------------: | | qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | pp2048 | 512.97 ± 0.33 | | qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | tg256 | 25.92 ± 0.23 | | qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | pp2048 @ d8192 | 397.20 ± 2.32 | | qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | tg256 @ d8192 | 22.56 ± 0.36 | | qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | pp2048 @ d16384 | 313.67 ± 0.63 | | qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | tg256 @ d16384 | 20.45 ± 0.04 | I sure do want that silicon now haha.
Ran llama-bench on my M3 Pro with `--n-depth 0,8192,16384 --n-prompt 2048 --n-gen 256 --batch-size 2048 -ub 2048`:
I sure do want that silicon now haha.