Show HN: 1gbps Tokenizer written in Assembly. 20x faster than HuggingFacegithub.com3 points by dogmaticdev a day ago
dogmaticdev a day agoI wrote this tokenizer using SSE2 SIMD Instructions. It takes text, removes white-space, and separates strings using a null terminator.I didn't bother making it multi thread, since it is already very fast. Maybe I will one day.stats: 10448 bytes in 11302 nano seconds 10448 ÷ 0.000011302 = 923620933.5 bytes, or 923mb/s31346 bytes in 32241 nano seconds 31346 ÷ 0.000032241 = 972240315.1 bytes, or 972mb/sAs you can see, it approaches 1 byte per nano second as more text is parsed.
aetherspawn 14 hours agoYou should have a go writing it with SSE intrinsics. You might find that letting the compilers optimiser have a crack at it will make it even faster. Or at least it will be easier to call.
I wrote this tokenizer using SSE2 SIMD Instructions. It takes text, removes white-space, and separates strings using a null terminator.
I didn't bother making it multi thread, since it is already very fast. Maybe I will one day.
stats: 10448 bytes in 11302 nano seconds 10448 ÷ 0.000011302 = 923620933.5 bytes, or 923mb/s
31346 bytes in 32241 nano seconds 31346 ÷ 0.000032241 = 972240315.1 bytes, or 972mb/s
As you can see, it approaches 1 byte per nano second as more text is parsed.
You should have a go writing it with SSE intrinsics. You might find that letting the compilers optimiser have a crack at it will make it even faster. Or at least it will be easier to call.