Python, Rust, Zig
Tech Blog
Short Stories Writer
Sr Data Engineer @ Singaporean Bank
Twitter Handle
Why Python is Slow ?
Use Zig/C to bypass cpython restrictions
Python
C
Let's understand with few examples
add.c
add.py
result
concat.c
concat_c.py
Result
prime.c
prime.py
Let's build the fastest LLM tokenizer library using ctypes
[3957, 735, 37432, 18, 555, 2394, 1003, 519, 841, 301, 11007, 949]
Ref: https://ziglang.org
Note: size param in token_ranker is length of name string
Perfomance
File | Size | No: of Words |
---|---|---|
Naatu Naatu Song Lyrics | 1.5KB | 202 |
Pycon Code of Conduct | 5.2KB | 757 |
Oxford Dictionary | 4.5MB | 731K |
Note: Zig uses PCRE2 for Regex operations
Note: All implementations are run on single thread
Rust implementation is Tiktoken library
Zig uses FancyRegex for Regex operations