Ask HN: Can tokenizers go from fixed length tokens to varying length?

3 points by jinen83 - 276 Days, 7 Hours ago Hacker News

I am going through a workshop on building LLM grounds up. While I am studying tokenizers like BPE - I was curious why not use ideas from encoding techniques like huffman and make better optimized encoders. So a token like '.' also has same sie token as the word 'Algorithm'. Their frequency of occurence and their size could save us some GPUs?

Loading...

Loading...

Loading...