Add get_encoding_name_for_model to tiktoken #136

noseworthy · 2025-02-16T16:01:40Z

The tiktoken-js library includes a very helpful function,
getEncodingNameForModel(). This function is buried in the
implementation of encoding_for_model() in the rust based
tiktoken package.

This function is very useful when implementing an encoding cache based
on the model used. In this case, having a mapping from model ->
encoding and then caching based on the encoding name conserves
resources since so many models re-use the same encoding.

I've exposed a new get_encoding_name_for_model() function
that behaves similarly to the one in the tiktoken-js package, and used
it inside of encoding_for_model().

Finally, I've also added a test to ensure that this function can be
called properly from typescript code, and that it properly throws
exceptions in the case of invalid model names.

Fixes: #123

noseworthy · 2025-02-17T16:11:24Z

Hey, @dqbd and @jens-f 👋

I think this should fix #123. It's a feature we've been looking for as well, so I figured I'd take a crack at it.

I'm not a rust developer, so please excuse any obvious blunders on my part. I'd love to know what you think. It'd be awesome to have this functionality exposed!

Thanks in advance for your consideration of the PR 🙏

The `tiktoken-js` library includes a very helpful function, `getEncodingNameForModel()`. This function is buried in the implementation of `encoding_for_model()` in the rust based `tiktoken` package. This function is very useful when implementing an encoding cache based on the model used. In this case, having a mapping from model -> encoding and then caching based on the encoding name conserves resources since so many models re-use the same encoding. I've exposed a new `get_encoding_name_for_model()` function that behaves similarly to the one in the `tiktoken-js` package, and used it inside of `encoding_for_model()`. Finally, I've also added a test to ensure that this function can be called properly from typescript code, and that it properly throws exceptions in the case of invalid model names. Fixes: dqbd#123

noseworthy force-pushed the expose-encoding-name-for-model branch from e405f8b to e512c0d Compare February 17, 2025 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add get_encoding_name_for_model to tiktoken #136

Add get_encoding_name_for_model to tiktoken #136

noseworthy commented Feb 16, 2025 •

edited

Loading

noseworthy commented Feb 17, 2025

Add get_encoding_name_for_model to tiktoken #136

Are you sure you want to change the base?

Add get_encoding_name_for_model to tiktoken #136

Conversation

noseworthy commented Feb 16, 2025 • edited Loading

noseworthy commented Feb 17, 2025

noseworthy commented Feb 16, 2025 •

edited

Loading