My assumption is that this gives you the ability to locally encode vectors. This...

andy99 · on Nov 26, 2023

Transformer inference is ~60 lines of numpy[0] (closer to 500 once you add tokenization etc). It would be nice to just have this and not all of pytorch and transformers.

[0] https://jaykmody.com/blog/gpt-from-scratch/

cherryteastain · on Nov 26, 2023

It's 60 lines for CPU only inference, which'll be slow. If you want GPU acceleration it'll be a lot more than 60 lines.

dmezzetti · on Nov 26, 2023

What about models besides GPT? Most of the popular vector encoding models aren't using this architecture.

If you really didn't want PyTorch/Transformers, you could consider exporting your models to ONNX (https://github.com/microsoft/onnxruntime).