Show HN: Deploy a translation API for 100+ languages with machine learning

calebkaiser · on Dec 9, 2020

Hey HN!

This is a small project I've been tinkering with for a while. In simplest terms, it is an open source web service that enables on-demand translation between over 100 languages using machine learning.

You can deploy it on your own AWS account if you want to try it out. It uses Cortex (an open source ML deployment project I help maintain) to automate deployment and implement multi-model caching, which allows the API to access the models from a remote S3 bucket and cache them in memory as needed. As a result, it only needs one EC2 instance to run, despite serving predictions from +1,000 different 300 MB models.

There’s a ton of room for optimization and customization here. Please feel free to hack it however you’d like. The API uses models trained by the Language Technology Research Group at the University of Helsinki, hosted publicly by Hugging Face.

mendeza · on Dec 10, 2020

Wow, this is really cool! Is inference running on GPU or CPU? I wonder how well this would run if model storage and inference is done on a jetson nano!

calebkaiser · on Dec 10, 2020

Thanks! The inference is currently run on GPU (a g4dn.xlarge instance), but can be swapped for CPUs by simply changing the compute request in the configuration YAML.

In theory, a nano has more than enough memory and storage to run it, but with how tied into AWS the current implementation is, I don't know how well it would run on a Jetson Nano without some hackery. Admittedly, however, I've not done much with the Nano, so I don't want to come off as overly confident here.