Mozilla heavily uses GitHub to host repositories for basically all of their projects (with possibly just the exception of Firefox itself). Anyone familiar with Mozilla’s open source efforts knows they use GitHub heavily, so stating that I didn’t see the source on their preferred platform is perfectly reasonable.
Your comment is pedantic and unhelpful. It is effectively the same as overhearing someone asking another person for a Kleenex, then choosing to interject and lecture that person on the difference between Kleenex and tissue paper when the other person does have actual Kleenex-brand tissue. Yes, I know what open source is. Yes, Mozilla uses GitHub. I even provided links to relevant GitHub repositories by Mozilla once I found what I was looking for.
Thanks to you for giving the extension a try! This is a very interesting feature, and I'd like to consider it. Would you mind filing an issue in the repo so we could track and discuss it here [1]?
+1. This was the first thing I tried to do and was surprised this feature doesn't exist. Most often, I don't encounter entire webpages in foreign languages, but rather small snippets of text.
It looks like both the web page version and browser extension download from storage.googleapis.com, which is not ideal (obviously much better than the text to be translated being sent). IMO, this should be clarified in the extension description. Are there any plans for Mozilla to host this data? (Well, Amazon I guess to match addons.mozilla.org)
I noticed the web page has Polish but not the extension. Is it not considered sufficiently high quality yet? (Looks like that might be the case from a quick test.)
As with others, I first tried to translate a bit of text not the whole page. Another super helpful feature likely already on a todo list is the ability to see alternative translation options.
Thanks again, this is great and I'm surprised it isn't a larger download. I am curious about any size/quality tradeoffs being made.
Also, playing with the web page a bit English->Spanish I see it does a whole text translation such that adding new sentences affects the translation of earlier ones, sometimes in very odd ways (with one combination of words it translated "Spanish->English" to "español-esplén"). It seems to sometimes produce "Spanish" words that don't seem to be actual words in Spanish as best I can tell and even accoding to its own Spanish->English. It seems like a way to indicate a section to be translated as a unit might be helpful.
> Are there any plans for Mozilla to host this data? (Well, Amazon I guess to match addons.mozilla.org)
I'll bring this internally for discussion.
> I noticed the web page has Polish but not the extension. Is it not considered sufficiently high quality yet? (Looks like that might be the case from a quick test.)
The Polish model finished training after we had the extension reviewed and signed for distribution, so that's why we could already integrate it in the website, which is controlled just by my team, and not in the extension yet, but we'll ship it in the next version which might come next week.
This feature you requested is something that I consider interesting and important, would you mind filing an issue in the repo so we could track it [1]?
We did not find any large quality issues after quantizing the models.
In regards to the neologism in Spanish, that is yes a condition that the consortium is aware and working to remediate it.
Thanks! Looks like I picked a bad test case for Polish, I tried a few more and some look perfect (boring corporate text) and others around the same as Google Translate (I don't know Polish or Spanish and mostly looked at song lyrics). I don't know if specific examples are helpful, but it looks like it particularly has a tough time with "W poprzek wpław" and I think a few more issues in:
I'll add a few feature requests to the issue tracker (if not there already) hopefully later today. Including a request to put that web page in the extension :).
Could you share details about the machine translation engine that is used (or where to find out more about it)? Are there any plans to open source the extension code (with the WebAssembly optmizations that are mentioned in the article)?
You can find the engine used here [1], the API built around it here [2] and its WASM port here [3] and the WebAssembly matrix multiplication optimizations are here [4]
Hi! This is an amazing project and will be really useful! Thank you!
I understand that the project is funded by EU so the focus is on European languages but are there any plans to add CJK or other languages ?
Yes, that's something we've been discussing internally and is being considered. In this meantime, please feel free to file an issue in the repo [1] so we could track it:
Yes, like I said above that's something we've been discussing internally and is being considered. In this meantime, please feel free to file an issue in the repo [1] so we could track it:
[1] https://github.com/mozilla/firefox-translations/issues
Hello, your tool is great! I have questions about the future of the project, is it planned to add languages in the future? The European Union grant ends in June and will the project continue to develop and add more languages in the future?
Thanks for using it! Best way currently is to keep using and reporting issues on [1]. You can see how the models are trained on [2] and file issues there too.
Google Translate code is present on many web sites to provide automatic translations of text. Could your translate code be uploaded to a server and embedded in web page to provide the same functionality?
I'm not aware of any actively maintained projects that give you this out of the box, but these two could be starting points for such a project.
Mozilla implemented a REST service based on (an earlier version of) bergamot-translator [1]. You could use that as a replacement for the WASM component in the addon's code.
I also know of some full-page translation demo code that uses the python bindings of bergamot-translator [2]. That's basically a web proxy a la Goole Translate.
Lastly, marian, the translation software that's being used, has a web server as well [3]. It does not support HTML though.
EDIT: see also my earlier comment for using it with Node or Python [4], which you could use to implement a simple web API.
Sure, like I mentioned in the article, you can embed the engine and the models in any web page to be run in a browser with proper WebAssembly and SIMD support.
You can have an example on how we did here [1] and test it here (I recommend using Firefox) [2]
That way you don't need a server and everything is processed in the browser, so no need of google translate, or any cloud service to have translations embedded in any website anymore.
All of them are freely available. Most of them through mtdata [1]. The exact list of the datasets is in the firefox-translations-training pipeline configuration file [2].