Hacker News new | past | comments | ask | show | jobs | submit login
GitHub open sources Linguist (github.com/blog)
153 points by DanielRibeiro on June 27, 2011 | hide | past | favorite | 2 comments



Linguist is GitHub's language identifier that determines which syntax highlighter should be used when you view a file. Some excerpts from the docs:

> Most languages are detected by their file extension. This is the fastest and most common situation. For script files, which are usually extensionless, we do "deep content inspection"™ and check the shebang of the file. Checking the file's contents may also be used for disambiguating languages. C, C++ and Obj-C all use .h files. Looking for common keywords, we are usually able to guess the correct language.

> The actual syntax highlighting is handled by our Pygments wrapper, Albino. Linguist provides a Lexer abstraction that determines which highlighter should be used on a file.

It also provides other features like generating stats on a repository.


So in order to have syntax highlighting for a language, you have to fork Pygments, write the lexer, issue a pull request, then finally fork Linguist, add the new language/lexer, then issue a pull request?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: