Hacker News new | past | comments | ask | show | jobs | submit login
A First Excercise in Natural Language Processing with Python: Counting Hapaxes (catswhisker.xyz)
90 points by cristoperb on Sept 8, 2017 | hide | past | favorite | 8 comments



I get that the point is to be an introduction to the libraries and whatnot, but was I the only one who immediately thought of just using Counter?

    from collections import Counter
    import re

    [word for word, count in Counter(re.findall('\w*', text.lower())).items() if count == 1]


As the article states in the introductory paragraph, this problem encompasses more than just counting strings. It also involves "some fundamental tasks of natural language processing (NLP): tokenization (dividing a text into words), stemming, and part-of-speech tagging for lemmatization", so a little more work is required here.


Yep, if you don't need any of the fancy NLP features of a library then something like this is the most straight-forward. (In the article I did give a plain Python solution using split() to tokenize and then using Counter to get the hapaxes in the function called "word_form_hapaxes".)


for anyone interested in more good beginner resources, I really enjoyed this youtube playlist on python NLTK https://www.youtube.com/watch?v=OGxgnH8y2NM&list=PLQVvvaa0Qu...

edit* I accidentally linked to another good playlist, but here's the first vid of the NLTK list from the same user https://www.youtube.com/watch?v=FLZvOKSCkxY


Hapax Legomenon is such a satisfying phrase to say. Even the opportunity to look at it makes my eyes happy.


I counted word n-grams up to length 6 in a corpus of 6 billion words with Madoka, a Count-Min sketch algorithm.

https://pypi.python.org/pypi/madoka


Author here. The misspelling in the title is embarrassing, but luckily not very noticeable (I've fixed it on the site).


It's so beautifully put together and made so easy to understand, I thank you very much, it helped greately in my learning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: