In this type of blog post with comparisons including many languages, you usually see some unatural approach for your favorite language.
However, in this case, the Python example is idiomatic.
Since the articles calls for better approaches, I would suggest to take the opportunity to use the walrus operator and rsplit() for the optimized version. Something like this:
reminding = ""
c=Counter( )
while (chunk := sys.stdin.read(64 * 1024)):
pre, post = chunk.lower().rsplit('\n', 1)
c.update((reminding + pre ).split())
reminding = post
It should not affect performance too much, and the code gets more expressive.
However, the performances will be quite different depending of the python version you use. Interestingly, Python 3.11 beta, which comes with a lot performance tweaks, is slower for this exercice, while being reported to be faster on real life tasks.
Your code assumes there is at least one newline in each 64K chunk, and assumes there are no words past the final newline.
It will fail on "abc" and give the wrong answer for "abc\ndef".
I prefer rpartition over rsplit to handle first case, and the loop-and-a-half construct instead of the while+walrus operator to handle the second, as in this modified version of your code:
remaining = ""
c=Counter( )
while True:
chunk = sys.stdin.read(64 * 1024)
if not chunk:
if not remaining:
break
pre = post = ""
else:
pre, mid, post = chunk.lower().rpartition("\n")
c.update((remaining + pre).split())
remaining = post
However, in this case, the Python example is idiomatic.
Since the articles calls for better approaches, I would suggest to take the opportunity to use the walrus operator and rsplit() for the optimized version. Something like this:
It should not affect performance too much, and the code gets more expressive.However, the performances will be quite different depending of the python version you use. Interestingly, Python 3.11 beta, which comes with a lot performance tweaks, is slower for this exercice, while being reported to be faster on real life tasks.