I recently had a use case for this where I needed a naively created bag of words along with the frequency of words. Having Counter made this extremely easy as I could simply split the string on whitespaces and pass the result to a Counter object.
Another useful feature is that you can take the intersection of two Counter objects. It's a really nice data structure to have!
_ in python is commonly used as a variable name for values you want to throw away. For example, let's say you have a log file split on newlines, with records like this:
logline = "127.0.0.1 localhost.example.com GET /some/url 404 12413"
You want to get all the URLs that are 404s, but you don't care about who requested them, etc. You could do this:
_, _, _, url, returncode, _ = logline.split(' ')
There's no special behaviour for _ in this case; in fact, normally in the interactive interpreter it's used to store the result of the last evaluated line, like so:
>>> SomeModel.objects.all()
[SomeModel(…), SomeModel(…), SomeModel(…), …]
>>> d = _
>>> print d
[SomeModel(…), SomeModel(…), SomeModel(…), …]
Which I think is basically the same behaviour; you run some code, you don't assign it, so the interpreter dumps it into _ and goes on about its day.
Normally you would place a variable there, such as x or y. But in this case, the variable doesn't matter. You aren't using it in the function call. You're telling the programming the variable for the for loop that it doesn't matter.
It's just a name of a variable. There's nothing special about it (in this context), but it's conventionally used when you don't care about the value assigned to it.
"Because I was so used to statically typed languages (where this idiom would be ambiguous), it never occurred to me to put two operators in the same expression. In many languages, 4 > 3 > 2 would return as False, because (4 > 3) would be evaluated as a boolean, and then True > 2 would be evaluated as False."
The second half of this is correct, but it has nothing to do with whether the language is statically or dynamically typed. It's a tweak to the parser, mostly.
It's not just a tweak to the parser, and it does have to do with the type system, but you're right that it's not about static typing.
The issue is that there are languages (like C) where typing is static but weak, so e.g. booleans are also integers and can have integer operations like '>' applied to them. In other words, the problem is that in C True == 1 and 1 > 2 is a valid expression. In Python, which has strong(er) types, this expression can have an unambiguous meaning if you want to add the feature to your parser.
You can implement it entirely in the parser if you can avoid name capture - it may or may not be implemented entirely as a tweak to the parser in practice, but it's fundamentally a syntactic thing.
Your discussion of types here is all wrong - it's true that C treats booleans as if they were integers, but Python does, too:
In terms of coding style, I agree with you that following the number line is usually going to be clearest. I think there might be some situations where descending is better than ascending, but certainly both are radically better than what I did above (low > high < lower).
As an example for what I was specifically trying to show here, though, that doesn't let me distinguish things quite as clearly.
I believe that (a < x < b) gives the same value as at least one of (a < x) and (x < b) for any value of x.
x < a:
a < x gives false
a < x < b gives false
a < x < b:
everything gives true
b < x:
x < b gives false
a < x < b gives false
So there's no way to get both of the parenthesized versions to disagree with the unparenthesized version in a single example.
In fact, Python just has a non-binary AST with regard to operators, i.e. the expression "a < b < c" is not parsed as CompOp([CompOp([a, b], <), c], <) but instead as CompOp([a, b, c], [<, <]). The same holds by the way for Boolean operators, "a and b and c" is represented as BoolOp([a, b, c], [and, and]). See https://docs.python.org/3/library/ast.html#abstract-grammar for details.
Yes, you really can just tweak the parser; just (a) don't have a rule allowing comparisons to appear as children of other comparisons, and (b) add a rule that permits chains of comparisons. Types have zilch to do with this.
You are absolutely correct that it could be implemented as 100% a tweak to the parser.
Assuming filmor is correct, in practice as it happens to be implemented in the Python code base it is not 100% a tweak to the parser - changing the structure of the produced AST means tweaks down the line. I think the change to the parser is still the most meaningful piece, though, even there.
I was lucky to watch this video while first learning the language. Every beginner (coming from another language) should watch this to understand the idioms of Python.
Then there's numpy, which has its own array type, needed because Python's array math is slow. Numpy arrays have different semantics - add and multiply perform the usual numeric operations.
You can mix numpy arrays and built-in lists with "+" and "*". Mixed expressions are evaluated using numpy's semantics. Since Python doesn't have type checking, it's possible to pass the wrong kind of numeric array to a function and have it treated with the wrong semantics.
Moral: when designing your language, using "+" for concatenation is tempting, but generalizing that concept comes back to bite you.
You went from "not what you want for numerical work" to "generalizing that concept comes back to bite you". I don't think you can make that step.
I do non-numeric scientific computing. (Meaning, I touch numpy about once a year.) My own code does things like
[0] * N # could be replaced with something like numpy.zeros()
[QUERIES_FPS] * 501
to_trans = [None]*256 # constructing a 256 byte translation table
# (I could use zeros(), but used None to force an exception
# if I missed a cell)
self.assertEqual(self._get_records(simple.record*2),
[(simple.title, simple.record)]*2)
# I parse a string containing two records and test I should
# be able to get the (id, record) for each one
["--queries", SIMPLE_FPS, "-k", "3", "--threshold",
"0.8"] + self.extra_args # Construct a command-line from 2 lists
These idioms also exist in the standard library, like:
So while I think you are correct, in that "+" causes confusion across multiple domains with different meaning for "+", I think the moral is that operating overloading is intrinsically confusing and should be avoided for all but the clearest of use cases.
There is no best generalization to "+". For example, if you pick the vector math meaning, then regular Python would have that:
["A", "B"] + ["C", "D"] == ["AC", "BD"]
which has its own logic, but is likely not what most people who haven't done vector math expect.
You think it's a "bad" design decision because you think that a Python list should represent a vector of numbers (not even an array - a mathematical vector).
But a list is much more than that - conceptually it's any ordered collection of things, and not necessarily even the same type of thing. Overloading `+` to mean concatenation and `*` to mean replication means that the operators can work with any kind of list, not just lists that are mathematical vectors.
If you do want a mathematical vector, you should use a numpy array - not only are you making it clear that you have a vector of numbers, but your operations will be more efficient (because using a numpy array guarantees that the elements in it are all of the same type, so you don't have to do type dispatching on every element).
yield? The reason why numpy lets you do math operation on each element in an array is because you can safely assume that each element is a number. You can assume absolutely nothing about the types of the elements in a list.
because the print function knows how to handle recursive definitions. Do all of the element-wise operations need to handle cyclical cases like this? I think numpy can get away with not worrying about this precisely because, as
wodenokoto pointed out, it can assume a flat structure.
You apparently want lst * intval to be equivalent to map(lambda n: n * intval for n in lst) or [n * intval for n in lst]. Since Python has a convenient built-in and even syntactic sugar for doing what you want, why not let the operator overloading handle a different case?
(also, your issue is not with "nonsense semantics", it's with "my idea of how this operator should've been overloaded is different from their idea", and perhaps is even a beef with the idea of operator overloading in general, though if you like numpy I think you wouldn't like losing operator overloading)
This is because you're assuming 'array' is supposed to mean 'vector' (as in the linear algebraic vector). It isn't, and it's a list -- it's meant to be a container. In this case, add meaning concatenate and multiplication meaning self-concatenate multiple times makes sense.
Even worse IMHO is the semantics of strings being implicitly iterable. Often it ends up that you're intending to iterate over something
for item in orders:
do_something_with(item)
So if `foo` is usually `[Order(...), Order(...), ...]` but due to a bug elsewhere, sometimes `foo` is "some string". Then you get a mysterious exception somewhere down in `do_something_with` or one of its callees at run time, and all because the above snippet calls do_something_with('s'), do_something_with('o'), etc.
In my experience, this behavior is so seldom what is wanted that it should be removable (with a from __future__ style declaration) or just off by default.
I use "for c in s", to read characters in a string, pretty often. Here's an example from Python3.2's quopri.py:
def unhex(s):
"""Get the integer value of a hexadecimal number."""
bits = 0
for c in s:
c = bytes((c,))
if b'0' <= c <= b'9':
i = ord('0')
elif b'a' <= c <= b'f':
i = ord('a')-10
elif b'A' <= c <= b'F':
i = ord(b'A')-10
else:
assert False, "non-hex digit "+repr(c)
bits = bits*16 + (ord(c) - i)
return bits
Here's another example of iterating over characters in a string, from pydoc.py:
if any((0xD800 <= ord(ch) <= 0xDFFF) for ch in name)
It seems like a pretty heavy-weight prohibition for little gain. After all, you could pass foo = open("/etc/passwd") and still end up with a large gap between making the bug and its consequences.
Not sure what it adds, but I don't quite understand it yet and perhaps there's something magic in the context of MIME quoted printable that I'm missing.
Based on my reading, there's nothing magic. The context is:
elif i+2 < n and ishex(line[i+1:i+2]) and ishex(line[i+2:i+3]):
new = new + bytes((unhex(line[i+1:i+3]),)); i = i+3
I tweaked it to
new = new + bytes((int(line[i+1:i+3], 16),)); i = i+3
and the self-tests still pass. (I also changed the 16 to 15 to double-check that the tests were actually exercising that code.)
It's not part of the public API, so it looks like it can simply be removed.
Do you want to file the bug report? Or perhaps it's best to update http://bugs.python.org/issue21869 ("Clean up quopri, correct method names encodestring and decodestring")?
* on lists can also mean elementwise multiplication, dot or cross product if you treat them as vectors. There's no way to choose the objectively best meaning. I'd even argue that vector math isn't the most popular use for lists in python, not because of + and * semantics, but because of performance.
So it was good design decision not to bother with math semantics for general use datastructure.
And besides Python has nice general syntax for elementwise operations if you don't care about performance:
[x*y for (x,y) in zip(xs,ys)]
I agree it would be better not to implement + for lists at all.
I'm very aware of what you mentioned but...all I "wanted" in this case is a visual separator in my terminal when I'm working with lots of output. I don't care whether each "* " refers to the same object, I just want a line :)
With that being said, if I want to merge two lists and apply an operation on each, I don't see what's the issue with:
In [1]: a = [1,2,3]
In [2]: b = [5,6,7]
In [3]: c = a+b
In [4]: c
Out[4]: [1, 2, 3, 5, 6, 7]
In [5]: d = [x*4 for x in c]
Out[6]: [4, 8, 12, 20, 24, 28]
Haskell also uses <> for combining any monoid, but of course in Python that was once not-equal... Maybe a dot? It's string concatenation in Perl, and function composition in Haskell. Interestingly, both of those are monoids...
I'll be sure to make great use of the dictionary get method - I'm embarrassed to admit how many thousands of times I could have used that, and didn't know it existed.
Another good/great source of Python tricks/idioms is Raymond Hettinger's "Idiomatic Python". The slides/videos are really great. I highly recommend them.
Your defaultdict approach and the dict.get with a default specified is not really equivalent. In the defaultdict case when you encounter a non existing key it adds a new entry with that key into the dict. i.e. your dict will start growing.
whereas dic.get with default value keep returning you the default value without touching your dict.
re: "Your defaultdict approach and the dict.get with a default specified is not really equivalent. In the defaultdict case when you encounter a non existing key it adds a new entry with that key into the dict. i.e. your dict will start growing."
europa - The dictionary is only modified if you are using a method to modify it. When you are just passively querying it, it's not impacted.
The difference here - is that you are actually assigning a value to the dictionary element. It's the assignment that's growing the dictionary, not the query.
Yes, as SixSigma says, it's called a jump table; I've also seen it called a dispatch table (because you dispatch control to the right function based on a key). It's quite an old technique, dating from the earliest programming languages and even used in assembly language. using indirect addressing and suchlike techniques.
Edit: Looked it up, dispatch table is in Wikipedia:
Do you consider that to be idiomatic? I've been out of touch with the a Python community for a few years, but back then I wouldn't have considered that remotely idiomatic, and if I was on a team writing software, I would have argued that we shouldn't be writing code like that.
If your dict keys are just numbers, then no, probably not. But strings mapping to functions, and in some cases objects and other things, are often used to substitute for numerous if and elif statements.
I'm not the poster you posed that question to. But for me, the one big drawback of using that idiom is that the function signatures have to be identical. So you either have to resort to args/kwargs, or you have an additional intermediary method between the actual "guts" of what you're calling, and the "switch" statement.
Or you live with the fact that you're passing unused/unnecessary parameters to your functions.
I don't think it's more powerful in any way than an if-elif-elif-else which, for what it's worth, I consider the Pythonic way.
Having said that, just because I consider something more Pythonic doesn't mean I prefer it. I've worked in a lot of languages over the years and still work in several in addition to Python. I really enjoy Python, but I prefer techniques that are more universal in many cases. For example, I prefer the idiom of looping over a list index to using Python's "enumerate" in most cases, because index looping is a common cross-language idiom and enumerate usually doesn't offer any benefit I value more than universal obviousness.
Other things such as Python's `for item in items` looping style are both VERY Pythonic and much nicer than, say, index looping, so I would almost always prefer such idioms.
The above switch -> function pointerish thing is clear to me from years of C/C++, but it is both less generally applicable across languages than if-elif... and less Pythonic, so I would prefer the if-elif... approach.
Obviously a matter of preferences, but since you asked....
It's strange that you would prefer `for item in items` but not `for i, item in enumerate(items)`. So in that case you'd manually do the `for i in range(len(items))...`?
I sometimes see this in code and think to myself, whoever wrote this needs to learn themselves some idiomatic python :) I don't think there's much point trying to force stuff that makes sense in one language into another language. Play to a languages strengths and all that.
I think you're right about if-elif generally being more powerful than the jump table. Though the jump table is useful in that you can define it in one place and use it in another.
I personally don't like this style of using multiple strings. Makes radical changes of the text cumbersome.
I think in most cases it's better to use triple quotes. And if the content of these variables isn't exclusively shown in the shell, you should use translation files anyway.
$ cat triple.py
def foo():
print """this is a triple quoted string
this is a continuation of a triple quoted string"""
if __name__ == '__main__':
foo()
$ python triple.py
this is a triple quoted string
this is a continuation of a triple quoted string
This is really warty. In bash you can mostly get around this with <<- for here documents (which removes leading indentation, so even if you're defining a here doc at a place that itself is indented, you don't have to align the text to the left column yourself). The man page in my version suggests it only works for leading tabs, not spaces, though.
e.g.
$ function usage() {
cat <<-END
this is a here document that
spans multiple lines and is indented
END
}
$ usage
this is a here document that
spans multiple lines and is indented
$
I don't have an opinion positive or negative on it, but since many other design decisions have already been mentioned, here is Haskell's design decision for multi-line string literals. It allows "tidy indenting", but like the first design decision, interacts badly with "reflowing / reformatting / filling".
from __future__ import print_function
import textwrap
t = """
Hello there
This is aligned
"""
# Need strip to get rid of extra NLs
print(textwrap.dedent(t).strip())
Dedent is nice, but then you still have to deal with removing single newlines (e.g. for error messages) and removing leading and trailing spaces. Ultimately nothing more than `re.sub(r'[^\n]\n[^\n]', '', textwrap.dedent(s).strip())` but kind of annoying to have to throw this in your code all over the place.
I would put all messages in variables at the top of the file or even in a separate one and print them in the appropriate places.
Having multi line prints in functions add a lot of noise in my opinion. When i read code, i dont normally care about the content of messages being printed.
(< a b c d) ;; T if a < b < c < d
(<= a b c d) ;; T if a < b < c < d
Also:
(lcm a b c d ...) ;; lowest common multiple
(+) -> 0
(+ a) -> a
(+ a b) -> a + b
(+ a b c) -> (a + b) + c
(*) -> 1
(* a) -> a
(* a b) -> a * b
(* a b c) -> (a * b) * c
Is it just syntactic sugar? (< a b c) evaluates a, b and c only once, which matters if they are expensive external function calls or have side effects.
(and (< (foo) (bar))
(< (bar) (baz)))
isn't the same as
(< (foo) (bar) (baz))
By the way, this could be turned into a short-circuiting operator: more semantic variation. Suppose < is allowed to control evaluation. Then an expression like (< a b c d) could avoid evaluating the c and d terms, if the a < b comparison fails.
and for making = actually useful (works on nested structures properly).
"Is this sorted", and "are these equal" are intuitive and useful concepts in programming and you shouldn't need to reimplement them each time you need them.
In a Lisp-1 dialect like Scheme, rel could easily and conveniently be a function. The call (rel a < b <= c) simply evaluates its arguments. The arguments < and <= are functions. The rel function then just operates on these values.
I assume the above was intended to be a Python expression. The answer, to the best of my knowledge, is "not really". There is no built in or reasonably standard function that lets you check simultaneously that a is less than b which in turn is greater than or equal to c. Not that we couldn't define one ad-hoc, although making it reusable in a way that's reasonably idiomatic might be a challenge...
Obviously, you can express it slightly more verbosely:
This isn't the same thing -- the Python syntax is asking for a < b and b >= c, while only evaluating b once. There are many infix math libraries for lisps (e.g. this one from 1995: https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/...) that allow things like #I(a < b >= c), but the trick is only evaluating b once. (The linked lib will expand into two evaluations if b is an expression.)
Which means it could be done, but the relational operators (any one of which could be the leftmost constituent) have to all be defined as macros which analyze the rest of the expression with alternative evaluation rules.
I think syntactic sugar is vastly undervalued by programmers. Anything that lets me more naturally read & write code should lead to fewer bugs and more productive software development, as well as making programming more enjoyable.
SQL has “x BETWEEN y AND z” as a special case which does “y >= x >= z”, which in practice is quite often what you actually want to use this feature for.
I'm not sure if other languages have it, but I have to say that pycharm is excellent at suggesting chained comparisons to you. I didn't know this existed before I switched IDEs.
If this Pycharm really had brains it would tell you that (2>3 and 2<7) reduces to (false and true), to false, at compile time, so the code wrapped in the if is unreachable.
I'm not sure about pycharm, but i know for certain IntelliJ warns you about constant conditions like that. That said, he could have picked that example just to illustrate a point, rather than something it literally suggests.
I came across this when I was first learning Python and it has always impressed me:
from random import shuffle
deck = ['%s of %s' % (number, suit) for number in '2 3 4 5 6 7 8 9 10 Jack Queen King Ace'.split(' ') for suit in 'Hearts Clubs Diamonds Spades'.split(' ')]
shuffle(deck)
I do it entirely, exclusively, only, purely because it requires less punctuation typing. It returns a list anyway. The performance hit is virtually unnoticeable in almost every use case (unless this is a function taking in input strings formatted this way many times per second, but in that case you've got way worse to worry about first...).
As DaFranker points out, it's just easier to type than
('Hearts', 'Diamonds', 'Spades', 'Clubs')
and has less opportunity for typos and syntax errors. If I was concerned about performance I would replace it with a tuple, but it was Good Enough for a quick example.
It's like old Perl idiom where Python's 'ab cd ef'.split(), would be written as qw(ab cd ef), which probably looks nicer --'qw' stands for 'quote words', I think.
Can someone direct me to a comparision of subprocess and os? I keep hearing subprocess is better, but have not really read any explanation as to why or when it is better.
(I'm glad I'm not the only one who was thrilled to discover enumerate()!)
The OS module interacts directly with the OS rather than abstracts it, so a lot of the functions in it have the "may not be available on all platforms" apology.
Subprocess uses OS under the hood but offers an abstraction that mostly works on all platforms, e.g. the way that "communicate" is implemented on Windows differs considerably from how its implemented on Unix.
I was grateful for the example of multilined strings, mysterious as it is. The lack of any way to do this has been an annoyance of mine for quite some time.
1. Am I the only one that really loves that `print` is a statement and not a function? Call me lazy, but I don't mind not having to type additional parentheses.
5. Dict comprehensions can be dangerous, as keys that appear twice will be silently overridden:
elements = [('a', 1), ('b', 2), ('a', 3)]
{key: value for key, value in elements} == {'a': 3, 'b': 2}
# same happens with the dict() constructor
dict(elements) == {'a': 3, 'b': 2}
7. I see
D.get(key, None)
way too often.
8. Unpacking works in many situations, basically whenever a new variable is introduced.
for i, el in enumerate(['a', 'b']):
print i, el
{key: value for (key, value) in [('a', 1), ('b', 2), ('a', 3)]}
map(lambda (x, y): x + y, [(1, 2), (5, -1)])
Note: the last example (`lambda`) requires parentheses in `(x, y)`, as `lambda x, y:` would declare a two-argument function, whereas `lambda (x, y):` is a one-argument function, that expects the argument to be a 2-tuple.
On the subject of "call me lazy", I really like the % syntax for string interpolation. I'd like perl-style interpolation even more. "".format() is going in completely the wrong direction for me. (I don't think % is being removed, but I think it's discouraged.)
The first really has nothing to do with `print` being a statement - it could be parsed and passed as a function in "non-statement" positions regardless.
The rest could probably be special-cased in a backwards-compatible way as well. This is currently not valid Python 2.0 syntax:
well, as a statement it's also a reserved keyword, so you can't override or mock it (AFAIK), and I suspect changing the parser to identify context and operate accordingly might be rather painful.
Honestly, I'm reasonably happy with the split; there don't seem to be many compelling reasons to shoehorn all the extra bits back into statement-print other than
a) removing a single pair of parens
b) being backwards compatible (but then the old code wouldn't be using those new features anyway, and would still have to support that nasty bitshift hack.
> 1. Am I the only one that really loves that `print` is a statement and not a function? Call me lazy, but I don't mind not having to type additional parentheses.
If you think the parentheses are bad, why wouldn't you prefer a language like Ruby where you can omit them generally? Leaving them out for just one special construct seems so insufficient as a cure.
In general, Ruby's syntax (or rather, semantics) is ambiguous (you don't know if a statement without parentheses is calling a function or just accessing a value). I prefer unambiguous syntax.
However, Python's `print` statement is (1) well-known, (2) very useful (for prototyping, debugging, in REPL), and (3) shouldn't in general be present in production code (logging should be used instead). Therefore, omitting parentheses would help ease debugging and exploring (REPL prototyping), while not making code more ambiguous in general. Yes, it's a special case, but `print` is also has very special, very specific use-case.
"7" May be because getattr (at least) works the other way around, instead raising an exception if not found and no default specified. I'm sure many people can't always remember which works which way.
There would be no purpose in the `get()` function if it raised an exception if the key wasn't there - that's how `[]` works. On the other hand, `getattr` is IMO mostly used for situations where you don't know what properties exist on an object, so you can't just use the dot notation.
with the print as a keyword, you can't have modules defining a print function (for example, APIs where print makes sense as a function name ([he]xchat's py support comes to mind))
additionally, if you want to override print in python2, you need to replace the stdout stream with your own inbetween buffer object, which also has the downside of being global
I work with python full time, and the last (#10 string chaining) is one of the few times the syntax had caused me grief, due to missed commas in what were supposed to be tuples of strings. The chaining rules are one of the few sources of apparent ambiguity in the syntax, especially when you include the multiline versions.
When I first started using Python around 1999, it didn't even have list comprehensions. Code was extremely consistent across projects and programmers because there really was only one way to do things. It was refreshing, especially compared to Perl. It was radical simplicity.
Over the decade and a half since then, the Python maintainers have lost sight of the language's original elegance, and instead have pursued syntactical performance optimizations and sugar. It turns out that Python has been following the very same trail blazed by C++ and Perl, just a few years behind.
(At this point Python (especially with the 2 vs. 3 debacle) has become so complex, so rife with multiple ways to do even simple things that for a small increase in complexity, I can just use C++ and solve bigger problems faster.)
It's certainly up for debate whether named tuples and enums, various kinds of metaprogramming and decorators might be making the language more complex for fairly little gain... but this article talks about the `enumerate` function, about string formatting and dictionary comprehensions. Simple, straightforward stuff with no downsides.
But syntax matters! When you're writing stuff all day,
a = [_ * 2 for _ in range(10)]
is a lot more pleasant than:
a = []; for _ in range(10): a.append(_ * 2)
It also gives Python a lot more information about your actual intent. Suppose "range(10)" were actually "giant_list". Hypothetically, the list comprehension could pre-allocate len(giant_list) elements instead of calling list.append that many times. That's potentially a huge performance win.
You see Python as getting more complex. I see it as getting less complex by giving concise alternatives to common idioms.
There is an sweet in India called [0] phirni when you start eating from outer layer to inner layer, you will feel like walking in heaven.Now you are in outer layer.I hope you enter to the inner level and feel the python still. :)
"Missing from this list are some idioms such as list comprehensions and lambda functions, which are very Pythonesque and very efficient and very cool, but also very difficult to miss because they're mentioned on StackOverflow every other answer!"
Can anyone link to good explanations of list comprehensions and lambda functions?
I think he or she meant a specialized set data structure that stores sets of numbers which can be written as finite unions of interval. Typically besides set operations you'd also want (1) that only endpoints of the intervals are stored, so that the structure is compactly represented in memory, (2) to be able to recover the canonical representation of the set as a sorted union of disjoint intervals.
I'm not entirely sure how to interpret the PEP header. It dates back to 2001 and was updated in 2012.
It's probably in python since 2.3 but maybe 2.7(2010)/3.0(2008).
The nice thing about a Counter is that it will take a collection of things and... count them:
Docs: https://docs.python.org/2/library/collections.html#counter-o...