Hacker News new | past | comments | ask | show | jobs | submit login
Python Standard Library changes in recent years (antonz.org)
277 points by nalgeon on May 21, 2022 | hide | past | favorite | 128 comments



I love the memoization/caching stuff.

I know someone will blurt out “Haskell!” or whatnot, but I wish memoization was more first class in popular languages. It’s a great way to maintain performance without having to reason about the implementation details of your code. A decorator in the stdlib gets pretty close.


The problem with memoization too easy is that you generally want a sane cache policy for the problem, and that's generally problem dependent. Otherwise you blow up program memory usage.


I only need invalidation policies if i want to cache functions that aren't referentially transparent.


You also need invalidation policies if the number of items you cache might end up too large in practice. That has nothing to do with referential transparency.


tbh i never ran into such an issue with my applications and it sounds to me like a physical constraint: i don't expect to be able to keep something cached in memory that is larger than the available memory. if i would see myself optimizing for the best "maxsize" property (assuming lru_cache), then i would rather change to a probabilistic data structure like a bloom filter or cuckoo filter (or whatever the contemporary approach there is).

can you elaborate the problem a bit more please?


> can you elaborate the problem a bit more please?

Simplest example: take any function you memoized with @functools.memoize. The more it's called, the the bigger the cache can grow. Run it on a stream of inputs and you're practically guaranteed to run out of RAM.

This obviously has nothing to do with referential transparency.


memoization is a means of simple optimization to speed up a program, just like GP wrote. you exchange the cost of executing a function with the cost of memory utilization.

say you memoize a function that is referentially transparent, needs 10secs to finish its execution and return its result (eg. factorial). then the memoization would just cache the arguments plus the return value to reduce the 10secs into a simple comparison. the worst that could happen is a missed hit which would invoke the expensive function again - just like what would happen without the optimization. that isn't a big concern, as the function is referentially transparent (without side-effects). memoization lives in this perfect mathematical world where memory for maxsize entries will always be available.

but if your expensive function is referentially opaque, but still pure, then you can't only depend on the arguments of the function, but you need to take more context into account and the additional context requires a form of cache invalidation. the context can be counter-based (only re-evaluate every nth call), time-based (re-evaluate at most once every t time, throttling or debouncing), or both (only allow for at most n/t re-evaluations over time, rate-limiting) etc. in my book that isn't memoization anymore, it's just caching. and you need to define a policy that defines its invalidation.

if your stream of inputs exceeds the available memory, it lives outside the perfect mathematical world and i'd guess you have other problems than a speed optimization through memoization. you'd then need to consider space-efficiency and i would argue that memoization is the wrong kind of optimization for that problem.


If your lang has 1st class functions (edit: inc. closures[1]) you can write memoisers trivially and use them like

   var fm = add_memoisation_to(f);  // used generics for c#/scala
Done this in javascript, scala, C#. Can't remember how I handled params, it's not a big deal.

@dataflow: you build in cache limits, either by time or space, and instrumenting so you can get feedback (eg. hit rate, cache size, whether the cache turnover indicates thrashing). Very little code, a page or two.

[1] and if it doesn't, just use objects. Slightly less streamlined code but otherwise identical semantics.


Nobody is claiming you can't come up with some way to limit the cache size if you wanted to. What I'm saying is (a) you usually need to have the user do something about the cache (most people want to just set @memoize and forget about cache invalidation), and (b) the cache policy (and note this isn't necessarily just about the size of the cache, but about which entries are evicted) is frequently problem-dependent, and a general-purpose one may not end up being all that useful. I can say from personal experience that, while I cache stuff all the time, the number of times I recall seeing @functools.memoize (or the equivalent in any language) used in any real-world code (i.e. not classroom exercises) is exactly zero, whether in my own code or that of anyone else—despite the fact that caching is incredibly common. Just look at common Python libraries (even the standard library itself) and count how many times they use @memoize anywhere. It will be minuscule if not exactly zero. And the fact that it is not widespread is not due to lack of knowledge of the technique.

Notice this is the same reason why you don't see a general-purpose "tree" or "graph" class in many languages' standard libraries [1]: you certainly can make one, but design has strong dependence on the actual problem you're solving, and it's hard to come up with a general-purpose one that's handy and useful. Of course, some people try anyway, and it occasionally gets used, but nowhere remotely as often as people need trees or graphs. Same thing here.

[1] https://stackoverflow.com/a/2982420


If you want other than LRU or time-to-expire then I take your point, but those are IME the most common types, therefore support it. The cache in python's functools are default 128 items, LRU (from memory).

If you really want specialised caching policy then you can have that as an extra object/function passed in to the cache (edit: a predicate or predicate closure), but I've never needed it. LRU all the way for me and most others.

> Just look at common Python libraries (even the standard library itself) and count how many times they use @memoize anywhere.

Obviously! They are general purpose so they can't make any guess, and if they need caching that's up to the user, who can then add @functools.memoize in one line. That's what it's there for!

> And that fact is not widespread is not due to lack of knowledge of the technique.

IME it very much is.

> Notice this is the same reason why you don't see a general-purpose "tree" or "graph" class in many languages' standard libraries

But then caching is far simpler than graphs, and anyway from your link: "(Out of more than a dozen projects in various domains I have been involved in so far, I recount two which actually needed graphs.) So I guess there was no really big pressure from the Java community in general to have a Graph in the Collection Framework."


Have been using easy decorator memoization in python for at least a decade. This is not new and far from impossible to do using probably even older versions of python. I implemented it in 2.5 IIRC.


It's a built-in decorator now called `functools.cache`[0].

[0] https://docs.python.org/3/library/functools.html#functools.c...


I was responding to "was more first class in popular languages". I know about python's functools lib and its memoisation decorator. Point was, if you didn't have it and wanted it, it's easy; no need for the language to provide it.


Don't let this guy see math.tau.


This guy sees your point and this guy does indeed wonder why it's there when 2*Pi will do. But if it's about python the answer is simple - unlike many other languages python does fold constants, but does't fold expressions using variables even if the variable is never changed so there's an unavoidable cost unless you write out the value literally, or have it as a constant.

Or - in PEP 628 - it's just a bit of a larf https://peps.python.org/pep-0628/

Perhaps this guy should grow a sense of humour.


> Perhaps this guy should grow a sense of humour.

Agreed!


Now I'm wondering, can this be done in c++ with variadic templates?



Thanks this looks quite nice (though unfortunately some projects are still stuck on c++11)


Are folks still excited about py? I see talk of rust and Julia and it all goes over my head, but when I watch a talk given by Raymond about basic data types and iterators (his creation), I get excited again (I’m glad to see he’s still contributing! Boy does that man have a talent for teaching)

I think while the experienced here might scoff at it, it still remains one of the most accessible and -fun- languages. Also I just adore iterators and any language without it is a no-go.


I don't like python the language, it has a lot of warts that are unlikely to be fixed at this point, but I sure am excited by python the ecosystem: there's very little you can't do with it, interacting with different systems, databases, networks, etc. with a few python scripts is very easy and productive.

> I think while the experienced here might scoff at it, it still remains one of the most accessible and -fun- languages.

This is a shit attitude, and this comes from someone who mostly does C++ and Java at work, you know, "real languages" with braces and builds and whatnot. Python being easy and accessible isn't a bug, it's a goddamn feature.


I am. Python is a joy to work with. Moreover, new features are being delivered at incredible speed.


Python just gets out of your way like almost nothing else. I work in graph ML, and Python allows us to glue together runtimes and technologies with ease (eg PySpark).


Not that much to be honest. Python is the language that gave dynamic typing its bad name. When I code in Python I must have unit tests even for the simpler of scripts because it's so easy to get a NoneType where you don't want it to.

I'm not sure dynamic typing is the fault by itself, I guess it's from a lack of compilation phase, as other languages such as Elixir seems to protect you from typing mistakes much more than Python does. My running theory is that dynamic typing is a sprectrum, and Python is on the side of "it's very easy to get it wrong" compared to other languages. I haven't used the recent versions with type hints, so it might've gotten better.

Also, I'm not a fan of how it feels. It is afraid of functional constructs: come on, itertools.accumulate is a simple reduce summing the elements. Why is Python so against map, reduce and ergonomic lambdas?

Like Go, it seems Python wants to be simple enough to be understood in large teams of varying developer skills, but unlike Go, Python is much harder to understand and debug in hairier codebases when you add metaclasses, weird backward compatible behaviours, monkey patching, typing woes and an often substandard stdlib.


> Python is the language that gave dynamic typing its bad name.

I don't think this is true at all. There have been people against dynamic typing since it was first introduced, and their justifications haven't changed due to Python. There's nothing about Python's approach to dynamic types that is particularly egregious, in my opinion. In fact, since type annotations were added to the syntax in 3.6 (I think), there is now support for external tools to do some static type-checking (though these are not enforced by the language proper). If I were to point to an in-use dynamically typed language with dynamic type problems, I'd be pointing at JavaScript. The extend of implicit conversions is truly heinous (again, just my opinion).

> I'm not sure dynamic typing is the fault by itself, I guess it's from a lack of compilation phase

I'm not sure I understand this. For a language to be "dynamically typed", it must necessarily not check types during compilation no matter what, otherwise it would be "statically typed". So I don't see how adding a distinct compilation phase to Python's evaluation model would affect anything.

Also, Python does have a compilation phase; it just isn't distinct from the interpretation phase. There are no longer interpreters separate from compilers, as far as mainstream languages are concerned.

> Why is Python so against map, reduce and ergonomic lambdas?

The actual answer to this is that Guido van Rossum didn't want to make them first-class. That's the whole justification.

> Python is much harder to understand and debug in hairier codebases when you add metaclasses, weird backward compatible behaviours, monkey patching, typing woes and an often substandard stdlib.

Most of these sound like problems with dynamic types, honestly.


> Python is the language that gave dynamic typing its bad name

I'm pretty sure that honor falls upon PHP and perhaps Perl as well as JS.


> Python is on the side of "it's very easy to get it wrong"

> Python is much harder to understand and debug in hairier codebases when you add metaclasses, weird backward compatible behaviours, monkey patching, typing woes and an often substandard stdlib.

Indeed, Python (it's not just about dynamic typing, but more about how the underlying constructs of the language are exposed as first class citizen) makes it possible to write very bad programs, that no one would want to work with.

But this is also what allowed to build great libraries, that could abstract a lot from the user, particularly because of how much you can twist it to your needs. It's also great when writing tests.

My take is that there was a time were monkeypatching and doing this nasty stuff everywhere was common (early days of python becoming mainstream). Some libraries might still do this, but they are well tested and supported, so not an issue for them. But most python code being written now does not make much use of these features.

About functional constructs, I mostly agree with you, but the various comprehensions are a joy to work with, and provide enough for the most common usecases; the recent `match` construct also makes it possible to simulate ADT, which is great.

Finally, typing is very mainstream now, great libraries such as dataclasses or pydantic make classes with typed fields very easy to use, with minimal boilerplate.


> Python is the language that gave dynamic typing its bad name.

You never saw a line of assembler, didn't you? You also just discounted 40 years of computer science (EDIT: I mean programming, CS has longer history of course) that happened before advent of Python. Good job.


There's nothing useful to be accomplished by conflating untyped languages with dynamic ones. I've written a bit of assembly language and more Forth; it's not Python, it's not even relevant to Python.


Note: edited to show more care.

Well, separating them is just as arbitrary. From the perspective of type theory, both kinds are just unityped. I was responding to claims about dynamic typing - and to that, assembler is relevant, as it uses the same typing discipline.

Moreover, as others noted, the division between static and dynamic typing fans probably started with Lisp sometime in the '50s. The arguments for and against each typing discipline are mostly the same as they were way, way before Python existed. Some technological breakthroughs did change some aspects of a discussion, but the debate is mainly philosophical and treated as a social activity more than an engineering one. That helped it remain frozen in time for so long.

In any case, Python is most assuredly not what gave dynamic typing a bad name. Instead, what gave dynamic typing a bad name were developers with a strong preference for static typing. And on the other hand, what gave static typing a bad reputation - and it also does have one like that - were developers who liked dynamic typing more.

That statement is ahistorical and is impossible to defend without intensive goalpost-moving.


There's no point in talking to you if you don't show the care needed to figure out that I'm not the person you're yelling at.


Seriously? Even if my arguments are sound? Just because I didn't notice the change in the nick, because I was replying from the threads page, where the parent post of my first comment was not displayed? Well... Ok, I guess. And what yelling, could you please tell me? I read my previous post probably 20 times by now, and I honestly cannot see what did I write that deserves to be called "yelling"...


I was really excited to see graphlib in stdlib (despite the very limited use so far), but wow is it me or that API is.... awful. I guess it's super focused on parallel task processing but that has to be one of the least Pythonic and user-friendly APIs I've seen. Create a class, then call .prepare(), then loop over .is_active(), what the hell were they thinking? I was hoping to have a nice alternative to networkx, which is big and slow, but this is honestly a mess.

Also is ZoneInfo basically stdlib version of pytz? Why not just put it under datetime.timezone like utc?


> Also is ZoneInfo basically stdlib version of pytz? Why not just put it under datetime.timezone like utc?

A good thing about Python is that pretty much every decision and rationale is documented by PEP, and this is no exception: https://peps.python.org/pep-0615/#using-datetime-zoneinfo-in...

In short the heavy implementation of zoneinfo disfavors a nested module, since it is not a norm in Python that importing A doesn't automatically make A.B available as well.


I’ve only read the top level part of the PEP, but it sounds like they’d rather keep the Python source code clean than reuse `datetime.timezone`.

They optimised for, what, 20 people instead of optimising for thousands. I am not impressed at all.

“Optimising” here is quite a euphemism too. The omission of timezones from the standard library, alongside allowing timezone naive datetimes, has been a major implementation flaw for a long time.


I see where you come from, but a timezone database presents several challenging design problems to date and time libraries (I have written some popular date & time library in the past).

One of such problems is to keep the database up to date, which is nothing but trivial. I'm very confident that the OS-level time zone data is the only database you can be reasonably sure that it's likely up to date, and different OSes expose (sometimes accidentally) different bits of data. Any other solutions are not very automated and tend to drift, I've seen systems where pytz or Java's TZUpdater neglected for a while. I hoped TZDIST [1] to be a thing, but it is not widespread enough, to say the least.

In this sense PEP 615 is not exactly the solution but a stopgap; it essentially admits that we can't exactly solve this problem, but we can at least include the existing half-solution into the batteries.

[1] https://www.rfc-editor.org/rfc/rfc7808.html


Reusing datetime.timezone would not make sense. datetime.timezone represents fixed UTC offsets, while ZoneInfo objects represent IANA timezones (which generally don't have a fixed offset).


I believe GP wants something like `datetime.timezone.from_tzid("Etc/UTC")`, which is by itself a fine backward-compatible upgrade from the original interface.


I would disagree that this is a fine interface. It's not clear with the "Etc/UTC", but what is the offset of datetime.timezone.from_tzid("America/New_York")? You need a datetime to go with the TZ id to answer that question. And experience has shown "Just assume the current time" is a bit of a footgun.


As currently defined, datetime.timezone doesn't directly expose the zone offset as a whole and always requires some reference datetime object to compute it, even though the built-in implementation would simply ignore it for now [1]. So it can be surely upgraded.

[1] https://docs.python.org/3/library/datetime.html#datetime.tim...


I guess they wanted to raise an exception if the graph has cycles, while at the same time still be able to use the object partially. Then the constructor can not raise.

Also, you don't have to call is_active(), the object itself is convertible to bool. Not sure if that is an improvement.

graphlib is definitely underwhelming at the moment. A standard Python vocabulary for graphs would be nice.


Similar: Ned Batchelder recently posted about major Python changes introduced in every point-release -

https://nedbatchelder.com/text/which-py.html


I think Python needs much fewer packages in its standard library:

- array: everyone uses numpy for the same purposes

- bisect: very niche feature, not a good fit for standard library

- glob: should probably be part of os

- graphlib: why is it part of the standard library? even if you need to work with graphs, chances are you have different requirements or data format

- shlex: again, not really necessary in standard library (unless it's used in shutil, I don't know)

- statistics: should be handled by scipy

And that's just the modules mentioned in this article. There are tons of other modules that are outdated, unused, don't fit into standard library: getopt, curses, urllib (everyone uses requests), xml*, html, tkinter.

What Python should be doing is deprecating those modules and/or moving them outside of the standard library. Not adding more of them.


Sorry to pile-on on the disagreements, but I also try to keep my scripts as external dependency-free as possible (at the expense of reinventing the wheel a bit) so that I minimize supply-chain attacks, and to keep things as future-proof as possible.

For that, some of the included libraries really ease the pain of having to reinvent things like text-wrapping, special cases when iterating over things, creating TUIs, etc.

The more you read the docs, the more you appreciate the thought and effort of the Python developers in making things useful and convenient for everyone.


Python package management is a nightmare. A script that only uses the standard library can be shared as a single file.

To me, the breadth of the standard library is one of Python’s main strengths.


Strange selection you got there.

This is what will be (eventually) removed: https://docs.python.org/3/library/superseded.html

> urllib (everyone uses requests)

iirc requests is actually a wrapper for urllib. And there are certainly many scripts using urllib directly to avoid having an out-of-stdlib dependency. The API is not that bad.


requests uses urllib3, which was forked or split out of the standard library, presumably due to missing features or design flaws.


Thanks for the link! It's nice to see that some cleanup is ongoing.


They do sort out dead batteries [1]. But pretty much every case you've listed is not exactly dead, but more like a philosophical disagreement.

For example some modules are not what they seem to be; graphlib doesn't purport to be a general graph library and statistics doesn't mean to replace scipy. They are just groups for otherwise independent bits of code, named so that similar future code can be grouped together. In some cases you have a waaaaaay too high threshold for "unused" modules, for example [1] kept getopt because it mirrors a popular C library.

[1] https://peps.python.org/pep-0594/


> kept getopt because it mirrors a popular C library

It's an excellent reason to keep it as a separately maintained module. Having in the standard library two modules for the same purpose just adds to the confusion.


Strong disagree on 'statistics', I'm not pulling in the entirety of SciPy just so I can call stdev in a small script.


You can also use numpy to use stdev, and i would argue thay in most contexts where you need to compute stdev you are also likely to use other numpy and scicpy features


It is truly the jQuery of the Python world.


I personally love the (extra)-batteries included approach to the standard library.

Having them included by default is much more convenient; I may want to do some very simple stats without needing all of scipy, etc.


Array - What's the use case here? (And I'm not being snarky or even critical.)

My usage is generally lists for small stuff where I don't care about performance, and pandas/numpy for analysis. So maybe array is either if I don't want numpy, or if I am doing something real and large and list performance would truly be a problem?

Also, I looked for a PEP to answer these questions and couldn't find it. (The one I found was very old). Or any stats on performance improvements?


I agree with some of these, and some could be consolidated, but the features of `glob` and `shlex` are useful for writing scripts, which I think is a common use for Python. Seems to me that leaving them in is reasonable.

I also think having basic statistic computations is useful. There are people who may want to calculate simple statistics without adding a whole dependency.


I use the array module to make the Python/C extension interface easier, to save space for big data sets, and (with from/to), simplify I/O.

I don't want a dependency on NumPy, especially as I don't need any other feature of NumPy beyond storing a homogeneous, resizeable vector of simple data types.


> What Python should be doing is deprecating those modules and/or moving them outside of the standard library. Not adding more of them.

No, thanks.


No. The "batteries included" is awesome for many reasons. Removing in-use modules would also create serious upgrade problem.


Yes, but it includes wrong batteries. If anything, it should be including numpy or pandas, not arrays.


> array: everyone uses numpy for the same purposes

It still is used in situations where numpy doesn't fit (serialization of data, etc)

Agree with shlex and glob

> statistics: should be handled by scipy

It was just added. I guess for people who want to import numpy to calculate the mean of an array? (please don't do this - you don't need to do this)

getopt is a needed evil and I think requests uses a lot of builtin stuff

Python does deprecate some modules once in a while


Are you trolling?

For one thing, you can pry Tkinter from my cold dead hands. For another, I used array just last week.

This whole "no one is using it so throw it away" culture is part of the improving-to-death of the modern Python ecosystem. I think it must come from the Python 3 fiasco. It's become fashionable in a subset of Python community to aggressively deprecate working code.


Given how many warts remain in the language, not aggressive enough.


binary search is standard imo.


I agree, Python's stdlib doesn't have a lot of standard algorithms and data structures; bisect is a very welcome exception.


glob is part of pathlib in Python 3. Pathlib is great.


But, unfortunately, very slow.

https://youtu.be/tFrh9hKMS6Y


glob works on more than paths, though


Like what for example?


They probably mean fnmatch, which is what glob actually uses under the hood. That's indeed a useful module for simple wildcard filtering.


Exactly! I use fnmatch a lot to translate glob strings to regexes (in a VFX company, I'm a developer for a Houdini and in-house software pipeline).

Being able to provide solid globbing options to users in any scenario that is 'path-y' can provide massively more power to them, in lots of different cases I'd say.


My bad, I did mix it up with fnmatch.


glob could be part of os (or maybe even pathlib) that is true. But I have used all except graphli at one point or another with shlex being the most regular.


In the case of graphlib, I somewhat agree - graph functions are necessary for many modern applications, and the choice of functions supported by graphlib is... interesting. It would probably make more sense to either extend them or just replace it with a more useful library.


Did not know about functools.singledispatchmethod, this could lead to some clean code in specific situations. I know I’ve missed overloading a few times!


For a property setter, you have to give the method the same name as the original property, otherwise it mysteriously breaks. Here for singledispatchmethod they give it the name _. Why no consistency? Such things seem ad-hoc in python. (This criticism comes from an avid Python user.)


I guess to indicate that the method the @mydispatch.register() decorator wraps will not be added to the class namespace; it will only be accessible via the function wrapped by @singledispatchmethod. As is typical in Python, an underscore indicates an anonymous/unused name.


This explanation doesn't seem to cover the case with @property def foo(self) and the following @foo.setter def foo(self, value) - I wonder why the name of the foo.setter method matters, it doesn't make sense to me.


`singledispatchmethod` definitely looks useful.

The name isn't great, though. It actually offers double dispatch, not single dispatch! `divider.divide(10, 2)` first dispatches on `divider`, then on `10`.

As so often with Python, I'm left wondering why they didn't follow through and properly complete the design, in this case perhaps offering multiple dispatch (multimethods) instead of just double dispatch. Casting an eye over the functools code, it looks like they have all the necessary C3 infrastructure in there.


Python sort of has multimethods, but only for type-annotations, not for actually dispatching to code.


Definitely useful.

That code looks ugly as fuck though


this feels really unpythonic, would be very surprised if I saw this in code


Though I'm guessing it won't get as much use now pattern matching has arrived and is a more general way of achieving the same thing.


I've been making extensive use of it lately; it's treating me quite well.


Question about this part:

    key = operator.itemgetter("name")
    people = sorted([p1, p2, p3], key=key)
What's the advantage of using operator module instead of just

    key=lambda x: x['name']

?


It's faster (better performance). It could also be preferred just as a coding style.


Holy shit, you're right. It's about 33% faster on Python 3.10 on my machine:

  import operator
  n1 = operator.itemgetter('name')
  n2 = lambda x: x['name']
  x = {'name': 'abcdef'}
  import timeit
  print(timeit.timeit('n1(x)', globals=globals()))
  print(timeit.timeit('n2(x)', globals=globals()))
I'm so surprised because I went and read (what I thought was) the source for operator.itemgetter, which boils down to a wrapper object around the plain version.

It turns out that this is not the source code. As with so many other things in Python, there's a special case you have to know about: there's a separate C implementation of the operator module. :-/


> It turns out that this is not the source code. As with so many other things in Python, there's a special case you have to know about: there's a separate C implementation of the operator module. :-/

I actually kinda like that Python often maintains 100% compatible pure-Python implementations of various C modules. It's definitely niche, but I have sometimes found myself trying to work out what a module is doing (e.g. with the fancy new PEG parser) and I find it far easier to read Python than C.


I'd expect it to be even better improvement inside sort, but maybe my hunch is wrong. The significance is greater there, at least.


This makes me wonder, couldn't python have a peephole optimization that replaces certain easy labmdas like this with operator functions? Maybe it's too noticeable to work as a transparent optimization (function kind and name etc changes).


Such optimizations can't be transparent in python (at least not easily) because you could very well reassigned everything in `operator` to do whatever you want.

PyPy can do it with its jit because it supports invalidating compiled code when such changes are made.


Assuming the implementation has access to the "real" operator.itemgetter, I can't think of a case where lambda x: x['name'] cannot be replaced by a operator.itemgetter("name") and get a different result.

... except for the stack trace:

  >>> f = lambda x: x["A"]
  >>> f({"A": 3})
  3
  >>> f({"B": 3})
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 1, in <lambda>
  KeyError: 'A'

  >>> import operator
  >>> f = operator.itemgetter("A")
  >>> f({"A": 3})
  3
  >>> f({"B": 3})
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  KeyError: 'A'
So if 'x' implemented its own getitem and it inspected the stack (eg, how Pandas uses the 'level' parameter in 'eval' and 'query' methods to get locals() and globals()), then this change would cause different results.


Oh, I see.

I think you are right.


I generally agree that it's not worth importing operator just to get access to itemgetter. operator module has a few other nice functions though.


What's your favourite operator feature?


I'm not the same person, but there's not much in operator module that can't be done by simply using an operator symbol. It's useful to use in e.g. map(), but you could use lambdas instead (for some performance overhead as mentioned).

operator.countOf() is perhaps the hardest to reimplement in lambda, as it's more reliable than lambda x: x.count() - for example it will work for dictionary.values() as well.


FWIW I prefer itemgetter to a lambda


TIL about dict reverse mapping in 3.10. This will be rather useful:

    people = {
        "Diane": 70,
        "Bob": 78,
        "Emma": 84
    }

    keys = people.keys()
    # dict_keys(['Diane', 'Bob', 'Emma'])

    keys.mapping["Bob"]
    # 78
EDIT: I actually misunderstood this; see comments. The keys.mapping field does not point to the original dictionary. I guess this is actually just about consistency with the way other mappings work? Not so useful (to me) after all.


Why would you want a collection of keys to behave like the mapping they came from? Why not pass around the dict itself, when you need access to the values? This seems like it extends the responsibilities of "keys" beyond what they should care about, and the mental overhead grows with it.

Or is this about read-only access? There are better, more explicit ways to accomplish that.


I do not see the use for this, over the standard access methods.

    people = {
        "Diane": 70,
        "Bob": 78,
        "Emma": 84
    }

    people.get("Bob")
    # 78
The common usecase is:

    for key in people.keys():
        people.get(key)
vs

    keys = people.keys()
    # dict_keys(['Diane', 'Bob', 'Emma'])

    for key in keys:
        keys.mapping[key]
What is the advantage?


It gives you access to the original dictionary from a view. I've not played with it yet but I guess that means you can modify it.

EDIT: apparently not:

    >>> keys.mapping
    mappingproxy({'Diane': 70, 'Bob': 78, 'Emma': 84})
In this case I actually don't know what the use case is :shrug:


The rationale was here.

https://github.com/python/cpython/issues/85067#issuecomment-...

It seems a bit vague: « Exposing this attribute would help with introspection, making it possible to write efficient functions that operate on dict views. »


I don’t quite get the usecase for it. You’ve got the original dictionary that gives you exactly the same thing. There’s obviously something I’m not thinking of here but I’m not sure why I would want to trade the dictionary for a set of keys, that’s also the dictionary.


I hope the new itertools.pairwise could be added to strings and lists with split()

e.g.

'abcdef'.split(2) == ['ab', 'cd', 'ef'] # not the same as .split("2")

[1, 2, 3, 4, 5, 6].split(3) == [[1,2,3], [4,5,6]]

Or one can use the `grouper` from the recipe

https://docs.python.org/3/library/itertools.html#itertools-r...


> 'abcdef'.split(2) == ['ab', 'cd', 'ef'] # not the same as .split("2")

Pairwise would give you ['ab', 'bc', 'cd', 'de', 'ef']. It's not a split.


int.bit_count, but str.removesuffix -- consistency has never been a strength of the Python standard library.


Python's style guide prefers words separated by underscores, but makes an exception to match the prevailing style of the module. `removesuffix` is consistent with the naming style of the rest of the `str` methods. `str` is such a fundamental class that it likely predates the current style guide. (All of the `int` named methods seem to be new to Python 3.)


`bit_count` is a terrible name for a function that returns the number of one-bits. Zero-bits are bits too!


Yeah, good point! I hadn't thought about that until you mentioned it but totally agree. I'd suggest something like `num_enabled_bits` or `enabled_bits_count` might have made things a bit clearer?


The operation's name is "popcount", there's an asm (i86?) instruction "POPCNT".

https://duckduckgo.com/?t=ffcm&q=popcount&ia=web


What do you mean? That one's a verb and the other isn't? I think it's fine here. The latter returns a modified version of the string, whereas the former is more like a property of the int, like it's "length", although Python also doesn't have .length so I guess that's a bad example. Think of it like datetime.year vs datetime.replace


I think the problem is that the first separates words by underscore and the second doesn’t


Well count could be a verb here. And one has an underscore while the other smashes two words together.


similarly, I was just looking at functools.cached_property.. In the usual style of Python, it's a smart special case but not the same as the composition of @functools.cache and @property. Maybe it should have had a different name, for that reason, and the issue would be less prominent.


What is the use case for int.bit_count [1] which returns the number of ones in the binary representation of the integer?

[1]: https://docs.python.org/3/library/stdtypes.html#int.bit_coun...


If you are using integers to represent subsets then int.bit_count is the number of elements of the set. There is a machine instruction to count the number of ones in the binary representation, maybe popcnt is the name, I am not sure.

Edited, more info: AMD's Barcelona architecture introduced the advanced bit manipulation (ABM) ISA introducing the POPCNT instruction as part of the SSE4a extensions in 2007. Intel Core processors introduced a POPCNT instruction with the SSE4.2 instruction set extension, first available in a Nehalem-based Core i7 processor, released in November 2008.

https://www.rosettacode.org/wiki/Population_count


A common use is the Hamming distance, which is the number of different bits between two streams. It comes up in things like the census transform, which is used for image matching.

https://www.researchgate.net/figure/Census-transform-and-Ham...


Sometimes you might use bits in an integer to encode data, e.g.;

- A bitset index

- Bitflags (e.g. to represent logging levels)

- Feature flags

`bit_count` (aka `popcount`) gives you an efficient way to figure out how many bits within those structures are set and not set.


https://en.wikipedia.org/wiki/Hamming_weight

I guess it's included as an easy way to access the fast hardware supported implementation, on architectures where it is available.


Sadly it does not call into the hardware implementation: https://github.com/python/cpython/blob/f62ad4f2c4214fdc05cc4...


It does. Note that it calls popcount_digit on each digit of the integer (a 'digit' here is 32 bits), which calls _Py_popcount32, which does map to a hardware instruction if possible: https://github.com/python/cpython/blob/f62ad4f2c4214fdc05cc4...


Ah, you are correct. Although I still take issue with "fast" in the comment I was replying to ;)


Never used in python, but I had some C++ code where I used bit sets to identify which values in an array where valid, in that context the amount of ones was the amount of valid entries.


It’s surprisingly useful, and shows up everywhere from chess to molecular biology—-but the CPU instruction probably exists because of its use in code-breaking.

This article has a lot of interesting examples. https://vaibhavsagar.com/blog/2019/09/08/popcount/


Got ‘em!!!

(0.2).as_integer_ratio()

# (3602879701896397, 18014398509481984)

# oopsie


thank you for this practical, hype-free guide!!! couldn't agree more with your sentiment about yawnsync.io. i wonder how the speed/dimensionality of array compares to numpy


Oh, come in, who did the graphlib? It's not cycles, it's strongly connected components.


I'm disappointed base32hex isn't called duotrigesimal...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: