Python discovers immutable data, and gets it wrong. Frozendict is a blunt instrument - instead of a mutable free-for-all, it just locks the data for the entire lifecycle. Brace for the wave of code littered with deep copies, proudly proclaiming how functional it all is.
If you want real immutable data structures, not a cheap imitation, check out pyrsistent.
If your dicts are frozen, you shouldn't need to deep-copy. The point of immutability is that if you want a new frozendict based on another one, you just rebuild the indirection data structure up top and leave the values it references alone.
You're absolutely right: an "indirection data structure" is necessary. Freezing the data is the least interesting part - it doesn't give you any of the benefits typically associated with immutable data structures in functional languages. That's my point - Python is shipping a half solution that's being mistaken for a proper one.
You think Python developers are going to roll their own HAMT on top of frozendicts? Or are they just gonna make copies? Personally, I'd just use pyrsistent which seems to get it right.
Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.
This is rarely the case in practice, most dictionaries and dict operations are small, if you have a huge dict, you probably should be chunking your load or delegating that to infra.
Not to mention pyrsistent's API is incompatible with dicts, so you can't pass it to external code without conversion.
You'd better have an incredible ROI to justify that.
Where's the code? Have you observed the bottleneck call?
> Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.
> This is rarely the case in practice
Where's the stats on the actual practice?
> You'd better have an incredible ROI to justify that.
The ROI being: fearless API design where 1) multiple instances of high level components are truly independent and could easily parallelize, 2) calling sites know that they keep the original data intact and that callees behave within the immutability constraints, 3) default func inputs and global scope objects are immutable without having to implement another PEP, 4) collections are hashable in general.
It's a little early to say that "Python is shipping" anything related to this. The PEP is still in draft status and it may be modified (as it was three hours ago) or even rejected. The article says it is likely to be accepted, but that has not happened yet. That also means there is time to comment on the PEP in the discussion it links to if you'd like.
The main trick is to store everything in a tree with a large branching factor.
The elements are on your leaves.
When you want to make an edit you need to create only the nodes from the root to the actual leaf. Then you return the new root. Everything else you can share.
This is a log operation, normally implemented with branch 32.
It works with vectors as well as dicts.
Creating a copy is constant, it's immutable can be shared freely (especially on GC'd languages).
Insert/Delete are O(log32(n)) instead of O(n) they'd be with a full copy.
And, yet, somehow, functional languages have been doing this since forever? I jest, I jest! There's an entire field of study focused on this! I'm no expert, but imagine you've got a singly-linked list, and we're going to allow only push_back() and pull_back() (no insert()). Let's go ahead and let multiple owners have at this list. If A does "pull_back()" what really happens is A has a local "tail" pointer that moves back along the end of the list and has a new belief about the end of the list. When A does "push_back()" it begins attaching links at its new tail. Since the links just "point backwards" without modifying the old links, this doesn't mutate the list "as is", it only mutates A's version of the list. So, if B is also looking at the list, it could just start doing "push_back()" and add its own "tail". The result is a data-structure that's a "bunch of shared singly linked lists" in a bush structure that's more akin to a tree than a singly linked list.
You can do the same thing with binary trees and other structures. It's more fiddly, and there's definitely places where single-ownership allows mutability "under the hood" for real performance gains, but that's the basic idea.
Basically, a proxy. You don't need to deep copy; you just need to return a proxy object that falls back to the original dict if the key you are requesting has not been found or modified.
Functional data structures essentially create a proxy on every write. This can be inefficient if you make writes in batches, and you only need immutability between batches.
If you want real immutable data structures, not a cheap imitation, check out pyrsistent.