I don't think I subscribe to the idea the that we need to instrument everything to collect data for eventual ML (especially if that data is going to be privately owned). I can see why that would be appealing especially if you are a large software company involved with ads.
If there really is a killer ML model for a particular IoT device then let me opt in/out and the data collecting can be anonymous, no login required. Being online should not be required for its function.
edit 2: realized I did not address the point on updates
Self updating devices I am more on the fence about. Ideally I would have some control over what is getting updated and when and the ability to revert things and schedule the updates.
It would also be great if these devices used ssl and signatures for updates. There are horror stories of them opening up tftp and using http in the clear.
If this is too hard for IoT makers get right maybe hubs are the way to go? Not sure but the fewer things phoning home via the internet on my network the better I guess.
Maybe the current feature set is enough; it is what the customer bought after all.
ML algos can be pretrained before shipment and the model baked into the device. Online training can be gamed just like search optimization games Google search; in fact it is a pretty big security hole.
Some internet/local network server on the network should provide software to do that. This software should be competed on separate from the network of thing devices individually (and should probably have open source interfaces and implementations; an industry consortium like the Khronos group - OpenGL, OpenCL, Vulkan, etc. - would be perfect). You could buy a pre-built box for it, it might get folded into the future all-in-1 router/modem (website) or console/TV box, you can have your laptop or old desktop do it, etc.
That way there is a single authoritative device to secure and control everything with, a single device to reset if it becomes compromised, etc. Updates can be in cryptographically signed bundles from their manufacturers. And any smart algorithms can be run through the server. Arbitrary code execution on devices would be strongly discouraged by industry practice.