I disagree that there is no practical benefit when you add the creative elements of 3D to the display of traditional 2D content. Reading a line of text is fundamentally the same in whatever format you consume - but there are serious interactions with that line of text which only become possible in an animated free space. Consider learning to read in a foreign language where HMD eye tracking is used to infer difficulty on a certain word that triggers additional supporting materials. There are thousands of examples yet to be explored and the impact to well established existing 2D information systems will be dramatic. Implementing 3D layers will offer a double win of increased functional utility combined with a nicer more human fitting and artfully expressed interface.
You are making points about potential creative applications - BTW I thoroughly disagree on your language example, comp learning people are constantly making this mistake, we don't need better visualization to learn foreign languages, it is really all about practice, which unfortunately for you has nothing to do with your scenario - the 'triggering additional supporting materials' has nothing to do with the user actually practicing, at best you have an overly complex hyper-micro-optimization that will bombard the user with more unhelpful material...
The issue in the parent comment had to do with user interface e.g. for OS - I think the concept is doomed in the general sense - people live in houses, which are mostly reduced to a set of 2d interfaces - having houses in your house is not helpful, its literally just more confusing.
This the general issue - if the assumption is 3D is better than 2D then we should aspire to do everything in 4D, 5D, etc. Apart from obvious physical limitations, there is a good reason we don't do this. Our computer systems already are n-dimensional - dimensions are useful for storing complexity. We crave simplicity, though, this is why 2D is so popular, we reduce complex n-dimensional models to 2 dimensional ones.
As programmers, we even reduce it to a single, textual dimension - being able to follow a single thread is often all we can easily reason about. Many, many people prefer reading or listening to audio over watching pictures - TV shows can be nice to veg out to, but they are much harder and more complex to dig into and really engage with.
That's why there is no good use case for a 3D OS shell, for the majority of people it doesn't provide adequate visualization value for the added complexity. To a systems engineer, there could be some value in viewing OS components as parts of a car engine, perhaps, indeed a lot of useful tooling seeks to visualize this type as stuff as much as possible. But your average Joe just needs email and maybe pictures and video - sticking them in a 3D environment just makes them more difficult to use.