> Fortunately, my session database is just 'JSON(B) in a single table', so I was able to add those additional fields without the need for any migrations. And SQLite's `json_extract` makes adding indexes after-the-fact super-easy.
Our solution for a similar situation involving semi-structured data (in postgres) was to double it up: put all the json we send/receive with a vendor into a json field, then anything we actually need to work on gets extracted into regular table/columns. We get all the safety/performance guarantees the database would normally give us, plus historical data for debugging or to extract into a new column if we now need it. The one thing we had to monitor in code reviews was to never use the json field directly for functionality.
This is exactly what I've tried (and failed at) doing! Can I ask how you handle normalization from vendor data when it contains relationships and multilevel nesting? How do you know when to create a new child table, and which ones to create, and their relationships etc. I haven't found a good balance yet.
Basically what the other reply said - handle it the same as you would any complex data. You just don't need to handle all of the json immediately, only the parts you plan on using for the moment.
Odd-shaped miscellaneous data that you only need to retrieve is a good candidate for a JSON field. Once you're heavily using some piece of data, or if you need to index it (which means you are heavily using it), you should insert the data in the database "properly".
If some vendor is giving you a list of categories you don't care about, there's no need to make a vendor categories table and a many-to-many link table until you actually need them.
The point is that putting data properly in the database lets you use database features on it and get database performance.
Our solution for a similar situation involving semi-structured data (in postgres) was to double it up: put all the json we send/receive with a vendor into a json field, then anything we actually need to work on gets extracted into regular table/columns. We get all the safety/performance guarantees the database would normally give us, plus historical data for debugging or to extract into a new column if we now need it. The one thing we had to monitor in code reviews was to never use the json field directly for functionality.