Hacker News new | past | comments | ask | show | jobs | submit login

> 5. Never remove a name. > Removing a named schema component at any level is a breaking change for programs that depend on that name. Never remove a name.

I agree with this in theory and have seen it go oh so very wrong in practice. Tables with dozens of columns, some of which may be unusued, invalid, actively deceiving, or at the very least confusing. Then a new developer joins and goes "A-ha! This is the way to get my data." ... except it's not and now their query is lying to users, analysts, leadership, anyone who thinks they're looking at the right data but isn't.

You absolutely have to make time to deprecate and remove parts of the schema that are no longer valid. Even if it means breaking a few eggs (hopefully during a thorough test run or phased rollout)




This x100. The most miserable and frustrating periods of my career have been in places that never deprecated anything. You could spend hours doing something that looked quite sensible, get a working draft that seemed to work, and then be told "oh yeah, that's deprecated, that data isn't even populated anymore, those rows just _happened_ to have data in dev." Then you either start sanity checking everything before doing anything and your velocity sucks, or just keep stepping on landmines and losing whole afternoons.

Edited to add: docs can help, but only so much. Environments that cluttered also tend to have layers of docs that are equally misleading.


There are few things more important than comprehensive and up to date database documentation. Otherwise you don't even know what your data means. An organization that cannot produce documentation like that is somewhere between amateurish and waiting for a disaster to happen, unfortunately.


I don’t really know how to screen for that before joining a company but I’d say 20% of companies seem to be at that point.


Reclaiming the physical storage of an unused column is often a costly and sometimes impossible operation, which is why many legacy applications end up with the equivalent of my_column_final_final_v2. Database administration requires compromises like this sometimes in the name of uptime and data integrity. Big migrations are always inherently a little risky, and from the view of many DBAs, why even risk it just for a bit of clean up? Your schema shouldn't be totally transparent to your application's business logic anyway, so there are better places to enforce naming hygiene.


I believe in most relational databases you can just alter a column to allow null values and run a series of transactions in the background to set that column value to null, and that will quite effectively free up most of the physical overhead of the column in question. I would be reluctant to delete, rename, or even clear all the data out of a column without providing an alias though.


Yeah this how you grow to the point of destruction. Your schema is half noise and nobody understands it. Then someone says you need to start from scratch.


worse, I've seen supposedly unused columns be used for some other purpose, and then existing analytics fall apart.


Hyrum's law in action.


But then why not addressing the real problem? If a table has a few columns which are not used or invalid or deceiving, why did we let developers introduce them? Lack of planning? Lack of peer review? Lack of talent?

I understand these “ten rules” as: as long as you have a decent codebase and decent engineers, these ten rules will make your life easier.

These rules are nothing if you are dealing with crap codebases (they can help, sure, but they will be just patches)


Because sometimes you make assumptions that are seemingly correct but eventually found to be wrong or based on flawed inputs from sources beyond your control.

Any system that ultimately relies on "engineers need to always do the right thing" is a flawed, brittle, ineffectual system. Because even the best engineers will make a mistake somewhere, and because you can't exclusively hire "the best" engineers.

Let's spend our time figuring out how to recover from mistakes rather than trying to pretend they'll never happen.


I've worked with some databases that are 20+ years old and have outlived multiple application iterations. There's always going to be cruft in this kind of situation, it just comes with territory of supporting applications with real production users for a long time.


Even the best team make design decision that turns out suboptimal when the requirement changes.

Also, even the best team will sometimes make mistake.

Db schemas are unforgiving.


Requirements change over time. Domain understanding change over tim. Business change over time. Environments change over time. Unless you are a seer with perfect precognition, most of what you have done will be invalidated over time.

Hence: make your code and data easy to change, but simple, as you cannot predict in what way it will change.


> Unless you are a seer with perfect precognition

Even then ain't nobody in a 10 person seed-stage startup got time, resources, or need to build the database you'll want to have when you're a 600 person Series C monster.


Developers without special training should generally not do database design for the sort of databases that are intended to last decades. It is a similar task to developing a complex file format that is usable twenty years later - not something to be done off the cuff, and if you want schema stability database design requires more care than most file formats.


While I agree with you, unfortunately this is unrealistic. Unless a startup happens to have someone skilled with schema design, they’re going to make do with what they can, and it’s very unlikely that they’d waste headcount on a dedicated DBA / DBRE at a young stage.

The immediate effect of that, of course, is that they also won’t try to hire any such person until the DB is a problem they can’t scale via throwing money at it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: