Let me add a few more from my 16+ years of DB coding:
1. Many field types will change over time. Booleans will become multiple choice. Character fields will become memos/text. Date fields will start to have year, month, or day parts optional.
2. All tables with tax-logic will be compounded by code hacks. So just a table with county tax rates and order amount ranges will not be sufficient to calculate the applicable tax because you will need to keep adding special conditions of tax to apply when pencils are sold on 3rd week of every month when school is in session.
3. Management will never understand how it could be that complex to "just change the database" to enable payroll calculation for hourly employees who work overtime just because the database already handles the fixed payroll for salaried employees. In other words, databases with existing data, logic, triggers, and reports will always be difficult to change.
4. The biggest rule is that business logic never ever translates to structured database tables without lots of conditions in code and strict limitations in scope of application. I'm not talking about pretty to-do lists, address books, or blog engines. I'm talking custom ERP apps, MRP modules, production forecast algos and my favorite, scheduling for production and employees.
I guess the take-home message in all of this is that don't get into this field if you don't like challenges of this sort. In addition to coding up DB apps, I also make a lot of non-database apps that keep me sane and balance things out. However, in terms of sheer complexity and project planning, good DB apps are pretty high on my list of "exciting" things.
I've seen too many places where reasoning similar to that of 2 results in developers prematurely punting to quick and dirty hacks to get a trouble ticket closed faster. It may not be possible to get by with anything but a one-off conditional, especially in tax calculation , but I worry about the claim that it is inevitable. It provides a justification to not look for more maintainable alternatives first.
The same thing with 4. The only thing that bothers me is the sense of inevitablility. There are places where specific cases need to be dealt with in code, and it is an entirely different argument of how they should be dealt with once that much is assumed. But the inevitability bothers me, if only because I have run into several cases where things were claimed inevitable, and upon closer inspection caused large amounts of pain that could have been avoided if the existence of alternatives was considered.
You're right - especially about 4 - but the absoluteness is in practice: your solutions to 2 and 4 require a critical mass of people to understand the data being modelled (and thus - or perhaps the other way around - the processes creating/manipulating the data) and this is so rare as to be effectively non-existent.
The idea of putting any logic in the database never did make much sense to me for these exact reasons. The beauty of software is it is soft (i.e. easily changed) and us hackers have learned how to manage change with tools and whatnot. Databases are robust and solid because they have to be. A lot of the time they're the app's foundation. You go chipping away at the foundation and you end up with a trailer home.
I spend 100% of my time writing "database apps." I'm not saying I'm right, but I use the database to store data, and my app to maintain consistency and provide logic. You can't see the future, so why pretend? I mean, who would have thought a single person can be considered either male, female, or both depending on the context and date... That happened to one of my apps... No one would have had the forethought to handle that case during the initial design.
> The idea of putting any logic in the database never did make much sense to me for these exact reasons.
I'm a big proponent of enforcing business rules right into the data model. Constraints such as not null, reference constraints, or even conditional unique constraints (yes, there's such a thing!) should be enabled wherever possible.
Just remember, your app will not be the final app that makes use of the data. Your app comes and goes, along with many other apps written by many others. But the data lives "forever."
In my opinion the notion of implementing business logic outside of the database and acknowledging the longevity of the data are not necessarily in conflict. You need to have a well defined model and you can choose whether the database is a full implementation of that model or just a representation of the state of that model.
I don't find databases well suited for expressing complex behavior - programming languages are far better at that.
Business rules applied to the data are good (constraints, etc.) because they force the developer to fix the old data when they introduce a new rule. Doing data migration from old systems is quite painful when required fields aren't there. It's hard work to introduce a foreign key constraint when there are a million rows that have to be updated...
However, much of what people call business rules are much more complex logic than that. I like the store things like the state transition mappings in the database, but ultimately the application is the more flexible place to execute the state transitions.
> Just remember, your app will not be the final app that makes use of the data. Your app comes and goes, along with many other apps written by many others. But the data lives "forever."
Here here - that quote is so close to what Tom Kyte always says I reckon you are either Tom himself or an Oracle DB programmer :-)
> I spend 100% of my time writing "database apps." I'm not saying I'm right, but I use the database to store data, and my app to maintain consistency and provide logic. You can't see the future, so why pretend?
The weird thing about that is in some organizations (like my own), databases are considered easier to change than the software. Hence the push by management to move the logic out of the view or controller layers and into the "model" (i.e. use of stored procedures, constants fetched from the DB instead of in config files or headers, etc.) Where I work, they only pay one person to do database work, but three to write code. Yet the DB guy is always crushed by work, since he gets change/reporting requests all the time.
That breaks as soon as you realise apps:databases isn't 1:1 but n:1 and you can't scale without having core logic implemented in one place and one place only.
The solution to this is a multi-tier architecture. Only one application that implements the domain model talks to the DB and all other apps talk to that application. This doesn't give you the raw performance of a model living purely in the DB but may be a viable design decision. You trade flexibility for performance.
depending on the size of your model, right? If your dataset is huge, your domain model app can shard the data and provide a nice entry point to tune performance without touching any of the n apps who are dependent on the data. I'm not sure i've ever seen a database that lets you plug in memcached to speed up certain queries.
The model may still become a bottleneck if objects are tightly coupled and you can't scale out the model in a horizontal way. The low level nature and conceptual simplicity of the relational model allows database vendors to pull some pretty amazing tricks with scalability and clustering.
People aren't building big database centric applications because they don't know any better (so I would hope). It all depends on the context. It shouldn't be the default because it can be inflexible.
Umm, yeah, then you buy a third-party application that talks to your database just fine but doesn't know a thing about your "one application". Oh noes!
Is it realistic to assume that a 3rd party application that can be customized to work with my database model can't be customized to work with a model exposed in some other way (web service, J2E, .Net or whatever)?
No, it isn't. Database interfaces are very simple, text and table based, have been used for many years, are well tested, work reasonably well and fast, there are tools, bindings for every language in existence, and thousands of trained programmers. To change them for web services (I don't know how .NET or J2EE are equivalent) is not realistic at all.
I've been at my company for just shy of 4 years and I had no official training or schooling in database design or programming for that matter. When I started we were roughly 3 years out of a migration from MS Access to SQL Server - all done without rewrites or restructuring, but rather a simple recreation of the existing schema.
4 years later, the 4 "programmers" that were there are now long gone and we've cycled another 5 or 6 since I've been there and have managed to keep another 10 on staff. Not one a DBA.
So you start with a horrible database design in MS Access. Couple that with the migration to SQL. Add in 8 years of "maintenance" by nearly 2 dozen programmers who were forced to do web design, web development, application programming, hardware maintenance and everything else under sun because management doesn't know the difference. Finally, you mix that up with a sales driven CEO & COO and no real CIO, which means that deadlines and client satisfaction are king and properly designed architectures take a back seat to "just add a column so we can get it done".
What does leave you? A 12 Million dollar INC 500 company, backed by a dozen programmers with an average age in the mid 20's and average salaries just BELOW the average Level 1 programmers salary in our area. It also leaves you with a 9 hour day with your Management Studio open writing alters, updates, deletes and selects just to handle the relentless tickets that come in from the rest of the support staff. And it also leaves you with an 86 GB database consisting of 5000 tables, 5700 views, 3500 stored procedures with little to no data warehousing and enough transactional processes to choke an 8 Core, 24 GB Dell Server on a daily basis.....
Did I mention we don't have load balancing or proper offsite backups? We've been teetering on the brink of absolute chaos and collapse for so long that routine 30,000 row updates I won't even blink at. And there's no end in sight.
This article is great, but it doesn't even scratch the surface. ;)
I really find it refreshing that more people start questioning "relational model" databases. The discussion here inspired me to write a little on what the problem actually is with relationships in RDBMS, read it here:
The modern database paradigm is defined by relations, so of course that’s what falls apart as soon as you get an app into production.
Only poorly designed data base schemas "fall apart".
If you find yourself making all fields optional or needing only many-to-many relationships, that's a sure sign that you already did something wrong when you designed the data base.
Surely you must have known about the guy from Guam or the one living in Virginia but commuting to D.C. when you analyzed your app, right? And even if you didn't, you did design a data base structure flexible enough to accomodate the natural evoluation of the app, right?
Please don't confuse the limitations of 3rd and 4th normal form with the shortcomings of data bases built with them.
> If you find yourself making all fields optional or needing only many-to-many relationships, that's a sure sign that you already did something wrong when you designed the data base.
Not really. Business rules change because the nature of business changes. Over-engineering databases to account for future changes takes a lot of time and effort. If the company does not sell outside of US and does not have plans to sell outside of US anytime soon, I would not spend a week implementing Norwegian, UAE, and Indian addressing and tax/tariff system.
When I designed the payroll system for my company, I took into account that hourly employees can get promoted and become salaried. This happens once a quarter or two and the system can deal with it. Now if a salaried employee wants to become an hourly employee, my system cannot auto-calculate the change in their vacation days etc. It would require a considerable amount of time for me to build that functionality in. However, it is possible for this situation to happen but in five years, it has only happened once. In that instance, I manually edited the records and documented the entries. So does that mean I designed the database schema poorly? No it just means I did not want to make it so rich and extensive that it can handle every remote case, just most of them.
For what it's worth, the "what falls apart" line was intended to be tongue-in-cheek.
Surely you must have known about the guy from Guam or the one living in Virginia but commuting to D.C. when you analyzed your app, right?
Nope. For most of the apps I've worked on recently, we're pulling incomplete datasets that are augmented from other, slightly-incompatible sources and manual entry and must be designed and built to meet the deadline for a newspaper story. So I do the best I can to capture the data and then change the schema as these uncaptured/new requirements make themselves known.
The analysis for the app was incomplete (for whatever reason). I'm not looking to place blame, I'm just saying that a well designed data base by definition must handle the wear and tear of the app.
Some of the worst crap I ever had to maintain was originally considered a success because it met its original deadline. Sigh.
> I'm just saying that a well designed data base by definition must handle the wear and tear of the app
Impossible. This is pure theory and has no ground on reality. Name me one database that was designed to handle Sarbanes-Oxley needs in the last decade.
Business requirements change, and so does the app and the database.
Tell me how you would deal with this situation in your current database designs: A few years ago, a born male person here in Brazil got his sex legally changed to female (also had a sex-change operation). So, s/he's a female for most of legal concerns, but s/he's got the genome presented on males, which means:
1) He needs to be a "female" on your employees records;
2) She needs to be a "male" on your company health insurance records;
Sorry if I sounded abnoxius, I can't argue softly in English :p
Something a little like this happens on our elections app: politicians have a name and party, but sometimes run in an election under a different name (eg. "Robert" runs as "Bob") and party (eg. Joe Lieberman lost his primary but won in the general as an independent). And sometimes they're doing this in multiple elections at once.
Impossible. This is pure theory and has no ground on reality.
I have apps still running well after 15 years that would beg to differ.
You example is clearly an extreme outlier.
I never said that the business requirements don't change. That would be silly. I simply said that a rigorously designed data base structure should easily handle the changes you expect in your apps.
Did you build in audits? Transactions? Date effectivity? Sequentially numbered keys? Multi-part keys? I can go on and on, but you get the idea. I've made a career rewriting data base applications where the original designer failed to anticipate the simplest of new requirements.
Your English is fine. And you don't sound obnoxious. I just found your "no ground on reality" a bit at odds with my opinion a millions lines of code later.
I think all the Lightweight (agile) methods in sw development put emphasize in being able to prototype ideas easily.
DB simply don't do that.
In HN there are often questions on How to manage schema changes in the DB. I know there are tools for this but the mere fact, that is still a common question is an indication of the inflexibility of database schemas.
These are great examples towards the fact that the relational model isn't always the best way to go. Especially with regards to an evolving model.
In object oriented models, polymorphism can be employed to extend an existing model with rather minimal impact. Introducing, for instance, a subtype for an address in a relational model isn't easily done.
I'd rather model in OO land and use the DB as dumb object store.
I haven't implemented a serious project based on an real OO database. Any experiences you guys would want to share with regards to model flexibility with an OODB?
I actually use MySQL ;) Its pretty web two point oh-y.
Honestly, when I say something like "single table inheritance" I wish people would use say a search engine and type it in and see what comes out. Its a pretty unique phrase that gets you what you want... don't enable people who expect everything to be explained to them! (Not uninformed-ist mind you.)
No way. If you discover your database model is incorrect, fix it. Data kludges and hacks are not professional solutions.
The individual who wrote this is dead wrong and either headed for a lifetime of maintenance pain or a well-deserved reputation of being "that guy who caused all the problems" after he's moved on to cause pain elsewhere.
Nonsense like this is why corporate DBAs are such hard-asses when it comes to changing the schema.
Hi, I'm the individual who wrote this. So far I have never earned a reputation for causing problems, but I have earned one for fixing them. (Yes, I am in contact with former coworkers.)
I think we may be in violent agreement: I also think kludges and hacks are unprofessional. In the post I'm writing about my experience with the way that models are exposed as imperfect over time and they're fixed to deal with it. As much as I gripe about exceptions and outliers, capturing them is a big part of design.
...Except the 'fixes' you've given aren't fixes: They're patchwork on-the-spot kludges which let you ignore the problem until it crops up again. A few years of that will give you spaghetti code that is a nightmare to maintain.
How about that user who has two locations? Sure, you overlooked that in the initial design, but a permanent fix needs to be made to ensure the system will accommodate the next one which arrives.
A profile for someone located outside the US? Pretty likely you'll see that happen again if the system grows. Why not account for it correctly rather than rely upon duplicate data or another such hack?
Your urge to 'fudge and hack' is natural but the professional solution is to resolve the root cause by correcting the data model, not fiddling with the data to 'just make it work'.
After the initial glory of creating Software v1.0, much of the remaining work involves fixing the system to better reflect reality. The reason mature software takes 'exceptions and outliers' in stride is that it's been coded to do so after a great deal of trial and error in on-going analysis and discovery.
Ignoring those exceptions and working around them rather than fixing them permanently leads to madness.
The correct fix for the guy with two locations was to remove the location_id column and create a join table from People to Locations. The correct fix for the guy outside was to alter the Location model to accomodate that, which included making US postal codes optional.
Your urge to 'fudge and hack' is natural but the professional solution is to resolve the root cause by correcting the data model, not fiddling with the data to 'just make it work'.
We agree again, that's why I wrote that they're "like freezing a bottle of water, it always grows and breaks worse in the end."
The impression I gained from your blog post was that of willy-nilly denormalization and data duplication to accommodate edge-cases (which often become anything but) as they're discovered to make problems quickly go away. After working on a few development, refactoring and data conversion projects, I was astounded to discover how often those sorts of kludges crop up and how costly they become.
Perhaps you could write a sequel pointing out the pitfalls of what appear to be easy database fixes to design errors but turn out in the long-run to be anything but.
Sorry I left you with that impression, I certainly didn't mean to. I only mentioned those as great ways to shoot yourself in the foot when you don't have the patience/skill to do the Right Thing.
Sounds like you're better prepared to write that followup than me, but I'll ponder it. Maybe "Database Anti-patterns". Actually, just typing the name makes me think it likely someone's done this sort of thing, but I don't have time to check.
If you discover your database model is incorrect, fix it.
A nice sentiment, but once a transactional system is in production, it becomes very difficult to change, IME. The company I work at has a legacy schema we've been wanting to retire for years, but we haven't been able to due to number of (often hidden) constraints and overall complexity. I realize this fact suggests a badly designed system, but much of real world database work consists of dealing with poorly designed legacy systems.
If there was one single belief in the software world that I could change, this would be it.
Changing the database is only hard if you are afraid of changing the database. The belief that change is difficult leads people to expect that the database will never change. Which leads to programming practices that aren't flexible enough to change large sections of code rapidly. Which leads to software that is difficult to maintain in the face of schema changes.
Start from the assumption that the database will change often, however, and you'll build refactorability in from the start. If Junior dev Jimmy drops a NOT NULL constraing on a column someplace, then suddenly the continuous build will break on seven unrelated projects, and those teams will spend 5 minutes each patching the code, testing and getting the build fixed.
It's really not hard. You just have to stop being afraid!
Over a decade ago, I was working at a company that sold insurance software. It seemed like the database got designed only about halfway, then the rest of everything was kludgy hacks. Like, if this certain number field is negative, then it's really the absolute value, but it meant that this row was the active one for the account, unless the next timestamp in the same account has a value "i" in this other column...
On and on like that. In fact, there were entire internal courses to teach these kludgy schemes, and they had fancy sounding official names. (Yes, entire week long courses, not just 1-day workshops.)
Prove to me that improved design or increased implementation elegance in an existing corporate application can have a sufficient near-term effect on the revenue numbers, and we'll talk.
DBAs aren't the only hard-nosed folks involved here.
I am curious about the implications of Rails migrations here. It seems that migrations give you a pretty good level of softness in your data models when you need it. Of course, I think a lot of Rails apps do end up with much of their logic in the M part of MVC. And understandably, some folks are not very comfortable with that.
Can anyone speak from experience on this? Talking about mature Rails apps now. Are Rails migrations a help or hindrance in terms of 'app aging'?
In a number of the apps we've worked on, we've found that migrations are no longer useful in order to build a database from scratch (say, you're getting a new employee started and are setting up his development rig).
Mainly because the actual Model code changes so much in between someone running the "rails" command and the "mature app" stage - the validation logic and callbacks being the main culprit.
Eventually, it's easier to bootstrap the schema from schema.rb (rake db:schema:load IIRC) than it is to maintain migrations from a zero state.
Then you've just got to make sure you keep up with the deltas :)
In rails the database is mostly a storage space for freeze-dried model objects. Your typical enterprise RDBMS woes don't apply.
Migrations just make sure that the database scheme is consistent with your actual model. As far as writing Rails applications is concerned the actual technology used to persist objects is (mostly) abstracted away from the task of application design and development.
1. Many field types will change over time. Booleans will become multiple choice. Character fields will become memos/text. Date fields will start to have year, month, or day parts optional.
2. All tables with tax-logic will be compounded by code hacks. So just a table with county tax rates and order amount ranges will not be sufficient to calculate the applicable tax because you will need to keep adding special conditions of tax to apply when pencils are sold on 3rd week of every month when school is in session.
3. Management will never understand how it could be that complex to "just change the database" to enable payroll calculation for hourly employees who work overtime just because the database already handles the fixed payroll for salaried employees. In other words, databases with existing data, logic, triggers, and reports will always be difficult to change.
4. The biggest rule is that business logic never ever translates to structured database tables without lots of conditions in code and strict limitations in scope of application. I'm not talking about pretty to-do lists, address books, or blog engines. I'm talking custom ERP apps, MRP modules, production forecast algos and my favorite, scheduling for production and employees.
I guess the take-home message in all of this is that don't get into this field if you don't like challenges of this sort. In addition to coding up DB apps, I also make a lot of non-database apps that keep me sane and balance things out. However, in terms of sheer complexity and project planning, good DB apps are pretty high on my list of "exciting" things.