Hacker News new | past | comments | ask | show | jobs | submit login

Most people don't need dedicated DBAs and worse the DBA often never has sufficient domain knowledge of the problem to make an intelligent suggestion anyway.

DBs performance is complicated, yes, but the vast majority of it is extremely simple. It's just none of this simple stuff has to be learnt until it's too late and the cost of fixing it has dramatically increased.

There's a certain level where you need a DBA and that bar has been getting higher for years.

What we really need is to demystify DB performance, which for the most part is fairly simple.

What you need to do is teach your developers how to read those query plans. Show them how to find the expensive queries. To give them tools to easily see what queries their ORM is spitting out. To show them how to use a SQL profiler.

Tell your developers how query plan caching actually works. How a clustered index works and what you should and shouldn't put it on. Explain how indexes work. Explain how DB pages actually work and then it's obvious why certain indexes are a bad idea. Explain how relational keys are very important for the DB engine and leaving them off is not an 'oops', it's a serious mistake with long term consequences.

And that's at most a few days work. So why do you need that DBA?

Aside from that you need someone who knows how to maintain a DB, but again that's not particularly complicated and once it's done you can forget about it apart from the occasional sanity check that it's all working properly.




> What we really need is to demystify DB performance, which for the most part is fairly simple.

Without getting into the rest of your argument, I'd like to quickly address this.

No, it really isn't.

In your typical MySQL database, your performance for a simple "select * from x where y" is going to go through a lot of complicated machinery (most of which can be tuned for performance), a few points of which I will enumerate below.

    1) Acquire a query cache lock & see if this query is there
    2) Run the query through the optimizer
      2a) Perform multiple shallow dives into a table to look at the cardinality of the filtered columns
      2b) Identify the best indexes based on the shallow dives
      2c) Create a query plan
    3) Push the query plan down into InnoDB
    4) Load the index into memory, if its not already there
    5) Load the potential rows into memory, if they are not already there
      5a) If there's not enough memory, load a few into memory, and be ready to push those rows out of memory in favor of more rows when needed
      5b) Load rows that are still in the insert tree but not yet part of the regular buffer pool or pages on disk
    6) Loop through the candidate rows for matches to the filter
    7) Return the data to MySQL
    8) Acquire the query cache lock & update it
    9) Return the data to the client
Any and all of these can (and often should) be tuned. There are 600+ page books and very old (and oft updated) blogs dedicated to this topic... it's not something you can teach a developer in a couple of days.

As an example, I attended an introductory course to being a MySQL DBA; it lasted 5 days of 8-5 teaching & running examples. And it only scratched the surface of what I do on a daily basis.


This sort of nonsense is exactly the sort of mysticism I'm talking about.

My, for example, do you think every query needs to be tweaked for this one:

Push the query plan down into InnoDB

So why did you include it on the list?

5a + 5b, Is MySQL so bad at memory management that you tweak it for every query?

So why did you include it.

Virtually everything on that list are implementation details that the vast majority of applications and developers don't need to even think about.


> Virtually everything on that list are implementation details that the vast majority of applications and developers don't need to even think about.

Particularly if you hire a DBA.

But that's not what you meant... and you're right. If your business doesn't generate enough query traffic and data to stress your database, then no, you don't need more than the indexing and smart query development which your average developer can learn.

The moment your business hits that wall, however, you'll be scrambling to get someone who can dig into those "implementation details" and wring every ounce of performance out of your database.

> [...] do you think every query needs to be tweaked for this one: Push the query plan down into InnoDB

Nope. However, for that one table that is better served by using Tokudb, or MyISAM, or the Archive engine it does matter (yes, there are actually use cases for using MyISAM tables instead of InnoDB).

> 5a + 5b, Is MySQL so bad at memory management that you tweak it for every query?

Again, for every query? No. Also, 5a and 5b have nothing to do with memory management, and everything to do with the size of the dataset you have to pull into memory to identify and return the results. Knowing how this affects the buffer pool LRUs, disk usage patterns, and how to optimize the interaction between the two can be vital.


Look, you're doing the equivalent of looking at a car and saying that every single one needs to be hand tuned.

Just because you're a mechanic.

I said there is a bar when you will need a DBA. It is much, much, much higher than you're making out.

And the things you're talking about, most of the people mucking around with them probably shouldn't be. They've probably made it worse. Any the tiny few who actually need to? They're a tiny few.


"Most people don't need dedicated DBAs"

That's a pretty easy statement to defend. But, I'll respond by saying that most Companies running Oracle 11g with more than a couple terabytes of databases, require a competent DBA, particularly if Disaster Recovery/Transaction Rollback is important.

" the DBA often never has sufficient domain knowledge of the problem "

Of the half dozen or so truly high level DBAs I've worked with (and managed on occasion), I can say they had incredible domain knowledge of Oracle Database Server, and worked extraordinarily hard to have next to zero knowledge of the application running on it. Their focus was to keep the database running, defend it from engineers and users, and recover it when things went really awry.

"DBs performance is complicated, yes, but the vast majority of it is extremely simple."

Any time you see the phrase, "Extremely Simple" when discussing a domain in which the expert practitioners routinely make $250K/year or more without any form of market manipulation, you need to reconsider why, exactly, these technicians are being paid so much to do something, "Extremely Simple."

"What you need to do is teach your developers how to read those query plans. "

Completely agree here, but, there are two perspectives on this topic. There is the "Engineers are ultimately responsible for the efficiency of their query plans, and should be educated/trained to take that responsibility" and then there is, "We can't train our engineers to be query plan experts, just keep them from shooting themselves in the foot, and let Query Optimizer handle the rests - it's up to the DBA to manage stats gathering to keep DBM_STATS healthy"

I think we tend to see the second approach more frequently in the enterprise, where your engineers are likely making less money, and the company is keen to leverages their many 10s of millions of dollars of Oracle Technology.

Finally, when a company is paying 10s of millions of dollars a years in Oracle licenses, they consider it a worthwhile investment to have a few high-level DBAs to fully leverage that investment.


Why would someone run Oracle in this day and age? "We have 100k lines of established PL/SQL running our business" is a reason, of course, but you could make a similar argument that COBOL development is alive. PL/SQL pretty much always leads to an unmaintainable mess, and if you need BIG DATA, you can go much bigger without Oracle than you can with it, and cheaper to boot.


Guh - every few months this comes up, and it's fairly hilarious to me how much anti-oracle dogma there is here.

Oracle is the most feature-rich database in the industry, and the most expensive, and the most complicated. This comes with benefits and curses.

It is freakishly powerful. If you want, you can roll back just your view of the database to three hours ago. You can show different versions of the same schemas to different clients. You can dynamically scale out your database to hundreds of servers without having to manually shard. You can do multi-master replication, master-to-slave, master-to-slave-to-master, any combination you can think of.

Your database clients can automatically detect when a primary has gone down and fail over to a standby, which will automatically be brought online, no admin intervention necessary.

If you give Oracle raw disks, you can tell it to automatically use the faster parts of the spindles for more commonly used data. Or give it some SSDs, and it'll use it for cache. Or, if you buy Oracle storage servers, it'll actually offload basic query execution to the storage.

With that power comes great cost and complexity, which is why many web companies don't bother with it - when you get to google or facebook scale you build these types of capabilities into your application tier.

But I know of a ton of multi-petabyte Oracle implementations at big traditional companies, and they love it. Because they don't want to have to build all of that functionality at the application tier, and they trust Oracle's reliability.


Well, I'll admit that I haven't used Oracle in 4 years, but back then, the automatic-sharding thing (goldengate IIRC?) just plain didn't work.

I guess I'm just biased towards solving scalability problems at the application level. It seems like an uphill battle to take a declarative/descriptive language like SQL and tune it to execute a query the way you want it executed -- it seems a lot more straightforward to just write code that does what you tell it to.


GoldenGate is for multi-master replication and shares a lot of the challenges you'll find with any multi-master replication solution.

No, I was referring to Oracle RAC, which does away the need for sharding. All of your nodes see a complete and comprehensive picture of the data on a shared set of disks.

> I guess I'm just biased towards solving scalability problems at the application level.

That's a totally valid strategy, and indeed, an option that many companies go for. But you're just shifting complexity from one place to another. Either you're going to be building in complexity to make your application aware of data distribution, movement, sharding, and so on - or you're going to use a more complex data storage platform like Oracle.

And SQL tuning is a skillset, much like writing good code. If you are good at SQL tuning, it's not that hard.

My point is just that it depends on your business requirements. I personally think Oracle's price point is so high that I would never use it. But if money were no object, and I was designing an application, why would I want to have my application have to think about where the data lives?

Wouldn't it be a lot simpler to just say, "go here for your data", and let the dedicated application deal with that?

Or put another way, in the same way it seems insane to shove 100k lines of business logic into the database layer, why doesn't it seem insane to shove 100k lines of data management logic into the application tier?


Because at a certain size of business, paying developers to make patches against your favorite OSS database doesn't scale. It's cheaper to pay for a, let's be honest, damned fine piece of software.

Oracle, despite being the very definition of evil, has one hell of a software product. It is performant, has built-in features that makes Postgres look like BerkelyDB, HA solutions, and perhaps most importantly, the backing and support of a multi-billion dollar company.


Eh. Different strokes for different folks, I guess. Most of those built-in features are anti-features IMO, and none of them make up for having to maintain long-ass PL/SQL functions as opposed to a reasonable programming language. And as far as performance goes, SSDs are cheaper than Oracle licenses, by a lot.

Oracle doesn't have any magic pixie dust that changes your underlying hardware. If you want raw performance, BerkeleyDB will wipe the floor with Oracle, because it's that much simpler. You've just got a much smaller feature-set as a result.

You probably see things differently than me, though. Related question since you're a DBA -- what are the best practices for testing PL/SQL, you have a link or anything? Every place I've seen it done was a nightmare and involved a lot of crossed fingers during releases.


I converted one of these ugly PL/SQL unmaintainable mess from Oracle to the MySQL query language.

Because the boss told us MySQL was cheaper.

Because of MySQL limitations, I think it is even more unmaintainable now. In real world terms, it is ugly and about ten people out of 500 can fix stuff there.


Well, IMO, you're saying that you basically trapped yourself in a box there. If you're completely determined to embed your business logic into a relational query language, then PL/SQL is probably the best there is.

I'm saying that your business logic shouldn't be there. You've got a whole universe of programming languages, potential disk formats and all that available to you. If you're writing huge ugly queries, and I have, that's when I generally step away from the computer for a bit and think if that complexity is better managed someplace else.


The stored procedures are not actually big ugly queries. But they surely have business logic.

So far, besides the ugly MySQL workarounds for some missing features, it has been a good decision.

And well, it was a company decision, they are the ones trapped. I'm free from maintaining it, while I had to maintain the first version with all business logic outside the DB.


"DBA often never has sufficient domain knowledge"

"often never"?

I've met some DBAs who were quite versed in the business domain. And some that weren't. This is no different to the majority of the developers I run in to, so I'm not sure why you're drawing a line there, except that the article was about database stuff.


So basically train them to be DBAs?


No, train them to use the database properly. A DBA is the person who keeps the databse running, much like a sysadmin is the person who keeps the server running.


That's a naive view of both DBAs and sysadmins these days. They both do much more than just keep things running. Sysadmin has mostly turned into opsdev. Same with DBA.


Sysadmin has mostly turned into opsdev.

Not in places large enough to value loose coupling (e.g., if keeping your servers running comes out of a different budget than improving your applications).


Opsdev != dev


If you are willing to agree that's what constitutes a DBA, then I don't see much value to the idea of hiring a "dedicated" DBA.

(I don't personally accept the first clause, but I also agree that hiring a dedicated specialist is now something only a rarified few need to do, really.)


Database administration - is not a fire and forget task. It dismays me greatly that so many developers do not see beyond the code in their IDE, and think they can just bring a DBA for a few days, and all will be well. It may also surprise some developers just how much return on investment a good DBA can bring - they can and do learn about the business, the data, the processes and can then help get the best from the database as a result of that knowledge. But they can also serve as an SME on the database engine technology, perhaps pointing out where it is not being used in the right or optimal way. The biggest gains in performance generally come when they help a team of developers who were treating the database as a dumb data store and not making any use of the features offered.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: