While a lot of people people have disagreed with this guy in the past, I find it very hard to disagree with him in most of his posts.
My exposure, as far as SQL goes, has been MsSQL, PostgreSQL and MySQL. With absolutely loads of MsSQL, and pretty hairy data structures at that (they have an aversion to the polyglot approach where I work) - SQL is one of the most enjoyable things that you can do once you get past the elementary "SELECT WHERE" (and stop using graphical development tools).
I have been saying what Markus said for a long time, MySQL is why RDBMS has a bad name and why the "not using SQL" movement exists. I really hope the MariaDB team spend their time where it is needed most (a join engine that doesn't suck), if they haven't already.
Unfortunately thats only a year ago. I believe MySQL added transactions over a decade ago and you still run into people who insist MySQL doesn't have transactions, because five years ago they read something written seven years ago that was years out of date at that time.
You just don't see behavior like that outside the database community. Hey did you know that BIND doesn't support IPv6? Yeah version 4.9.3 doesn't support ipv6. Don't bother me with facts like version 9.0 was released more than thirteen years ago in 2000.
So once every installed and reasonably security patched mysql server in the world supports hash joins, the world has at minimum a decade of people who have no idea what a hash join is, and no idea how or why to use it, none the less demanding that mysql doesn't support it.
(Bitter? Yeah maybe a little. You can get good advice about OS, or compilers, off the internet. But overall the internet is useless for database advice, a couple stars don't outshine the greater cesspool)
I have long said: "MySQL is the PHP of databases."
In my experience, the database is capable, but the database itself has some quirks (whether these are due to the database itself or the developers I do not know). Some things I do know for sure:
- I worked on a product which supported MsSQL, Oracle, Postgres, and MySQL. Only MySQL compared character columns in a case insensitive way which caused us some extra work, but we did fix this using the proper collation.
- I worked on a Django project. When using MySQL, a Django ORM query would pull the entire result set and hold it in memory while you iterated over it, sometimes consuming gigs of ram if we weren't careful. Iterating over a result set from Postgres would pull (presumably) only one result at a time. Whatever the case, it didn't cause memory problems like MySQL. This is not MySQL's fault, but again, it was the exceptionally problematic database.
- I'm currently working with a MySQL database which is quite troublesome because foreign-keys are not enforced. See a non-null foreign-key? You might expect it to refer to an existing record, but you'd be wrong. Again, failure of the developers, not the database.
- I also am currently working with a MsSQL database. We had to add a column to a MsSQL table; no locking, no problem. Then I had to add a column to a MySQL table, which locked the entire table for the length of the operation. MySQL was, again, problematic where another database was not.
There are other problems I have only heard of, but I will not mention those. All the above are personal experiences where MySQL was causing trouble where other databases were not. It parallels my experience with PHP, both are great tools if used right, but they seem to be misused more often then other technologies.
"MySQL database which is quite troublesome because foreign-keys are not enforced"
Not sure what this means. Innodb will enforce them and has been the default for "awhile" although if you don't configure FKs it won't automagically do it for you (superficially, a DBMS that assumed any column named user_id is a FK for table user column id, although I bet that would create some amazing bugs occasionally) If you use myisam you can hack in FKs using triggers, which is icky/foul but you only have to do it once in the design phase. Or just don't use myisam unless you need its sometimes somewhat higher speed.
Something I say a lot is its a philosophical outlook difference. Outta the box, MySQL is a cub scout who always does his best, which may or may not have anything to do with what you think is a reasonable goal. On the other hand outta the box some DB like Postgres have no sense of humor and will go on strike / curl up and die if you ask it do something somewhat imperfectly or without the proper precise etiquette. Its somewhat easy to spend some time configuring either toward your project's philosophy or your personal philosophy.
This philosophy thing is why your comparisons were insensitive... the default outta the box collation was (is?) something weird like latin1_swedish_ci and if you merely change it to latin1_general_cs or whatever it "just works" and becomes case sensitive. Well what does the helpful cub scout mysql do, it makes sure you get all the results you could possibly want, in addition to some you probably don't.
I have run into slow schema mod problem before. Solution, don't change your schema.
MySQL has been my go-to database for years because it was fast, reliable and effective. Unfortunately it hasn't been evolving as quickly as other databases, Postgres in particular.
There's a lot of quirks in MySQL that were excusable a decade a go, but now they're problems that've been solved in virtually every other database engine. Working with Sybase was never a pleasant experience, but versions a decade old have features that MySQL still doesn't have.
Every so often you'll hit the wall with MySQL, little things like applying a LIMIT in a subselect, or row length limitations, that will make you wonder why you're using MySQL at all.
Actually that's what I thought, but it's not always the case. Sometimes a good chunk of the TEXT type column is saved in the row itself (~768 characters), and when the length exceeds that, the remainder is stored in the blob area of the table store. (http://www.mysqlperformanceblog.com/2011/04/07/innodb-row-si...)
This depends on your row format, an obscure technical issue most never have to deal with, but it can be a problem.
I agree, and it's unfortunate, but you also see this happen in the programming language community as well, just look at the wealth of perpetuated ignorance around PHP for instance.
Fair enough, don't ask the internet for advice on databases, language selection, and how about editor. Maybe not as unique as I initially claimed but the situation still sucks.
The upthread comparison was to PostgreSQL (which fully supports transactional DDL), not Oracle (which does not.)
(Oracle 11g R2 provides something like Transactional DDL via Edition Based Redefinition. MS SQL has partial support for transactional DDL; PostgreSQL, DB2, Firebird, Informix, and Sybase Adaptive Server all support Transactional DDL [1].)
A conspiracy theory would be putting an intentionally buggy one in their free offering. It would have to be buggy by design such that it would be incompatible with a later patch. Then put non buggy design 2.0 in the paid product. Perhaps the inability to design a bad design like that is why that scenario hasn't been initiated. Sometimes it is hard to think of intentionally bad code, although it seems easy enough to generate when I'm trying to write good code.
They're doing all kinds of awesome and the greater community is starting to see this, which is evident with Archlinux* for example whom has dropped MySQL in favour of MariaDB.
Slowly but surely MySQL will fall to the waysides, which I can't help but think is what Oracle wants.
even if you kill 80% of free signups to MySQL to get 10% of those converted to premium offerings, it would be worth it.. the rest would go to other MySQL-compatible drop-in replacement solutions, like MariaDB
I may be biased because I work on it, but I think TokuMX solves or will soon solve all of the "big data scaling" problems that exist in MongoDB.
There is definitely still room for polyglot strategies: Zookeeper is for tiny but crucially consistent data, Riak is for fancy distributed systems availability guarantees when you can afford a simple data model, and I believe Redis has value as a sophisticated programming model for things you can fit in RAM (but I actually have no Redis experience personally).
But in the past few months, I've grown to be really impressed with the document model, and the aggregation framework is getting more and more powerful with each release. I think that's the important part of MongoDB (sharding's a little messy and Riak seems to have their heads on straighter with that), and TokuMX takes that and sands off the rough edges you see with big data sets and concurrency, and I think that's going to end up dominating MongoDB and being a really compelling point in the NoSQL space.
People like to give MongoDB shit, but is actually pretty fun to use. I wouldn't use it as a "big data platform." At scale I'd use Cassandra. For relational data I'd use postgresql. For memory caching, Redis. So when do I use MongoDB? For prototyping. Why? Because it's fun to use.
Consider RethinkDB. It has all the advantages of MongoDB, scales better and is better behaved. It's not production safe yet (particularly, you have to dump/reload on updates), but it's expected to become so by the end of the year.
MongoDB is only fun to program against. Not fun to admin, even on a prototype instance. RethinkDB is fun to admin.
I've been nothing but an 'ops guy' for 20 years now, and I've been enthusiastically using MongoDB for my 'devops' work since 2009.
Having said that, I've never used MongoDB 'on the large'. I've always used MongoDB against datasets that mostly fit into memory, and use some other solution for 'bigger data'.
I suspect most of the MongoDB dislike is rooted in three points:
1. The 'Mongo People' really over-hyped it early on.
2. The decision to not have single-server disk safety (but providing very elegant multi-server data safety) was a poorly understood decision. I personally agree with their direction. Your data isn't really, really safe until it's on more than one system.
3. Based on what I've read, I don't think MongoDB scales up (even with their built-in sharding) as well as some other solutions. Perhaps this is limited to some kinds of use cases, and not others.
In my opinion, for people who do day to day fast development with dynamic languages, like myself, MongoDB, used correctly, is really fantastic, at least for many use cases.
really? all the ops people that I've worked with have been pretty enthusiastic about it, more so than a mysql instance with bugs in 5.5 that created huge performance issues with multi-table joins (i.e. http://dba.stackexchange.com/questions/13817/mysql-5-5-stran...)
In all fairness the only tools that are not (speaking from a devops position at least) hated is chef or puppet. Everything else will at some point go down at 2am.
To be taken seriously when making performance claims about 'relational databases' rather than 'mysql', yes, you may need to venture into products that aren't as simple as mysql.
At work we have an MS SQL Server instance managing ~1TB of data spread among various databases.
No one database is > 200GB so it's not "web scale" by any means but sometimes you need to pull out tricks like indexed/materialised views.
I use Transact-SQL directly via ADO in C++ and ADO .NET in C# and indirectly via EF and NHibernate.
It's easy to manage schema with SQL Server Data Tools and diagnosing a poorly performing query is straightforward with graphical execution plans.
Quite frankly, I don't care that there are alternative RDBMS' or alternatives to traditional RDBMS.
But ... I keep hearing about problem after problem with MySQL. What's the trick to using it successfully? Is the barrier to entry too low, and problems are from legions of rank amateurs? i.e. is it as simple as constructing your schema thoughtfully, with tables that are "well-it's-almost-3NF" and making some educated guesses about which indexes might be needed in advance?
What are you actually trying to say? Not everything in your post is a full truth.
MS's graphical query plan tool is actually pretty awful and hasn't been updated since forever, quite often hiding the most important information, like estimated rows vs. actual rows. It takes quite a while to learn how to actually read it unless someone shows you. Like months. And it's really just a wrapper around SHOW PLAN, which MySQL has with EXPLAIN.
I'm not convinced that SQL Server Data Tools easily manages schemas, I've only been using it a few months but it seems a massive hack to me which relies on constant schema compares.
MS also constantly push you to use the graphical tools, but if you play in the tools and then try to make something repeatable it's a right PITA. Your only option is to turn on the incredibly annoying 'Autogenerate change scripts' whereas in MySQL you can just look at the history tab in Workbench which shows every command you've run. Imagine that. Everything as SQL scripts you can modify and re-run!
Finally, SQL Server Management studio is an absolute dog performance wise, I actually still miss Query Analyser which was much better IMO.
And you can't actually use SQL Server with something like Ruby or Python without quite a massive amount of effort. I know because I've tried, there's no decent, easy to install, equivalent of ADO.Net without mucking around with ODBC connections. At least a year ago there wasn't.
As for the problems with MySQL, he mentions in the post the problem with MySQL is the lack of certain types of operations.
Also there's a bunch of gotchas, sub-selects perform terribly in MySQL, which makes the EF unusable on it.
Still, I've used both and SQL Server is much better if you're developing with C#.
I work with SQL Server and its parts (SSIS, SSAS, etc.) pretty much exclusively at my day job, and I occasionally find myself hacking up Python scripts to automate various tasks.
I don't know about a year ago, but several months ago I found pyodbc [1] and it's been wonderful. Easy to set up, works a lot like the mysql functions in PHP.
MySQL is pretty good in practice, I think many complaints are indeed just user error. That said:
- Much like MongoDB, the defaults sacrifice safety for performance, e.g. MyISAM tables (which everyone everywhere will tell you never to use, but are IIRC still the default) ignore things like foreign key constraints
- Lots of "error" conditions silently complete as best as they can, rather than failing with an error. E.g. trying to store the wrong type of value, or trying to make a table using a disabled storage engine, or trying to store a string in an encoding that can't represent all its characters. A lot of this is legacy that is slowly being improved on, but reputation naturally lags slightly behind
- The query planner is a bit simplistic, so if you want performance you do need to put a little more thought into tables so that you're not doing 5-way joins all the time
- Some more advanced "features" of other databases like triggers just don't exist. IMO they are bad features and shouldn't be used, but their absence provides an easy thing to criticize mysql for if you're that way inclined
It's perfectly possible to make a good product using MySQL, quite possibly better than you would with other databases. But you do need to be very scrupulous about checking for warnings after every statement.
How easy is it to put a lock on a file while writing on it? What happens in an environment where you have multiple frontend servers that need to write to the same file? Are you going to have like one server that does the reading/writing from a file and have like a protocol with which other servers can send commands?
It's really funny when I get asked that question, really:
PostgreSQL is a pain to setup, the SQL parser is very finnicky and for the past times I spent too much time trying to work around every idiosyncrasy of Postgres when I could have a similar MySQL instance running in a shorter time
So, MySQL doesn't waste my time with stupid crap.
Now please call me ignorant and downvote me since PostgreSQL fanboys can't take a critique and think their loved database is perfect.
Not calling you ignorant, but for workloads that a simple, untuned mysql installation is sufficient you can just build yourself a simple, default postgresql installation. Use chef, puppet, terraform,... to make that repeatable and the setup pain goes away.
True, Chef and Puppet kind of take the pain away between different linux distributions (which is good). But this would be aiming at a deployment.
However, one common use case (for me) is having to have a one-off install, which unfortunately kind of prevents that.
If my job needed frequently installs of DBs I would have probably gone this way, but it isn't, so today it's PostgreSQL 9.1 on Mac OS, tomorrow (more likely, a year or two from now) it's MySQL on Fedora, etc
Chef and Puppet can deploy just fine into a development machine - either a VM (i'd recommend this) or even on your host machine. I can't imagine any workload that's simple enough to be handled by a vanilla, untuned mysql instance and at the same time not simple enough to be handled by a scripted postgres installation. Things get different once you're getting to more complex workloads, but the effort of tuning a mysql instance to handle those is in the same ballpark as the effort of tuning a postgres instance.
PostgreSQL setup is as easy as yum install postgresql. You may have to tweak pg_hba.conf to get password authentication in some distros, but this is it setup-wise.
As for the parser being finnicky, that is a feature. Hint: an empty string in a timestamp does not mean midnight 1/1/1970. This is one area where mysql did more harm than good: getting developers hooked in sloppy SQL.
What is an hba file? How am I supposed to know what it does?
PostgreSQL has a "createuser" command. Great, but how am I supposed to know this is a PSQL command? (not to mention potential collision with other filenames)
RTFM? This is an acceptable answer, but it disregards completely the user experience.
MySQL is more intuitive. Authentication? User/password. None of this "ident" or "trust" (really?) schemes. Yes, pg_hba is more flexible, but if you have only one DB user... Yes, it's convenient to have a local "trust" connection. Also a little bit more insecure as well.
And there are convenient mysql* commands, you can tell they are mysql commands.
> What is an hba file? How am I supposed to know what it does?
From the section in the docs titled "the pg_hba.conf file", which explain it quite clearly? [1]
> PostgreSQL has a "createuser" command. Great, but how am I supposed to know this is a PSQL command?
Its not a psql command, its a client application. Which you are expected to know by reading either the documentation section on client applications [2] or the documentation on the administrative function for which that particular application is used [3].
Of course, you don't need the program for the function, you can just use SQL. Which, presumably, you are going to be reading the documentation on for any RDBMS, since even the ones that support the standard the best (e.g., between PostgreSQL and MySQL, PostgreSQL by far) still have some implementation specifics you need to be aware of.
> MySQL is more intuitive. Authentication? User/password.
PostgreSQL supports that, and configures it by default in the project-supplied installers (IIRC, for local connections only by default.) You only need to configure authentication if you are using different options, or want to support remote logins.
> Yes, pg_hba is more flexible, but if you have only one DB user...
You don't need a multiuser RDBMS at all?
> And there are convenient mysql* commands, you can tell they are mysql commands.
And how do you know that they exist or what they do without reading the docs?
Beyond the RTFM part of your post, it's fun that people are so defensive and don't understand what I'm saying.
"(createuser) not a psql command, its a client application."
Yeah, so postgresql is the only program that creates users for itself? Createuser is a generic name.
(Some) Postgresql client appplications have no prefix. MySQL apps have.
So that if I see a file called mysqladmin I know it's from mysql. If I see a file called createuser I don't have that information. Who says it's from pgsql and not somebody else?
> And how do you know that they exist or what they do without reading the docs?
An hba file is one of the configuration files. The converse question would be: What is a my.cnf file? How am I supposed to know what it does? The answer is the same for both: RTFM
The CREATE USER issue is just bad faith on your part. It is a de-facto standard, used by Oracle, Transact-SQL and ironically MySQL.
In the end, your complain boils down to:
1. Having to edit pg_hba replacing "ident" with "password"
2. Unfamiliarity with pgsql opposed to familiarity with mysql
A developer that chooses tools primarily based on familiarity is on the slippery slope to dinosaur land. Not that it's bad -- Cobol development still pays well to this day.
I'm not talking about "CREATE USER", I'm talking about /usr/sbin/createuser (or whatever it is installed in your system)
"A developer that chooses tools primarily based on familiarity "
Yes, let me waste time to learn how to use the latest fad on HN, sure (not talking about PostgreSQL here)
And PostgreSQL is made of unicorn blood and my projects will be 200% better if I use it instead of MySQL, sure.
It's a DB. There's a narrow range where it would make a difference to use it instead of scaling to more servers, for example. And you can "always" convert later. (Unless you're FB apparently)
Honestly, this is not my experience at all, and I'm not a DBA. I install postgres, I fiddle with setting up a user & pg_hba.conf to allow connections, and after that it's basically game on.
I've installed it on Linux and Windows and, no, its not. Its pretty much the same work as MySQL to setup, except the install usually wants information to create or use a system user for the DB server in the install, which is a trivial amount of extra work.
I understand this used to be an issue many years ago (IIRC, most of the streamlining of the setup happened in the first couple of major 8.x releases, in 2005-2006, about the same time as official native Windows support was added.)
For example, MySQL accepts both ' and " as string delimiters.
It's very fun trying to get around PGSQL not understanding what you want to do while MySQL understands.
It doesn't require ';' to mark the end of a command
> For example, MySQL accepts both ' and " as string delimiters.
That is, in its default mode, MYSQL does not accept ANSI-standard quoted identifiers (supporting only non-standard use of backquotes for quoted identifiers), and instead treats double quotes as equivalent to single quotes for strings. (Of course, MYSQL has a non-default option to support the standard.)
PostgreSQL, OTOH, supports the standard out of the box, and treates double quoted material as quoted identifiers.
Neither behavior is more finnicky, one is just standards-compliant.
> It's very fun trying to get around PGSQL not understanding what you want to do while MySQL understands
This would appear to only be an issue if your only RDBMS experience is with MySQL and you are adapted to its quirks, since if you have experience with other DBs which adhere to the ANSI standards for basic behavior, MySQL will, by default, be more likely to not understand what you mean than PostgreSQL.
SQLite is a great replacement for MySQL for many of the same reasons and in many of the same circumstances where MySQL is a decent replacement for a non-MySQL RDBMS. Its lighter, simpler, and less overhead for simple tasks.
- Replace MySQL with MariaDB (hopefully your tables are InnoDB and the transition will be "relatively" seamless). You can continue using the rest of your app as-is.
- Yes. There are many by the simple fact that a lot of people are unaware of what they're really doing and what impact certain indexes, queries and even to some extent, compile-time options will introduce.
- Schema is everything for an RDBMS. If people just take the time to calmly and carefully consider what it is they're trying to do and what options they'll truly need vs. just guessing what future "growth" would be like, they will have fewer actual (rather than imagined) growing pains later.
Of course, most of these problems could be mitigated if new projects started off with Postgres or MariaDB, if they're taking the RDBMS route.
I keep hearing people recommend MariaDB, which is fine. I'd add in Percona as well. I have/had a decent run with it, but MySQL limitations with querying make me look towards Postgres. SQL engines are hard to do right, no wonder most of MPPs are built on top of Postgres.
Except that I have coworkers that have no clue whatsoever and ran into deadlocks. Hmmm.. Deadlocks? That's probably a transaction problem. MS SQL is buggy, right? We just don't use transations anymore and pass on the wisdom that transactions are slow and don't work. The _transactions_ cause deadlocks. Evil transactions..
SQL works well, if you invest time understanding the theory and the limitations. If you care about the implementation you're going to use (whether it's MySQL, MS SQL, PostgreSQL or whatnot).
Unfortunately my rather sad experience is that people give a damn about it and will use the trial&error approach, until they cannot reproduce deadlocks/critical errors in their local 'runs on my Thinkpad' VM.
It's easy to manage schema with SQL Server Data Tools and diagnosing a poorly performing query is straightforward with graphical execution plans.
I wouldn't put too much stock in the graphical execution plans. Outside of the problem that it simply fails for complex queries, it lies about costings (costings are always based upon estimates, which means that when you have super-fast storage, as many new installs do, the query analyzer becomes a giant bag of lies), it can't "see into" table-defined functions (they sit there innocuously with negligible cost when often they sit in front of a curse of poor performance).
SQL Server is a great database, but it is very far from perfect. I mentioned above about the graphical execution plans lying about storage costs, and the same goes for the query planner -- it will choose horrific query plans under the assumption that you have one spinning disc of rust, falling apart when you're really on some FusionIO SLC. This can manifest in queries that run excellently one day falling completely to shit the next because it passed a threshold and has decided to switch join patterns. One would think that it would actively actually monitor storage/memory and CPU to make all of these weights, but it doesn't whatsoever.
The other major issue being lock escalation that can destroy SQL Server databases under heavy contention.
I feel that both your remark and the one you were reacting to are strangely correct and do not contradict each other. MySQL does "simply" work for many projects, especially if the number of rows doesn't go much higher than a few million.
Similarily, while the design of PHP must be considered terrible in some respects, it certainly allows you to quickly create webapps and ist does so in a way that is rather "native to the web".
But I think if you start non-trivial new projects on the PHP+MySQL stack today you are just lazy. Python (for example) with modern frameworks and Postgres gives you virtually all the advantages with fewer of the downsides and many nice extras like a more vibrant higher quality ecosystem and a better designed language.
Even trivial projects become non-trivial when they reach the scale of Wikipedia or Facebook.
I am a bit surprised that nobody mentioned MySQL lacking transacion support in this thread, because otherwise a lot of comments sound "I don't really try to use/learn it, but I heard it is a toy db, and PG is real DB". Well, ok.
MySQL lacks transaction support? Unless you use MyISAM tables, that isn't true since like forever. For the record, I prefer Postgres over MySQL any day.
There's a couple of things that you can do with PGs transactions that don't work in MySQL, for example transactions around DDL statements. That can leave you with quite a mess at your hand when you have to rollback half a database migration.
I've heard lots of complaints about mysql, do you know if query deadlocks are more common with mysql than with something like postgres? I've been running into that a lot lately, and I don't understand what the hell is going on. None of the queries I'm executing have particularly long run times (usually a few milliseconds, occasionally 100ms), but I still get random deadlock errors from time to time. I've never run into this on any other database platform (most of my experience is with Oracle and SQL Server), and I have no idea what the hell is going on.
We also have no idea whats going on with your system. I ran into this twice that I can recall. Once was a pathological SELECT FOR UPDATE where it was locking up way too much stuff, and another was a time where a boring looking update was locking a whole bunch of seemingly unrelated indexes.
So for the two examples I ran into all I can say is make your transactions as small as possible and take a close look at ALL the indexes especially if you think they're unrelated.
There's an option for logging the intimate details of deadlocks into the log file (or error file or whatever it was) anyway the main PITA is having to restart the server to enable it. On second thought the real PITA is operational in the "watched pot never boils" tradition as soon as you enable it the intermittent problem seems to go away, just dumb luck.
The query in question is an UPDATE mytable SET flag=1 WHERE ID IN(...) statement. I'm assuming that another system was in the middle of a performing a SELECT statement on the flag column.
In this particular case it's perfectly ok for the select statement to get slightly incomplete data (e.g. returning a record with a flag that might have just been changed), so it would be nice if I could configure the table so that select statements never lock. Is it possible to do something like that?
The first stack exchange when I googled was pretty interesting. For completeness, if you use innodb as your engine
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Seems to be the line you are asking for. There's the usual way to make it a system default in the config file. As for the difference between what you are asking to do, and what you should do, well... be careful. Especially if you're feeding the dirty read back into something that could lead to an update. That could be icky.
There's some way to do myisam stuff.
Generally I've found if you've got two things trying to lock one thing, overall processing goes a lot faster if one greedy dude grabs it for the whole thing. Cooperative multitasking doesn't work if it spends way more time lock switching than actually processing. Of course greedy processing is an excellent way to jam things up if you're not careful.
The funniest discovery I ever made about that related to importing zillions of rows with autocommit off it was dramatically faster rather than demanding an I/O store. This was the kind of data where if someone yanked the power cord it would be auto-re-imported without any loss; if your only copy of the data is in RAM and the INSERT you just ran, then autocommit off might be unwise.
Your IN (...) if ... is a select that might none the less be similar to my fun time a couple years back with dueling indexes. This is the origin of the whole never index more than you have to meme because it doesn't just waste storage and write time, an "innocent" looking pathological index intended to make some report run faster can kill read performance if its touches too many things other indexes touch while "separate" processing is happening. If you don't know A leads to B, B leads to C, C leads to D, then why does messing with A kill D performance when D should be totally separate, ah forgot about that peculiar linkage...
Note that none of this is a mysql problem. You can shoot yourself in the foot with a common 9mm or a weirdo caliber. More people get hurt by the 9mm not because it inherently magically hits feet more often but because theres just more people using it. I'm quite sure you can screw up Oracle or DB2 the same way if enough people try hard enough. I guarantee if you google you're going to find more "mysql did something weird" stories than any other DBMS simply because more people use mysql.
I have run into this from time to time, occasional deadlocks in MySQL. I think it was caused by occasional exceptions while in the middle of writing a transaction, then me leaving the connection open via connection pooling without doing either a commit or rollback.
The obvious way to fix this is funnel all database writes through a common set of code that always avoids a problem.
But there's another way I've fixed this, too. In the past I have also fixed this by moving my highest volume writes into a queue that happens in a few small threads that separate in a way that avoids all contention. This architecture lets me smooth out load spikes and gives better performance anyway.
I note for those who don't get to the bottom, there are comments there that suggest the author of the post you linked is incorrect. I'm not qualified to know who is right.
Are people, in general, aware that MySQL employs a pluggable storage engine? I wonder if much of the confusion about MySQLs abilities stems from arguments from people aware of it against those that have only ever used the stock engines.
Most people use the default storage engine, without really thinking about it, but almost every project I've worked with recommends innodb - not sure why.
Why is it that when it comes to databases no one is unbiased?
I cannot find any reliable articles on the web concerning database engines that I look at and trust the author. It's even more difficult considering I am far from a pro with databases but quite simply I won't use any MS products and so I've used MySQL for smaller projects. I've started using MongoDB a lot more lately and regardless of what people (mostly people who've never used it I would imagine) say about it I love it.
Comparing apples to oranges really, but I suppose it's more accurate if you take into account all the people who are (or could be using) a relational database without much issue and then switching over.
The article is right though, I've heard scaling MongoDB into the hundreds of gigs to terabytes is an absolute nightmare for the operation team, the software is actually very cool when dealing with reasonably sized data that has an elastic structure.
There are other "NoSQL" databases out there that pay more attention to scale, like HBase. My HBase cluster is just over a TB compressed, consistently churning 6k requests a second, without an issue.
> "In my eyes, MySQL has done great harm to SQL because many of the problems people associate with SQL are in fact just MySQL problems"
That's a very strong and subjective statement. If any, resources heavy databases like Oracle or PostgreSQL, had brought more users to NoSQL than MySQL. That's doesn't there is something wrong with those databases, only that they weren't the right tool for the job is some cases.
Also "MySQL problems" are most of the time due to poor usage, not to the database system itself, actually when used properly MySQL/MyISAM is a great tool
That is pure FUD. Postgres is no more resource heavy than MySQL (the stock config will only use 32MB of ram!), and will perform better on many real world workloads.
I think it's not resource heaviness per se so much as high latency (and low user-friendliness) in a dev configuration. Postgres takes noticeable time to start up, both server and client, and the client feels less responsive; its commands are also more arcane (e.g. mysql's "show tables" is something like "\d"). Postgres is quite possibly better for a "production" configuration, but it's much slower to develop with, so developers get to thinking of it as "slow".
I think a lot of the NoSQL hate comes from people who don't understand the use-case, and don't build High Availability/Low Latency systems. A lot of the things I work on require pre-summarized data to provide fast response times at scale (aggregating normalized data would be too slow) and a NoSQL DB (not necessarily Mongo, I prefer Couchbase for most things) with a simple but flexible data model that gives more control to the developer Just Makes Sense.
You would think someone who claims to be an expert on databases would have a clue about his industry.
MongoDB is a document database and is as different from almost every other NoSQL database as it is from every SQL database. It very much stands alone and requires your domain model to be structured in a particular way. To say that it "represents" NoSQL is ridiculous. And to act like a guide on how to scale MongoDB to store 100GB is a problem is also ridiculous. I can create domain models that almost every SQL database would struggle with that MongoDB could breeze through and vice versa.
And seriously anyone claims Cassandra or Riak are misspent adventures are simply delusional. They are solving real world problems in particular around horizontal scaling today that could never be done as cheaply or easily as before. A master-master cluster that costs nothing, scales linearly and can be managed by a developer. Would love to know what product existed years ago that could do that.
Likewise I take exception with the criticism of MySQL. It is an easy to use, manage and install and has the best tooling bar none. It does what it is intended to do perfectly. Some people who deal exclusively with ORM layers will never see the imperfections and just see the ease of use.
> I can create domain models that almost every SQL database would struggle with that MongoDB could breeze through and vice versa.
I am always interested in learning about models that SQL/"traditional relation model" can't easily (or less easily) represent or query. Last time I asked someone pointed me to Datomic's EAVT (entity-attribute-value-time) model as a good example.
One classic example is tree/graph structured data.
I've worked extensively with modeling and querying medical concepts and relationships in RDBMS. I realize there are tools like recursive common table expressions an materialized paths that can aide querying such data, but now that I'm working at a different job using neo4j, I can see how much simpler the medical informatics domain could be modeled and traversed in a graph database.
There's also all the models that requires a lot of partitioning because there will be a lot of data or of simultaneous queries. A relational database is only relational on a single server. Partitioning breaks the relationships (like, no joins between tables that exist on different servers). The new generation of SQL databases like VoltDB and Google F1 target this case, I don't know at which price.
>And to act like a guide on how to scale MongoDB to store 100GB is a problem is also ridiculous
While I agree with a good bit of what you say. I take issue with this. 100GB is not a lot of data by any modern definition. That this guide exists (or existed until it was removed) is a warning sign of hidden gotchas, which is I think what the original author was reacting to. When contemplating a change as big as mongo, those kinds of things make people whose butts are on the line for tech decisions nervous.
Also, that guide didn't say anything about what data structures were being used, the implicit assumption was that you had a properly structured mongo database to begin with.
If the article assertion is true, then what is the Postgresql of NoSQL? In OSS RDBMSs, Postgresql quickly rose to the position of best designed, less quirky contender (albeit waay slower than MySQL in the past).
I'd say RethinkDB (http://rethinkdb.com). It's labelled as not production ready, but most of the things that matter have been solidified (like data integrity).
It takes very good cues from both the NoSQL and SQL worlds...it's document-based but supports server-side joins. It uses MVCC (no global write lock) and while I haven't scaled it yet, the server administration is a lot simpler than MongoDB (`rethinkdb join [any_server_in_cluster]` vs creating 3 config servers and a bunch of repsets and setting up sharding schemas that just don't work). Also, the query language is crazy good. It's the only database where I've felt like I'm programming against my data (map/reduce doesn't come close).
There are some gotchas, like it doesn't have a binary storage field type, and to upgrade versions you have to dump/restore your data. I'm sure there are more. Like I said, not "production ready" yet. Give it a year though.
That isn't even remotely close to true. Single threaded "insert 10,000 rows" benchmarks had postgresql slightly slower, not "waay slower". Concurrent access benchmarks have always had postgresql faster than mysql.
I'm old. I know that there was no postgresql 4. The first release of postgresql was 6, and what I said was accurate as of the release of version 6, in 1997 or 1998.
I never expected the very geek joke to be understood. Very cool!
On a more serious note, I didn't mean to support the ever old myth that postgresql is slower than mysql. It hasn't been the case for over ten years, and all the while being a better database in every aspect.
However, go back enough and it was slower (than mysql) for real-world loads. You could get near enough if you disabled fsync, but then were negating many of the advantages of pgsql. Even then, in the old days, table based locking would kill a real database. Table based locking was solved early enough ('98? '99? around that time). Fsync was solved with WAL, in the performance effort of the early 7.x series. It was around late '2000 that I came back to postgresql for good. Until then, it was either mysql for a quick hack or oracle for anything serious. Since then, it's postgresql for everything (I don't deal with the scenarios where Oracle still leads).
My exposure, as far as SQL goes, has been MsSQL, PostgreSQL and MySQL. With absolutely loads of MsSQL, and pretty hairy data structures at that (they have an aversion to the polyglot approach where I work) - SQL is one of the most enjoyable things that you can do once you get past the elementary "SELECT WHERE" (and stop using graphical development tools).
I have been saying what Markus said for a long time, MySQL is why RDBMS has a bad name and why the "not using SQL" movement exists. I really hope the MariaDB team spend their time where it is needed most (a join engine that doesn't suck), if they haven't already.
I mean, at the end of the day you have MySQL that can't even do hash joins, and then you have PostgreSQL with GEQO: http://www.postgresql.org/docs/9.0/static/geqo-pg-intro.html