Ask HN: Archive a MySQL database

tmm1 · on Jan 19, 2009

http://www.xaprb.com/blog/2006/05/02/how-to-write-efficient-...

http://www.maatkit.org/doc/mk-archiver.html

wenbert · on Jan 19, 2009

Thanks for this. This is what I am looking for.

jfarmer · on Jan 19, 2009

This is the correct answer.

chadr · on Jan 19, 2009

Here is what I do with mysql 5.1 and my innodb tables on Linux...

mysqldump --single-transaction --skip-lock-tables --all-databases | gzip > "/some/directory/backup-`hostname`-hourly-`date +%H`.sql.gz"

These two options are important: --single-transaction and --skip-lock-tables. At least in mysql 5.1, mysqldump defaults to locking each table before dumping it (not good if you have a busy app and large db).

I run the above in a cron once per hour. It gives me 24 hours worth of hourly backups. I then use rsnapshot (http://www.rsnapshot.org/) to store daily, weekly, and monthly backups offsite.

I also run my backups with a low priority (nice -n 19) so that my application gets higher priority than the backup script.

chadr · on Jan 19, 2009

One last thing... mysqldump has the following option:

-w, --where=name Dump only selected records; QUOTES mandatory!

You can probably use this to backup just the rows you are interested in and then delete them once you are sure the are properly archived.

agotterer · on Jan 19, 2009

One of the ways we archive data is by setting an archive cutoff point. So say we only need the last 6 months of data... Every month we will run a script that takes anything older then 6 months and move it to a separate table (or database). Obviously this solution isnt perfect for every situation. But depending on how much data you have for each row, a million rows usually isnt all that much. So far example we may have a table called "data" and a second table called "data_archive". This will keep the data table and indexes small and manageable.

You should of course still backup your data since the archive is really just a reference.

wenbert · on Jan 19, 2009

This makes sense. But I imagine it that it will be complicated provided that one table has a lot of other dependent tables associated with it. Am I right to assume this?

Perhaps if I had foreseen this, I would have made my database schema easier to archive. :-(

brianr · on Jan 19, 2009

If you can't afford downtime, check out Innodb Hot Backup (a commercial product sold by oracle).

If you already happen to be using LVM, check out this post: http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-for...

Otherwise, you're going to have some downtime, which could range from a few seconds to a lot longer depending (mostly) on the size of your data. The simplest way is using mysqldump, but that may be too slow (probably not though, unless you have really low-end hardware, 1m records isn't that many).

There are faster (and less straightforward) ways, which basically involve: -stop the server -copy the data directory somewhere else -restart the server

Google is your friend here. If you have a spare server, you might also consider setting up replication so that the next time you want to back up your data, you can take it from a slave and it won't require any downtime.

gojomo · on Jan 19, 2009

This technique using LVM snapshots always struck me as the optimal way to achieve negligible-downtime backups of even largish databases:

http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-snap...

(Haven't done it myself, though.)

chrisbolt · on Jan 19, 2009

We've been using this at work for a while now, and it's a good way of getting consistent backups. Only drawbacks are that I think it only works with MyISAM tables, and writes block for a few seconds while the snapshot is created.

wenbert · on Jan 19, 2009

Another thing I am curious about. When archiving/purge/whatever it is called; what happens to the dependent tables?

For example, the Header-to-detail relationship between tables.

Anyway, I will be trying out the Tools found in this link: http://www.maatkit.org/doc/mk-archiver.html

chrisbolt · on Jan 19, 2009

If you're using MyISAM, you can use MERGE tables to split data over multiple tables, or you can use partitioning. Then when you decide one of the 'sub' tables is 'full' you can take it out of the MERGE table, do a read lock, rsync it to a backup server, then drop it off the main server.

thomasmallen · on Jan 19, 2009

  # mysqldump -u (user) -p (database name) > (destination filename)

  mysqldump -u root -p some_project > some_project.sql

  # to import:
  # mysql -u (user) -p (database name) < (backup file)

  mysql -u root -p some_project < some_project.sql

newt0311 · on Jan 19, 2009

Read up on WALs and incremental backups. I know postgres has support for these and MySQL should too.

slater · on Jan 19, 2009

You could use PHPMyAdmin:

Go to the "export" tab and use the "Save as file" option

wenbert · on Jan 19, 2009

Exporting the database? But then I would still have a lot of rows in the database. What I was trying to say is that I archive the "old" records in the database and still have the currently used ones available. How do you go about this?

elijahbuck · on Jan 19, 2009

What is your criteria for a record being "old"? Once you've established your criteria, just archive the database as it is now and then drop the rows that match your oldness criteria.

wenbert · on Jan 19, 2009

I imagine that I have also to query down to the dependent tables?