Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Archive a MySQL database
31 points by wenbert on Jan 19, 2009 | hide | past | favorite | 18 comments
How do you archive a database? I have this database and it has been receiving data for the last 3 years and I would like to archive the old data WITHOUT affecting the current data (current transactions, etc.). The database is heavily used and is close to a million records and on some cases it takes a little longer for the complicated queries to complete...

What is the best way to do this? I have no experience on this. In fact this is the first time I have handled a database with this volume. Perhaps you can direct me to a good resource, etc.

I am using MySQL5 on a LAMP environment. If it requires downtime, I can sacrifice a day for this.

EDIT: If I backup the entire database, I would still have "old" records and the database would still be at the same size.





Thanks for this. This is what I am looking for.


This is the correct answer.


Here is what I do with mysql 5.1 and my innodb tables on Linux...

mysqldump --single-transaction --skip-lock-tables --all-databases | gzip > "/some/directory/backup-`hostname`-hourly-`date +%H`.sql.gz"

These two options are important: --single-transaction and --skip-lock-tables. At least in mysql 5.1, mysqldump defaults to locking each table before dumping it (not good if you have a busy app and large db).

I run the above in a cron once per hour. It gives me 24 hours worth of hourly backups. I then use rsnapshot (http://www.rsnapshot.org/) to store daily, weekly, and monthly backups offsite.

I also run my backups with a low priority (nice -n 19) so that my application gets higher priority than the backup script.


One last thing... mysqldump has the following option:

-w, --where=name Dump only selected records; QUOTES mandatory!

You can probably use this to backup just the rows you are interested in and then delete them once you are sure the are properly archived.


One of the ways we archive data is by setting an archive cutoff point. So say we only need the last 6 months of data... Every month we will run a script that takes anything older then 6 months and move it to a separate table (or database). Obviously this solution isnt perfect for every situation. But depending on how much data you have for each row, a million rows usually isnt all that much. So far example we may have a table called "data" and a second table called "data_archive". This will keep the data table and indexes small and manageable.

You should of course still backup your data since the archive is really just a reference.


This makes sense. But I imagine it that it will be complicated provided that one table has a lot of other dependent tables associated with it. Am I right to assume this?

Perhaps if I had foreseen this, I would have made my database schema easier to archive. :-(


If you can't afford downtime, check out Innodb Hot Backup (a commercial product sold by oracle).

If you already happen to be using LVM, check out this post: http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-for...

Otherwise, you're going to have some downtime, which could range from a few seconds to a lot longer depending (mostly) on the size of your data. The simplest way is using mysqldump, but that may be too slow (probably not though, unless you have really low-end hardware, 1m records isn't that many).

There are faster (and less straightforward) ways, which basically involve: -stop the server -copy the data directory somewhere else -restart the server

Google is your friend here. If you have a spare server, you might also consider setting up replication so that the next time you want to back up your data, you can take it from a slave and it won't require any downtime.


This technique using LVM snapshots always struck me as the optimal way to achieve negligible-downtime backups of even largish databases:

http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-snap...

(Haven't done it myself, though.)


We've been using this at work for a while now, and it's a good way of getting consistent backups. Only drawbacks are that I think it only works with MyISAM tables, and writes block for a few seconds while the snapshot is created.


Another thing I am curious about. When archiving/purge/whatever it is called; what happens to the dependent tables?

For example, the Header-to-detail relationship between tables.

Anyway, I will be trying out the Tools found in this link: http://www.maatkit.org/doc/mk-archiver.html


If you're using MyISAM, you can use MERGE tables to split data over multiple tables, or you can use partitioning. Then when you decide one of the 'sub' tables is 'full' you can take it out of the MERGE table, do a read lock, rsync it to a backup server, then drop it off the main server.


  # mysqldump -u (user) -p (database name) > (destination filename)

  mysqldump -u root -p some_project > some_project.sql

  # to import:
  # mysql -u (user) -p (database name) < (backup file)

  mysql -u root -p some_project < some_project.sql


Read up on WALs and incremental backups. I know postgres has support for these and MySQL should too.


You could use PHPMyAdmin:

Go to the "export" tab and use the "Save as file" option


Exporting the database? But then I would still have a lot of rows in the database. What I was trying to say is that I archive the "old" records in the database and still have the currently used ones available. How do you go about this?


What is your criteria for a record being "old"? Once you've established your criteria, just archive the database as it is now and then drop the rows that match your oldness criteria.


I imagine that I have also to query down to the dependent tables?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: