Hacker News new | past | comments | ask | show | jobs | submit login

I'm curious about your previous set up. You said mod_perl2, MySQL and Apache2 weren't cutting it for you, but I've scaled to that level fine with mod_perl2, PostgreSQL and Apache2. The key was database pooling via Apache::DBI, all the perl modules cached in a startup.pl via mod_perl2, keep alive and host lookups (and a few other things) completely off in Apache2, and all queries using indexes and indexes all in memory via postgres.

Early on, like in your situation, I did go to the file system at first because I couldn't "make it work," but then eventually I went back to postgres after I figured out its scalability details. If you have a lot of little files and you hit them a lot, that will eventually probably become your bottleneck. Did you figure out the bottleneck in your db setup or has there just not been enough time yet? Just curious.




Ah, a fellow mod_perl hacker. :)

I'm using Apache::DBI, caching everything via startup.pl on load, keep_alive is on but with a very small timeout, host lookups are off.

Also I'm using the worker MPM in apache2, just FYI. I've found it to be really memory efficient.

I've been using MySQL with Apache::DBI for years and it's usually brilliant - I ran WorkZoo.com, a high traffic job search engine with a combination of MySQL and a full-text api.

With feedjit I'm basically storing weblogs. I either have to dump them into a single table and query that - which is what I was doing and the high query rate with read/write was a problem - or have lots of individual tables which isn't feasible after about 500 with MySQL. So small files works best for me.


We're talking one file per unique domain or per unique url?

Also, I take it from your comment that the bottleneck was in MySQL doing the writes. I assume the read side is indexed appropriately so MySQL finds the right part on the disk almost instantly. Do you think it is a locking issue then, e.g. table lock vs row lock? (Forgive me, I haven't used MySQL in a while.)


One file per URL. The problem with MySQL is pretty much what you've described. I have (had) an index on a table that gets read a lot by the application. It's amazingly fast - MyISAM table's really rock for fast reads on indexes. But therein lies the problem because it also gets written to a lot. Every time it gets written to MySQL needs to lock the table and rebuild the index.

You can improve things a bit by using INSERT DELAYED. When you use that, mysql doesn't guarantee that it'll insert the row immediately, but the mysql query returns immediately when you do the insert (it doesn't block) and mysql queues up inserts and inserts them in bulk when it feels like it. The non-blocking and bulk inserts that INSERT DELAYED give you speed things up, but only to a point because you're still constantly rebuilding an index on a table that's getting a lot of reads.

Mark.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: