Hacker News new | past | comments | ask | show | jobs | submit login
Moving from PHP to C saved my web startup.
61 points by BuckToBid on Oct 22, 2010 | hide | past | favorite | 83 comments
A few months ago I took the journey into startup land. I quit my job and decided to finish the project I had been thinking about finishing for a long time. A penny auction app where users are linked to facebook identities.

One major worry right from the start with this project was that it required a real time accurately synchronized countdown across multiple browsers. I was using ajax to call back to the server for an update every 1 second to accomplish this. I knew it was going to get interesting if a lot of users joined up because they would all be calling the server once every second.

Initially I thought we would run into trouble with bandwidth, but I was wrong. After 4 days we had about 150 users up and bidding when the server decided to crash (right in the middle of an ipad auction just because my luck is that good). Being on slicehost they had us upgraded within 20 minutes, but still we were working the server hard. The timer was skipping, we were getting all kinds of strange errors in the logs and things were looking pretty bad. It wasn't bandwidth, but memory and processor usage that were the problem.

I knew from reading articles on HN regularly that PHP is terrible, but I build the app in PHP anyways along with the 1 second callback. It was the fastest way to get things up and running for me.

That night I had an idea, if I built the 1 second call back portion of the app in C the server wouldn't have to load apache with all its extensions every 1 second. So I built a very simple fork server to send updates back for the real time update.

Result: Processor 99% idle even during heavy use and memory usage basically stays exactly the same whether or not more people are watching the auctions.

We now have a few thousand users and things have not changed at all, still running great. Of course this system will have its limits but so far we have not even dented it.

If you want to see it in action you can go to http://apps.facebook.com/bucktobid

Edit: For those who want a little more detail I have a lighttpd server listening on port 80 that redirects to apache for php calls. If the call comes in for .btb (a made up extension) lighttpd redirects to the C app which listens on another port locally and serves the needed info to the browser. The updater is 100% C/C++ not an apache module.




So it really wasn't PHP but Apache right? In an hour you could have switched out your front end server to Nginx and had it serve responses from Memcached and then keep the Apache/PHP backend and change it to update Memcached on bid changes.

Here are some links:

http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-b...

http://lserinol.blogspot.com/2009/03/speeding-up-your-nginx-...


So instead of understanding the problem and implementing the simplest possible solution, you're recommending throwing more middleware caching crap at it? Brilliant engineering. I think I understand "web scale" now.


How is placing the blame on the wrong tool understanding the problem?


You're right, I should have read the OP more carefully.


On the frontend I was already using lighttpd, but I could have used memcached on the backend that was going to be the next step if the c app didn't work. It just worked so well I didn't have to worry about it.


I didn't mean to poop on your parade. I love writing servers too but if you have a public facing HTTP server there are a ton of obscure edge cases to worry, like some script kiddy DOSing your server by either opening a connection and keeping it open without sending anything or dribbling out a character a minute to circumvent your SIGALRM handler.

I'd think about switching back to a public interface that has been through these battles before. You could always keep your server up and use Nginx or HAProxy to front it for you and just pass the requests on.


Yeah I have run into some of those problems already. Thanks for pointing that out. You definately need a SIGALRM handler if you are thinking of doing something like this. But I don't really see how switching to Memcached is going to prevent a DOS attack. Not trying to be defensive, just really wondering if there is something I'm not getting about that.


I'm not trying to be offensive either :) I was just warning that putting up a public facing server is eventually going to make you a target. Using a battle tested server will let you concentrate on getting more people to use your app instead of fighting bored 13 year olds trying to bring your server down.

Good luck with your app!


Thanks! I understand your point, but a fork server is so simple I don't think there is much that can go wrong there, if it was serving more data I would be worried, but its meant to do a quick and short reply.


it's not called a forkbomb for nothing


http://en.wikipedia.org/wiki/Fork_bomb

Fork Bombs are easier to pull off from the command line. Not so much with DOS. I would say that loading a large webserver for each request would use up more resources than a tiny C program. Thats the whole point of the post.


You're right about the webserver being overload. Threads would be less expensive than processes.

I would use Go for this service myself. Defending against abuse is hard whichever way one goes.


Writing your own server in C seems a little crazy anymore. Just use a C++ framework like POCO or ASIO. POCO gives you a fast httpserver component with all the tricky stuff figured out.


I couldn't agree more. Nginx and HAProxy will seem like miracle workers after seeing Apache's performance.


Also because of the need for consistency and latency I'm not sure memcached would be a benefit because it could never be scaled anything but vertically anyways.


Memcached? Only vertically? Horizontal scaling is built-in in all memcached clients.


yes, I was just saying that we can't wait for replication, when someone places a bid all the others users must have that information immediately.


memcached doesn't do replication as far as I know. It will just have to do some requests second time. It still beats re-calculation on each request.


Replication is the wrong word for it. Key distribution is what you want. If a given memcache server goes down then the keys it was storing should get reloaded to a different one in your pool by your cold cache loading code.


Good for you! I read all of the other guys asking, "but why didn't you do this? ..." I myself am a throwback from the C/C++ days and I frequently re-write stuff in C. That's how it's supposed to be done. Write as much as you can as fast as you can up-front, then optimize your bottlenecks using more efficient methods. I'm happy that C improved your situation that drastically. I agree that the overhead of lighttpd, then apache, then php might have been the real killer in your situation, and using memcache might have helped also, but making a simple C server using fork() to handle processes and not opening it up to the world is a very good solution in my book. People think that if enough people start writing things in C, that they'll have to start doing it too - I think that's the reason for all of the backlash. Remember, there are a bunch of C and C++ programmers out there doing things that perform well and scale, but your audience on HN is mostly php/python/java programmers and web startup people who go the route of optimizing using more trendy technologies instead of down-shifting into a language like C. More power to you, fellow C programmer! :)


The more generalized takeaway from this is that you shouldn't use a heavyweight listener to handle polling (or websockets in the near future) if you can avoid it.

PHP was the culprit here, but I can't help but think you'd have had the same problem if you were trying to do the same thing with RoR, any Java app server, or any of the Python frameworks.

Likewise, node.js or Twisted probably would have been an equally effective replacement.


I'd probably go with node.js for this task - more manageable. Twisted has a huge learning curve.


If you want to stick with Python, tornado is a really good async server without all the learning curve of Twisted.


I wrote my own async servers and having done that (and used Tornado), I'm back on threads :) Async is kind of cool in some cases (long polling for one), but for most cases it's pain.

Right now mostly working on CherryPy's WSGI server.


Why do you use AJAX to update countdown?

Using AJAX for this would give you MUCH less accuracy than plain-old JavaScript with time delta of user time to server time:

If you just need to time sync - you could receive server time once (remembering user time, when you sent request for server time), then just compensate user time with that value.

In PSEUDO-JavaScript (client-side):

  // do this once:
  var user_time = (new Date()).getTime();
  ajax.call('/server-time-in-sec-since-epoch', 
    callback: 
      delta_time = recvd_server_time - user_time;
  )
  
  // then at any given second real server time is:
  var current_server_time = (new Date()).getTime() + delta_time;
  // no need for ajax calls
For more accuracy you should divide delta_time by 2 (since it's round-trip).


The reason it has to continually sync is that any user could place a bid at any moment. This makes the timer increase and top bidder change, so every user must be notified of the change. Its asking the server for how much time is left in each auction not what time it is in the real world. Sorry for the confusion.


Why don't you use long polling rather than sending a request a second?

You could even incorporate the timer into a single request.

    do {
        sleep(1);
        $seconds_remaining = fetch_auction_time_from_memcache();
        echo "<script>updateActionTime($seconds_remaining);</script>";
        flush();
    } while ($seconds_remaining > 0);


This is interesting. Would this work if I was on the site for say an hour? Or is there some limit to the amount of time you can send data like this? I have never tried anything like this. What is the overhead like?


I've only used it on smaller projects. Facebook use it on their frontpage for progressive loading (check the source for `<script>big_pipe.onPageletArrive(..);</script>`).

Longer running connections are a little more problematic, but some client side code should be able to handle the connection being closed. You just need to make sure your web server can handle the number of connections your are expecting. Apache is particularly bad for this. Something like nginx should perform better.


Not hard. The client could notice if the connection dies (JQuery has AJAX failure callbacks, for example) and then attempt to reconnect; that's the important thing.


I was going to suggest long polling as well. Seems to make way more sense than normal polling for this application.


Personally I'd go with simple Python WSGI app or node.js app. Actually even PHP can handle 150 reqs/sec (your load at 150 users with 1 req/s) if used as nginx+php-fpm+eaccelerator. By my measures it can actually do about 1000 reqs/sec.


Yes, it would handle it and we could have kept scaling the server up to handle more and more of it. With the C app all those problems went away. It has almost no footprint and uses almost no processor power.


So... what you built was a Facebook version of https://www.wavee.com/ (or any one of the other dozens of sites), which is basically a way of taking foolish people's money.

Replacing PHP/Apache fork/threads with a C daemon is a good migration, though most any language with an async sockets library worth a damn should be able to handle thousands of simple requests every second.


Yes there are many penny auction sites out there. The whole reason to link it to facebook is so that you know its a real person you are bidding against. How do you know wavee has all real people bidding? You don't some of their users could just as easily be bots bidding items up automatically.


The thing is, Wavee is not an auction site. An auction site is one in which bids are free, and the winner pays what their bid says they pay. What Wavee had, and what you have built are, effectively, a method of taking money from people for the opportunity to pay money for something.

Don't get me wrong, it's an amazing racket; Wavee makes 75 cents for every bid coming in to increment the value by 1 cent. Saw a $150 iPad. That iPad has already earned Wavee's owners $11,250 without even being sold! And the use of Facebook to gain the trust of people is a good marketing tactic, but it doesn't skirt the fact that creating fake Facebook users with a bunch of friends is easy (get some pictures, feed some content in from any one of the millions of open twitter accounts, etc.), or that you have a real incentive to perpetrate fraud.

Really though, being the most honest crook among crooks still leaves you being a crook. That's why some countries have outlawed this particular kind of scam.


So now you have just jumped straight into calling me a crook. I'm sorry that Wavee or whatever site you went to took your money.

I honestly didn't know (and still don't) if you can make any money doing an honest penny auction site. We are just breaking even with this one so far.

The fact is that every time an auction goes up we have more risk than anyone else involved. If I put up a $500 iPad it could go for extremely cheap. The lowest one has ever gone for is 64 cents. That means we made less than $64 and still had to buy and ship that $500 iPad. And we have never sold anything for more than $26 which still isn't the $2600 you think it is because some packages give you more bids per dollar.

The legitimate complaint is when sites cheat. Which I would guess alot of them do. Using Facebook is not some trick to get people to trust us. Its a way for people to verify for themselves that they are in fact bidding against other real people.

How can you say that its a scam? Do you think we are just out there creating all these fake accounts with fake profile and personal photos and relationships etc. I can see maybe if we had 5 or 6 or even 20 users. But we have 100's of winners. That's a stretch even for the most paranoid and cynical of people.

Not everybody wins every time. But the data we have so far says that the majority of users that buy more than just a few bids are actually the ones getting all the deals. Its the people who come in and spend $24 expecting to win a $1500 item then leave when they don't. Those are the people who are losing. The users who are logical enough to see how the system works are the ones who get far more than they put in.

And some countries outlaw all kinds of crazy things, some countries are considering passing legislation to ban homosexuality so I guess we should all agree that its bad now too according to your theory?


I'm not so foolish to have spent money at any of those sites. But I do stand by my statement of calling you a crook. I don't believe that I will be able to convince you, but I do hope that I'll be able to convince others.

Let's give you the benefit of the doubt for a moment that you are actually honest. We'll say that you aren't running bots, fake identities, etc. That's fine. My basis in calling you (and those that run businesses like yours) a crook is not founded on that (though I have no doubt that other companies are doing as much, if only to boost the value and bidding on an item, but I digress).

Try to remember that the fundamental operating principle of your business being profitable is your selling the vast majority of your customers absolutely nothing. They aren't getting a good or service for any bid that doesn't win (which still costs them money). They get nothing.

The money to purchase those items must come from somewhere. If you are breaking even (as you say you are), it's not coming from the people who are "winning", it's coming from the people who are losing. If/when you are making a profit, it's not because the "winners" are necessarily paying that much more for an item (they won't have bid enough plus paying for the total value to pay for the item itself), it's because you've got more losers who are putting money into something without getting anything in return.

Your business is breaking even, and may eventually be profitable because of all of those who come in buying $24 worth of bids, failing, and leaving. Any business that requires it's customers be ignorant in order to make money is fundamentally a scam.

Also, your conflating countries making homosexuality illegal with countries making illegal a business that bases it's operations on exploiting the ignorant, is a fundamentally flawed argument. One is based on basic human rights to be who they are and behave in ways that causes no harm to others. The other is one that profits from people who don't know any better. One is a human rights travesty, the other is the outlawing of an enterprise with margins that organized crime wishes they could have (in the case of a "successful" site like Wavee and others). Trying to claim their equivalence, or that based on "my reasoning" they are equivalent, is dishonest, and really, troll-like behavior. That may fly in some forums, but it doesn't fly here. Try again.


> The users who are logical enough to see how the system works are the ones who get far more than they put in.

Your business model requires minimizing the number of those people and maximizing the number of suckers. There's a reason they're seen as scams and exploitative.


Polling in real time always needs to be done in some compiled language. The good folks at 37 Signals ran into this when they launched their CampFire app.

For instance, consider what David Heinemeier Hansson says about Campfire, the chat software he helped developed. First written in Ruby On Rails, it soon became clear that the code that polls to see who is in the chat room needed to be as fast as possible:

"We rewrote the 100 lines of Ruby that handled the poll action in 300 lines of C. Jamis Buck did that in a couple of hours. Now each poll just does two super cheap db calls and polling is no longer a bottleneck. Campfire and a shared todo list is different because they’re not working on a shared resource. There’s no concept of locking. Or two people dragging the same item. So a 3 second delay between posting and showing up doesn’t matter. It does when you’re working on a shared resource."

http://www.ruby-forum.com/topic/62907

Later they tore out the C code and re-wrote it in Erlang.


...which [Erlang] is not compiled.


False. Erlang is compiled natively via HIPE (the "High Performance Erlang" compiler, nice acronym!) on many platforms, and compiled to BEAM bytecode on the rest.

Running Erlang has some overhead, sure, but that's because it's designed for distributed systems where you can pull a plug out of the wall without interrupting service. I wouldn't use Erlang for number crunching, but using it as a glue language for a networked system hits all its strong points.


I didn't know that, I thought everything ran on beam. I guess the point is the canonical Ruby implementation is really slow when it doesn't need to be. It's not like it's a hard problem, or even that it hasn't already been solved (look at GemStone's Maglev: http://ruby.gemstone.com/)


I've stopped commenting about MRI entirely, it just makes people mad, and it's not even fun anymore. It's too easy. Still, I have to give Matz credit for making a language a lot of people sincerely love.

I know it's splitting hairs whether bytecode + a VM counts as compiled or interpreted (it's both, really), but compiling to bytecode rather than a pure interpreter usually makes enough of a difference performance-wise that it's worth giving some credit.


If people would just stop talking about Ruby 1.8, it wold allow for more meaningful discussion. 1.9 is a lot faster and has better support for concurrency.


Indeed, it's too bad that the transition has been taking so long.


FYI, compilers are written completely different from interpreters. Bytecode languages are compiled as well.


I know, but often when people make that distinction, it's concerned with what performance ballpark the language has rather than strictly implementation issues.


keep-alive

it doesn't make sense to make a new connection every 1 second, esp. to apache. that is where your prob was, not php


Not to be the unpopular one but couldn't you also have used Flash to open a socket to a server written in [name your favorite language] and had the server intermittently (every 1 second, every 500ms) send out the current tick?


Every time I read a story like this it reminds me of a very important lesson:

The world is built from bailing wire and duct-tape. There are probably a million better, smarter, less technology illiterate ways to solve this problem, but that really doesn't matter. What matters is getting out there and doing it. Being able to do this sort of app 'correctly' would be an edge, but only to the person who can do it 'correctly', and is out there doing it.

I wish you the best of luck with your duct-tape!


> I knew from reading articles on HN regularly that PHP is terrible, but I build the app in PHP anyways along with the 1 second callback

You always have to use the right tool for the job. It requires a deep understanding of what is actually going on inside the server when you write a line of code. PHP doesn't magically absolve you of that.

It really has nothing to do with PHP, C, Ruby or [insert your most reviled/loved technology here]. Calling a complex runtime for hundreds of near-contentless requests per second on a single machine is a really bad idea, no matter what environment you use.

Also, I'm sorry if I snipe from the cheap seat here, but 1 request per second per user doesn't seem like a great solution to your problem either. It might be more appropriate to just leave the HTTP connection open and push new data out through that when it becomes available, e.g. when something about the bidding process changes.


Instead of loading apache for each php call, why not have a few PHP FastCGI instances running? They're lighter weight than apache+mod_php and you don't have to wait for them to load for each call?


Mostly just because I've never used FastCGI before, and I have played around with making C/C++ servers for fun before. Seemed like the fastest solution and so far its worked out better than expected.


This is a bit of an oversimplification, but where CGI spawns a new process for each connection, FastCGI* starts the process once, then runs a loop to handle each connection, so the process startup, database connection, etc. costs amortize to essentially nothing -- many of the constant factors for working in a higher level language are eliminated.

FastCGI is worth looking into - a lot of popular webservers support it, and it's less of a complete model change than switching to an event-based system (e.g. node.js) or an MVC framework.

* Or SCGI, which is a newer, simpler design with similar goals.


Node.js worked great for me for similar application. Built the web application in PHP and timers with Node.js


Was it an auction site as well? If so which one if you don't mind me asking?


Yes, it was a penny auction site. I can't give the URL as it is not my website. I assisted them with development.


Why not use a js ntp library?

http://jehiah.cz/a/ntp-for-javascript

Then have a dedicated ntpd daemon running. That will be way more extensible, maintainable, and scalability than a custom C program.


The timers are all set differently, we can have 9 seperate auctions going at once and all have different times they end at. Also when someone bids the timers increase. So they are not only all different but all changing constantly. Its not just a single countdown that we could sync.


can't you use a comet connection with something like orbited instead of polling? i wouldn't trust polling via HTTP GET with "real-time accurately synchronized countdown" -- a couple of small delays can skew your entire countdown, and delays are easy to come by across the internet. especially with creating multiple connections.

as much as i dislike like it, i doubt the problem has much of anything to do with php. a simple fastcgi server hooked into lighttpd would have probably had the same outcome of better performance, or even apache with mod_php.


So you saved this app by using fork(), not (just) by using C. You can use fork() from a lot of languages, including shell scripts and PHP.


php isn't terrible in general, but it can be the wrong tool for certain use cases, this being one of them. In the same way, apache is also not the right tool for some use cases, this being one of them. your solution works well for you, so stick with it. if you're thinking of scaling up further, also consider an event-driven server architecture (nginx, node.js, etc).


Agreed that PHP isn't terrible. Typically, it's the programmer's fault before it's the language's. Also, I agree that PHP isn't the right solution for every problem, but if you're set on using PHP for the wrong problems and need to bitch about it, see the previous sentence.

A nice discussion on why PHP DOESN'T suck: http://stackoverflow.com/questions/309300/defend-php-convinc...


It's not that PHP is terrible. You can definitely write code in it that gets shit done.

It's that it exists in the same universe with Python, Ruby, Java, Clojure, Erlang, etc.

If you have the freedom to choose your implementation language, and those are your alternatives (to name just a few), that's when PHP is harder to justify. Not that it's terrible in isolation. In relative terms.


Lots of really good comments here (except mine ;-)). I'm glad I read this. Is there a "Best of Hacker News" out there?



Any idea on how to subscribe to Best of HN?


could you give more info about how you did it? I need to implement something like this for a frequent Ajax refresh.


You could google how to do a simple fork server in c/c++ and after you have that working you just need to do something like this:

stringstream response; response << "HTTP/1.1 200 OK\r\n" << "Server: BTB Auction Updater 1.0\r\n" << "X-Powered-By:BTB Update Engine\r\n" << "Content-Length: " << msg.length() << "\r\n" << "Content-Type: text/html\r\n\r\n" << msg;

  int n = write(s, response.str().c_str(), response.str().length());
  if(n < 0) error("Error writing to socket");
to write back a valid header that a browser will understand


You could store the static (non-changing) text in a couple of C strings and then call writev() to communicate everything. That would save you the step of continuously reconstructing the header.


Thanks for the suggestion!


Sure thing, what part are you stuck on?


thanks for the reply. I haven't started to code, but maybe in some weeks. It's more a chatroom with long polling that I will be building. All your app is in one dedicated server?


Yes, right now its all on one server. The parts of the app that serve the pages you see are separate from the 1 second updater and could be on separate servers.

Right now the C app listens on a different port that lighttpd will redirect to if a pertinent request comes in.

There is a parent process that will fork upon connection and pass the socket to child which pulls the requested data and quickly returns it.

The one thing you will also need if you go this route is a SIGALRM handler. You just basically call signal() and set a flag before each socket read and write. So if the socket takes too long to return the signal gets called and kills the child process. Then you clear the flag on the other side of the socket read/write call so that the signal doesn't kill the process if the call returns on time.



you can embed your own C code without the hassle to handle HTTP/S (or forkbombs) using KLone web server; it can easily handle thousands of requests per second.

p.s. I'm part of the company whom made it.


Did you create an apache module? If not, what libs did you use?


I completely bypassed apache all together. Apache is not used for the 1 second ajax call at all. The only libs i used were jansson for easy json manipulation and mysql++ for database access


Couldn't it just be that you suck at writing PHP?


have you considered using rabbitmq/0mq? It sounds like they are perfect candidates for what you are trying to achieve.


I had never heard of those before. Are they capable of working inside any web broswer? Looking it up right now.


Yes, you can reach a queue via a web browser using the proper api access you provide.

If you have 100 users, each one waiting for a message in their queue, then you can "broadcast" a single message to all of the queues (or a group) with one command. All the users waiting for a message in their queue will get a copy.

Another approach which is hack-ish imo, is to use something like jabberd to broadcast messages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: