Eek... that's a rather unfortunate name choice. The "Gor" saga is an infamous series of books spanning the past 50 years, basically the sci-fi/fantasy equivalent of "Fifty Shades of Grey".
I have always wanted to use such a tool and this one looks good from a cursory glance.
But what I have always wanted to know how people actually use these properly. There are two problems that I see in replicating production traffic to staging/dev
1. Changes in application structure url/parameter. I think given the change we make per release, we will get a lot of error. How to gracefully handle that?
2. Our application is write heavy so there is a lot of new content. So the majority of the requests will access content s that don't exists in the staging/dev environment. We can't even use live replication of DB either since we usually have a lot of DB changes also.
One solution I can think of is record today's traffic. Take a DB snapshot at the end of the day, replicate DB, run migration and then replay. Still has to deal with app change. Am I missing something or this is really challenging. I can see how this will perfectly work though for infrastructure change like server software or configuration.
On a side note of the blue/green deployment. Theoretically the reversible DB migration/parallel deployments of old-new version sounds good. But how many people can make this possible? Does most app has little DB changes from release to release. I will hazard a guess that if we try to implement such things the cost of such complexity will easily overshadow the actual changes to the app and most likely add severe risk of bug. But people seem to be doing it, again not sure what I'm missing.
We use it to replicate production traffic to our staging and testing environments at https://reverb.com. It's an incredibly useful and functional tool. We've used it to shake out exceptions from our blue/green deploys as well. For instance when we migrated from Rails 3 -> 4, Gor was paramount to shaking out some pretty nasty bugs by replicating prod traffic from the Rails 3 cluster to the new Rails 4 cluster.
This seems like a super useful tool, especially if you want to do blue/green deploys where staging/production end up on the same physical machines and you want to run with artificial load as part of capacity planning/testing.
This is super cool. You know what I'd really really like even more? Store this info somewhere, and let me replay it at a later time or date. Maybe I want to replay a whole day's worth of data at an accelerated rate to see if my servers can handle it. It also opens up the ability to do data forensics to try and reproduce a bug, perhaps.
As a side note, however, I would recommend anybody to not blindly apply the "Tuning" section (at the bottom of the readme) to their system. More specifically, net.ipv4.tcp_tw_recycle and net.ipv4.tcp_tw_reuse are notorious for causing problems if mis-used.
net.ipv4.tcp_tw_recycle causes problems with NAT-ed clients.
tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4)
Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended since this causes problems when working with NAT (Network Address Translation).
net.ipv4.tcp_tw_reuse seems fine to use, but literature about its real effects is sparse.
At the higher end of the performance scale there are commercial options using specialized hardware and commercial software. Building block cost is ~$300k/20Gbps (~400K rps) of replay traffic, but scales out with parallel deployments.
https://en.wikipedia.org/wiki/Gor
https://www.google.com/search?q=gor&biw=1920&bih=947&source=...