Comment: Re: Profiling your website while live on ... (Score 1) on 2010.07.22 2:54
Attached to: Profiling your website while live on a production cluster
this is great work. I'm eager to try it on our systems. One of the most difficult things for me is to predict system behaviour when rolling out changes to the entire cluster instead of just, say, one box.
On one of our largest systems, when there's a potentially critical change, we roll it out on one box first, then 2/3, and if we don't sense any dramatic changes, we deploy to the full cluster. As I said, though, sometimes this is not enough.
The only strategies I can think of are:
1) randomly enabling the new feature/change for a sample of the users (either A/B testing or rand(x)>y)
2) setup an independent parallel staging cluster, and replicate a near-production load. Not easy, and requires lots of resources.
Do you have any war story about that?