Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

autarch (914)

autarch
  (email not shown publicly)
http://www.vegguide.org/

Journal of autarch (914)

Saturday October 11, 2008
11:01 AM

You don't need to scale

[ #37642 ]

Programmers like to talk about scaling and performance. They talk about how they made things faster, how some app somewhere is hosted on some large number of machines, how they can parallelize some task, and so on. They particularly like to talk about techniques used by monster sites like Yahoo, Twitter, Flickr, etc. Things like federation, sharding, and so on come up regularly, along with talk of MogileFS, memcached, and job queues.

This is lot like gun collectors talking about the relative penetration and stopping power of their guns. It's fun for them, and there's some dick-wagging involved, but it doesn't come into practice all that much.

Most programmers are working on projects where scaling and speed just aren't all that important. It's probably a webapp with a database backend, and they're never going to hit the point where any "standard' component becomes an insoluble bottleneck. As long as the app responds "fast enough", it's fine. You'll never need to handle thousands of request per minute.

The thing that developers usually like to finger as the scaling problem is the database, but fixing this is simple.

If the database is too slow, you throw some more hardware at it. Do some profiling and pick a combination of more CPU cores, more memory, and faster disks. Until you have to have more than 8 CPUs, 16GB RAM, and a RAID5 (6? 10?) array of 15,000 RPM disks, your only database scaling decision will be "what new system should I move my DBMS to". If you have enough money, you can just buy that thing up front.

Even before you get to the hardware limit, you can do intelligent things like profiling and caching the results of just a few queries and often get a massive win.

If your app is using too much CPU on one machine, you just throw some more app servers at it and use some sort of simple load balancing system. Only the most brain-short-sighted or clueless developers build apps that can't scale beyond a single app server (I'm looking at you, you know who).

All three of these strategies are well-known and quite simple, and thus are no fun, because they earn no bragging rights. However, most apps will never need more than this. A simple combination of hardware upgrades, simple horizontal app server scaling, and profiling and caching is enough.

This comes back to people fretting about the cost of using things like DateTime or Moose.

I'll be the first to admit that DateTime is the slowest date module on CPAN. It's also the most useful and correct. Unless you're making thousands of objects with it in a single request, please stop telling me it's slow. If you are making thousands of objects, patches are welcome!

But really, outside your delusions of application grandeur, does it really matter? Are you really going to be getting millions of requests per day? Or is it more like a few thousand?

There's a whole lot of sites and webapps that only need to support a couple hundred or thousand users. You're probably working on one of them ;)

Cross-posted from House Absolute(ly Pointless) - permalink .

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • ...I hear often when I am telling people that is not the most important thing.
    as if those two sentences are contradicting.

    IMHO These people usually come from some C/C++ background or have been working on code where the someone (maybe they themselves) wrote an algorithm that was O(2^n) instead of O(n^2) or even O(n).

    In the distant past I too had once a mistake writing an O(2^n) algorithm in LISP that blew up somewhere around n=3 or so. I had to go back and fix it. Since then I am thinking more about su

    --
    • The disease is called 'Premature optimization' and is the root of all evil according to Donald Knuth.

      Regarding indexes, you can also have too many. I once had a task to find out why a page in RT took 6 seconds to load.

      It turned out there were two almost identical indexes, one with two columns, the other with three, an extra column added at the end.

      Removing one of the indexes cut the load time of the page to approximately ½ a second. Now I don't know excatly why that was a problem for the DBS, but