Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Matts (1087)

Matts
  (email not shown publicly)

I work for MessageLabs [messagelabs.com] in Toronto, ON, Canada. I write spam filters, MTA software, high performance network software, string matching algorithms, and other cool stuff mostly in Perl and C.

Journal of Matts (1087)

Wednesday May 12, 2004
02:45 PM

Load

[ #18718 ]

Some day I need to write up a "here's something cool we did with AxKit/Perl" about MessageLabs' spam quarantine system (called "Spam Manager"). But suffice it to say for now that we did it with AxKit and Perl, and it is very cool.

However we're experiencing some very "interesting" problems with it to do with load. We're seeing the load go up on some servers while the CPU usage sits at no more than 5%.

Of course load average isn't tied to CPU usage. But most people see the load go up to more than 1.0 when their CPU is fully utilised. Load is a measure of runnable processes - although that's terribly poorly explained practically everywhere (I did find a good explanation of it but lost the link - so if you don't know what load average really means I can't help you :-).

So this is something to do with the kernel not being able to context switch in processes fast enough. Usually we've managed to tie this down to bad duplex settings on the network interface (half duplex instead of full duplex). However recently we've seen the problem again with the network interface being just fine.

Debugging this is practically impossible - it's not repeatable or isolate-able. I welcome any tips from anyone here who has experience with this. My next port of call is to look at the SQL Server that the box is connected to, and see if that has any relevance, but I don't hold much hope to find anything out. I'm kinda stuck.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Maybe it's worth installing OProfile [sourceforge.net] to try and find out what's happening?

    Never used it myself, but it looks like the right kind of tool...

    -Dom

  • But the only good definition is the technical definition. When the scheduler says, "I have a timeslice to hand out", how many processes are lined up, ready to take that slice (on average)?

    It means nothing more, and nothing less.

    What matters after understanding that is that there is no simple intuitive understanding of what that means. If you have a single CPU-bound process, it always wants a timeslice, and will contribute 1 to your load. That process may have priority 20 and lose to everything else, so