Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ziggy (25)

ziggy
  (email not shown publicly)
AOL IM: ziggyatpanix (Add Buddy, Send Message)

Journal of ziggy (25)

Wednesday August 10, 2005
09:10 AM

std-dev

[ #26221 ]

Yesterday, I was hacking on a script to extract a series of numeric values from a data set. I wanted to understand the data better than just looking at (min | max | average).

So, after I had a list of values, one per line, I followed my first instinct and loaded that file into Excel. (I should mention that I view using spreadsheets a symptom of a larger problem, not a part of a working solution.) After a while, I realized that I had a lot of data to analyze, a lot of tests to run, and this was the fast track to weeks of needless agony and make-work.

Excel did have one benefit -- it helped me understand standard deviation a little better. It's a little annoying though that Excel's =STDEV() function, the most obvious function to use for calculating standard deviation, is actually standard deviation of a sample, not standard deviation of an entire population. Once I refreshed my memory of the concepts involved, it took a while to figure out why the standard formula wasn't agreeing with Excel's results. Sure enough, the =STDEVP() function did match with better than random precision.

I took a quick look on CPAN, but didn't find anything that does standard deviation. I know it's there, but I didn't want to download a huge Math library to calculate a simple function. So I wrote a quick and dirty std-dev instead:

#!/bin/env perl -w

use strict;
use List::Util qw(sum min max);

chomp(my @values = <>);
my $n = @values;
my $avg = sum(@values)/$n;
my $std_dev = sqrt(sum(map {($_ - $avg) ** 2} @_) / $n);

print "total   = $n\n";
print "std_dev = $std_dev\n";
print "avg     = $avg\n";
print "min     = ", min(@values), "\n";
print "max     = ", max(@values), "\n";

The hard part was the single line of code to calculate standard definiton. That was translated verbatim from the definition on the wikipedia page.

This little script, along with nth , reduced a bunch of time consuming Excel drudgery into a nearly autonomic piece of analysis. ;-)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Statistics::Descriptive [cpan.org] calculates the standard deviation, among other things. It is the sample standard deviation, however (same as Excel and my pocket calculator). That's because most people prefer to use the sample formula, because the "entire population" is usually considerd to be "infinite" (also, if you have a reasonable enough sample, it doesn't really matter if you divide by N or N-1).