Yesterday, I was hacking on a script to extract a series of numeric values from a data set. I wanted to understand the data better than just looking at (min | max | average).
So, after I had a list of values, one per line, I followed my first instinct and loaded that file into Excel. (I should mention that I view using spreadsheets a symptom of a larger problem, not a part of a working solution.) After a while, I realized that I had a lot of data to analyze, a lot of tests to run, and this was the fast track to weeks of needless agony and make-work.
Excel did have one benefit -- it helped me understand standard deviation a little better. It's a little annoying though that Excel's =STDEV() function, the most obvious function to use for calculating standard deviation, is actually standard deviation of a sample, not standard deviation of an entire population. Once I refreshed my memory of the concepts involved, it took a while to figure out why the standard formula wasn't agreeing with Excel's results. Sure enough, the =STDEVP() function did match with better than random precision.
I took a quick look on CPAN, but didn't find anything that does standard deviation. I know it's there, but I didn't want to download a huge Math library to calculate a simple function. So I wrote a quick and dirty std-dev instead:
#!/bin/env perl -w
use strict;
use List::Util qw(sum min max);
chomp(my @values = <>);
my $n = @values;
my $avg = sum(@values)/$n;
my $std_dev = sqrt(sum(map {($_ - $avg) ** 2} @_) / $n);
print "total = $n\n";
print "std_dev = $std_dev\n";
print "avg = $avg\n";
print "min = ", min(@values), "\n";
print "max = ", max(@values), "\n";
The hard part was the single line of code to calculate standard definiton. That was translated verbatim from the definition on the wikipedia page.
This little script, along with nth , reduced a bunch of time consuming Excel drudgery into a nearly autonomic piece of analysis.
Statistics::Descriptive (Score:1)