How to calculate standard deviation
When I started this blog last month, I thought “Standard Deviation” was a snappy title. Of course, I also knew about standard deviation as a statistical tool, however I didn’t expect that this overlap would cause Google search to drive 50+ visitors a month here looking for implementations of the standard deviation formula.
So as a “public service”, here is some code to figure standard deviation in Ruby and Java.
(disclaimer: no warranties as to correctness, particularly to the nth decimal place, don’t use this to run your home made nuclear reactor or air traffic control system, blah blah, etc, etc :-))
The algorithm I’ll be using is “borrowed” from wikipedia’s entry on Algorithms to calculate variance. Specifically, I’ll be using a variant of algorithm II, which is sourced from Knuth, except we’ll calculate the standard deviation for the population, rather than a sample.
As you should probably know, standard deviation is defined as the square root of the variance. If you didn’t know this, maybe you should go read about standard deviation first.
-
/**
-
* @param population an array, the population
-
* @return the variance
-
*/
-
public double variance(double[] population) {
-
long n = 0;
-
double mean = 0;
-
double s = 0.0;
-
-
for (double x : population) {
-
n++;
-
double delta = x - mean;
-
mean += delta / n;
-
s += delta * (x - mean);
-
}
-
// if you want to calculate std deviation
-
// of a sample change this to (s/(n-1))
-
return (s / n);
-
}
-
-
/**
-
* @param population an array, the population
-
* @return the standard deviation
-
*/
-
public double standard_deviation(double[] population) {
-
}
example usage:
-
double[] arr = { 1, 3, 24, 17, 12, 6, 14};
-
// prints 7.596992
-
def variance(population)
-
n = 0
-
mean = 0.0
-
s = 0.0
-
population.each { |x|
-
n = n + 1
-
delta = x - mean
-
mean = mean + (delta / n)
-
s = s + delta * (x - mean)
-
}
-
# if you want to calculate std deviation
-
# of a sample change this to "s / (n-1)"
-
return s / n
-
end
-
-
# calculate the standard deviation of a population
-
# accepts: an array, the population
-
# returns: the standard deviation
-
def standard_deviation(population)
-
Math.sqrt(variance(population))
-
end
example usage:
-
puts standard_deviation([1, 3, 24, 17, 12, 6, 14])
-
# prints 7.59699188589047
If you found this at all useful, (or have spotted a bug), please leave a comment to that effect…
January 28th, 2007 at 5:33 am
I have copied your Java code for use in my performance testing client. I hope this works for long ‘population’ also.
February 22nd, 2007 at 3:04 am
Thanks for the code. I could’ve figured it out but you saved me some time!
March 5th, 2007 at 3:03 am
Thanks … this was very helpful. I tried it in Ruby and it worked for my simple tests.
It looks like your code is a variant of Algorithm III rather than Algorithm II (at least in the current post http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance)
I also implemented the other Algorithm and glad to send it along if interested.
April 11th, 2007 at 7:41 am
Thanks this was exactly what I was looking for when I googled “calculate std”, and in Ruby aswell so I didn’t even have to rewrite it
June 1st, 2007 at 1:03 pm
my search was “ruby calculate standard deviation” and boom! this page popped up. Thanks a bunch for putting this up. You’ve saved me a bunch of work!
July 31st, 2007 at 4:20 pm
As one of those 50+ visitors per month, thankyou Warren.
November 22nd, 2007 at 6:15 pm
You’re the man! Thank you very much.
February 28th, 2008 at 12:45 am
i found it very useful, warren u are a life saver, thank u
March 12th, 2008 at 4:23 pm
Thank you , I found it useful to understand. I translated it into PL/SQL
June 14th, 2008 at 2:33 pm
Perfect google result.
Thanks
July 3rd, 2008 at 10:38 pm
Thanx, clean and clear!!
July 31st, 2008 at 10:47 am
Thanks for the code. I added:
values = []
$<.each do |l|
values << l.to_f
end
puts ["count", "mean", "stddev"].join(”\t”)
puts [values.size(), mean(values), standard_deviation(values)].join(”\t”)
So I can pipe in a list of values from the command line.
$ du -s * | cut -f1 | stddev
count mean stddev
9 2048.44444444444 5625.49061783575
August 31st, 2008 at 8:46 pm
Are you sure about this implementation?
It looks to me like the caclulation of the mean would be sensitive to the order of the data
try
{1,2,3} vs {3,2,1}
x m=0,s=0
1 d=1,m=1,s=1
2 d=1,m=1.5,s=1.5
3 d=1.5,m=2,s=3
sd=1
x m=0,s=0
3 d=3,m=3,s=0
2 d=-1,m=2.5,s=1
1 d=-2,m=1.5,s=2
sd=0.666
September 24th, 2008 at 6:30 pm
Cheers, mate! This saved me some time.
@Roger: code works fine here.