Statistics for Software

qznc · on April 12, 2016

A student of mine just finished building a benchmarking tool for applications [0]. For example, it warns if your sample size is too small. Here is an example, where he compares GHC performance over the last years [1].

[0] https://github.com/parttimenerd/temci [1] https://uqudy.serpens.uberspace.de/blog/2016/02/08/ghc-perfo...

rubidium · on April 12, 2016

"When the software industry gets to a point where it leverages this analysis as much as the hardware industry, the technology world will undoubtedly have become a cleaner place."

Hear hear! Quality control is usually one of the _primary_ drivers in new hardware development. With software, I find it's often tacked on at the end.

rileymat2 · on April 12, 2016

I am not sure that is coming anytime soon. Generally, You have the ability to update software products more cheaply than hardware products. Also, people get quite a bit of utility out of buggy products, people will jump on a beta if they think it is useful.

Given that, half finished buggy software products have incentives to be released with less quality control. This of course will vary by use, no one is getting in a plane with software with low quality control.

I would guess that we can look at the attention paid to quality control as a function of updatability and risk.

kmkemp · on April 12, 2016

Exactly this. Reminds me of this quote:

"If you are not embarrassed by the first version of your product, you’ve launched too late."

- Reid Hoffman

pbkhrv · on April 12, 2016

Yes, the ability to quickly update software covers a multitude of sins.

zatkin · on April 12, 2016

Slightly off-topic, but it's surprising to me that they have this blog that is up-to-date with their latest brand, yet they've still let large portions of Paypal.com go without an update to line up with their latest brand.

duiker101 · on April 12, 2016

Well, this post blog will be seen by a relatively small number of people and also it's just that, a blog. Their main site is used by an exponentially large number of people, including our grandma's and possibly people with different accessibility requirements, making it trickier to update. This is probably not an excuse (I consider having a half updated site worse that a slightly bad full update) but it's to say that it's not that surprising

pandeiro · on April 12, 2016

OP writes really well. Found myself reading the README of his Python web framework (which I'll never use) just because of the clarity, style and pedagogical approach.

Hope there's more on the way.

mhashemi · on April 12, 2016

Whoa! With praise like this, how couldn't there be more? Thank you!

ascotan · on April 12, 2016

Oh God. I need to read this. Great post.

partycoder · on April 12, 2016

I started by dumping spreadsheets and forcing myself to use R. Also signed up to datacamp.com

I never liked spreadsheets or people that like spreadsheets.

gnahckire · on April 12, 2016

I don't really like the idea of throwing data away because it ultimately gives an incomplete view of the system. But, easy solution to solve a hard problem!

gnahckire · on April 12, 2016

Okay. Not really sure why I got downvoted so much. Why am I wrong?

mhashemi · on April 12, 2016

Sampling is fundamental to so much of practical statistics. It's more or less proven and accepted. In real studies, we "throw data away" by just not collecting it in the first place. As long as you do it right, you still get a reliable answer.

But if you've already got it all and it all fits in memory, by all means, hold on to it!