Hacker News new | past | comments | ask | show | jobs | submit login
Wizard for Mac - a tool for statistics and data analysis (evanmiller.org)
92 points by harper on Sept 26, 2012 | hide | past | favorite | 34 comments



Sorry for the poor audio quality in the video. Some random thoughts that may be of interest:

1. I've been in grad school for economics the last four years. The program basically embodies all the things I've learned about econometrics, i.e. applied statistics. Wizard does about 1% of the things that R does, but it does them very, very well.

2. Wizard is fast. All the tight loops are written in C. Wizard will use all the cores on your Mac as well as the CPU's vector capabilities. This was achieved with a combination of Grand Central Dispatch and OpenCL. I highly recommend both technologies. For my research I am using Wizard on data sets with ~3 million rows, and it works like a champ.

3. The main weaknesses of the program is that it does not deal with time-series data particularly well, and you can't manually enter data. These shortcomings will be addressed in future releases.

4. The program is closed-source. I might make some of the libraries open-source. Or I might not. I haven't decided.

5. The non-Pro version will be released sometime next week.

6. I have absolutely no regrets about not writing Wizard using JavaScript and HTML5.

7. [Edit] You can read a review of the pre-release version of the program here: http://www.macstats.org/reviews/wizard.html


This is very cool. The tutorial (the Wizard Wizard, if you will) was very well done and I feel like the program makes a lot of basic stats analysis very easy and visual. Nice work.

In your opinion how relevant would this be to survey analysis for psych and social science research? Does it cover most of the analysis required or are there shortcomings that, say, SPSS would better fill? I'm not a stats guy, just a programmer, but it would be great to work it into our recommendations for tools for researchers on Mac (shameless plug - http://www.socialsci.com).


Coming from a social science perspective, this tool seems to cover a lot of bases. I haven’t used it for actual research (obviously), so it’s easy for me to overlook shortcomings, but a lot of the important stuff seems to be there. What I didn’t really see (but maybe I missed it) were good tools to edit and clean up data (which often is a lot of tedious work and at least as important as actually calculating the statistical tests) as well as factor and cluster analysis (especially factor analysis is something that is used quite frequently when looking at survey data) – but I think you can do most of the stuff social scientists do with this tool.

I do not think it is a replacement for other tools (SPSS, R) but I do think it’s excellent at what it does. I’m quite impressed, actually. Firing up SPSS is just no fun (also crazy expensive), so I will definitely look into getting this. (R is fun, but for some reason I’m very slow with it.) It’s definitely very accessible.


I took a Stats course in university. Basically, this one: http://www.stat.sfu.ca/content/dam/sfu/stat/courses/outlines...

Can you recommend a way for myself and others in the same position to brush on their stats?


I have always wondered whether there are grad students or scientists frustrated enough by the statistical tools they have and knowledgeable enough about programming to make something like this. (Only the first part applies to myself.) Now I know.

Great job! This looks really great. I would love to see factor analysis added. That’s one bread and butter thing I often need which I couldn’t find. Maybe also post-hoc tests if ANOVA is significant, but maybe that’s outside of the scope of this app. (Oh, and another thing I’m missing in the summary view is the number of cases. Maybe I’m just blind, but when I, for example show a t-test there doesn’t seem to be a way to display how many cases there are included in each category.)


Love it.

Questions:

- how many rows of data can it handle (limited to excel 65k?)

- can I manually (and permanently) correct any detected data anomalies?

- can I manually (and permanently) add columns to the data?

- can I export numerical values of graphs? (formats?)

- can I export the graphs? (formats?)

- can I export reports (data sub set, models, e.g. graphics for strong republicans in all regions)

- can I create values based on functions of data elements (all fields containing '@gmail')

Thanks.

PS agree with the > $50 issue, I'd buy something for $50 just like that, but if it's over, I start thinking about it.


>>> For my research I am using Wizard on data sets with ~3 million rows, and it works like a champ.


Hmmmm, suddenly the $79 sounds a lot more reasonable... (providing data is easily exported).

I have no need fo SPSS etc (beyond my pay grade and comprehension) but I've been frustrated with the limitations of Excel (love pivot tables) and I have no $$$ for http://www.tableausoftware.com/

Note: my main interest is actually CLEANSING DATABASES (finding outliers/anomalies, and edit the data set, hence my questions re: changing data)

Evan has created another lovely app by the way (http://magicmaps.evanmiller.org/) which makes me more confident that it will all work properly.


For point 2. Do you compile to C or is it running as an interpreter? I too have written something similar, but I compile to C++ and yes the speed is amazing, and memory consumption minimal, because, you only keep 1 record in memory (if using more cores then this will not be true).

If you are interested in a solution for entering data, let me know. My solution is open source.


The program uses the Objective-C run-time, but the core routines are written in C.


Very impressive video. The trial download is broken ATM.


For the love of God... please put your video on Dropbox or S3. I don't think your site can handle it. At least I'm having issues with it.


There are nicer ways to make recommendations/requests than "for the love of god...do X."


You are correct, I just got a bit excited about the product and really wanted to watch the video. I'm sorry if I came off a bit too aggressive.


Sorry about that. The HN traffic was unexpected. I will upload to Dropbox ASAP.


For future reference, uploading a video to Dropbox when you are on the HN front page is a bad idea. My account was suspended almost immediately. Switching to YouTube...


Or upload it to youtube and split it up, so it's scrollable (wasn't able to scroll in the 30+ minute video now).


502 Bad Gateway


I just watched the video, nice work! I regularly deal with files that have 10,000,000+ records. How well does Wizard scale with large data sets?


Thanks. I've been testing on 3-4M records on a quad-core laptop, and it works great. I would guess that 10M records will be doable on an 8-12 core Mac Pro, but I don't have access to one to test this particular hypothesis.


Seeing pricing >$50 for software products always makes me happy as a bootstrapper.

It shows you trust the value your customers will put in your product.


$80 for something that is "Best for casual (...) users" is not really the price I'd pay for a program I'd use from time to time to casually crunch numbers in.

And not like there's a free alternative out there that just blows Wizard out of the water, right?


Thanks. I have updated the copy to better reflect the distinction between the two versions.


There is? A link would make me happy.



Looks good. Thanks! I was aware of https://gephi.org since before, always nice to have more tools in the toolbox. :)


502 Bad Gateway. Links down.


Sorry. It's back up.


Same for me.


Me too


Just tried it out. Looks really good. Would be a perfect plug-in for Sequel Pro for example. What other products are there in the same space (easy-to-use statistics tools for Mac)?


Getting a 502 error when i visit


Why pie charts dude?


video!!! ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: