Hacker News new | past | comments | ask | show | jobs | submit login

I guess R is a bit more DIY than these frameworks, but it has a very large collection of tools. I've found libraries for everything from CART (classification and regression trees), to SVM, to HMM learning, to clustering, to EM. R with libraries from CRAN is my go-to tool for statistical learning.



I am amazed R didn't make it to the list.


R is great at 100s of thousands but fails miserably at millions. This seems especially true for clustering and regression. With the explosion of data collection, tools have to be able to easily take in and clean millions of records quickly.

Does anybody have experience with Orange or RapidMiner?


Something like SAS or DAP would be better suited for large data sets, as R tends to load everything on the RAM.

From http://www.gnu.org/software/dap/ : "Because Dap processes files one line at a time, rather than reading entire files into memory, it can be, and has been, used on data sets that have very many lines and/or very many variables."


Regarding R's processing power, I haven't found it to be an issue. When building a model and testing, I use a sample of the data which is usually less than 100,000 observations. I use samples even when using a tool like SAS Enterprise Miner.

As far as scoring, I usually export using PMML and run it natively on the database. Makes for fast execution. PMML is available in R, RapidMiner, and other packages.


This is definitely a problem with R, although the biggest problem IMO is that a lot of libraries aren't multicore capable. Fixing the memory problem was just a matter of adding lots of ram into our workstation, we can't fix the "can't use more than one core at a time" problem as easily.


Yes you can: just run it in many single-core VMs. This what I was recommended by vendor making legacy single-threaded software. They were the best in their field, so they never really tried to port it to multicore ;(




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: