Hacker News new | past | comments | ask | show | jobs | submit login

Something like SAS or DAP would be better suited for large data sets, as R tends to load everything on the RAM.

From http://www.gnu.org/software/dap/ : "Because Dap processes files one line at a time, rather than reading entire files into memory, it can be, and has been, used on data sets that have very many lines and/or very many variables."




Regarding R's processing power, I haven't found it to be an issue. When building a model and testing, I use a sample of the data which is usually less than 100,000 observations. I use samples even when using a tool like SAS Enterprise Miner.

As far as scoring, I usually export using PMML and run it natively on the database. Makes for fast execution. PMML is available in R, RapidMiner, and other packages.


This is definitely a problem with R, although the biggest problem IMO is that a lot of libraries aren't multicore capable. Fixing the memory problem was just a matter of adding lots of ram into our workstation, we can't fix the "can't use more than one core at a time" problem as easily.


Yes you can: just run it in many single-core VMs. This what I was recommended by vendor making legacy single-threaded software. They were the best in their field, so they never really tried to port it to multicore ;(




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: