Hacker News new | past | comments | ask | show | jobs | submit login

This is missing an important component of the process. If you don't want to have to reinvent the wheel every time you are asked to do a certain type of analysis, you also have to set up some infrastructure to support your analytic pipeline. That involves understanding databases, writing scripts to automatically harvest data, possibly creating APIs for your data to support flexible analytic views, etc. The more time I spend in data science, the more time I find myself spending on these types of infrastructural tasks. It's great to work for a company that provides engineers that will do all of this for you, but those companies aren't super common.



BTW this is what we're hacking on at NStack.. we're building an analytics platform which gives you a high-level language which provides an abstraction over infrastructure. The aim is that data teams can productionize code without thinking about anything but business-logic and without requiring an engineering team. So you can write things like..

  nstack start "Schedule { interval : 'Daily' } | DataWarehouse { sql : "./request.sql" } | YourPythonClassifier | Postgres { insert_table : "Results"}"
..which then gets distributed on your cloud-provider. You can kind of think about it like a type-safe, distributed cloud bash!

I'm clocking off for bed, but would love to give you a demo if you're interested: leo@nstack.com.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: