Hacker News new | past | comments | ask | show | jobs | submit login

While Spark is not intended for ETL per se, when I need to copy data from s3 to HDFS, I just use sc.textFile and sc.saveAsTextFile, in most of my use cases it does it pretty fast.

But Spark is mostly a computation engine replacing MapReduce (plus a standalone cluster management option). not an ETL tool.

I would look into other tools, such as https://projects.apache.org/projects/sqoop.html but I'm sure you know it already.




Sqoop does the extraction but not the transformation part of ETL and is only used for bulk moves not iterative.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: