Hacker News new | past | comments | ask | show | jobs | submit login

While Spark is not intended for ETL per se, when I need to copy data from s3 to HDFS, I just use sc.textFile and sc.saveAsTextFile, in most of my use cases it does it pretty fast.

But Spark is mostly a computation engine replacing MapReduce (plus a standalone cluster management option). not an ETL tool.

I would look into other tools, such as https://projects.apache.org/projects/sqoop.html but I'm sure you know it already.




Sqoop does the extraction but not the transformation part of ETL and is only used for bulk moves not iterative.




The deadline for YC's W25 batch is 8pm PT tonight. Go for it!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: