We use Spark essentially as a distributed programming framework for data processing - anything you can do on a small dataset on a single server, you can do the same thing on a huge dataset and 20 servers or 2000 servers with minimal extra development