I've been working on a data frame implementation for Python. I think API-wise we can do a lot better than Pandas. Especially after having seen and almost daily used dplyr with R, when having to use something else, I miss that convenience of a clear and consistent API and the chaining of operations. I don't know yet if this project makes sense in terms of speed and corner case handling. I haven't done any real-world work with it yet, but at least it's been a good learning project.
One interesting usecase for a Pandas replacement is AWS lambda functions. If you have a skinnier package that can get 80% of the data-processing niceness whilst using up a smaller % the Lambda function's size limit this could come in very handy for many people.
Nice project! Not a quarantine project, but we've been building data frame abstractions in Python for genetics [1] [2]. We spent a lot of time studying the existing abstractions (pandas, R/dplyr, pyspark, etc.) Desinging a data frame in Python is an interesting and challenging problem. Our design is far from perfect, but I think we've found an interesting design point. Here's your example in Hail:
Hail's tables are functional. Operations like `filter` and `order_by` return new tables. That means it would be an error to use `vehicles.year` in the `order_by`, since the input and the sort expression refer to different tables. Unfortunately, this means you can't use `.` chaining.
A little more background on the project: Hail's raison d'etre is a 3-dimensional generalization data frames we use for genetic data called a MatrixTable [3]. Conceptually, it is matrix-of-dicts rather than lists-of-dicts.
Genetic data is massive, so all of this is lazy and works on out of core data. The Python front end constructs an IR representing the query, it's fed through a query optimizer (written in Scala) and executed by a backend. We're working on multiple backends, but our primary backend right now is Spark.
https://github.com/otsaloma/dataiter
https://github.com/otsaloma/dataiter/blob/master/dataiter/da...