Example using Google App Engine's new Remote API

DenisM · on Feb 14, 2009

This looks of limited utility. Why would I want to process data away from where it is stored?

DocSavage · on Feb 14, 2009

Right now, you can't run long-lived processes on App Engine, so schema changes, backups, and a whole raft of data processing routines must be handled through the HTTP interface. The remote API abstracts that away so you can write scripts as if you were operating locally without those limitations.

Also, the remote API could potentially be used with any other App Engine API, not just the datastore API.

There have been a number of App Engine enhancements over the last few months: a hook system, sorting on keys, this remote API, and most importantly, the relaxation of CPU limits. When taken together, these enhancements will allow some pretty cool management apps purely in user space.

DenisM · on Feb 14, 2009

>> The remote API abstracts that away so you can write scripts as if you were operating locally without those limitations.

But you still have to write the code assuming it can be interrupted by network going down etc. Once you do that, you might as well run it next to the data, no?

I can see the CPU-intensive argument, but this is less important now given relaxation of CPU rules.

Also, consider this paper by Jim Gray: http://dslab.epfl.ch/courses/pods/fall06/readings/gray-econo... where he concludes from economics point of view: Put the computation near the data. The recurrent theme of this analysis is that "On Demand" computing is only economical for very CPU-intensive (100,000 instructions per byte or a CPU-day per gigabyte of network traffic) applications.

DocSavage · on Feb 15, 2009

>> this is less important now given relaxation of CPU rules

If you run on the server-side, you are still severely limited by request time limits, so you'll have to move the outer loops off App Engine. I already wrote a REST api to my app, and it'll still be used for AJAX clients and routines and authorization-specific scenarios. But for schema migrations, backup/restore systems, and quick and dirty prototype apps, the remote API simplifies the programming.

> Also consider this paper by Jim Gray

I don't dispute the bottom line of putting computation near the data... if your app is computation-driven or each computation requires mondo data. Ease of programming against the datastore shouldn't be casually dismissed in favor of architectural ideals, particularly in the context of free app quotas and programmers who want to quickly hack up a web app or prototypes.

shadytrees · on Feb 14, 2009

I could think of a few situations where this is useful. For example, it would be ridiculously easy to write a "publish to my GAE weblog" command for any text editor that can talk to Python.

DenisM · on Feb 14, 2009

But would that be the right architecture? I think it would be better to POST some more abstract XML data to GAE-side code, which is then closely tied to the data structures. That way I can test and evlve two parts of the big app independently.

jaxn · on Feb 14, 2009

I will try to use this for some data warehouse-ish stuff. Pull down a large dataset, run some computations, put some new data back.

Also, if you want to use sockets in your GAE app then you could have the socket code live somewhere else (since you can't use sockets on GAE).

Third, backups. Bulkload is great for pushing data to Google, but now we can pull it down for archival.