I wonder if a kind of pony-express messaging app could be done with something like this. Like pick a friend in another state, write a message, carry it around in memory hoping that you'll be within a quarter mile of someone else running the app, and see how long it would take for the message to get delivered.
I love it when geo stuff shows up here. Its always fun to see what people are doing.
I see that it is in Memory, but are there any practical limits? How much memory is recommended for what size datasets?
And it doesn't look like the documentation has an examples as to how to load data in? Do I need to convert to Web Mercator, or can I give it WGS84 and have it convert? Also, any plans to allow for pure WGS84? I never actually use Web Mercator, so would rather a system that uses WGS84 natively.
> I see that it is in Memory, but are there any practical limits?
Yes. The in-memory database is basically a big btree + a big rtree. These trees contain many pointers to geo objects and depending on the complexity of the objects, it may take up a lot of memory. A 16GB machine can typically store 100 million+ points.
> How much memory is recommended for what size datasets?
Depends on the type of object and the length of the keys, but I would conservatively use 128-bytes per point as a general rule. So 16GB machine should support ~135 million points.
> And it doesn't look like the documentation has an examples as to how to load data in?
Thanks for the response. I see what I was actually looking for was on http://tile38.com/topics/object-types/ Right there on the front page, and not in the docs tab where I was looking.
For the GeoJSON, does it support properties? Meaning can I store a a whole document in there? Or would it be best to just store an ID and lookup the ID in a DB or something?
Yes, Tile38 has full GeoJSON support including properties for Feature objects. Though there is one exception, the CRS key is ignored because WGS84 is assumed.
> Meaning can I store a a whole document in there?
Yes, as long as the document conforms to http://geojson.org/geojson-spec.html. Though JSON keys that are not defined in the spec will be stripped out of the document prior to storing in memory. For example, a "properties" key of a type that is not a "Feature" will be stripped, and requesting that object at later time will result in a JSON document that does not include the "properties".
I then tried to make a module but redis modules do not support hooking into the pubsub tooling, so Tile38 live geofencing and webhooks were not possible.
Lol looks like this database needs its own database. Yeah seems like it should be a plugin or library not sure why it's standalone if it's just backed by redis.
It's not backed by Redis. It's fully standalone. I wish I could have made a redis module back when I started the project, but I'm plenty happy how Tile38 has matured.
Thanks for the suggestion, I'll need to investigate further. Right now, under the hood Tile38 uses an 3d rtree. Each node clusters 8-16 objects together. I'm always looking for a better way.
Looks pretty neat! Since OP is the author, can I ask: how many simultaneous concurrent (client) connections does this support, how many simultaneous moving objects and at what update frequency? Do these limits change if the shapes get complicated?
> how many simultaneous concurrent (client) connections does this support?
The client connections are very lightweight. The protocol library is based on https://github.com/tidwall/redcon. I don't have exact numbers but this link may be helpful (which compares Redcon to Redis): https://simongui.github.io/2016/10/24/benchmarking-go-redis-.... I've seen it handle 16K+ client connection without issue, but typically a client library that supports pooling should be used (like Redigo).
> how many simultaneous moving objects and at what update frequency?
Depends on the complexity of the objects, the number of nearby objects, and the server hardware. Each collection of objects consists of one btree and one rtree for key/id lookups and spatial queries respectively. So updating a single point is like O(log(n)+log(m)). An in-memory database like Tile38 can achieve 100K+ per second. Utilizing network pipelining can help out too. FYI, I use a 4-core server with 8GB ram for testing.
In the case of a roaming geofence (http://tile38.com/topics/roaming-geofences), following each update there an additional query of the rtree O(log(n)) to retrieve nearby neighbors. If there's a lot of nearby objects, this can result in a lot of data sent over the network. But in general, a collection of millions of points spread over a continent with 100K+ updates per second should work, while keeping with realtimelyness.
> Do these limits change if the shapes get complicated?
Yes. Complicated objects such are MultiPolygons may require additional calculations like raycasting.
Neat. Seems like a good weekend is coming up. I was wondering how was the gifs/images done? I would really like to see the code if it was done programmatically.
I programmatically simulated the movements as paths at 30fps into a series of data points, then imported into Animate CC and tweaked the tweening by hand, then exported to a series of PNGs, then imported PNGs into Photoshop as a movie sequence, finally exported as gif.
The superior tweening capabilities of Animate CC and the excellent gif compression of Photoshop makes tuning the animation super easy.
Perhaps next time I'll programmatically draw gif frames, for the fun of it.
Neat! Does someone already used it in production? I'm wondering if it could be useful to replace a medium sized postgres database. I'm trying to replace some clustering currently done with Elasticsearch with postgis, but it's really slow (6m points * 700 boxes, it's taking 20 minutes to scan the table).
General purpose spatial queries with Tile38 are very fast. Likely much quicker than PostGIS, but PostGIS will be able to store much more data. It's a tradeoff for sure.
If your dataset isn't terribly big and performance is what you're after then I recommend giving Tile38 a try.
Thank you for the details! After all, if I can manage to fit around 6 million points into a single r4.large and get a decent response time when clustering them, I can't see why I shouldn't try something with it as a secondary data store.
Depends on which method you use I assume. For geohashing, WITHIN should be best case O(1) and worst case O(n) depending on specificity you care about, where n is the number of geohash cells you need to generate to match the bounding box you care about (which is typically pretty small unless you want, like, a 1km box with 1cm granularity)
Frontloading computation into generating smart data structures give you better choices here.
Sorry, my previous answer was not very descriptive.
The WITHIN call will be on average O(log(N)) because all objects are stored in an rtree. But it's usually very very fast to locate objects based on their outermost bounding area. The complexity of the object will add to the duration. Simple points are super fast, like sub microsecond fast in many cases. A MultiPolygon can be much slower.
Tile38 uses a modified raycasting algorithm, not raytracing for it's polygon detections.