Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: TG – Fast geometry library for C (github.com/tidwall)
201 points by tidwall on Sept 22, 2023 | hide | past | favorite | 32 comments



Howdy, I work on S2 [1] so I have questions! How do you deal with polygons that cross the antimeridian?

The indexing structure you've come up with seems very interesting. In spherical coordinates line sweep algorithms like that are a little less intuitive because there's not really a min and max y value to work with. Does your index support multiple polygons indexed together?

The lack of exact predicates worries me a little bit. It's tricky because it will work right until it doesn't for mysterious reasons, and it's very hard to test for if you haven't built it on a foundation of exact predicates. You'll periodically fall into the fractal foam around edges when testing if you cross them or not ([2] has some good pictures). We do this in S2 by relying on a set of predicates [3] that fall all the way back to symbolic perturbation if they have to. We simply don't have to think about colinear points in S2 for that reason.

[1] https://s2geometry.io/ [2] https://github.com/mourner/robust-predicates [3] https://github.com/google/s2geometry/blob/master/src/s2/s2pr...


TG isn't spherical and wasn't designed to be. It's 2D and projection agnostic. Crossing the antimeridian is a user problem to solve. I recommend following the GeoJSON rule of antimeridian cutting [1].

As I said in the README. My goals are fast point-in-polygon, geometry intersections, and low memory footprint. Those are my bread-and-butter.

From what I know about S2, it has different goals. Such as being spherical and using discrete cells for distributed spatial indexes. Those are great things of course, but not really what I need.

This is the second comment that recommends using Shewchuk’s methods. And while yes I agree that that may be a superior way to go, and always on the table for the future, I'm still able to achieve my goals without those methods. At least for now.

[1] https://datatracker.ietf.org/doc/html/rfc7946#section-3.1.9


Yeah we're spherical mostly because working in spherical coordinates let's you not have to worry about the projection. S2Cells are really just nodes in a quadtree, the difference being our quad tree is implicit, you can (and we do) store them in a linear array in memory.


Nice library. +1 for the excellent visualizations [1]. Will you keep it focused on intersections or might you dabble with triangulation?

[1] https://github.com/tidwall/tg/blob/main/docs/POLYGON_INDEXIN...


Rather than creating new geometric primitives on the heap, wouldn't it be more flexible if the caller can provide the memory to initialize the structure in?

Then instead of

  struct tg_geom *geom = tg_geom_new_point(-112, 33);
  if (!geom) {
      // System is out of memory.
  }
you could do

  struct tg_geom geom;
  tg_geom_init_point(&geom, -112, 33);
without need for error checking or cleanup. You can still allocate on the heap if you want but you wouldn't have to.


I have strict safety rules for this project, one of which requires that all geometries are thread-safe pointers. So in the example you show, it would not be safe to share the tg_geom pointer with other threads because it belongs on the stack instead of the heap.

Yet I do see the need for avoiding allocations when working with lots of simple geometries like points and rects, and for my use I made functions like tg_geom_intersects_xy and tg_geom_intersects_rect to perform relation operations directly on point and rectangle values.


I think that this would be incompatible with the fast cloning/reference counting feature.


Hey, this looks really great, but I couldn't really see anything in the docs about robustness (how you handle floating point inaccuracy, etc) - what approach did you use?


I can't stop precision loss in all cases, but I do my darnedest to avoid loss when it causes false positives, especially for stuff like intersect detection code. For example the collinear [1] function looks really big for a seemingly simple operation, but there are extra checks built in for precision loss and in the cases of compiler associate math issues (like a user borking a build with -ffast-math).

I'm sure it's not all perfect but I feel pretty good about it overall. It certainly helps that much of the logic derived from the Tile38 [2] project, which has 8 years of use in production. I ported many of the tests too, which make me warm and fuzzy every time they pass.

[1] https://github.com/tidwall/tg/blob/v0.1.0/tg.c#L389

[2] https://github.com/tidwall/tile38


You might consider Shewchuk’s methods for robust geometric computations: https://www.cs.cmu.edu/~quake/robust.html

There are more modern implementations available on GitHub, but I’ve found it to be fairly easy to integrate into C-family projects without much modification.


This is a really great README - clearly explained the benefits of the library and anticipated all of my questions about it.


This looks really interesting. It implements pretty much the exact subset of geospatial stuff that I care about, in a single C file (an amalgamation that includes its dependencies).

This could make for a really neat SQLite extension, as a much lighter alternative to SpatiaLite.


... and here's Alex Garcia's first version of that SQLite extension: https://github.com/asg017/sqlite-tg


It looks like the coordinates are 64-bit doubles. How accurate and repeatable are the implementations of geometry operations here?


This is awesome! I wonder how feasible is it to include TG in Apache Sedona (https://github.com/apache/sedona)

Although Sedona runs as a distributed system, but TG may speed local in-memory geometrical computation for each worker node. Let me know your thoughts!


Very nice! I wonder if I can compile this to wasm. I’ve been badly needing spatial predicates in a web environment that are faster and less clunky than JSTS.

Does it assume a flat Cartesian world or does it handle ellipsoids or even map projections? (Or does it avoid the complexity altogether by not doing any work that cares about distances?)


Are there any other libraries in the browser space that implement polygon Boolean operations other than JSTS?

I've tried martinez-polygon-clipping, polygon-clipping, and polyclip-ts, all of which are based on the same algorithm, and all of which seem to throw exceptions frequently on geometry with overlapping edges.

I'm thinking I'll need to dig into JSTS in the hope it'll prove more stable... or learn how to write these kinds of algorithms myself.


I'm working on bringing GeoRust algorithms [0] (which include boolean operations) to the web via WebAssembly. See this blog post [1] for an intro. There's also separate work on binding GEOS to Wasm [2], but I'm more excited about GeoArrow in the long term because you can interpret Wasm geometries in JS without any copies across the boundary [3].

[0]: https://docs.rs/geo/latest/geo/#algorithms

[1]: https://observablehq.com/@kylebarron/prototyping-georust-geo...

[2]: https://github.com/chrispahm/geos-wasm

[3]: https://observablehq.com/@kylebarron/zero-copy-apache-arrow-...


Oh I've read [3] before, that's very exciting! I'll be following your RSS feed :)


I would assume it's flat

A) if you're handling projections or doing geodesic geometry then there's a bunch of other stuff to do and you'd probably have to mention it.

B) it would be obvious in the codebase if that was a part of it there'd be mentions of projections or some sort of constants for spherical maths.

C) Projections etc are often very application specific afaik it's not really possible to generalise it safely.


Ellipsoids don't have a well-defined answer to what is "inside" or "outside" the asylum.


Yeah a WASM build of this would be fantastic. It looks like it's a very clean C codebase so I expect getting it working in WASM would be relatively straightforward.


Yes, I used wasm as one of the targets during development.


Awesome, you should add that to the readme!

I blogged about it here: https://simonwillison.net/2023/Sep/23/tg-polygon-indexing/


I am curious, what exactly "feq" is calculating?

> return !((x < y) | (x > y));

Is it supposed to avoid x == y. If yes, why?


Also, "length" may have better be implemented using: https://en.cppreference.com/w/c/numeric/math/hypot

It will be helpful, if you add comments discussing why certain functions are implemented in that way, and what algorithms (references) is used. For example, there are some magic numbers without explanation: if (cap < 1000) {

Have you looked at: https://github.com/davideberly/GeometricTools Each algorithm is well documented and explained. There is even a short pdf papers for some of the algorithms explainimg why some decisions are taken and what numerical issues may be expected.


It implements x==y, but it's also true when one of them is a NaN


I understand. You want to work with NaN-s for some reason.


Is the "Natural structure" described in some paper?


It’s a new structure I came up with as described in the docs/POLYGON_INDEXING.md document. https://github.com/tidwall/tg/blob/main/docs/POLYGON_INDEXIN...


I think this might handle NaN differently than == does, I believe the comparisons would always be false with NaN. I’m not sure if that is the intent though.


The author should consider some assembly versions of parts of it (ofc, without abusing the assembler pre-processor).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: