PointCNN – A simple and general framework for feature learning from point cloud

andreyk · on Jan 29, 2018

Link to Arxiv: https://arxiv.org/abs/1801.07791

Pretty exciting, the problem of learning with Point Cloud data is still very open and this beats the prior best work on it (PointNet++). Seems they treat point clouds as quasi graphs based on point closeness. Promising, but it requires nearest neighbor search which seems like it'd be expensive.

0xfaded · on Jan 30, 2018

I’ve been working to get orb_slam to run on the raspberry pi in real-time. One of the most interesting parts, to me, is the bag of words approach to feature matching. It’s shockingly reliable for such a simple method.

It’s used for initialization when no world model is available, a backup for when the motion model fails to track a new frame, for relocalization and loop closing.

Orb_slam wouldn’t work without it, and all it is is a kmeans tree for classifying nearest neighbors.

When I was starting out, I was looking for a SLAM system, not just visual odometry, that could realistically be sped up for the pi. The winning factor for orb_slam was that it used sparse simple feature points that could be cheaply classified.

After spending almost a year working on this, I have the system running at 10fps on the kitti benchmark, which are sequences of larger images also recorded at 10fps. VGA images can run up to 20fps depending on various trade offs.

My present opinion is that orb features are great for mapping, but not robust enough for reliable visual odometry.

It wouldn’t be so much of an issue to generate a local point cloud on the pi, but processing the cloud in real time is out of the question. If someone works out how to reliably and cheaply classify point cloud features it would be a game changer.

billconan · on Jan 30, 2018

why can this model handle permutation? whereas pointnet, the paper pointcnn based on, used a permutation invariant function?

jimfleming · on Jan 30, 2018

If I'm understanding correctly (I haven't read it in depth yet):

Points are processed as local clusters. Each cluster of points is ordered according to a transformation matrix X. This produces a "canonical" ordering so the processing of points in these local clusters do not need to be invariant to the order since the order now has consistency and meaning.

It's kind of like placing each point into a regular grid like an image before running the convolution. The trick is which points to put in which grid cell which is determined by the transformation matrix X. This takes advantage of locality which PointNet does not, if I recall correctly. By acting locally and stacking many layers of these you can produce a hierarchy of more and more abstract clusters of points, each with an inherent relationship to nearby clusters of points. In addition, the transformation matrix also appears to act as an attention over the points in the cluster.