The main person behind Microsoft's award winning ResNet, Kaiming He, left for Facebook AI research last year. This work looks like it builds upon the previous work done on Residual Networks (ResNet).
fun fact: if you inspect the properties of images hosted on facebook, you can see an html attribute for tags that Facebook has determined are in the photo (like "two people smiling," "pool", "outdoor party", "dog", etc).
As far as I know this isn't used for any user-facing features (yet), but it's there.
Most of the image datasets are exclusively photo-based. (Definitely no anime in ImageNet, MSCOCO, SVHN, CIFAR etc.) I've noticed that many computer vision tools seem to do worse on drawings or cartoons. The other day I was putting together a dataset of anime headshots to try to generate them with WGAN, and needed to crop images down to just the faces; the usual OpenCV tools failed utterly, and I had to use an anime-specific face detection library.
Cartoons are a special snowflake: Jeremy Howard said in the fast.ai course that you probably need to retrain multiple layers of convolutional network to handle them. I.e you have a different model for them than for photos.
Does anyone know what is the current state of 3D object recognition using Shapenet for example? I imagine there must be progress on this in the AR industry.