my overall feeling as someone that wants to start getting into visual recognition is that there are a bunch of great libraries/ecosystems to choose from and all of them have pros and cons, but i honestly don't want to make the wrong decision and end up being stuck later on. Anyone here has any advise on what i should use to have a camera(rpi) recognize most common objects and then add a layer where we can teach specific objects, ie (putting a name on a person or a pet), thank you!
You should be set with openCV or JavaCV. JavaCV has more ported image/video progressing libraries than openCV and are a bit more customizable in my experience. And literally JavaCV has openCV natively but a different wrapper. I would only suggest openCV if you are working with python. It provides enough tools for basic then mid complex objects and if you want to mature in it then open up the hood. As for porting that pet porject to a mobile android device i would suggest to go straight with javaCV even though openCV has wrappers for android and java. The reason is because at some point you'll want to ditch their riggid methods of obtaining video/pictures. JavaCV has served me well when integrating with android. Also, if you eventually want to scale up the processing into a web service then it' easier.
Not sure if you're looking for CV or DL libraries. If you're looking for DL libraries, you can't go wrong with Tensorflow or PyTorch for research/development, and Tensorflow or Caffe2 for deployment. Tensorflow's a bit difficult to learn, but has a lot of great tooling around it, and PyTorch is the opposite. The other frameworks are fine, but don't have the same amount of documentation & beginner resources.
The first lesson is image classification ("is this a picture of a cat or a dog?"). Given that OP is commenting on an object detection library release, though, I assume they're interested in object recognition/detection/segmentation and rather than just image classification. So, more like: "what things are in this image and where are they?" or even just "where are the dogs in this image?"
That's also covered eventually in fast.ai, but not until the second course if memory serves.