YOLO will be slow on a raspberry pi. Try instead SSD MobileNetv2 with quantization-aware training and run inference with Tensorflow Lite; you should get at least a few frames per second.
I recommend starting with a pre-trained model (COCO, for instance) and finetuning for your images. However, even with finetuning you probably need > 10k images to get good results.
I recommend starting with a pre-trained model (COCO, for instance) and finetuning for your images. However, even with finetuning you probably need > 10k images to get good results.