They are excellent questions and that indeed is exactly the goal of this course. I hope you try it out and let us know if you find the answers you need.
For this particular question, a model that does localization and then integrated classification is called an "attentional model". It's an area of much active research. If your images aren't too big, or the thing you're looking for isn't too small in the image, you probably won't need to worry about it.
And if you do need to worry about it, then it can be done very easily - lesson 7 shows two methods to do localization, and you can just do a 2nd pass on the cropped images manually. For a great step by step explanation, see the winners of the Kaggle Right Whale competition: http://blog.kaggle.com/2016/01/29/noaa-right-whale-recogniti...
(There are more sophisticated integrated techniques, such as the one used by Google for viewing street view house numbers. But you should only consider that if you've tried the simple approaches and found they don't work.)
For this particular question, a model that does localization and then integrated classification is called an "attentional model". It's an area of much active research. If your images aren't too big, or the thing you're looking for isn't too small in the image, you probably won't need to worry about it.
And if you do need to worry about it, then it can be done very easily - lesson 7 shows two methods to do localization, and you can just do a 2nd pass on the cropped images manually. For a great step by step explanation, see the winners of the Kaggle Right Whale competition: http://blog.kaggle.com/2016/01/29/noaa-right-whale-recogniti...
(There are more sophisticated integrated techniques, such as the one used by Google for viewing street view house numbers. But you should only consider that if you've tried the simple approaches and found they don't work.)