This is image based, not text based. It’s very useful for a number of applications!
I think your usecase is extremely promising assuming it results in better quality output than just running a modern object detector. Another usecase I don’t have bandwidth for, but would likely be very marketable, is similar to what you’re saying but to allow the use of traditional algos like sift or surf across modalities.
I think your usecase is extremely promising assuming it results in better quality output than just running a modern object detector. Another usecase I don’t have bandwidth for, but would likely be very marketable, is similar to what you’re saying but to allow the use of traditional algos like sift or surf across modalities.