I built a system for TaskRabbit that scraped all the IKEA products from a variety of sources and ran algorithms to determine their category and predict how long they would take to be assembled. Then there was a Mechanical Turk sort of system for human input. When combined with real-world feedback from the Taskers, it was pretty good.
For better or worse, I've personally been through the entire catalog multiple times.
Maybe as Turtle or JSON-LD, but you need a format that encapsulates triples and various data types. Otherwise you're throwing away most of the utility of a semantic web knowledge base.
I built a system for TaskRabbit that scraped all the IKEA products from a variety of sources and ran algorithms to determine their category and predict how long they would take to be assembled. Then there was a Mechanical Turk sort of system for human input. When combined with real-world feedback from the Taskers, it was pretty good.
For better or worse, I've personally been through the entire catalog multiple times.