What domain of knowledge does my problem fall under?

jdale27 · on May 31, 2008

The domain of knowledge you're interested in is "information retrieval" (IR).

Here's a textbook on the subject from some of the top minds in the field: http://www-csli.stanford.edu/~hinrich/information-retrieval-...

I am not an IR expert, but one particular algorithm that you might find useful is latent semantic analysis, a.k.a. latent semantic indexing: http://en.wikipedia.org/wiki/Latent_semantic_analysis

Someone · on May 31, 2008

> We search for longest common substring.

That idea can be improved upon: "length of largest common substring" is not the measure to use. At the least, you want to use Levenshtein distance <http://en.wikipedia.org/wiki/Levenshtein_distance>, but even that can be tweaked to weigh common typing 'errors' lower (for example, colour is closer to color than to clour)

> Then when a person searchs for a term, we use a proximity measure and then search > for all the terms with in a certain proximity

This is a form of <http://en.wikipedia.org/wiki/Query_expansion>;

michael_dorfman · on May 31, 2008

What you are asking about is fuzzy searching, but the fact that you are asking at all indicates that you ought to read up on searching theory. There's a large body of work out there that you ought to familiarize yourself with.

voidfiles · on May 31, 2008

do you have specific suggestions on what to read?

michael_dorfman · on June 1, 2008

Recent issues of the ACM Transactions on Information Systems will bring you the latest research; alternately, if you want to start at the beginning, Knuth volume 3.

jauco · on June 1, 2008

Try reading up on latent semantic analysis. It also uses vector calculus, but works a bit different IIRC. It can be used to identify words that are used in the same context and therefore should be the same. I'm not sure if it is directly applicable for recipes because of the concise language, but you might find interesting related articles.

mooneater · on May 31, 2008

Eggs and egg is an example of stemming, see http://en.wikipedia.org/wiki/Stemming

The milk example as stated could be solved using fulltext indexing and searching.

I think your approach might work, but you could get results much more quickly using existing database fulltext search.

voidfiles · on May 31, 2008

I am not convinced that just fulltext search would work. Wouldn't fulltext search fail in the case of goat. You have goats milk, goats milk cheese, goat meat. So if someone searched goat all three types would be returned?

robertk · on June 1, 2008

Well, if someone claims to have a goat, they probably have all three of those, or can at least make them. ;-)

akd · on June 1, 2008

not if it's a boy goat