My business partner, and I have created a site; http://www.qwertykitchen.com . Its a recipe site. Our main form of search is entering ingredients that you have on hand and you get a list of recipes that contain those ingredients.
We are now looking at doing something more then just full text searching, and keywords. The first problem we want to attack is making words that are basically the same, show up in the search when the other is called.
For example, we have egg and eggs, or milk, 2% milk, whole milk, and skim milk. We want someone to only have to enter milk. If they do enter milk, then the entire cluster should be searched.
My idea of an approach to this problem is this. We search for longest common substring. Then create a directed graph representation. Then when a person searchs for a term, we use a proximity measure and then search for all the terms with in a certain proximity.
So, I was wondering one, does anyone know if their is a name for what I am talking about, and two, does anyone maybe have a better idea of how to do something like this.
Here's a textbook on the subject from some of the top minds in the field: http://www-csli.stanford.edu/~hinrich/information-retrieval-...
I am not an IR expert, but one particular algorithm that you might find useful is latent semantic analysis, a.k.a. latent semantic indexing: http://en.wikipedia.org/wiki/Latent_semantic_analysis