Help me understand why you say this is incorrect. By definition, two sets with jaccard coefficient of 0.95 have a 95% probability of intersection for any given element. Therefore, we can restate my assertion as "given that set A (bag 1) contains element x, what is the probability that set B (bag 2) also contains element x?", which can simply be restated as "what is the probability of intersection between set A and set B", which can in turn be restated as "what is the jaccard coefficient between set A and set B.".

Geoffrey Hendrey

This is incorrect, "What is the probability that the element "M" is also in Bag 2? It's 95%."

Anonymous

2grams of "automatic": AU UT TO OM MA AT TI IC
So you can see that grams come from characters of individual words. Hope that helps.

Geoffrey Hendrey
Hi Geoffrey.

I wonder how the implementation of getNGrams(String part, int nGramLength) looks like since N-grams are the set elements and the parameter part is just a word here....

Cheers jp

Jean-Pierre