Word Similarity/Distance
- measuring the similarity/differences between 2 words based on their definition/sense
- useful in Information Retrieval (IR), Question/Answering, Machine Translation, etc
Word Similarity/Distance - Measures
|
Measure |
Description |
|---|---|
|
Path Similarity |
-𝑙𝑜𝑔(𝑝𝑎𝑡ℎ𝑙𝑒𝑛(𝑐1,𝑐2)) # 𝑝𝑎𝑡ℎ𝑙𝑒𝑛(𝑐1,𝑐2) is the number of edges the shortest path in thesaurus graph between synsets 𝑐1 and 𝑐2 |
|
Resnik Similarity |
-𝑙𝑜𝑔𝐏(𝐿𝐶𝑆(𝑐1,𝑐2)) |
|
Lin Similarity |
[2·𝑙𝑜𝑔𝐏(𝐿𝐶𝑆(𝑐1,𝑐2))] / [𝑙𝑜𝑔𝐏(𝑐1) + 𝑙𝑜𝑔𝐏(𝑐2)] |
|
Jiang-Conrath Similarity |
1 / [2·𝑙𝑜𝑔𝐏(𝐿𝐶𝑆(𝑐1,𝑐2)) - (𝑙𝑜𝑔𝐏(𝑐1) + 𝑙𝑜𝑔𝐏(𝑐2))] |
|
Lesk Similarity |
𝛴𝑟,𝑞∊𝑅𝐸𝐿𝑆 [𝑜𝑣𝑒𝑟𝑙𝑎𝑝(𝑔𝑙𝑜𝑠𝑠(𝑟(𝑐1)), 𝑔𝑙𝑜𝑠𝑠(𝑞(𝑐2)))] |
𝐿𝐶𝑆
- lowest common subsumer of 2 concepts
- 𝐿𝐶𝑆(𝑐1,𝑐2) = lowest node in the hierarchy that subsumes both 𝑐1 and 𝑐2
Example Thesaurus Tree
---cognitive-computing---machine-intelligence/ai---subfields/natural-language-processing-(nlp)---computational-linguistics/nlp---disambiguating-ambiguity/word-similarity/distance/word-similarity-thesaurus-tree-example.png)