Word Similarity/Distance
  • measuring the similarity/differences between 2 words based on their definition/sense
  • useful in Information Retrieval (IR), Question/Answering, Machine Translation, etc

Word Similarity/Distance - Measures

Measure

Description

Path Similarity

-𝑙𝑜𝑔(𝑝𝑎𝑡ℎ𝑙𝑒𝑛(𝑐1,𝑐2)) # 𝑝𝑎𝑡ℎ𝑙𝑒𝑛(𝑐1,𝑐2) is the number of edges the shortest path in thesaurus graph between synsets 𝑐1 and 𝑐2

Resnik Similarity

-𝑙𝑜𝑔𝐏(𝐿𝐶𝑆(𝑐1,𝑐2))

Lin Similarity

[2·𝑙𝑜𝑔𝐏(𝐿𝐶𝑆(𝑐1,𝑐2))] / [𝑙𝑜𝑔𝐏(𝑐1) + 𝑙𝑜𝑔𝐏(𝑐2)]

Jiang-Conrath Similarity

1 / [2·𝑙𝑜𝑔𝐏(𝐿𝐶𝑆(𝑐1,𝑐2)) - (𝑙𝑜𝑔𝐏(𝑐1) + 𝑙𝑜𝑔𝐏(𝑐2))]

Lesk Similarity

𝛴𝑟,𝑞∊𝑅𝐸𝐿𝑆 [𝑜𝑣𝑒𝑟𝑙𝑎𝑝(𝑔𝑙𝑜𝑠𝑠(𝑟(𝑐1)), 𝑔𝑙𝑜𝑠𝑠(𝑞(𝑐2)))]

𝐿𝐶𝑆
  • lowest common subsumer of 2 concepts
  • 𝐿𝐶𝑆(𝑐1,𝑐2) = lowest node in the hierarchy that subsumes both 𝑐1 and 𝑐2
Example Thesaurus Tree