Lesk Algorithm
- used for Word Sense Disambiguation (WSD)
- use dictionary or thesaurus as indirect kind of supervision. choose the sense whose gloss shares the most words with target word neighborhood
Lesk Algorithm
# returns best sense of word
def simplified_lesk(word, sentence):
best_sense = most_frequent_sense_for(word)
max_overlap = 0
context = set_of_words_in(sentence)
for sense in senses_of(word):
signature = set_of_words_in_gloss_and_example_sentences_of(sense)
overlap = compute_overlap(signature, context)
if overlap > max_overlap:
max_overlap = overlap
best_sense = sense
return best_senseLesk Algorithm - Example
Click here to expand...
given sentence:
The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securitiesfind sense of
bankbelow is the gloss and examples sentences of the word
bank:
bank1
gloss
- a financial institution that accepts deposits and channels the money into lending activities
examples
he cashed a check at the bankthat bank holds the mortgage on my homebank2
gloss
- sloping land (especially the slope besides a body of water)
examples
they pulled the canoe up on the bankhe sat on the bank of the river and watched the currentsbased on the Lesk Algorithm, we have:
- bank1- 2 content words overlap
- bank2- 0 content words overlap
thus, we choose bank1 as the most probable sense used in the sentence
Lesk Algorithm - Improvements
- include related words (i.e. hyponyms)
- apply a weight to each overlapping word
- inverse document frequency (idf) value