Turney Algorithm
- learns phrase polarity rather than just word polarity
- learns domain-specific information
- processes:
- extract a phrasal lexicon from reviews
- learn polarity of each phrase
- rate a review by the average polarity of its phrases
- intuition:
- positive phrases co-occur more with “excellent”
- negative phrases co-occur more with “poor”
pointwise mutual information
- how much more do 2 events co-occur than if they were independent
- 𝐼(𝑋=𝑥,𝑌=𝑦) = 𝑙𝑜𝑔 [ 𝐏(𝑋=𝑥,𝑌=𝑦) / [𝐏(𝑋=𝑥)𝐏(𝑌=𝑦)] ]
pointwise mutual information between 2 words:
- how much more do 2 words co-occur than if they were independent?
- 𝐼(𝑤𝑜𝑟𝑑1,𝑤𝑜𝑟𝑑2) = 𝑙𝑜𝑔 [ 𝐏(𝑤𝑜𝑟𝑑1,𝑤𝑜𝑟𝑑2) / [𝐏(𝑤𝑜𝑟𝑑1)𝐏(𝑤𝑜𝑟𝑑2)] ]
how to estimate Pointwise Mutual Information
- 𝐏(𝑤𝑜𝑟𝑑) = ℎ𝑖𝑡𝑠(𝑤𝑜𝑟𝑑) / 𝑁
- 𝐏(𝑤𝑜𝑟𝑑1,𝑤𝑜𝑟𝑑2) = ℎ𝑖𝑡𝑠(𝑤𝑜𝑟𝑑1 𝑁𝐸𝐴𝑅 𝑤𝑜𝑟𝑑2) / 𝑁²
- 𝐼(𝑤𝑜𝑟𝑑1,𝑤𝑜𝑟𝑑2) = 𝑙𝑜𝑔 [ ℎ𝑖𝑡𝑠(𝑤𝑜𝑟𝑑1 𝑁𝐸𝐴𝑅 𝑤𝑜𝑟𝑑2) / [ℎ𝑖𝑡𝑠(𝑤𝑜𝑟𝑑1) ℎ𝑖𝑡𝑠(𝑤𝑜𝑟𝑑2)] ]
does phrase appear more with “poor” or “excellent”
- 𝑃𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑝ℎ𝑟𝑎𝑠𝑒) = 𝐼(𝑝ℎ𝑟𝑎𝑠𝑒,“excellent”) - 𝐼(𝑝ℎ𝑟𝑎𝑠𝑒,“poor”)
- 𝑃𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑝ℎ𝑟𝑎𝑠𝑒) = 𝑙𝑜𝑔 [ ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒 𝑁𝐸𝐴𝑅 “excellent”) / [ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒) ℎ𝑖𝑡𝑠(“excellent”)] ] - 𝑙𝑜𝑔 [ ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒 𝑁𝐸𝐴𝑅 “poor”) / [ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒) ℎ𝑖𝑡𝑠(“poor”)] ]
- 𝑃𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑝ℎ𝑟𝑎𝑠𝑒) = 𝑙𝑜𝑔 [ [ ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒 𝑁𝐸𝐴𝑅 “excellent”) ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒) ℎ𝑖𝑡𝑠(“poor”) ] / [ ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒) ℎ𝑖𝑡𝑠(“excellent”) ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒 𝑁𝐸𝐴𝑅 “poor”) ] ]
- 𝑃𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑝ℎ𝑟𝑎𝑠𝑒) = 𝑙𝑜𝑔 [ [ ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒 𝑁𝐸𝐴𝑅 “excellent”) ℎ𝑖𝑡𝑠(“poor”) ] / [ ℎ𝑖𝑡𝑠(𝑝ℎ𝑟𝑎𝑠𝑒 𝑁𝐸𝐴𝑅 “poor”) ℎ𝑖𝑡𝑠(“excellent”) ] ]
Example Phrases From a Thumbs-Up Review
|
Phrase |
PoS Tags |
Polarity |
|---|---|---|
|
online service |
JJ NN |
2.8 |
|
online experience |
JJ NN |
2.3 |
|
direct deposit |
JJ NN |
1.3 |
|
local branch |
JJ NN |
0.42 |
|
… |
… |
… |
|
low fees |
JJ NNS |
0.33 |
|
true service |
JJ NN |
-0.73 |
|
other bank |
JJ NN |
-0.85 |
|
inconveniently located |
JJ NN |
-1.5 |
|
AVERAGE |
0.32 |