Rule-Based PoS Tagging
- a type of Part-of-Speech (PoS) Tagging utilizing rules
- rules cannot be auto-learned by feeding it training data (requires manual creation)
Process
given:
- dictionary of words with their corresponding PoS Tags (a single word may have multiple tags)
- set of rules that selectively remove PoS Tags
- input sentence to be PoS Tagged
steps:
- according to the dictionary, assign each word of the input sentence all possible PoS Tags
- according to the set of rules, remove tags from word of the sentence, until each word has exactly one PoS Tag
Example
given input sentence:
She promised to back the bill
according to the dictionary, we assign each word their PoS Tags:
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
PRP |
VBN |
TO |
NN |
DT |
VB |
according to the set of rules, we remove tags
say we come across a rule like below:
Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP”
we then remove VBN from under the promised word like so:
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
PRP |
VBD |
TO |
NN |
DT |
VB |
we continue until each word has exactly one PoS Tag