Transformation-Based PoS Tagging

a type of Part-of-Speech (PoS) Tagging
the pure rule-based tagging approach is too expensive, slow, tedious
basic idea is to do a poor job first, and then use learned rules to improve things
rules can be auto-learned by feeding it training data (unlike the rule-based tagging)

Process

given:

dictionary of words with their corresponding PoS Tag (unlike rule-based tagging, each word has exactly one tag, the mostly likely tag of the word)
set of rules that selectively transforms PoS Tags (unlike rule-based tagging, the rules transforms PoS tags instead of removing them)
input sentence to be PoS Tagged

steps:

according to the dictionary, assign each word of the input sentence the most likely PoS Tag
according to the set of rules, transform tags of sentence until no more transformations are made (after each transformation we return to the first rule again and continue “transforming” until a single pass through the entire set of rules have been made without any transforms)

Example

based on the probabilities below, the most probable PoS Tag for the word race is NN

𝐏(NN|race) = 0.98
𝐏(VB|race) = 0.02

thus in the dictionary we have:

race → NN

we do this for each word

now let’s consider the 2 input sentences:

is expected to race tomorrow
the race for outer space

we tag the sentences based on the dictionary:

is/VBZ expected/VBN to/TO race/NN tomorrow/NN
the/DT race/NN for/IN outer/JJ space/NN

according to the set of rules, we transform tags

say we come across a rule like below:

Change NN to VB when the previous tag is TO

for the first sentence we transform race/NN into race/VB
for the second sentence we do nothing for this rule

is/VBZ expected/VBN to/TO race/VB tomorrow/NN
the/DT race/NN for/IN outer/JJ space/NN

then return to the first rule and continue “transforming” until a single pass through the entire set of rules have been made without any transforms

Learning (How to Learn the Set of Rules) - Brill’s Transformation Based Learning (TLB)

steps:

tag the corpus with the most likely tag for each word (unigram model)
choose a transformation that deterministically replaces an existing tag with a new tag such that the resulting tagged training corpus has the lowest error rate out of all transformations
apply that transformation to the training set
iterate
return as your tagger one that:
1. first tags using unigrams and then
2. applies the learned transformations in order

Brill’s Transformation Based Learning (TLB) - Pseudocode

function TBL(corpus) returns transforms-queue {
	INITIALIZE-WITH-MOST-LIKELY-TAGS(corpus)
	until end condition is met do
		templates ← GENERATE-POTENTIAL-RELEVANT-TEMPLATES
		best-transform ← GET-BEST-TRANSFORM(corpus, templates)
		APPLY-TRANSFORM(best-transform, corpus)
		ENQUEUE(best-transform-rule, transforms-queue)
	end
	return transforms-queue
}

function GET-BEST-TRANDFORM(corpus, templates) returns transform {
	foreach template in templates
		(instance, score) ← GET-BEST-INSTANCE(corpus, template)
		if (score > best-transform.score) {
			best-transform ← (instance, score)
		}
	return best-transform
}

function GET-BEST-INSTANCE(corpus, template) returns transform {
	for from-tag ← from tag1 to tagn do
		for to-tag ← from tag1 to tagn do
			for pos ← from 1 to corpus-size do
				if (correct-tag(pos) == to-tag && current-tag(pos) == from-tag)
					num-good-transforms(current-tag(pos-1))++
				else if (correct-tag(pos) == from-tag && current-tag(pos) == from-tag)
					num-bad-transforms(current-tag(pos-1))++
				end
				best-Z ← ARGMAXᵢ(num-good-transforms(i) - num-bad-transforms(i))
				tmp = num-good-transforms(best-Z) - num-bad-transforms(best-Z)
				if (temp > best-intance.score)
					best.rule ← "Change tag from " + from-tag + " to " + to-tag + " if prev tag is " + best-Z
					best.score ← num-good-transforms(best-Z) - num-bad-transforms(best-Z)
	return best
}

function APPLY-TRANSFORM(transform, corpus) {
	for pos ← from 1 to corpus-size do
		if (current-tag(pos) == best-rule-from && current-tag(pos-1) == best-rule-prev)
			current-tag(pos) ← best-rule-to
}

／var／log marcus chiu

Explorer

Transformation-Based PoS Tagging

Transformation-Based PoS Tagging

Process

Example

Learning (How to Learn the Set of Rules) - Brill’s Transformation Based Learning (TLB)

Brill’s Transformation Based Learning (TLB) - Pseudocode

／var／logmarcus chiu

Explorer

Transformation-Based PoS Tagging

Transformation-Based PoS Tagging

Process

Example

Learning (How to Learn the Set of Rules) - Brill’s Transformation Based Learning (TLB)

Brill’s Transformation Based Learning (TLB) - Pseudocode

／var／log marcus chiu