Weakly Supervised RE (Bootstrapping)

The idea here is to either:

start out with a set of hand-crafted rules and automatically find new ones from the unlabeled text data, through and iterative process (bootstrapping)
start out with a sed of seed tuples, describing entities with a specific relation (e.g. seed={(ORG:IBM, LOC:Armonk), (ORG:Microsoft, LOC:Redmond)} states entities having the relation “based in”)

Snowball: Extracting relations from large plain-text collections

Snowball is a fairly old example of an algorithm which does this:

Start with a set of seed tuples (or extract a seed set from the unlabeled text with a few hand-crafted rules).
Extract occurrences from the unlabeled text that matches the tuples and tag them with a NER (named entity recognizer).
Create patterns for these occurrences, e.g. “ORG is based in LOC”.
Generate new tuples from the text, e.g. (ORG:Intel, LOC: Santa Clara), and add to the seed set.
Go step 2 or terminate and use the patterns that were created for further extraction

／var／log marcus chiu