I just added support for the lemmatisation of adjectives in the Dutch lemmatizer.
Adjectives are words that modify nouns. An adjective generally occurs in two forms, an undeclined one and a declined one, ending in -e. A good description of the rules I found at Wikibooks.
Adjectives are also modified to form comparatives and superlatives. For example:
goed - beter -best.
Such special cases may be added as rules like:
beter=>goed
best=>goed
Ordinal numbers may also be considered as adjectives, so the lemmatizer should propose the cardinal number of any ordinal number. For example:
drie - derde.
In most cases, de cardinal number is found by deleting the end -de or -ste of the ordinal number.
For example:
twee - tweede
twintig - twintigste
Since this rule may only be applied for ordinal numbers, a list of cardinal numbers can be maintained. This list should not be too long since it is sufficient to cover the cardinal numbers on which the lemma might end.
Next, I will add support for Dutch adverbs and nouns in the lemmatizer.
zaterdag 8 augustus 2009
Abonneren op:
Reacties posten (Atom)
Geen opmerkingen:
Een reactie posten