To capture all the necessary verb information and lemmatization rules I am developing a verb conjugator first.
Once I can conjugate all Dutch verbs, I should also have the information to calculate the lemma of a given verb form.
It seems that I should only calculate the simple present and the simple past, since all other tenses are a combination of the infinitive and the past participle with other verbs.
For the conjugation rules, I found good reference material at www.dutchgrammar.com.

So far, everything works fine. I'm checking the output of the Conjugator and when necessary do some additional coding to handle exceptions. For example, when the stem ends on t, no extra t should be added etc.
Will be continued...
Geen opmerkingen:
Een reactie posten