vrijdag 18 september 2009

Gold Standard for Dutch Lemmatizer

I already use the Dutch Text Interpretation Aid to expand the Dutch lexicon and to improve the Dutch lemmatizer. Therefore, I included a gold standard inside the software tool. Each time, a lemma is added, a gold standard entry will be added that includes the lexical category, the word, and the lemma.

Afterwards, this gold standard may be used to verify the lemmatizer.

zondag 13 september 2009

Dutch Text Interpretation Aid 5

It is now possible to add adjectives, adverbs, nouns, proper nouns, verbs, and other word types in the Dutch Text Interpretation Aid. The procedure is very simple:
1. Paste a text into the text pane.
Words that are not yet recognized will be underlined red. If such unknown word starts with a capital letter, the word will be underlined orange. Usually these words are proper nouns.
2. Select a word that is underlined and activate the popup menu (left mouse button released).
Choose the proper lexical category (adjective, adverb, noun, proper noun, verb, or other) from the popup menu.



3. Check or edit the word properties (i.e. lemma or verb conjugation) in the specific dialog window.
3.a. For proper nouns the following window is displayed.



3.b. For verbs the following window is displayed.



3.c. For adjectives, adverbs, nouns, and other word types the following window is displayed.



The following tasks are now on my to do list:
1. Store the information into a database instead of in the current text files.
Now some lemmas and the corresponding dictionary information are being repeated in several text files. For example the lemma groot appears both as an adverb and as an adjective.
2. Add functionality to update the dictionary information in the Dutch Text Interpretation Aid.
3. Add help information.