dinsdag 11 augustus 2009

Dutch Text Interpretation Aid

As mentioned in my previous post, I wanted to develop a Dutch tokenizer that could be used to identify words in a Dutch text. A word could than be fed to a Dutch lemmatizer to find the lemma of the word. Using the lemma, dictionary information about the selected word might be found.

However, while I investigated the possibilities for such Dutch tokenizer, I found out that there exist two Java functions, i.e. Utilities.getWordStart() and Utilities.getWordEnd(), that may be used to identify a word in a text. Therefore, I decided to use these utilities instead of developing a tokenizer of my own.

The following screenshot displays a prototype of the Dutch text interpretation aid I want to develop. Any text can be pasted into the text pane. While hovering the mouse over the text, the text field at the bottom should display information about the word under the mouse pointer.



Next, I will link an electronic dictionary to the tool to provide the necessary dictionary information.

Geen opmerkingen:

Een reactie posten