zondag 6 december 2009

JavaDB for Dutch Text Interpretation Aid

Finally, I found some time to develop a JavaDB database to store the contents of the Dutch Text Interpretation Aid.

I believe this is important, because a database is better scalable than plain text files to store and retrieve information. The database also allows collaboration by making use of time stamps and user information. I plan to distribute the software tool to interested users who may contribute to, and make use of, the shared database.

Including all word forms for Dutch takes a huge amount of time and energy. Dutch is after all a highly inflected language. By dividing the work over many users, I hope that the benefits will outnumber the costs.

Currently, the tool recognises about 90% of the words in a news article. About 2% of the missing words are proper nouns (see the figure below).