The new software tool I am working on is a Dutch Text Editor. The software tool will support the writing of Dutch texts. While writing a text, the tool will not only highlight spelling and grammar mistakes, but will also indicate ambiguous and/or difficult words. Where possible the tool will offer the user a list of synonyms to replace difficult words. The user may also specify the meaning of ambiguous words by choosing the proper definition. This semantic information will be saved together with the text so that it is easier for readers (humans or machines) to interpret the semantically enriched text.
So far, I developed a tokenizer and sentence splitter for Dutch.
As you can see in the figure, the sentence splitter distinguishes between a point for the digit group separator and for a full stop. The tokenizer also replaces abbreviations like "zo'n" with the full form "zo een". This should facilitate the work of the parser that still needs to be developed.
maandag 18 januari 2010
Abonneren op:
Reacties posten (Atom)
Geen opmerkingen:
Een reactie posten