Task 1: Punctuation restoration from read text
Speech transcripts generated by Automatic Speech Recognition (ASR) systems typically do not contain any punctuation or capitalization. In longer stretches of automatically recognized speech, the lack of punctuation affects the general clarity of the output text. The primary purpose of punctuation (PR) and capitalization restoration (CR) as a distinct natural language processing (NLP) task is to improve the legibility of ASR-generated text, and possibly other types of texts without punctuation. Aside from their intrinsic value, PR and CR may improve the performance of other NLP aspects such as Named Entity Recognition (NER), part-of-speech (POS) and semantic parsing or spoken dialog segmentation.
Task 2: Evaluation of translation quality assessment metrics
The task is to investigate metrics for automatic evaluation of machine translation results or other similar data types. We have prepared translations made from English to Polish along with reference translations made by a human - a Polish native speaker. In the task we are looking for automatic metrics for evaluation. The task will be evaluated at the level of segments often similar to sentences. The results of the task will be a calculated correlations of the submitted scores with the human evaluations performed manually. The task will be evaluated at the level of segments often similar to sentences. The result of the task will be a calculated correlation of the submitted scores with the human evaluations performed manually.
Task 3: Post-correction of OCR results
The proposed task concerns post-correcting OCR results of Polish-language books, which were published in the years 1791-1998.