School of Computing Research Colloquia

Natural Language Processing using Compression-based Language Models: Past, Present and Future

William Teahan, Department of Computer Science, Bangor University

Abstract: This talk will describe the application of compression-based language models to the field of Natural Language Processing (NLP). It will describe the Prediction by Partial Matching (PPM) compression algorithm and its application to various problems in NLP such as text compression, text classification, text mining, and machine translation for various languages such as English, Welsh, Chinese and Arabic. The talk will discuss past results for these applications, summarise present research and highlight some possible future directions.