Shudong Hao—Understanding Crosslingual Topics by Probabilistic Models
OPEN TO THE PUBLIC | Probabilistic topic modeling has been used as an efficient tool for extracting high-level abstracts from large text dataset, and is also commonly used as a feature extraction technique for many natural language processing tasks. As a natural extension, multilingual topic models extract language-consistent features from corpora in multiple languages, enabling knowledge transfer across languages. Most models, however, require very specific crosslingual supervision data, which limits the generalization to languages without rich linguistic resources. In this talk, we will start by designing an efficient multilingual topic model evaluation as the foundation of subsequent works. We then formulate the model training as a knowledge transfer process from one language to another. Based on this formulation, we are able to identify factors that actually affect the performance of crosslingual learning in topic models, and thus introduce a new model that achieves competitive performance while using significantly less linguistic resource.
Please note that this event will take place from 6:00-7:00 pm, not 7:00-8:00 pm as incorrectly listed in the faculty and staff Bulletin.