Senja Pollak is a researcher at the Department of Knowledge Technologies, Jozef Stefan Institute.

After a BSc in Sociology of Culture and French Linguistics (Univ. of Ljubljana and Sorbonne Nouvelle, Paris), she oriented her research into computational linguistics. She earned a MSc degree in Computational Linguistics at the University of Antwerp and a PhD on the topic of definition extraction at the Dept. of Translation, Univ. of Ljubljana, where she is teaching French grammar. She also holds an ECQA advanced terminology manager certificate.

Her main interests are language technologies, corpus linguistics and computational creativity. She is involved in several EU and national projects, where she is currently performing research on collocations in non-standard Slovene (project Janes), conceptual blending (EU project ConCreTe), machine learning for fictional ideation (EU project WHIM), and is involved also in EU projects MUSE on interactive storytelling and in Prosecco Network for promoting Computational Creativity.

Automatic Terminology and Definition Extraction

Domain terms and their definitions are usually collected in terminological dictionaries and domain glossaries. However, constructing and updating glossaries manually is costly and time-consuming. Therefore, automated methods have been developed.

For automated term extraction, which is the first step of terminology management, several commercial and open source tools are available. A more challenging question is whether we can automatically extract definitions of these terms?

After a brief introduction into terminology, terminology management and definition types, different approaches to term- and definition-extraction will be presented. Methods can rely on syntactic rules, frequency information or rely on the use of machine-learning techniques. In more detail, I will present the approach to automatic extraction of terminology and definitional sentences from domain text for Slovene and English that I have been developing. I will also discuss possible applications in the domains of translation, lexicography, as well as text-mining.