Thomas Demeester

I am an associate professor at the Internet Technology and Data Science Lab (IDLab), Ghent University - imec, Belgium. Until a few years ago, I was mainly working on natural language processing (NLP), but I've always been more interested in the underlying machine learning methods and models than the natural language applications themselves. As soon as I was appointed assistant professor in 2019, I've initiated an AI research track that goes beyond NLP, with health as a key application area. I've been very fortunate to gather a group of highly talented PhD student working in that direction.

The research tracks we recently engaged in, include energy-based models (with a recent paper on Hopfield Networks and Deep Equilibrium Models), neuro-symbolic AI (with recent work on clinical reasoning as well as causal inference on combined tabular and textual clinical data). In collaboration with Luca Ambrogioni from Radboud University, we've looked into negative guidance strategies for diffusion models (with a paper at ICLR 2025), and with Stijn Vansteelandt and the Syndara team at the UZ Ghent hospital, we've worked on the inferential utility of synthetic medical data (with papers at UAI and NeurIPS 2024).

We've spent considerable time and effort over the last year on exploring the state of the art and beyond in AI for drug design, especially from the perspective of powerful protein language models and diffusion models. We've set up several collaborations in this area, with imec, and a number of Flemish biotech companies. Given the available budget and our ambitions in this area, I will be focusing most of my time on this newest research track. The goal is that our recent work on diffusion models, energy-based models, and neuro-symbolic approaches in the end come together in the application area of protein design.

Over the last decade, I've been building up and co-leading the Text-to-Knowledge research cluster with prof. Chris Develder. We worked on natural language processing (NLP) in general, for applications in several domains (the media, health, economics and law). Our current area of focus in NLP is conversational AI, with a strong emphasis on the use and development of neural language models. Some examples of recent work include the state-of-the-art biomedical sentence encoder BioLORD 2023, the transtokenization method for translating monolingual language models, the work on Anchored Preference Optimization and Contrastive Revisions for the alignment of generative language models, the ideas on highly efficient unsupervised dialogue structure induction, and the work on extreme multi-label classification in the HR domain.

Here's where I come from. In 2005, I received my M.Sc. degree in electrical engineering at Ghent University, after finishing my final year and master thesis at ETH Zurich, Switzerland. In 2009, funded by a grant from the Research Foundation - Flanders (FWO), I obtained my Ph.D. at the Ghent University Department of Information Technology, under the supervision of prof. Daniel De Zutter, in the area of computational electromagnetics. Shortly afterwards, I got involved in research on Information Retrieval, in collaboration with the Database Group at the University of Twente in The Netherlands. As a post-doc, I was heavily involved in getting project funding and managing projects mainly in the media sector. Gradually, my interests moved to Natural Language Processing, and machine learning and AI in general. It was my research stay in 2016 at University College London, in prof. Sebastian Riedel's Machine Reading lab, that made me decide to continue my career as an academic researcher. In October 2019, I was appointed assistant professor at IDLab, Ghent University, co-affiliated with imec. Although the job description didn't change a lot (apart from some additional teaching - which I love), it turns out the freedom in research directions gives lots of satisfaction. That, and the great students that I've been lucky enough to attract.

Jens-Joris Decorte

Topic: text encoders for skill extraction and labor market analysis

Supervisors: Thomas Demeester, Chris Develder, Jeroen Van Hautte

Sofie Labat

Topic: emotion analysis in conversational agents

Supervisors: Veronique Hoste, Thomas Demeester

Ruben Janssens

Topic: multi-modal human-robot conversation systems

Supervisors: Tony Belpaeme, Thomas Demeester

Paloma Rabaey

Topic: joint reasoning on tabular and textual clinical data

Supervisors: Thomas Demeester, Stefan Heytens

Henri Arno

Topic: predictive modeling (bankruptcy prediction) and causal inference from textual and tabular data

Supervisors: Thomas Demeester, Joke Baeck, Klaas Mulier

Cédric Goemaere

Topic: alternative learning paradigms: Hopfield networks, deep equilibrium models, predictive coding

Supervisors: Thomas Demeester

Alexander Decruyenaere

Topic: synthetic tabular data: inferential utility in clinical applications

Supervisors: Sylvie Rottey, Thomas Demeester, Stijn Vansteelandt

Felix Koulischer

Topic: dynamic guidance in diffusion models

Supervisors: Thomas Demeester

Joachim De Baer

Topic: conversational agents for collaborative learning

Supervisors: Chris Develder, Thomas Demeester

Aruna Audenaert

Topic: legal aspects to the use of AI for supporting the Chamber for Companies in Difficulty

Supervisors: Joke Baeck, Thomas Demeester, Klaas Mulier

Stijn Van Ruymbeke

Topic: machine learning based decision support to the Chambers for Companies in Difficulty

Supervisors: Joke Baeck, Klaas Mulier, Thomas Demeester

Valentin Noske

Topic: machine learning models for antibody design and characterization

Supervisors: Thomas Demeester, Jan Fostier, Kathleen Marchal

Adrick Tench

Topic: neuro-symbolic information extraction from clinical data

Supervisors: Thomas Demeester, Chris Develder

Karel D'Oosterlinck

Supervisors: Chris Develder, Thomas Demeester

Title: Interpreting and aligning large language models (February 2025)

Maarten De Raedt

Supervisors: Chris Develder, Thomas Demeester, Fréderic Godin

Title: Resource-efficient data annotation and augmentation methods for conversational agents (July 2024)

François Remy

Supervisors: Kris Demuynck, Thomas Demeester

Title: Deciphering the Semantics of Biomedical Texts (May 30, 2024)

Semere Kiros Bitew

Supervisors: Chris Develder, Thomas Demeester

Title: Language model adaptation with applications in artificial intelligence for education (March 2024)

Yiwei Jiang

Supervisors: Thomas Demeester, Chris Develder

Title: Towards grounded and robust task-oriented dialogue systems (February 2024)

Amir Hadifar

Supervisors: Chris Develder, Thomas Demeester, Veronique Hoste

Title: Towards efficient NLP models: strategies for improving data selection, training, and inference (June 2023)

Klim Zaporojets

Supervisors: Chris Develder, Thomas Demeester

Title: Neural approaches to entity-centric information extraction (December 2022)

Giannis Bekoulis

Supervisors: Chris Develder, Thomas Demeester

Title: Neural approaches to sequence labeling for information extraction (October 2019)

Lucas Sterckx

Supervisors: Chris Develder, Thomas Demeester

Title: Methods for efficient supervision in natural language processing (June 2018)

Cedric De Boom

Supervisors: Bart Dhoedt, Thomas Demeester

Title: Representation learning for symbolic sequences (May 2018)

Steven Van Canneyt

Supervisors: Bart Dhoedt, Steven Schockaert, Thomas Demeester

Title: Knowledge Extraction and Popularity Modeling Using Social Media (October 2016)