AI Meets Genetics: Joeri van der Velde on Unlocking the Potential of DNA Data

As an Assistant Professor in the Systems Genetics section at the University Medical Center Groningen (UMCG), Joeri van der Velde operates at the intersection of genetics, data infrastructure, and artificial intelligence. His work tackles one of the biggest challenges in modern medicine: how to make use of the vast amounts of genetic data that are available, but largely underutilized.

“You can roughly divide my work into two parts,” van der Velde explains. “On one hand, we build infrastructure to make genetic data findable, accessible, reusable, and well-structured, what we call the FAIR principles: Findable, Accessible, Interoperable, and Reusable. On the other hand, we develop AI methods to improve diagnoses, especially for rare genetic disorders.”

In 2024-2025, he participated in exquAIro’s Biomedical AI Bootcamp (Class 2). “My background is in bioinformatics and genetics,” he says. “I already worked extensively with data infrastructures, but my knowledge of statistics and machine learning was limited. The bootcamp provided exactly the in-depth training I was missing.” He now applies that knowledge directly in his research on rare genetic disorders and in making large-scale genetic data more useful.

Untapped treasures

“Every year, enormous amounts of DNA sequencing data are generated in the Netherlands,” he says. “Yet only a fraction is actually used. For a diagnosis, you often look at only a few variants; the rest remain untouched. Meanwhile, that data holds immense potential for new insights.”

This underutilization is caused by fragmentation, different standards, and complex regulations around data use. “We try to remove these barriers and structure and link data better so it can be shared and reused effectively, both nationally and across Europe.”

Diagnostics: from 35% to more

The second pillar of his work focuses on improving genetic diagnostics. Currently, only about 35% of patients with a suspected genetic disorder receive a molecular diagnosis. “That means the majority still go home without a clear answer,” says van der Velde. “With AI, we aim to identify causal variants more effectively.”

A concrete example, partly inspired by the bootcamp, is a method that links DNA variants to protein structures. Changes in protein properties are then used to predict, via machine learning, whether a variant is pathogenic.

“For this, we used, for example, a random forest. It’s relatively simple, but it works very well in this case. The most important lesson I learned from trainer Ilya Petoukhov during the bootcamp was: choose the right model for the problem. More complex is not automatically better.”

From linear models to deep learning

That broad foundation, spanning classical statistics to modern AI, was one of the bootcamp’s greatest benefits for him.

“You start with the fundamentals: linear models, Bayesian statistics, classical inference. From there, you move on to decision trees, ensemble methods, and eventually deep learning and generative models,” he says. “This way, you understand not only how models work, but also when to use them. You have a toolbox with everything in it — the skill is knowing which tool you need for a specific problem.”

Explainability as a prerequisite

Explainability is crucial in genetic diagnostics. “Many tools we used in genetics were long considered black boxes,” van der Velde notes. “You’d get a prediction, but you wouldn’t know why. That’s problematic in a clinical context.”

He therefore uses techniques such as SHAP (SHapley Additive exPlanations), which break down individual predictions into contributions from different features. “You can show exactly why a variant is classified as likely pathogenic. This increases trust and allows for validation.”

Language models as a connecting layer

In addition to classical machine learning, he sees an important role for generative AI. “Language models are extremely powerful for processing unstructured data.” Applications range from automatically extracting phenotypic features from clinical texts to harmonizing datasets. “You can use them as a sort of ‘glue layer’ between different data sources. They help structure and link data, something that’s practically impossible to do manually.”

In research, language models can also add value by combining literature, databases, and patient data into new hypotheses. He cautions, however: “They can hallucinate, meaning invent ‘facts.’ That’s why you must always use them in a controlled way, for example with a human-in-the-loop.”

From model to implementation

A key distinction emphasized in the bootcamp is between model development and implementation. “Building a model is only a small part of the work,” van der Velde says. “You need to consider software engineering, infrastructure, security, and how to maintain a model in production. Many models fail because that step is missing.”

Writing successful grant applications was also extensively covered, led by Marnix Bügel, one of exquAIro’s founders. “That helps enormously in actually bringing your ideas to life.”

AI as an organizational catalyst

The bootcamp’s impact is visible not only in content but also organizationally. Van der Velde now plays a central role within his department. “I lead an AI team and am involved in strategic AI decisions,” he says. “You become a connecting factor between research, the clinic, and policy.”

This role is increasingly important as AI permeates all layers of the field. “From patient care to fundamental research, AI plays a role everywhere. The question is not whether you work with it, but how.”

Who benefits from the bootcamp?

Van der Velde is clear: “AI is so widely applicable that almost everyone can benefit. But a certain technical affinity helps if you want to dive deep.” His key insight is simple: “It’s not just about the model. It’s about the whole: data, methods, implementation, and collaboration. If you master that, you can truly make an impact.”

Text: exquAIro (author Marlies Schipperheijn)
Photo: Joeri van der Velde during Biomedical AI Bootcamp, photographer Jan Buwalda