Applied Machine Learning in Genomic Data Science

Dr.-Ing. Jan Voges

Organization

Prerequisites

Hands-on programming experience (preferably in Python) is required. We will be programming in Python but not have the capacity to teach the language from scratch. Also, some familiarity with statistics and machine learning basics would be a plus.

Goals

The combined field of machine learning, genomics, and data science has witnessed a remarkable rise in recent years, transforming the landscape of biomedical research and healthcare, and revolutionizing our understanding of disease mechanisms and drug development, paving the way for precision medicine. In this course, students will enhance their understanding of how machine learning techniques can be applied to analyze and interpret biological data, specifically in the context of genomics. The key goals that students can expect to achieve are: 1) This course will provide students with a solid foundation in basic concepts and techniques used in genomic data science. 2) Students will learn about various machine learning algorithms. They will gain an understanding of how these algorithms work and when to apply them to different types of data. 3) Students will learn how to preprocess and prepare genomic data for machine learning tasks, choose appropriate features, train, and evaluate models, and interpret the results. By the end of the course, students will have a solid understanding of how machine learning can be applied to genomics and related areas, enabling them to explore further research and career opportunities in this exciting and rapidly evolving field. The course consists of a standard lecture, exercise sessions and project work. During the lecture the important concepts are introduced. In the exercise sessions, students will be guided in practical programming exercises. In the project work, the students work in small groups on programming projects during the semester.

Contents

Part I — Foundations: Introduction, Molecular Biology & DNA Sequencing, Information Theory, Machine Learning I, Machine Learning II

Part II — Applications: Processing of DNA Sequencing Data, Compression of DNA Sequencing Data, Variant Discovery, Bacteriome Analysis, 3D Genome Structure Reconstruction

Specialties

Participation limit: 30 (limited by room size). The project work must be completed during the semester. Lecture, exercise sessions and project work are only offered in the winter semester.

Literature