Coding of Genomic Sequencing Data

TNT members involved in this project:
Yeremia G. Adhisantoso, M.Sc.
Dr.-Ing. Marco Munderloh
Prof. Dr.-Ing. Jörn Ostermann
Dipl.-Ing. Jan Voges

Over the past years technological advances in genomic sequencing - the process of reading out genomic information from biological samples - have led to a faster and more cost-efficient approach to sequence individual genomes and other genomic samples. Because of the enormous amount of sequencing data generated the processing, storage and analysis of sequencing data entails novel challenges for the scientific community. New processes and tools have to be developed to overcome the current limitations in terms of storage space, processing speed and many more. Our goal is to develop novel algorithms to enhance data processing "from the tissue to the hard drive".

In the scope of this project we actively contribute to the series of MPEG-G standards (ISO/IEC 23092). More information is available on the MPEG-G website.

If you are interested in writing your thesis and thereby in contributing to this project please contact Jan Voges.

  • J. Voges, and J. Ostermann: Streaming für die Genomforschung, Binaire, vol. 2019, no. 2, 2019 (link)

Show all publications
  • Jan Voges, Tom Paridaens, Fabian Müntefering, Liudmila S. Mainzer, Brian Bliss, Mingyu Yang, Idoia Ochoa, Jan Fostier, Jörn Ostermann, Mikel Hernaez
    GABAC: an arithmetic coding solution for genomic data
    Bioinformatics, Vol. 36, No. 7, pp. 2275-2277, 2020
  • Idoia Ochoa, Hongyi Li, Florian Baumgarte, Charles Hergenrother, Jan Voges, Mikel Hernaez
    AliCo: A New Efficient Representation for SAM Files
    2019 Data Compression Conference (DCC), pp. 93-102, 2019
  • Tom Paridaens, Jan Voges, Mikel Hernaez, Jan Fostier, Jörn Ostermann
    GABAC: an arithmetic coding solution for genomic data [version 1; not peer-reviewed]
    F1000Research (International Society for Computational Biology Community Journal), Vol. 8, p. 1463 (poster), 2019