Medical Information Processing

Mitarbeiter: Jörn Ostermann, Bodo Rosenhahn, Jan Voges, Yeremia G. Adhisantoso
Background

At Institut für Informationsverarbeitung (TNT) we develop methods for processing and analyzing DNA sequencing data. The development of high-throughput sequencing technologies has the potential to enable the use of such genomic sequencing data as a daily practice in various areas. However, the IT costs associated with storing, transferring and processing large amounts of genomic sequencing data now significantly exceed the costs of performing the actual sequencing. Our work aims at democratizing genomic sequencing data access, for example to enable its broad use in personalized medicine.

Goal

Compression of Sequencing Data

In DNA sequencing, a nucleotide sequence to be read out is first fragmented. The fragments are first multiplied and then read out by a sequencing machine. All known sequencing technologies are generally defective. For this reason, a quality value is assigned to nucleotide. The read-out fragments are called reads and are stored together with the quality values in FASTQ files. Further processing steps are the alignment of the reads with the aim of reconstructing the underlying DNA sequence and the identification of structural variants of the sequenced material.

In our work we are especially concerned with compression methods for aligned reads and transparent lossy compression of quality values.

MPEG-G

The MPEG-G standard series is the first ISO/IEC project for the storage and transmission of sequencing data. Large parts of our work have been incorporated into MPEG-G.

Approach

Sequence alignment, lossy compression, machine learning, entropy coding methods

References

[1] Ibrahim Numanagic, James K Bonfield, Faraz Hach, Jan Voges, Jörn Ostermann, Claudio Alberti, Marco Mattavelli, S Cenk Sahinalp. Comparison of high-throughput sequencing data compression tools. Nature Methods 13(12), pp. 1005–1008, 2016.

[2] Jan Voges, Jörn Ostermann, Mikel Hernaez. CALQ: compression of quality values of aligned sequencing data. Bioinformatics 34(10), pp. 1650–1658, 2018

[3] Claudio Alberti, Noah Daniels, Mikel Hernaez, Jan Voges, Rachel L Goldfeder, Ana A Hernandez-Lopez, Marco Mattavelli, Bonnie Berger. An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values. 2016 Data Compression Conference (DCC), pp. 221–230, Snowbird, UT (US), 2016.

[4] Claudio Alberti, Tom Paridaens, Jan Voges, Daniel Naro, Junaid J. Ahmad, Massimo Ravasi, Daniele Renzi, Giorgio Zoia, Idoia Ochoa, Marco Mattavelli, Jaime Delgado, Mikel Hernaez. An introduction to MPEG-G, the new ISO standard for genomic information representation. bioRxiv preprint, 2018.

  • Conference Contributions
    • Idoia Ochoa, Hongyi Li, Florian Baumgarte, Charles Hergenrother, Jan Voges, Mikel Hernaez
      AliCo: A New Efficient Representation for SAM Files
      2019 Data Compression Conference (DCC), pp. 93-102, 2019
    • Tom Paridaens, Jan Voges, Mikel Hernaez, Jan Fostier, Jörn Ostermann
      GABAC: an arithmetic coding solution for genomic data [version 1; not peer-reviewed]
      F1000Research (International Society for Computational Biology Community Journal), Vol. 8, p. 1463 (poster), 2019
    • Jan Voges, Ali Fotouhi, Jörn Ostermann, M. Oguzhan Külekci
      A Two-Level Scheme for Quality Score Compression
      10th International Conference on Bioinformatics and Computational Biology (BICOB 2018), pp. 93-102, 2018
    • Ana A. Hernandez-Lopez, Jan Voges, Claudio Alberti, Marco Mattavelli, Jörn Ostermann
      Lossy Compression of Quality Scores in Differential Gene Expression: A First Assessment and Impact Analysis
      2018 Data Compression Conference (DCC), pp. 167-176, 2018
    • Ana A. Hernandez-Lopez, Jan Voges, Claudio Alberti, Marco Mattavelli, Jörn Ostermann
      Differential Gene Expression with Lossy Compression of Quality Scores in RNA-Seq Data
      2017 Data Compression Conference (DCC), p. 444 (poster), 2017
    • Jan Voges, Jörn Ostermann, Mikel Hernaez
      CALQ: compression of quality values of aligned sequencing data [version 1; not peer-reviewed]
      F1000Research (International Society for Computational Biology Community Journal), Vol. 6, p. 1382 (poster), 2017
    • Jan Voges, Jörn Ostermann
      MPEG-G: The Emerging Standard for Genomic Data [not peer-reviewed]
      Poster abstracts of the 25th German Conference on Bioinformatics, p. 2 (poster), 2017
    • Claudio Alberti, Noah Daniels, Mikel Hernaez, Jan Voges, Rachel L. Goldfeder, Ana A. Hernandez-Lopez, Marco Mattavelli, Bonnie Berger
      An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values
      2016 Data Compression Conference (DCC), pp. 221-230, 2016
    • Jan Voges, Marco Munderloh, Jörn Ostermann
      Predictive Coding of Aligned Next-Generation Sequencing Data
      2016 Data Compression Conference (DCC), pp. 241-250, 2016
    • Erik Soltow, Bodo Rosenhahn
      Automatic Pose Estimation Using Contour Information from X-Ray Images
      Image and Video Technology – PSIVT 2015 Workshops, Springer International Publishing, March 2016, edited by Fay Huang and Akihiro Sugimoto
    • Erik Soltow, Christof Hurschler, Bodo Rosenhahn
      Geometric bone models for marker-less RSA in total knee arthroplasty: a proof of concept
      4th International RSA Meeting, May 2015
    • Oliver Müller, Sabine Donner, Tobias Klinder, Ivonne Bartsch, Alexander Krüger, Alexander Heisterkamp, Bodo Rosenhahn
      Compensating motion artifacts of 3D in vivo SD-OCT scans
      Medical Image Computing and Computer Assisted Intervention (MICCAI), Vol. 7510, pp. 198--205, October 2012, edited by Nicholas Ayache, Hervé Delingette, Polina Golland, Kensaku Mori
    • Stojan Maleschlijski, Laura Leal-Taixé, Sebastian Weiße, Alessio Di Fino, Nicholas Aldred, A. S. Clare, G. Hernán Sendra, Bodo Rosenhahn, Axel Rosenhahn
      A stereoscopic approach for three dimensional tracking of marine biofouling microorganisms
      Microscopic Image Analysis with Applications in Biology (MIAAB). Heidelberg, Germany, September 2011
    • Oliver Müller, Sabine Donner, Tobias Klinder, Ralf Dragon, Ivonne Bartsch, Frank Witte, Alexander Krüger, Alexander Heisterkamp, Bodo Rosenhahn
      Model Based 3D Segmentation and OCT Image Undistortion of Percutaneous Implants
      Medical Image Computing and Computer-Assisted Intervention – MICCAI 2011 14th International Conference, Lecture Notes in Computer Science (LNCS), Springer Berlin / Heidelberg, Vol. 6893, pp. 454-462, September 2011, edited by Fichtinger, Gabor and Martel, Anne and Peters, Terry
    • Arne Ehlers, Florian Baumann, Ralf Spindler, Birgit Glasmacher, Bodo Rosenhahn
      PCA Enhanced Training Data for Adaboost
      Computer Analysis of Images and Patterns - 14th International Conference, CAIP 2011, Seville, Spain, August 29-31, 2011, Proceedings, Part I, Springer, Vol. 6854, pp. 410-419, August 2011
    • Laura Leal-Taixé, Matthias Heydt, Sebastian Weiße, Axel Rosenhahn, Bodo Rosenhahn
      Classification of swimming microorganisms motion patterns in 4D digital in-line holography data
      32nd Annual Symposium of the German Association for Pattern Recognition (DAGM), Springer, Vol. 6376, pp. 283-292, 2010
    • Tobias Klinder, Cristian Lorenz, Jörn Ostermann
      Free-Breathing Intra- and Intersubject Respiratory Motion Capturing, Modeling, and Prediction
      SPIE 2009, SPIE Medical Imaging, Orlando, Florida, USA , February 2009
    • Matthias Ehm, Tobias Klinder, Reinhard Kneser, Cristian Lorenz
      Automated Vertebra Identification in CT images
      SPIE 2009, SPIE Medical Imaging, Orlando, Florida, USA , February 2009
    • Laura Leal-Taixé, Ahmet U. Coskun, Bodo Rosenhahn, Dana H. Brooks
      Automatic segmentation of arteries in multi-stain histology images
      World Congress on Medical Physics and Biomedical Engineering, Munich (Germany), September 7th-12th, 2009
    • Laura Leal-Taixé, Matthias Heydt, Axel Rosenhahn, Bodo Rosenhahn
      Automatic tracking of swimming microorganisms in 4D digital in-line holography data
      IEEE Workshop on Motion and Video Computing (WMVC), Snowbird, Utah, USA., December 2009
    • Tobias Klinder, Cristian Lorenz, Jens von Berg, Steffen Renisch, Thomas Blaffert, Jörn Ostermann
      4DCT Image-Based Lung Motion Field Extraction and Analysis
      SPIE 2008, SPIE Medical Imaging, San Diego, California, USA , February 2008
    • Thomas Blaffert, Hans Barschdorff, Jens von Berg, Sebastian Dries, Astrid Franz, Tobias Klinder, Cristian Lorenz, Steffen Renisch, Rafael Wiemker
      Lung Lobe Modeling and Segmentation with Individualized Surface Meshes
      SPIE 2008, SPIE Medical Imaging, San Diego, California, USA, February 2008
    • Torbjörn Vik, Sven Kabus, Jens von Berg, Konstantin Ens, Sebastian Dries, Tobias Klinder, Cristian Lorenz
      Validation and Comparison of Registration Methods for Free-Breathing 4D Lung-CT
      SPIE 2008, SPIE Medical Imaging, San Diego, California, USA , February 2008
    • Tobias Klinder, Cristian Lorenz, Jörn Ostermann
      Respiratory Motion Modeling and Estimation
      The First Annual Workshop on Pulmonary Image Analysis, MICCAI 2008, Medical Image Computing and Computer Assisted Intervention, New York, USA, September 2008
    • Jalda Dworzak, Hans Lamecker, Jens von Berg, Tobias Klinder, Cristian Lorenz, Dagmar Kainmüller, Heiko Seim, Hans-Christian Hege, Stefan Zachow
      Towards Model-based 3-D Reconstruction of the Human Rib Cage from Radiographs
      CURAC 2008, Computer- und Roboterassistierte Chirurgie e.V., September 2008
    • Udo van Stevendaal, Tobias Klinder, Cristian Lorenz, Thomas Köhler
      Breathing-Motion Correction for Helical CT
      IEEE NSS/MIC, IEEE Nuclear Science Symposium and Medical Imaging Conference, Dresden, Germany, October 2008
    • Astrid Franz, Robin Wolz, Tobias Klinder, Cristian Lorenz, Hans Barschdorf, Thomas Blaffert, Sebastian Dries, Steffen Renisch
      Simultaneous Model-Based Segmentation of Multiple Objects
      BVM 2008, Bildverarbeitung für die Medizin, Berlin, Germany, April 2008
    • Tobias Klinder, Robin Wolz, Cristian Lorenz, Astrid Franz, Jörn Ostermann
      Spine Segmentation Using Articulated Shape Models
      MICCAI 2008, Medical Image Computing and Computer Assisted Intervention, Springer, New York, USA, September 2008
    • Tobias Klinder, Cristian Lorenz, Jens von Berg, Sebastian Dries, Thomas Bülow, Jörn Ostermann
      Automated Model-Based Rib Cage Segmentation and Labeling in CT images
      MICCAI 2007, Medical Image Computing and Computer Assisted Intervention, Springer, Brisbane, Australia, October 2007
    • Tobias Klinder, Cristian Lorenz, Jens von Berg
      Geometrical Rib-Cage Modeling, Detection, and Segmentation
      CARS 2007, Computer Assisted Radiology and Surgery, Berlin, Germany, June 2007
  • Journals
    • Jan Voges, Tom Paridaens, Fabian Müntefering, Liudmila S. Mainzer, Brian Bliss, Mingyu Yang, Idoia Ochoa, Jan Fostier, Jörn Ostermann, Mikel Hernaez
      GABAC: an arithmetic coding solution for genomic data
      Bioinformatics, Vol. 36, No. 7, pp. 2275-2277, 2020
    • Jan Voges, Jörn Ostermann, Mikel Hernaez
      CALQ: compression of quality values of aligned sequencing data
      Bioinformatics, Vol. 34, No. 10, pp. 1650-1658, 2018
    • Jan Voges, Ali Fotouhi, Jörn Ostermann, M. Oguzhan Külekci
      A Two-level Scheme for Quality Score Compression
      Journal of Computational Biology, Vol. 25, No. 10, pp. 1141-1151, 2018
    • Claudio Alberti, Tom Paridaens, Jan Voges, Daniel Naro, Junaid J. Ahmad, Massimo Ravasi, Daniele Renzi, Giorgio Zoia, Paolo Ribeca, Idoia Ochoa, Marco Mattavelli, Jaime Delgado, Mikel Hernaez
      An introduction to MPEG-G, the new ISO standard for genomic information representation [not peer-reviewed]
      bioRxiv, Vol. 426353, 2018
    • Ibrahim Numanagic, James K. Bonfield, Faraz Hach, Jan Voges, Jörn Ostermann, Claudio Alberti, Marco Mattavelli, S. Cenk Sahinalp
      Comparison of high-throughput sequencing data compression tools
      Nature Methods, Vol. 13, No. 12, pp. 1005-1008, 2016
    • Sabine Donner, Oliver Müller, Frank Witte, Ivonne Bartsch, Elmar Willbold, Tammo Ripken, Alexander Heisterkamp, Bodo Rosenhahn, Alexander Krüger
      In situ optical coherence tomography of percutaneous implant-tissue interfaces in a murine model
      Biomedical Engineering/Biomedizinische Technik, De Gruyter, pp. 1-9, Karlsruhe, May 2013, edited by Dössel, Olaf
    • S. Maleschlijski, G. H. Sendra, A. Di Fino, L. Leal-Taixé, I. Thome, A. Terfort, N. Aldred, M. Grunze, A. S. Clare, B. Rosenhahn, A. Rosenhahn
      Three dimensional tracking of exploratory behavior of barnacle cyprids using stereoscopy
      Biointerphases. Journal for the Quantitative Biological Interface Data., Springer, August 2012
    • Ralf Spindler, Bodo Rosenhahn, Nicola Hofmann, Birgit Glasmacher
      Video analysis of osmotic cell response during cryopreservation
      Cryobiology, Elsevier, February 2012
    • Tobias Klinder, Jörn Ostermann, Matthias Ehm, Astrid Franz, Reinhard Kneser, Cristian Lorenz
      Automated Model-Based Vertebra Detection, Identification, and Segmentation in CT Images
      Medical Image Analysis, Elsevier, Vol. 13, pp. 471-482, 2009
  • Book Chapters
    • Laura Leal-Taixé, Matthias Heydt, Axel Rosenhahn, Bodo Rosenhahn
      Understanding what we cannot see: automatic analysis of 4D digital in-line holographic microscopy data
      Video Processing and Computational Video, Springer, July 2011, edited by D. Cremers, M.A. Magnor, M.R. Oswald, L. Zelnik-Manor