Home | Research | Research Groups | Algorithmic Bioinformatics Algorithmic Bioinformatics Research Algorithmic Bioinformatics is a purely computational research group developing new algorithms to solve bioinformatics problems on sequence data. We address challenges such as read alignment, variant detection and genotyping, genome assembly, and whole-genome alignment among others. We implement our algorithms in new software tools and use them in the analysis of sequence data to gain new insights into human genetics and the immune system. Continue reading for a selection of the Group’s research projects. Examples of structural variants visualized in the Integrative Genomics Viewer (IGV). 01 | Complex structural variants A complex structural variant (cxSV) is a rearrangement of multiple segments of DNA that we assume was originally caused by a single mutational event. We have a new method for genotyping cxSVs in the human genome; we use breakpoint-resolved descriptions of known cxSVs and alignments of short-read sequencing data from a person’s genome to build allele models and calculate expected read-pair probability distributions. Based on these distributions, we predict the person's genotypes of the cxSVs as well as certainty scores. Our method enables accurate genotyping of cxSVs from widely available short-read data sets. 02 | Linked-read and long-read data analysis Our software tools, bcctools, bcmap, and bccall, can be chained into a workflow to allow structural variant detection and genotyping from linked-read sequencing data. Linked-read data provide long-range information through barcode labels on accurate short reads; in other words, all reads labeled with the same barcode originate from a small set of long DNA molecules. Bcmap and bccall can also be applied to regular long-read sequencing data. Bcctools is a toolbox for pre-processing linked-read data. It can trim barcodes from the reads, infer a whitelist of barcodes, and implements an efficient index data structure for retrieving corrected barcode sequences in constant time. It is several times faster to pre-process linked-read data with bcctools than with LongRanger. Bcctools is a toolbox for pre-processing linked-read data. It can trim barcodes from the reads, infer a whitelist of barcodes, and implements an efficient index data structure for retrieving corrected barcode sequences in constant time. It is several times faster to pre-process linked-read data with bcctools than with LongRanger. Bcmap efficiently determines the genomic intervals of long DNA molecules. We refer to this as ‘barcode mapping’. The output from bcmap enables efficient retrieval of reads from genomic regions of interest, without the need to compute a full read alignment. This approach is significantly faster than read alignment. Bcmap uses an open-addressing k-mer index and minimizers to efficiently determine genomic intervals in sets of reads labeled with the same barcode. Lastly, bccall allows for the detection and genotyping of structural variants. Given reads from a region of interest, it traverses a local assembly graph to detect structural variants and uses a statistical model to predict genotype likelihoods. 03 | Interactive exploratory workflows In collaboration with the Weidlich lab at the Humboldt University in Berlin, we are developing interactive workflows to support the exploration of genomic data. The analysis of scientific data is usually a dynamic process in which various software alternatives and setups are tested over a period of time. We are developing the means to simplify and systematically document this process. By applying these methods, we transform existing program chains into exploratory workflows that allow and track user intervention during execution. Workflows 04 | Structural variant detection in tens of thousands of genomes Our software tool, PopDel, can simultaneously analyze tens of thousands of short-read sequenced genomes to reliably detect and accurately genotype structural variants—differences between genomes that affect at least 50 bp of DNA sequence. The current focus of the software is on deleted sequence, but we have started to extend PopDel to other types of structural variants including inversions, duplications, and translocations. PopDel We have already used PopDel to identify a rare deletion in the LDLR gene which causes extremely low levels of LDL cholesterol in the blood (Björnsson E. et al, Circulation: Genomic and Precision Medicine, 2021). The tool’s superior scalability, high accuracy, fast run time, and ease of use make PopDel an attractive alternative to previous approaches. At the core of PopDel is a space-efficient (binary) read-pair-profile format and a structural variant-detection algorithm that is based on a likelihood-ratio test. More details in Niehus S. et al, Nature Communications, 2021 05 | Non-reference sequence variants Our software tool, PopIns2, the PopIns successor, identifies a type of genomic structural variant that involves non-repetitive sequence not found in the reference genome. We call these variants ‘non-reference sequence variants,’ or short NRS variants. Previously we could show that the majority of human non-reference sequence is ancestral, rather than newly inserted, and described an association between an NRS variant in the SREBF1 gene and myocardial infarction (Kehr et al, Nature Genetics, 2017). PopIns The detection of NRS variants from short-read data is particularly challenging as it inevitably involves a de novo assembly of the non-reference sequence. We combine data from many individuals simultaneously to ensure reliable NRS assembly. PopIns2 realizes this by representing non-reference sequence data in colored de Bruijn graphs. More details in Krannich T. et al, Bioinformatics, 2021 Publications Visit the complete list of our Research Group’s publications: https://kehrlab.github.io/publications.html Here is a selection of the most important publications from the last few years: Mirus T, Lohmayer R, Döhring C, Halldórsson BV, Kehr B. GGTyper: genotyping complex structural variants using short-read sequencing data. Bioinformatics 2024. Lüpken R, Krannich T, Kehr B. Bcmap: fast alignment-free barcode mapping for linked-read sequencing data. Preprint on bioRxiv 2022. doi: 10.1101/2022.06.20.496811 Pinkert J, Boehm H, Trautwein M, Doecke W, Wessel F, Ge Y, Gutierrez EM, Carretero R, Freiberg C, Gritzan U, Luetke-Eversloh M, Golfier S, Von Ahsen O, Volpin V, Sorrentino A, Rathinasamy A, Xydia M, Lohmayer R, Sax J, Nur-Menevse A, Hussein A, Stamova S, Beckmann G, Glueck JM, Schoenfeld D, Weiske J, Zopf D, Offringa R, Kreft B, Beckhove P, Willuda J. T cell-mediated elimination of cancer cells by blocking CEACAM6-CEACAM1 interaction. Oncoimmunology 2022;11(1):2008110. doi: 10.1080/2162402X.2021.2008110. PMID: 35141051 Krannich T, White WTJ, Niehus S, Holley G, Halldórsson BV, Kehr B. Population-scale detection of non-reference sequence variants using colored de Bruijn graphs. Bioinformatics 2022;38(3):604-611. doi: 10.1093/bioinformatics/btab749. PMID: 34726732 Niehus S, Jónsson H, Schönberger J, Björnsson E, Beyter D, Eggertsson HP, Sulem P, Stefánsson K, Halldórsson BV, Kehr B. PopDel identifies medium-size deletions simultaneously in tens of thousands of genomes. Nature communications 2021;12(1):730. doi: 10.1038/s41467-020-20850-5. PMID: 33526789 Schwarz JM, Lüpken R, Seelow D, Kehr B. Novel sequencing technologies and bioinformatic tools for deciphering the non-coding genome. Medizinische Genetik 2021;33(2):133-145. doi: 10.1515/medgen-2021-2072. PMID: 38836034 Markowski J, Kempfer R, Kukalev A, Irastorza-Azcarate I, Loof G, Kehr B, Pombo A, Rahmann S, Schwarz RF. GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data. Bioinformatics 2021;37(19):3128-3135. doi: 10.1093/bioinformatics/btab238. PMID: 33830196 Bjornsson E, Gunnarsdottir K, Halldorsson GH, Sigurdsson A, Arnadottir GA, Jonsson H, Olafsdottir EF, Niehus S, Kehr B, Sveinbjörnsson G, Gudmundsdottir S, Helgadottir A, Andersen K, Thorleifsson G, Eyjolfsson GI, Olafsson I, Sigurdardottir O, Saemundsdottir J, Jonsdottir I, Magnusson OT, Masson G, Stefansson H, Gudbjartsson DF, Thorgeirsson G, Holm H, Halldorsson BV, Melsted P, Norddahl GL, Sulem P, Thorsteinsdottir U, Stefansson K. Lifelong Reduction in LDL (Low-Density Lipoprotein) Cholesterol due to a Gain-of-Function Mutation in LDLR. Circulation. Genomic and precision medicine 2021;14(1):e003029. doi: 10.1161/CIRCGEN.120.003029. PMID: 33315477 Jónsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, Hjartarson E, Hardarson MT, Hjorleifsson KE, Eggertsson HP, Gudjonsson SA, Ward LD, Arnadottir GA, Helgason EA, Helgason H, Gylfason A, Jonasdottir A, Jonasdottir A, Rafnar T, Frigge M, Stacey SN, Th Magnusson O, Thorsteinsdottir U, Masson G, Kong A, Halldorsson BV, Helgason A, Gudbjartsson DF, Stefansson K. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 2017;549(7673):519-522. doi: 10.1038/nature24018. PMID: 28959963 Kehr B, Helgadottir A, Melsted P, Jonsson H, Helgason H, Jonasdottir A, Jonasdottir A, Sigurdsson A, Gylfason A, Halldorsson GH, Kristmundsdottir S, Thorgeirsson G, Olafsson I, Holm H, Thorsteinsdottir U, Sulem P, Helgason A, Gudbjartsson DF, Halldorsson BV, Stefansson K. Diversity in non-repetitive human sequences not found in the reference genome. Nature genetics 2017;49(4):588-593. doi: 10.1038/ng.3801. PMID: 28250455 Kehr B, Melsted P, Halldórsson BV. PopIns: population-scale detection of novel sequence insertions. Bioinformatics 2016;32(7):961-967. doi: 10.1093/bioinformatics/btv273. PMID: 25926346 Funding We would like to thank the funding agencies who support our work: FOR 2841 Beyond the Exome, Project P3 The Research Unit, FOR 2841, aims to identify, analyze, and predict the disease potential of non-coding DNA variants in patients with rare genetic diseases. The aim of Project P3 is to comprehensively identify genomic structural variants (SVs) in linked- and long-read sequencing data or rare disease patients. To this end, we developed a new genome-wide local assembly tool for SV detection during the first funding period. In the second funding period we are extending the tool to multi-sample variant calling. https://www.beyond-the-exome.org/P03.html CRC Transregio 221, GvH/GvL INF project The Transregional Collaborative Research Center, CRC/TRR 221, is investigating innovative immune-modulation strategies to separate graft-versus-host disease from graft-versus-leukemia effects. This seeks to enhance the safety and efficacy of allogeneic hematopoietic stem cell transplantation (HSCT) in the future. Within this the INF project is dedicated to data infrastructure. It focuses mainly on data management, while it also supports the individual projects with adequate software and expert knowledge during the entire data analysis process. https://www.gvhgvl.de/en/projects-publications/projects/project-section-b CRC 1404 FONDA, Project A6 The Collaborative Research Center, CRC 1404, explores methods for increasing productivity in the development, execution, and maintenance of data-analysis workflows for large scientific data sets. Our long-term goal is to develop methods and tools to substantially reduce both development time and development cost. Project A6 investigates methods and systems to support the explorative process of workflow specification. It focuses on workflows for genome analysis, which are often long and complex and whose development involves numerous design choices and time-consuming trial-and-error phases. https://fonda.hu-berlin.de/ GRK 2424 CompCancer The Research Training Group, GRK 2424, focuses on computational aspects of cancer research. Neuroblastoma is a pediatric tumor affecting the sympathetic nervous system, and is a model cancer predominantly driven by copy number variation in high-risk cases. In collaboration with a pediatric oncology group, we are studying rearrangements in neuroblastoma genomes. https://www.comp-cancer.de/ Team & Lab Life Prof. Birte Kehr Head of Research Group | Algorithmic Bioinformatics Tel: +49 941 944–18161 Email: birte.kehr@ukr.de Katrin Zehenter Team Assistant Tel: +49 941 944–38132 Email: katrin.zehenter@ukr.de Research team Prof. Birte Kehr Head of Research Group | Algorithmic Bioinformatics Dr. Robert Lohmayer Postdoctoral Scientist Kedi Cao PhD Student Tim Mirus PhD Student Richard Lüpken PhD Student Laura Grepmair PhD Student Previous Next Close Dr. Robert Lohmayer Postdoctoral Scientist Algorithmic Bioinformatics Tel: +49 941 944-18162 Email: Robert.Lohmayer@physik.uni-regensburg.de Close Kedi Cao PhD Student Algorithmic Bioinformatics Tel: +49 941 944-18162 Email: Kedi.Cao@ukr.de Close Tim Mirus PhD Student Algorithmic Bioinformatics Tel: +49 941 944-18162 Email: Tim.Mirus@ukr.de Close Richard Lüpken PhD Student Algorithmic Bioinformatics Tel: +49 941 944-18162 Email: Richard.Luepken@ukr.de Close Laura Grepmair PhD Student Algorithmic Bioinformatics Tel: +49 941 944-18162 Email: Laura.Grepmair@ukr.de Lab Life There is life outside the laboratory: The Leibniz Institute places great value on our scientists developing the team spirit both in and out of work. Here are the photos to prove it! Launch demo gallery modal for Gruppe_Kehr_web_2 Launch demo gallery modal for 20220626-PhD-cap-for-Sebastian_web Launch demo gallery modal for 20220722-Defense-Sebastian_web Launch demo gallery modal for 20210810-Kletterwald-Robert-and-Sebastian_web Launch demo gallery modal for 20210810-Kletterwald-Robert_web Launch demo gallery modal for 20210810-Kletterwald_web Launch demo gallery modal for 20210810-BBQ_web Launch demo gallery modal for 2024_06_AG Kehr Karussel. Launch demo gallery modal for 2024_06_AG Kehr. Image: Gruppe_Kehr_web_2 X Image: 20220626-PhD-cap-for-Sebastian_web X Image: 20220722-Defense-Sebastian_web X Image: 20210810-Kletterwald-Robert-and-Sebastian_web X Image: 20210810-Kletterwald-Robert_web X Image: 20210810-Kletterwald_web X Image: 20210810-BBQ_web X Image: 2024_06_AG Kehr Karussel. X Image: 2024_06_AG Kehr. X