Sepp Hochreiter: Difference between revisions

Content deleted Content added

Inline

Revision as of 09:32, 22 October 2014

Sepp Hochreiter (born 1967 in Mühldorf am Inn) is a computer scientist working in the fields of bioinformatics and machine learning. Since 2006 he has been head of the Institute of Bioinformatics at the Johannes Kepler University of Linz. Before, he was at the Technical University of Berlin, at the University of Colorado at Boulder, and at the Technical University of Munich. At the Johannes Kepler University of Linz, he founded the Bachelors Program in Bioinformatics, which is a cross-border, double-degree study program together with the University of South-Bohemia in České Budějovice (Budweis), Czech Republic. He also established the Masters Program in Bioinformatics at the Johannes Kepler University of Linz, where he is still the acting dean of both studies. Sepp Hochreiter launched the Bioinformatics Working Group at the Austrian Computer Society, he is founding board member of different bioinformatics start-up companies, he was program chair of the conference Bioinformatics Research and Development, he is conference chair of the conference Critical Assessment of Massive Data Analysis (CAMDA) , he is editor, program committee member, and reviewer for international journals and conferences.

Scientific Contributions

Genetics

Sepp Hochreiter developed "HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data"^[1] for detecting short segments of identity by descent. A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common ancestor, that is, the segment has the same ancestral origin in these individuals. HapFABIA identifies 100 times smaller IBD segments than current state-of-the-art methods: 10kbp for HapFABIA vs. 1Mbp for state-of-the-art methods. HapFABIA is tailored to next generation sequencing data and utilizes rare variants for IBD detection but also works for microarray genotyping data. HapFABIA allows to enhance evolutionary biology, population genetics, and association studies because it decomposed the genome into short IBD segments which describe the genome with very high resolution. HapFABIA was used to analyze the IBD sharing between Humans, Neandertals (Neanderthals), and Denisovans:^[2] Research Report.

Next-Generation Sequencing

Sepp Hochreiter's research group is member of the SEQC/MAQC-III consortium, coordinated by the US Food and Drug Administration. This consortium examined Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites regarding RNA sequencing (RNA-seq) performance.^[3] Within this project standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments have been defined.^[4] For analyzing the structural variation of the DNA, Sepp Hochreiter's research group proposed "cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation data with a low false discovery rate"^[5] for detecting copy number variations in next generation sequencing data. cn.MOPS estimates the local DNA copy number, is suited for both whole genome sequencing and exom sequencing, and can be applied to diploid and haploid genomes but also to polyploid genomes. For identifying differential expressed transcripts in RNA-seq (RNA sequencing) data, Sepp Hochreiter's group suggested "DEXUS: Identifying Differential Expression in RNA-Seq Studies with Unknown Conditions".^[6] In contrast to other RNA-seq methods, DEXUS can detect differential expression in RNA-seq data for which the sample conditions are unknown and for which biological replicates are not available. In the group of Sepp Hochreiter, sequencing data was analyzed to gain insights into chromatin remodeling. The reorganization of the cell's chromatin structure was determined via next-generation sequencing of resting and activated T cells. The analyses of these T cell chromatin sequencing data identified GC-rich long nucleosome-free regions that are hot spots of chromatin remodeling.^[7]

Microarray Preprocessing and Summarization

Sepp Hochreiter developed "Factor Analysis for Robust Microarray Summarization" (FARMS).^[8] FARMS has been designed for preprocessing and summarizing high-density oligonucleotide DNA microarrays at probe level to analyze RNA gene expression. FARMS is based on a factor analysis model which is optimized in a Bayesian framework by maximizing the posterior probability. On Affymetrix spiked-in and other benchmark data, FARMS outperformed all other methods. A highly relevant feature of FARMS is its informative/ non-informative (I/NI) calls.^[9] The I/NI call is a Bayesian filtering technique which separates signal variance from noise variance. The I/NI call offers a solution to the main problem of high dimensionality when analyzing microarray data by selecting genes which are measured with high quality.^[10]^[11] FARMS has been extended to cn.FARMS ^[12] for detecting DNA structural variants like copy number variations with a low false discovery rate.

Biclustering

Sepp Hochreiter developed "Factor Analysis for Bicluster Acquisition" (FABIA)^[13] for biclustering that is simultaneously clustering rows and columns of a matrix. A bicluster in transcriptomic data is a pair of a gene set and a sample set for which the genes are similar to each other on the samples and vice versa. In drug design, for example, the effects of compounds may be similar only on a subgroup of genes. FABIA is a multiplicative model that assumes realistic non-Gaussian signal distributions with heavy tails and utilizes well understood model selection techniques like a variational approach in the Bayesian framework. FABIA supplies the information content of each bicluster to separate spurious biclusters from true biclusters.

Support Vector Machines

Support vector machines (SVMs) are supervised learning methods used for classification and regression analysis by recognizing patterns and regularities in the data. Standard SVMs require a positive definite kernel to generate a squared kernel matrix from the data. Sepp Hochreiter proposed the "Potential Support Vector Machine" (PSVM),^[14] which can be applied to non-square kernel matrices and can be used with kernels that are not positive definite. For PSVM model selection he developed an efficient sequential minimal optimization algorithm.^[15] The PSVM minimizes a new objective which ensures theoretical bounds on the generalization error and automatically selects features which are used for classification or regression.

Feature Selection

Sepp Hochreiter applied the PSVM to feature selection, especially to gene selection for microarray data.^[16]^[17]^[18] The PSVM and standard support vector machines were applied to extract features that are indicative coiled coil oligomerization.^[19]

Learning Representations and Low Complexity Neural Networks

Neural networks are different types of simplified mathematical models of biological neural networks like those in human brains. If data mining is based on neural networks, overfitting reduces the network's capability to correctly process future data. To avoid overfitting, Sepp Hochreiter developed algorithms for finding low complexity neural networks like "Flat Minimum Search" (FMS),^[20] which searches for a "flat" minimum — a large connected region in the parameter space where the network function is constant. Thus, the network parameters can be given with low precision which means a low complex network that avoids overfitting. Low complexity neural networks are well suited for deep learning because they control the complexity in each network layer and, therefore, learn hierarchical representations of the input.

Deep Neural Networks and Long Short-Term Memory (LSTM)

Recurrent neural networks scan and process sequences and supply their results to the environment. Sepp Hochreiter developed the long short term memory,^[21] which overcomes the problem of previous recurrent and deep networks to forget information over time or, equivalently, through layers. LSTM learns from training sequences to solve numerous tasks like automatic music composition, speech recognition, reinforcement learning, and robotics. LSTM with an optimized architecture was successfully applied to very fast protein homology detection without requiring a sequence alignment.^[22] LSTM has been used to learn a learning algorithm, that is, LSTM substitutes as a Turing machine or computer on which a code for a learning algorithm is executed. Since the learning code is a neural network, it can be improved and develops learning algorithms that are superior to all known human design methods. ^[23] See the conference paper [http://www.bioinf.jku.at/publications/older/1504.pdf Learning to Learn Using Gradient Descent], where superhuman learning algorithms are learned by LSTM.

References

^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 24174545, please use {{cite journal}} with |pmid=24174545 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1101/003988, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1101/003988 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 25150838, please use {{cite journal}} with |pmid=25150838 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 25254650, please use {{cite journal}} with |pmid=25254650 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 22302147, please use {{cite journal}} with |pmid=22302147 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 24049071, please use {{cite journal}} with |pmid=24049071 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 23144837, please use {{cite journal}} with |pmid=23144837 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 16473874, please use {{cite journal}} with |pmid=16473874 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 17921172, please use {{cite journal}} with |pmid=17921172 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 21059952, please use {{cite journal}} with |pmid=21059952 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.2202/1544-6115.1460, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.2202/1544-6115.1460 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 21486749, please use {{cite journal}} with |pmid=21486749 instead.
^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 20418340, please use {{cite journal}} with |pmid=20418340 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1162/neco.2006.18.6.1472, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1162/neco.2006.18.6.1472 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1162/neco.2008.20.1.271, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1162/neco.2008.20.1.271 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1007/978-3-540-35488-8_20, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1007/978-3-540-35488-8_20 instead.
^ Hochreiter, S.; Obermayer, K. (2003). "Classification and Feature Selection on Matrix Data with Application to Gene-Expression Analysis". 54th Session of the International Statistical Institute.
^ Hochreiter, S.; Obermayer, K. (2004). "Gene Selection for Microarray Data". Kernel Methods in Computational Biology. MIT Press: 319–355.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1074/mcp.M110.004994, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1074/mcp.M110.004994 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1162/neco.1997.9.1.1, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1162/neco.1997.9.1.1 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1162/neco.1997.9.8.1735, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1162/neco.1997.9.8.1735 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1093/bioinformatics/btm247, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1093/bioinformatics/btm247 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1007/3-540-44668-0_13, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1007/3-540-44668-0_13 instead.

Software Downloads

Genetics:
- hapFabia: Software for identification of very short segments of identity by descent (IBD) characterized by rare variants in large sequencing data (R package)
Next Generation Sequencing (RNA-seq, copy numbers):
- DEXUS: Identifying Differential Expression in RNA-Seq Studies with Unknown Conditions (R package)
- cn.MOPS: Mixture Of PoissonS for discovering Copy Number variations in next generation sequencing data (R package)
Chemoinformatics
- Rchemcpp: An R package for computing the similarity of molecules (R package)
Microarray Probe Level Analysis (mRNA, copy numbers):
- FARMS and I/NI calls: Factor Analysis for Robust Microarray Summarization (R package)
- cn.FARMS: a latent variable model to detect copy number variations in microarray data (R package)
Biclustering / Clustering / Segmentation
Protein Structure
- PrOCoil: Predicting the Oligomerization of Coiled Coil Proteins (R package)
- LSTM(protein): Long Short-Term Memory for Protein classification (Java package)
Support Vector Machines
- PSVM: Potential Support Vector Machine for classification, regression and feature extraction also with non-positive definite kernels (C++ package)
Neural Networks
- LSTM: Long Short-Term Memory software for the state of the art recurrent neural network (C package)
- FMS: Flat Minimum Search software for regularizing neural networks (C package)

External references and sources

External links

Template:Persondata

[1] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 24174545, please use {{cite journal}} with |pmid=24174545 instead.

[2] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1101/003988, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1101/003988 instead.

[3] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 25150838, please use {{cite journal}} with |pmid=25150838 instead.

[4] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 25254650, please use {{cite journal}} with |pmid=25254650 instead.

[5] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 22302147, please use {{cite journal}} with |pmid=22302147 instead.

[6] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 24049071, please use {{cite journal}} with |pmid=24049071 instead.

[7] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 23144837, please use {{cite journal}} with |pmid=23144837 instead.

[8] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 16473874, please use {{cite journal}} with |pmid=16473874 instead.

[9] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 17921172, please use {{cite journal}} with |pmid=17921172 instead.

[10] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 21059952, please use {{cite journal}} with |pmid=21059952 instead.

[11] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.2202/1544-6115.1460, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.2202/1544-6115.1460 instead.

[12] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 21486749, please use {{cite journal}} with |pmid=21486749 instead.

[13] Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 20418340, please use {{cite journal}} with |pmid=20418340 instead.

[14] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1162/neco.2006.18.6.1472, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1162/neco.2006.18.6.1472 instead.

[15] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1162/neco.2008.20.1.271, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1162/neco.2008.20.1.271 instead.

[16] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1007/978-3-540-35488-8_20, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1007/978-3-540-35488-8_20 instead.

[17] Hochreiter, S.; Obermayer, K. (2003). "Classification and Feature Selection on Matrix Data with Application to Gene-Expression Analysis". 54th Session of the International Statistical Institute.

[18] Hochreiter, S.; Obermayer, K. (2004). "Gene Selection for Microarray Data". Kernel Methods in Computational Biology. MIT Press: 319–355.

[19] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1074/mcp.M110.004994, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1074/mcp.M110.004994 instead.

[20] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1162/neco.1997.9.1.1, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1162/neco.1997.9.1.1 instead.

[21] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1162/neco.1997.9.8.1735, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1162/neco.1997.9.8.1735 instead.

[22] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1093/bioinformatics/btm247, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1093/bioinformatics/btm247 instead.

[23] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1007/3-540-44668-0_13, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1007/3-540-44668-0_13 instead.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]