Topological data analysis

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In applied mathematics, Topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric and provides dimension reduction and robustness to noise. Beyond, it inherits functorality, one of the keys to the modern mathematics, from its topological nature, which makes it adaptive to new tools from mathematics.

The initial motivation is to study the shape of data. TDA has combined algebraic topology and other tools from pure mathematics to give mathematically rigorous and quantitative study of "shape". The main tool is persistent homology, an adaptation of homology to point cloud data. Persistent homology has been applied to many types of data across many fields. Moreover, its mathematical foundation is also of theoretical importance. The unique features of TDA make it a promising bridge between topology and geometry.

Basic theory[edit]

Intuition[edit]

The premise underlying TDA is that shape matters. Real data in high dimensions is nearly always sparse, and tends to have relevant low dimensional features. The task of TDA is to precisely characterize this observation. One illustrative example is a predator-prey system governed by a Lotka-Volterra equation.[1] One can easily observe that data forms a closed circle. TDA provides tools to detect and quantify such recurrent motion.[2]

Without prior domain knowledge, the correct threshold[clarification needed] for a data set is difficult to choose. The main insight of persistent homology is that we can use the information obtained from all values of a parameter. Of course this insight alone is easy to make; the hard part is encoding this huge amount of information into an understandable and easy-to-represent way. With TDA, there is a mathematical interpretation when the information is a homology group. In general, the assumption is that features that persist for a wide range of parameters are "true" features. Features persisting for only a short period are presumed to be noise, although the theoretical justification for this is unclear.[3]

Early history[edit]

Precursors to the full concept of persistent homology appeared gradually over time.[4] In 1990, Patrizio Frosini introduced the size function, which is equivalent to the 0th persistent homology.[5] Nearly a decade later, Vanessa Robins studied the images of homomorphisms induced by inclusion.[6] Finally, shortly thereafter, Edelsbrunner et al. introduced the concept of persistent homology together with an efficient algorithm and its presentation as persistence diagram.[7] Carlsson et al. reformulated the initial definition and gave an equivalent visualization method called persistence barcodes,[8] interpreting persistence in the language of commutative algebra.[9]

Concepts[edit]

Some widely used concepts are introduced below. Note that some definitions may vary from author to author.

A point cloud is often defined as a finite set of points in some Euclidean space, but may be taken to be any finite metric space.

The Čech complex of a point cloud is the nerve of the cover of balls of a fixed radius around each point in the cloud.

A persistence module indexed by is a vector space for each , and a linear map whenever , such that for all and whenever [10] An equivalent definition is a functor from considered as a partially ordered set to the category of vector spaces.

The persistent homology group of a persistence module is defined as , where is the Čech complex of radius of the point cloud and is the homology group.

A persistence barcode is a multiset of intervals in , and a persistence diagram is a multiset of points in ().

The Wasserstein distance between two persistence diagrams and is defined as

where and ranges over bijections between and . Please refer to figure 3.1 in Munch [11] for illustration.

The bottleneck distance between and is

This is a special case of Wasserstein distance, letting .

Basic property[edit]

Structure theorem[edit]

The first classification theorem for persistent homology appeared in 2005:[9] for a finitely generated persistence module with field coefficients,

Intuitively, the free parts correspond to the homology generators that appear at filtration level and never disappear, while the torsion parts correspond to those that appear at filtration level and last for steps of the filtration (or equivalently, disappear at filtration level ).

Persistent homology is visualized through a barcode or persistence diagram. The barcode has its root in abstract mathematics, though not at first sight; essentially, the derived category of chain complexes over a field is equivalent to the graded category of vector spaces.[12]

Stability[edit]

Stability is desirable because it provides robustness against noise. If is any space which is homeomorphic to a simplicial complex, and are continuous tame functions, then the persistence vector spaces and are finitely presented, and , where refers to the bottleneck distance.[13]

Workflow[edit]

The basic workflow in TDA is:[14]

point cloud nested complexes persistence module barcode or diagram
  1. If is a point cloud, replace with a nested family of simplicial complexes (such as the Čech or Vietoris-Rips complex). This process converts the point cloud into a filtration of simplicial complexes. Taking the homology of each complex in this filtration gives a persistence module
  2. Apply the structure theorem to provide a parameterized version of Betti number, persistence diagram, or equivalently, barcode.

Graphically speaking,

A usual use of persistence in TDA [15]

Computation[edit]

The first algorithm for persistent homology over was given by Edelsbrunner et al.[7] Carlsson et al. gave the first practical algorithm to compute the persistent homology over all fields.[9] Edelsbrunner and Harer's book gives general guidance on computational topology.[16]

One issue that arises in computation is the choice of complex. The Čech complex and Vietoris-Rips complex are most natural at first glance; however, their size grows rapidly with the number of data points. The Vietoris-Rips complex is preferred over Čech complex because its definition is simpler and the Čech complex requires extra effort to define in a general finite metric space. Efficient ways to lower the computational cost of homology have been studied. For example, the α-complex and witness complex are used to reduce the dimension and size of complexes.[17]

Recently, discrete Morse theory has shown promise for computational homology because it can reduce a given simplicial complex to a much smaller cellular complex which is homotopic to the original one.[18] This reduction can in fact be performed as the complex is constructed by using matroid theory, leading to further performance increases.[19] Another recent algorithm saves time by ignoring the homology classes with low persistence.[20]

Various software packages are available, such as javaPlex, Dionysus, Perseus, PHAT, DIPHA, and Gudhi. A comparison between these tool is done by Otter et al.[21] Also, an R package TDA is capable of calculating recently invented concepts like landscape and the kernel distance estimator.[22]

Visualization[edit]

High-dimensional data is impossible to visualize directly. Many methods have been invented to extract a low-dimensional structure from the data set, such as principal component analysis and multidimensional scaling.[23] However, it is important to note that the problem itself is ill-posed, since many different topological features can be found in the same data set. Thus, the study of visualization of high-dimensional spaces is of central importance to TDA, although it does not necessarily involve the use of persistent homology. However, recent attempts have been made to use persistent homology in data visualization.[24]

Carlsson et al. have proposed a general method called MAPPER.[25] It inherits the idea of Serre that a covering preserves homotopy.[26] A generalized formulation of MAPPER is as follows:

Let and be topological spaces and let be a continuous map. Let be a finite open covering of . The output of MAPPER is the nerve of the pullback cover , where each preimage is split into its connected components.[24] This is a very general concept, of which the Reeb graph and merge trees are special cases.

This is not quite the original definition.[25] Carlsson et al. choose to be or , and cover it with open sets such that at most two intersect.[3] This restriction means that the output is in the form of a complex network. Because the topology of a finite point cloud is trivial, clustering methods (such as single linkage) are used to produce the analogue of connected sets in the preimage when MAPPER is applied to actual data.

Mathematically speaking, MAPPER is a variation of the Reeb graph. If the is at most one dimensional, then for each ,

[27] The added flexibility also has disadvantages. One problem is instability, in that some change of the choice of the cover can lead to major change of the output of the algorithm.[28] Work has been done to overcome this problem.[24]

Three successful applications of MAPPER can be found in Carlsson et al.[29] A comment on the applications in this paper by J. Curry is that "a common feature of interest in applications is the presence of flares or tendrils."[30]

A free implementation of MAPPER is available online written by Daniel Müllner and Aravindakshan Babu. MAPPER also forms the basis of Ayasdi's data visualization platform.

Mathematical foundation[edit]

Although persistent homology is a "21st century child of algebraic homology", its mathematical foundation has been vastly developed. An incomplete list of active mathematicians working on this field may include leading figures Gunnar Carlsson, Herbert Edelsbrunner, Vin De Silva, Peter Bubenik, Frédéric Chazal, Robert Ghrist, and rising scholars such as Michael Lesnick, Justin Curry, Jonathan Scott.

Multidimensional persistence[edit]

Multidimensional persistence is important to TDA. The concept arises in both theory and practice. The first investigation of multidimensional persistence was early in the development of TDA,[31] and is one of the founding papers of TDA.[9] The first application to appear in the literature is a method for shape comparison, similar to the invention of TDA.[32]

The definition of an n-dimensional persistence module in is[30]

  • vector space is assigned to each point in
  • map is assigned if (
  • maps satisfy for all

It might be worth noting that there are controversies on the definition of multidimensional persistence.[30]

One of the advantages of one-dimensional persistence is its representability by a diagram or barcode. However, discrete complete invariants of multidimensional persistence modules do not exist.[33] The main reason for this is that the structure of the collection of indecomposables is extremely complicated by Gabriel's theorem in the theory of quiver representations,[34] although a finitely n-dim persistence module can be uniquely decomposed into a direct sum of indecomposables due to the Kull-Schmidt theorem.[35]

Nonetheless, many results have been established. Carlsson et al. introduced the rank invariant , defined as the , in which is a finitely generated n-graded module. In one dimension, it is equivalent to the barcode. In the literature, the rank invariant is often referred as the persistent Betti numbers (PBNs).[16] In many theoretical works, authors have used a more restricted definition, an analogue from sublevel set persistence. Specifically, the persistence Betti numbers of a function are given by the function , taking each to , where and .

Some basic properties include monotonicity and diagonal jump.[36] Persistent Betti numbers will be finite if is a compact and locally contractible subspace of .[37]

Using a foliation method, the k-dim PBNs can be decomposed into a family of 1-dim PBNs by dimensionality deduction.[38] This method has also led to a proof that multi-dim PBNs are stable.[39] The discontinuities of PBNs only occur at points where either is a discontinuous point of or is a discontinuous point of under the assumption that and is a compact, triangulable topological space.[40]

Persistent space, a generalization of persistent diagram, is defined as the multiset of all points with multiplicity larger than 0 and the diagonal.[41] It provides a stable and complete representation of PBNs. An ongoing work by Carlsson et al. is trying to give geometric interpretation of persistent homology, which might provide insights on how to combine machine learning theory with topological data analysis.[42]

The first practical algorithm to compute multidimensional persistence was invented very early.[43] After then, many other algorithms have been proposed, based on such concepts as discrete morse theory[44] and finite sample estimating.[45]

Other persistences[edit]

The standard paradigm in TDA is often referred as sublevel persistence. Apart from multidimensional persistence, many works have been done to extend this special case.

Zigzag persistence[edit]

The nonzero maps in persistence module are restricted by the preorder relationship in the category. However, mathematicians have found that the unanimousness of direction is not essential to many results. "The philosophical point is that the decomposition theory of graph representations is somewhat independent of the orientation of the graph edges".[46] Zigzag persistence is important to the theoretical side. The examples given in Carlsson's review paper to illustrate the importance of functorality all share some of its features.[3]

Extended persistence and levelset persistence[edit]

Some attempts is to lose the stricter restriction of the function.[47] Please refer to the Categorization and cosheaf and Impact on mathematics sections for more information.

It's natural to extend persistence homology to other basic concepts in algebraic topology, such as cohomology and relative homology/cohomology.[48] An interesting application is the computation of circular coordinates for a data set via the first persistent cohomology group.[49]

Circular persistence[edit]

Normal persistence homology studies real-valued functions. The circle-valued map might be useful, "persistence theory for circle-valued maps promises to play the role for some vector fields as does the standard persistence theory for scalar fields", as commented in D. Burghelea et al.[50] The main difference is that Jordan cells(very similar in format to the ones in linear algebra) are nontrivial in circle-valued functions, which would be zero in real-valued case, and combing with barcodes give the invariants of a tame map, under moderate conditions.[50]

Two techniques they use are More-Novikov theory[51] and graph representation theory.[52] More recent results can be found in D. Burghelea et al.[53] For example, the tameness requirement can be replaced by the much weaker condition, continuous.

Persistence with torsion[edit]

The proof of the structure theorem relies on the base domain being field, so not many attempts have been made on persistence homology with torsion. Frosini defined a pseudometric on this specific module and proved its stability.[54] One of its novelty is that it doesn't depend on some classification theory to define the metric.[55]

Categorization and cosheaves[edit]

One advantage of category theory is its ability to lift concrete results to a higher level, showing relationships between seemingly unconnected objects. Bubenik et al.[56] offers a short introduction of category theory fitted for TDA.

Category theory is the language of modern algebra, and has been widely used in the study of algebraic geometry and topology. It has been noted that "the key observation of [9] is that the persistence diagram produced by [7] depends only on the algebraic structure carried by this diagram."[57] The use of category theory in TDA has proved to be fruitful.[56][57]

Following the notations made in Bubenik et al.,[57] the indexing category is any preordered set (not necessarily or ), the target category is any category (instead of the commonly used ), and functors are called generalized persistence modules in , over .

One advantage of using category theory in TDA is a clearer understanding of concepts and the discovery of new relationships between proofs. Take two examples for illustration. The understanding of the correspondence between interleaving and matching is of huge importance, since matching has been the method used in the beginning (modified from Morse theory). A summary of works can be found in Vin de Silva et al.[58] Many theorems can be proved much more easily in a more intuitive setting.[55] Another example is the relationship between the construction of different complexes from point clouds. It has long been noticed that Čech and Vietoris-Rips complexes are related. Specifically, .[59] The essential relationship between Cech and Rips complexes can be seen much more clearly in categorical language.[58]

The language of category theory also helps cast results in terms recognizable to the broader mathematical community. Bottleneck distance is widely used in TDA because of the results on stability with respect to the bottleneck distance.[10][13] In fact, the interleaving distance is the terminal object in a poset category of stable metrics on multidimensional persistence modules in a prime field.[55][60]

Sheaves, a central concept in modern algebraic geometry, are intrinsically related to category theory. Roughly speaking, sheaves are the mathematical tool for understanding how local information determines global information. Justin Curry regards level set persistence as the study of fibers of continuous functions. The objects that he studies are very similar to those by MAPPER, but with sheaf theory as the theoretical foundation.[30] Although no breakthrough in the theory of TDA has yet used sheaf theory, it is promising since there are many beautiful theorems in algebraic geometry relating to sheaf theory. For example, a natural theoretical question is whether different filtration methods result in the same output.[61]

Stability[edit]

Stability is of central importance to data analysis, since real data carry noises. By usage of category theory, Bubenik et al. have distinguished between soft and hard stability theorems, and proved that soft cases are formal.[57] Specifically, general workflow of TDA is

data generalized persistence module generalize persistence module discrete invariant

Soft stability theorem asserts that is Lipschitz, and hard stability theorem asserts that is Lipschitz.

Bottleneck distance is widely used in TDA. The isometry theorem asserts that the interleaving distance is equal to the bottleneck distance.[55] Bubenik et al. have abstracted the definition to that between functors when is equipped with a sublinear projection or superlinear family, in which still remains a pseudometric.[57] Considering the magnificent characters of interleaving distance,[62] here we introduce the general definition of interleaving distance(instead of the first introduced one):[10] Let (a function from to which is monotone and satisfies for all ). A -interleaving between F and F consists of natural transformations and , such that and .

The two main results are[57]

  • Let be a preordered set with a sublinear projection or superlinear family. Let be a functor between arbitrary categories . Then for any two functors , we have .
  • Let be a poset of a metric space , be a topological space. And let (not necessarily continuous) be functions, and to be the corresponding persistence diagram. Then .

These two results summarize many results on stability of different models of persistence.

For the stability theorem of multidimensional persistence, please refer to the subsection of persistence.

Structure theorem[edit]

The structure theorem is of central importance to TDA; as commented by G. Carlsson, "what makes homology useful as a discriminator between topological spaces is the fact that there is a classification theorem for finitely generated abelian groups."[3]

The main argument used in the proof of the original structure theorem is the standard classification theorem for finitely generated modules over a principal ideal domain.[9] However, this argument fails in since is not a PID. Carlsson gave a detailed discussion in the most influential review paper in TDA.[3]

In general, not every persistence module can be decomposed into intervals.[63] Many attempts have been made loosing the assumptions.[clarification needed] The case for pointwise dimensional persistence modules indexed by a locally finite subset of is solved based on the work of Webb.[64] The most notable result is done by Crawley-Boevey, which solved the case of . Crawley-Boevey's theorem states that any pointwise finite-dimensional persistence module is a direct sum of interval modules.[65]

To understand the definition of his theorem, some concepts need introducing. An interval in is defined as a subset having the property that if and if there is an such that , then as well. An interval module assigns to each element the vector space and assigns the zero vector space to elements in . All maps are the zero map, unless and , in which case is the identity map.[30] Interval modules are indecomposable.[66]

Although this is a very powerful theorem, it still doesn't extend to the q-tame case.[63] A persistence module is q-tame if the rank() is finite for all . There are examples that q-tame persistence module fails to be pointwise finite.[67] However, it turns out that similar structure theorem still exists if the features that exist only at one index value are removed.[66] Actually, the infinite dimension wouldn't persist.[68] Specifically, the observable category is defined as , in which denotes the full subcategory of whose objects are the ephemeral modules( whenever ).[66]

Note that all these extended results listed here don't apply to the zigzag persistence. There is some work on the stability of zigzag persistence.[69]

Statistics[edit]

Real data is always finite, thus the study of it is stochastic in essence. To distinguish between true nature and artifacts is the power of statistics. Note that persistent homology has no mechanism to distinguish between low-probability features and high-probability features.

One way of statistics is to study statistical properties of summaries on topological features of point cloud. A reference of the works done on "the study of random abstract simplicial complexes generated from stochastic processes and non-asymptotic bounds on the convergence or consistency of topological summaries as the number of points increase" can be found in K. Turner et al.[70]

Another way, and also the more important one, is to study the probability distribution on the persistence space. The persistence space is the , where are all the barcodes containing exactly intervals and the equivalences are if that .[71] This space is fairly complicated, for example, not complete endowed with the bottleneck metric. The first attempt made on is by Y. Mileyko et al.[72] The space of persistence diagrams in their paper is defined as

where is the diagonal line in . A very nice property is that is complete and separable in the Wasserstein metric. Expectation and variance and conditional probability, three basic concepts in probability theory, can be defined in the Fréchet sense. Since then, many statistical tools are converted into TDA. Works on null hypothesis significance test,[73] confidence interval,[74] and robust estimate[75] are notable steps.

An interesting concept, persistent landscape, invented by Peter Bubenik, has led another direction, namely the different representation of barcode.[76] The persistent landscape over a persistent module is defined as a function , , where denotes the extended real line and . While it inherits all good properties of barcode representation (stability, easy representation, etc.), its space is very nice: not only statistical inferences can be defined, some problems in Y. Mileyko et al.'s work, such as the expectation is not necessarily unique,[72] can be overcome. Effective algorithm is available.[77] Another approach is to use revised persistence, which is Image, kernel and cokernel persistence.[78]

Applications[edit]

Classification of applications[edit]

More than one way exist to classify the applications of TDA. Perhaps the most natural way is by field. A very incomplete list of successful applications includes [79] data skeletonization,[80] shape study,[81] graph reconstruction,[82][83][84] [85] image analysis, [86][87] material,[88] progression analysis of disease,[89][90] sensor network,[59] signal analysis,[91] cosmic web,[92] complex network,[93][94][95][96] fractal geometry,[97] viral evolution,[98] and the propagation of contagions on networks. [99]

Another way is by distinguishing the techniques by G. Carlsson,[71]

one being the study of homological invariants of data one individual data sets, and the other is the use of homological invariants in the study of databases where the data points themselves have geometric structure.

Characteristics of TDA in applications[edit]

Ayasdi is a data analysis company relying heavily on TDA, cofounded by a number of leading researchers in the field. There are several notable interesting features of the recent applications of TDA:

  1. Combining tools from all main branches of mathematics. Besides the obvious need for algebra and topology, partial differential equations,[100] algebraic geometry,[33] presentation theory,[46] statistics, combinatorics, and Riemannian geometry[68] have all found use in TDA.
  2. Quantitative analysis. Topology is considered to be very soft since many concepts are invariant under homotopy. However, persistent topology is able to record the birth(appearance) and death(disappearance) of topological feature, thus extra geometric information is embedded in it. One evidence in theory is a partially positive result on the uniqueness of reconstruction of curves;[101] two in application are on the quantitative analysis of Fullerene stability and quantitative analysis of self-similarity, separately.[97][102]
  3. The role of short persistence. Short persistence has also been found to be useful, despite the common belief that noise is the cause of the phenomena.[103] This is interesting to the mathematical theory.

One of the main fields of data analysis today is machine learning. Some examples of machine learning in TDA can be found in Adcock et al.[104] A conference is dedicated to the link between TDA and machine learning. In order to apply tools from machine leaning, the information obtained from TDA should be represented in vector form. An ongoing and promising attempt is the persistence landscape discussed above. Another attempt uses the concept of persistence images.[105] However, one problem of this method is the loss of stability, since the hard stability theorem depends on the barcode representation.

Impact on mathematics[edit]

Topological data analysis and persistent homology have had impacts on Morse theory. Morse theory has played a very important role in the theory of TDA, including on computation. Some work in persistent homology has extended results about Morse functions to tame functions or, even to continuous functions. A forgotten result of R. Deheuvels long before the invention of persistent homology extends Morse theory to all continuous functions.[106]

One recent result is that the category of Reeb graphs is equivalent to a particular class of cosheaf.[107] This is motivated by theoretical work in TDA, since the Reeb graph is related to Morse theory and MAPPER is derived from it. The proof of this theorem relies on the interleaving distance.

It is evident to mathematicians that persistent homology is closely related to spectral sequences.[108] Zigzag persistence may turn out to be of theoretical importance to spectral sequences.

See also[edit]

References[edit]

  1. ^ Epstein, Charles; Carlsson, Gunnar; Edelsbrunner, Herbert (2011-12-01). "Topological data analysis". Inverse Problems. 27 (12): 120201. doi:10.1088/0266-5611/27/12/120201. 
  2. ^ "http://www.diva-portal.org/smash/record.jsf?pid=diva2%253A575329&dswid=4297". www.diva-portal.org. Archived from the original on November 19, 2015. Retrieved 2015-11-05.  External link in |title= (help)
  3. ^ a b c d e Carlsson, Gunnar (2009-01-01). "Topology and data". Bulletin of the American Mathematical Society. 46 (2): 255–308. doi:10.1090/S0273-0979-09-01249-X. ISSN 0273-0979. 
  4. ^ Edelsbrunner H. Persistent homology: theory and practice[J]. 2014.
  5. ^ Frosini, Patrizio (1990-12-01). "A distance for similarity classes of submanifolds of a Euclidean space". Bulletin of the Australian Mathematical Society. 42 (03): 407–415. doi:10.1017/S0004972700028574. ISSN 1755-1633. 
  6. ^ Robins V. Towards computing homology from finite approximations[C]//Topology proceedings. 1999, 24(1): 503-532.
  7. ^ a b c "Topological Persistence and Simplification". Discrete & Computational Geometry. 28 (4): 511–533. 2002-11-01. doi:10.1007/s00454-002-2885-2. ISSN 0179-5376. 
  8. ^ Carlsson, Gunnar; Zomorodian, Afra; Collins, Anne; Guibas, Leonidas J. (2005-12-01). "Persistence barcodes for shapes". International Journal of Shape Modeling. 11 (02): 149–187. doi:10.1142/S0218654305000761. ISSN 0218-6543. 
  9. ^ a b c d e f Zomorodian, Afra; Carlsson, Gunnar (2004-11-19). "Computing Persistent Homology". Discrete & Computational Geometry. 33 (2): 249–274. doi:10.1007/s00454-004-1146-y. ISSN 0179-5376. 
  10. ^ a b c Chazal, Frédéric; Cohen-Steiner, David; Glisse, Marc; Guibas, Leonidas J.; Oudot, Steve Y. (2009-01-01). "Proximity of Persistence Modules and Their Diagrams". Proceedings of the Twenty-fifth Annual Symposium on Computational Geometry. SCG '09. New York, NY, USA: ACM: 237–246. doi:10.1145/1542362.1542407. ISBN 978-1-60558-501-7. 
  11. ^ Munch E. Applications of persistent homology to time varying systems[D]. Duke University, 2013.
  12. ^ Weibel, Charles A. (1995-10-27). An Introduction to Homological Algebra. Cambridge University Press. ISBN 9780521559874. 
  13. ^ a b Cohen-Steiner, David; Edelsbrunner, Herbert; Harer, John (2006-12-12). "Stability of Persistence Diagrams". Discrete & Computational Geometry. 37 (1): 103–120. doi:10.1007/s00454-006-1276-5. ISSN 0179-5376. 
  14. ^ Ghrist, Robert (2008-01-01). "Barcodes: The persistent topology of data". Bulletin of the American Mathematical Society. 45 (1): 61–75. doi:10.1090/S0273-0979-07-01191-3. ISSN 0273-0979. 
  15. ^ Chazal, Frédéric; Glisse, Marc; Labruère, Catherine; Michel, Bertrand (2013-05-27). "Optimal rates of convergence for persistence diagrams in Topological Data Analysis". arXiv:1305.6239free to read. 
  16. ^ a b Edelsbrunner, Herbert; Harer, John (2010-01-01). Computational Topology: An Introduction. American Mathematical Soc. ISBN 9780821849255. 
  17. ^ De Silva, Vin; Carlsson, Gunnar (2004-01-01). "Topological Estimation Using Witness Complexes". Proceedings of the First Eurographics Conference on Point-Based Graphics. SPBG'04. Aire-la-Ville, Switzerland, Switzerland: Eurographics Association: 157–166. doi:10.2312/SPBG/SPBG04/157-166. ISBN 3-905673-09-6. 
  18. ^ Mischaikow, Konstantin; Nanda, Vidit (2013-07-27). "Morse Theory for Filtrations and Efficient Computation of Persistent Homology". Discrete & Computational Geometry. 50 (2): 330–353. doi:10.1007/s00454-013-9529-6. ISSN 0179-5376. 
  19. ^ Henselman, Gregory; Ghrist, Robert (1 Jun 2016). "Matroid Filtrations and Computational Persistent Homology". arXiv:1606.00199free to read. 
  20. ^ Chen, Chao; Kerber, Michael (2013-05-01). "An output-sensitive algorithm for persistent homology". Computational Geometry. 27th Annual Symposium on Computational Geometry (SoCG 2011). 46 (4): 435–447. doi:10.1016/j.comgeo.2012.02.010. 
  21. ^ Otter, Nina; Porter, Mason A.; Tillmann, Ulrike; Grindrod, Peter; Harrington, Heather A. (2015-06-29). "A roadmap for the computation of persistent homology". arXiv:1506.08903free to read. 
  22. ^ Fasy, Brittany Terese; Kim, Jisu; Lecci, Fabrizio; Maria, Clément (2014-11-07). "Introduction to the R package TDA". arXiv:1411.1830free to read. 
  23. ^ Liu S, Maljovec D, Wang B, et al. Visualizing High-Dimensional Data: Advances in the Past Decade[J].
  24. ^ a b c Dey, Tamal K.; Memoli, Facundo; Wang, Yusu (2015-04-14). "Mutiscale Mapper: A Framework for Topological Summarization of Data and Maps". arXiv:1504.03763free to read. 
  25. ^ a b "Download Limit Exceeded". citeseerx.ist.psu.edu. Retrieved 2015-11-02. 
  26. ^ Bott, Raoul; Tu, Loring W. (2013-04-17). Differential Forms in Algebraic Topology. Springer Science & Business Media. ISBN 9781475739510. 
  27. ^ Curry, Justin (2013-03-13). "Sheaves, Cosheaves and Applications". arXiv:1303.3255free to read. 
  28. ^ Liu, Xu; Xie, Zheng; Yi, Dongyun (2012-01-01). "A fast algorithm for constructing topological structure in large data". Homology, Homotopy and Applications. 14 (1): 221–238. doi:10.4310/hha.2012.v14.n1.a11. ISSN 1532-0073. 
  29. ^ Lum, P. Y.; Singh, G.; Lehman, A.; Ishkanov, T.; Vejdemo-Johansson, M.; Alagappan, M.; Carlsson, J.; Carlsson, G. (2013-02-07). "Extracting insights from the shape of complex data using topology". Scientific Reports. 3. doi:10.1038/srep01236. PMC 3566620free to read. PMID 23393618. 
  30. ^ a b c d e Curry, Justin (2014-11-03). "Topological Data Analysis and Cosheaves". arXiv:1411.0613free to read. 
  31. ^ Frosini P, Mulazzani M. Size homotopy groups for computation of natural size distances[J]. Bulletin of the Belgian Mathematical Society Simon Stevin, 1999, 6(3): 455-464.
  32. ^ Biasotti, S.; Cerri, A.; Frosini, P.; Giorgi, D.; Landi, C. (2008-05-17). "Multidimensional Size Functions for Shape Comparison". Journal of Mathematical Imaging and Vision. 32 (2): 161–179. doi:10.1007/s10851-008-0096-z. ISSN 0924-9907. 
  33. ^ a b Carlsson, Gunnar; Zomorodian, Afra (2009-04-24). "The Theory of Multidimensional Persistence". Discrete & Computational Geometry. 42 (1): 71–93. doi:10.1007/s00454-009-9176-0. ISSN 0179-5376. 
  34. ^ Derksen H, Weyman J. Quiver representations[J]. Notices of the AMS, 2005, 52(2): 200-206.
  35. ^ Atiyah M F. On the Krull-Schmidt theorem with application to sheaves[J]. Bulletin de la Société Mathématique de France, 1956, 84: 307-317.
  36. ^ Cerri A, Di Fabio B, Ferri M, et al. Multidimensional persistent homology is stable[J]. arXiv:0908.0064, 2009.
  37. ^ Cagliari, Francesca; Landi, Claudia (2011-04-01). "Finiteness of rank invariants of multidimensional persistent homology groups". Applied Mathematics Letters. 24 (4): 516–518. doi:10.1016/j.aml.2010.11.004. 
  38. ^ Cagliari, Francesca; Di Fabio, Barbara; Ferri, Massimo (2010-01-01). "One-dimensional reduction of multidimensional persistent homology". Proceedings of the American Mathematical Society. 138 (8): 3003–3017. doi:10.1090/S0002-9939-10-10312-8. ISSN 0002-9939. 
  39. ^ Cerri, Andrea; Fabio, Barbara Di; Ferri, Massimo; Frosini, Patrizio; Landi, Claudia (2013-08-01). "Betti numbers in multidimensional persistent homology are stable functions". Mathematical Methods in the Applied Sciences. 36 (12): 1543–1557. doi:10.1002/mma.2704. ISSN 1099-1476. 
  40. ^ Cerri, Andrea; Frosini, Patrizio (2015-03-15). "Necessary conditions for discontinuities of multidimensional persistent Betti numbers". Mathematical Methods in the Applied Sciences. 38 (4): 617–629. doi:10.1002/mma.3093. ISSN 1099-1476. 
  41. ^ Cerri, Andrea; Landi, Claudia (2013-03-20). Gonzalez-Diaz, Rocio; Jimenez, Maria-Jose; Medrano, Belen, eds. The Persistence Space in Multidimensional Persistent Homology. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 180–191. doi:10.1007/978-3-642-37067-0_16. ISBN 978-3-642-37066-3. 
  42. ^ Skryzalin, Jacek; Carlsson, Gunnar (2014-11-14). "Numeric Invariants from Multidimensional Persistence". arXiv:1411.4022free to read. 
  43. ^ Carlsson, Gunnar; Singh, Gurjeet; Zomorodian, Afra (2009-12-16). Dong, Yingfei; Du, Ding-Zhu; Ibarra, Oscar, eds. Computing Multidimensional Persistence. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 730–739. doi:10.1007/978-3-642-10631-6_74. ISBN 978-3-642-10630-9. 
  44. ^ Allili, Madjid; Kaczynski, Tomasz; Landi, Claudia (2013-10-30). "Reducing complexes in multidimensional persistent homology theory". arXiv:1310.8089free to read. 
  45. ^ Cavazza N, Ferri M, Landi C. Estimating multidimensional persistent homology through a finite sampling[J]. 2010.
  46. ^ a b Carlsson, Gunnar; Silva, Vin de (2010-04-21). "Zigzag Persistence". Foundations of Computational Mathematics. 10 (4): 367–405. doi:10.1007/s10208-010-9066-0. ISSN 1615-3375. 
  47. ^ Cohen-Steiner, David; Edelsbrunner, Herbert; Harer, John (2008-04-04). "Extending Persistence Using Poincaré and Lefschetz Duality". Foundations of Computational Mathematics. 9 (1): 79–103. doi:10.1007/s10208-008-9027-z. ISSN 1615-3375. 
  48. ^ de Silva, Vin; Morozov, Dmitriy; Vejdemo-Johansson, Mikael. "Dualities in persistent (co)homology". Inverse Problems. 27 (12): 124003. doi:10.1088/0266-5611/27/12/124003. 
  49. ^ Silva, Vin de; Morozov, Dmitriy; Vejdemo-Johansson, Mikael (2011-03-30). "Persistent Cohomology and Circular Coordinates". Discrete & Computational Geometry. 45 (4): 737–759. doi:10.1007/s00454-011-9344-x. ISSN 0179-5376. 
  50. ^ a b Burghelea, Dan; Dey, Tamal K. (2013-04-09). "Topological Persistence for Circle-Valued Maps". Discrete & Computational Geometry. 50 (1): 69–98. doi:10.1007/s00454-013-9497-x. ISSN 0179-5376. 
  51. ^ Novikov S P. Quasiperiodic structures in topology[C]//Topological methods in modern mathematics, Proceedings of the symposium in honor of John Milnor’s sixtieth birthday held at the State University of New York, Stony Brook, New York. 1991: 223-233.
  52. ^ Gross, Jonathan L.; Yellen, Jay (2004-06-02). Handbook of Graph Theory. CRC Press. ISBN 9780203490204. 
  53. ^ Burghelea, Dan; Haller, Stefan (2015-06-04). "Topology of angle valued maps, bar codes and Jordan blocks". arXiv:1303.4328free to read. 
  54. ^ Frosini, Patrizio (2012-06-23). "Stable Comparison of Multidimensional Persistent Homology Groups with Torsion". Acta Applicandae Mathematicae. 124 (1): 43–54. doi:10.1007/s10440-012-9769-0. ISSN 0167-8019. 
  55. ^ a b c d Lesnick, Michael (2015-03-24). "The Theory of the Interleaving Distance on Multidimensional Persistence Modules". Foundations of Computational Mathematics. 15 (3): 613–650. doi:10.1007/s10208-015-9255-y. ISSN 1615-3375. 
  56. ^ a b Bubenik, Peter; Scott, Jonathan A. (2014-01-28). "Categorification of Persistent Homology". Discrete & Computational Geometry. 51 (3): 600–627. doi:10.1007/s00454-014-9573-x. ISSN 0179-5376. 
  57. ^ a b c d e f Bubenik, Peter; Silva, Vin de; Scott, Jonathan (2014-10-09). "Metrics for Generalized Persistence Modules". Foundations of Computational Mathematics. 15 (6): 1501–1531. doi:10.1007/s10208-014-9229-5. ISSN 1615-3375. 
  58. ^ a b de Silva, Vin; Nanda, Vidit (2013-01-01). "Geometry in the Space of Persistence Modules". Proceedings of the Twenty-ninth Annual Symposium on Computational Geometry. SoCG '13. New York, NY, USA: ACM: 397–404. doi:10.1145/2462356.2462402. ISBN 978-1-4503-2031-3. 
  59. ^ a b De Silva V, Ghrist R. Coverage in sensor networks via persistent homology[J]. Algebraic & Geometric Topology, 2007, 7(1): 339-358.
  60. ^ d’Amico, Michele; Frosini, Patrizio; Landi, Claudia (2008-10-14). "Natural Pseudo-Distance and Optimal Matching between Reduced Size Functions". Acta Applicandae Mathematicae. 109 (2): 527–554. doi:10.1007/s10440-008-9332-1. ISSN 0167-8019. 
  61. ^ Di Fabio, B.; Frosini, P. (2013-08-01). "Filtrations induced by continuous functions". Topology and its Applications. 160 (12): 1413–1422. doi:10.1016/j.topol.2013.05.013. 
  62. ^ Lesnick, Michael (2012-06-06). "Multidimensional Interleavings and Applications to Topological Inference". arXiv:1206.1365free to read. 
  63. ^ a b Chazal, Frederic; de Silva, Vin; Glisse, Marc; Oudot, Steve (2012-07-16). "The structure and stability of persistence modules". arXiv:1207.3674free to read. 
  64. ^ Webb, Cary (1985-01-01). "Decomposition of graded modules". Proceedings of the American Mathematical Society. 94 (4): 565–571. doi:10.1090/S0002-9939-1985-0792261-6. ISSN 0002-9939. 
  65. ^ Crawley-Boevey, William. "Decomposition of pointwise finite-dimensional persistence modules". Journal of Algebra and Its Applications. 14 (05): 1550066. doi:10.1142/s0219498815500668. 
  66. ^ a b c Chazal, Frederic; Crawley-Boevey, William; de Silva, Vin (2014-05-22). "The observable structure of persistence modules". arXiv:1405.5644free to read. 
  67. ^ Droz, Jean-Marie (2012-10-15). "A subset of Euclidean space with large Vietoris-Rips homology". arXiv:1210.4097free to read. 
  68. ^ a b Weinberger S. What is... persistent homology?[J]. Notices of the AMS, 2011, 58(1): 36-39.
  69. ^ https://meetings.webex.com/collabs/files/viewRecording
  70. ^ Turner, Katharine; Mileyko, Yuriy; Mukherjee, Sayan; Harer, John (2014-07-12). "Fréchet Means for Distributions of Persistence Diagrams". Discrete & Computational Geometry. 52 (1): 44–70. doi:10.1007/s00454-014-9604-7. ISSN 0179-5376. 
  71. ^ a b Carlsson, Gunnar (2014-05-01). "Topological pattern recognition for point cloud data". Acta Numerica. 23: 289–368. doi:10.1017/S0962492914000051. ISSN 1474-0508. 
  72. ^ a b Mileyko, Yuriy; Mukherjee, Sayan; Harer, John (2011-11-10). "Probability measures on the space of persistence diagrams". Inverse Problems. 27 (12): 124007. doi:10.1088/0266-5611/27/12/124007. ISSN 0266-5611. 
  73. ^ Robinson, Andrew; Turner, Katharine (2013-10-28). "Hypothesis Testing for Topological Data Analysis". arXiv:1310.7467free to read. 
  74. ^ Fasy, Brittany Terese; Lecci, Fabrizio; Rinaldo, Alessandro; Wasserman, Larry; Balakrishnan, Sivaraman; Singh, Aarti (2014-12-01). "Confidence sets for persistence diagrams". The Annals of Statistics. 42 (6): 2301–2339. doi:10.1214/14-AOS1252. ISSN 0090-5364. 
  75. ^ Blumberg, Andrew J.; Gal, Itamar; Mandell, Michael A.; Pancia, Matthew (2014-05-15). "Robust Statistics, Hypothesis Testing, and Confidence Intervals for Persistent Homology on Metric Measure Spaces". Foundations of Computational Mathematics. 14 (4): 745–789. doi:10.1007/s10208-014-9201-4. ISSN 1615-3375. 
  76. ^ Bubenik, Peter (2012-07-26). "Statistical topological data analysis using persistence landscapes". arXiv:1207.6437free to read. 
  77. ^ Bubenik, Peter; Dlotko, Pawel (2014-12-31). "A persistence landscapes toolbox for topological statistics". arXiv:1501.00179free to read. 
  78. ^ Cohen-Steiner, David; Edelsbrunner, Herbert; Harer, John; Morozov, Dmitriy. Persistent Homology for Kernels, Images, and Cokernels. pp. 1011–1020. doi:10.1137/1.9781611973068.110. 
  79. ^ Kurlin, V. (2015). "A one-dimensional Homologically Persistent Skeleton of an unstructured point cloud in any metric space" (PDF). Computer Graphics Forum (CGF). 34 (5): 253–262. doi:10.1111/cgf.12713. 
  80. ^ Kurlin, V. (2014). "A fast and robust algorithm to count topologically persistent holes in noisy clouds" (PDF). IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/CVPR.2014.189. 
  81. ^ Kurlin, V. (2015). "A Homologically Persistent Skeleton is a fast and robust descriptor of interest points in 2D images" (PDF). Lecture Notes in Computer Science (Proceedings of CAIP: Computer Analysis of Images and Patterns). 9256: 606–617. doi:10.1007/978-3-319-23192-1_51. 
  82. ^ Cerri, A.; Ferri, M.; Giorgi, D. (2006-09-01). "Retrieval of trademark images by means of size functions". Graphical Models. Special Issue on the Vision, Video and Graphics Conference 2005. 68 (5–6): 451–471. doi:10.1016/j.gmod.2006.07.001. 
  83. ^ Chazal, Frédéric; Cohen-Steiner, David; Guibas, Leonidas J.; Mémoli, Facundo; Oudot, Steve Y. (2009-07-01). "Gromov-Hausdorff Stable Signatures for Shapes using Persistence". Computer Graphics Forum. 28 (5): 1393–1403. doi:10.1111/j.1467-8659.2009.01516.x. ISSN 1467-8659. 
  84. ^ Biasotti, S.; Giorgi, D.; Spagnuolo, M.; Falcidieno, B. (2008-09-01). "Size functions for comparing 3D models". Pattern Recognition. 41 (9): 2855–2873. doi:10.1016/j.patcog.2008.02.003. 
  85. ^ Li, C.; Ovsjanikov, M.; Chazal, F. (2014). "Persistence-based Structural Recognition" (PDF). IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 
  86. ^ Bendich, P.; Edelsbrunner, H.; Kerber, M. (2010-11-01). "Computing Robustness and Persistence for Images". IEEE Transactions on Visualization and Computer Graphics. 16 (6): 1251–1260. doi:10.1109/TVCG.2010.139. ISSN 1077-2626. 
  87. ^ Carlsson, Gunnar; Ishkhanov, Tigran; Silva, Vin de; Zomorodian, Afra (2007-06-30). "On the Local Behavior of Spaces of Natural Images". International Journal of Computer Vision. 76 (1): 1–12. doi:10.1007/s11263-007-0056-x. ISSN 0920-5691. 
  88. ^ Nakamura, Takenobu; Hiraoka, Yasuaki; Hirata, Akihiko; Escolar, Emerson G.; Nishiura, Yasumasa (2015-02-26). "Persistent Homology and Many-Body Atomic Structure for Medium-Range Order in the Glass". arXiv:1502.07445free to read. 
  89. ^ Nicolau, Monica; Levine, Arnold J.; Carlsson, Gunnar (2011-04-26). "Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival". Proceedings of the National Academy of Sciences. 108 (17): 7265–7270. doi:10.1073/pnas.1102826108. ISSN 0027-8424. PMC 3084136free to read. PMID 21482760. 
  90. ^ Schmidt, Stephan; Post, Teun M.; Boroujerdi, Massoud A.; Kesteren, Charlotte van; Ploeger, Bart A.; Pasqua, Oscar E. Della; Danhof, Meindert (2011-01-01). Kimko, Holly H. C.; Peck, Carl C., eds. Disease Progression Analysis: Towards Mechanism-Based Models. AAPS Advances in the Pharmaceutical Sciences Series. Springer New York. pp. 433–455. ISBN 978-1-4419-7414-3. 
  91. ^ Perea, Jose A.; Harer, John (2014-05-29). "Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis". Foundations of Computational Mathematics. 15 (3): 799–838. doi:10.1007/s10208-014-9206-z. ISSN 1615-3375. 
  92. ^ van de Weygaert, Rien; Vegter, Gert; Edelsbrunner, Herbert; Jones, Bernard J. T.; Pranav, Pratyush; Park, Changbom; Hellwing, Wojciech A.; Eldering, Bob; Kruithof, Nico (2011-01-01). Gavrilova, Marina L.; Tan, C. Kenneth; Mostafavi, Mir Abolfazl, eds. Transactions on Computational Science XIV. Berlin, Heidelberg: Springer-Verlag. pp. 60–101. ISBN 978-3-642-25248-8. 
  93. ^ Horak, Danijela; Maletić, Slobodan; Rajković, Milan (2009-03-01). "Persistent homology of complex networks - IOPscience". Journal of Statistical Mechanics: Theory and Experiment. 2009: P03034. doi:10.1088/1742-5468/2009/03/p03034. 
  94. ^ Carstens, C. J.; Horadam, K. J. (2013-06-04). "Persistent Homology of Collaboration Networks". Mathematical Problems in Engineering. 2013: 1–7. doi:10.1155/2013/815035. 
  95. ^ Lee, Hyekyoung; Kang, Hyejin; Chung, M.K.; Kim, Bung-Nyun; Lee, Dong Soo (2012-12-01). "Persistent Brain Network Homology From the Perspective of Dendrogram". IEEE Transactions on Medical Imaging. 31 (12): 2267–2277. doi:10.1109/TMI.2012.2219590. ISSN 0278-0062. 
  96. ^ Petri, G.; Expert, P.; Turkheimer, F.; Carhart-Harris, R.; Nutt, D.; Hellyer, P. J.; Vaccarino, F. (2014-12-06). "Homological scaffolds of brain functional networks". Journal of the Royal Society Interface. 11 (101): 20140873. doi:10.1098/rsif.2014.0873. ISSN 1742-5689. PMC 4223908free to read. PMID 25401177. 
  97. ^ a b MacPherson, Robert; Schweinhart, Benjamin (2012-07-01). "Measuring shape with topology". Journal of Mathematical Physics. 53 (7): 073516. doi:10.1063/1.4737391. ISSN 0022-2488. 
  98. ^ Chan, Joseph Minhow; Carlsson, Gunnar; Rabadan, Raul (2013-11-12). "Topology of viral evolution". Proceedings of the National Academy of Sciences. 110 (46): 18566–18571. doi:10.1073/pnas.1313480110. ISSN 0027-8424. PMC 3831954free to read. PMID 24170857. 
  99. ^ Taylor, D.; al, et. (2015-08-21). "Topological data analysis of contagion maps for examining spreading processes on networks". Nature Communications. 6 (6): 7723. doi:10.1038/ncomms8723. ISSN 2041-1723. 
  100. ^ Wang, Bao; Wei, Guo-Wei (2014-12-07). "Objective-oriented Persistent Homology". arXiv:1412.2368free to read. 
  101. ^ Frosini, Patrizio; Landi, Claudia. "Uniqueness of models in persistent homology: the case of curves". Inverse Problems. 27: 124005. doi:10.1088/0266-5611/27/12/124005. 
  102. ^ Xia, Kelin; Feng, Xin; Tong, Yiying; Wei, Guo Wei (2015-03-05). "Persistent homology for the quantitative prediction of fullerene stability". Journal of Computational Chemistry. 36 (6): 408–422. doi:10.1002/jcc.23816. ISSN 1096-987X. PMC 4324100free to read. PMID 25523342. 
  103. ^ Xia, Kelin; Wei, Guo-Wei (2014-08-01). "Persistent homology analysis of protein structure, flexibility, and folding". International Journal for Numerical Methods in Biomedical Engineering. 30 (8): 814–844. doi:10.1002/cnm.2655. ISSN 2040-7947. PMC 4131872free to read. PMID 24902720. 
  104. ^ Adcock, Aaron; Carlsson, Erik; Carlsson, Gunnar (2016-05-31). "The ring of algebraic functions on persistence bar codes" (PDF). Homology, Homotopy, and Applications. 18 (1): 381–402. doi:10.4310/HHA.2016.v18.a21. 
  105. ^ Chepushtanova, Sofya; Emerson, Tegan; Hanson, Eric; Kirby, Michael; Motta, Francis; Neville, Rachel; Peterson, Chris; Shipman, Patrick; Ziegelmeier, Lori (2015-07-22). "Persistence Images: An Alternative Persistent Homology Representation". arXiv:1507.06217free to read. 
  106. ^ Deheuvels, Rene (1955-01-01). "Topologie D'Une Fonctionnelle". Annals of Mathematics. Second Series. 61 (1): 13–72. doi:10.2307/1969619. JSTOR 1969619. 
  107. ^ de Silva V, Munch E, Patel A. Categorified reeb graphs[J]. arXiv:1501.04147, 2015.
  108. ^ Goodman, Jacob E. (2008-01-01). Surveys on Discrete and Computational Geometry: Twenty Years Later : AMS-IMS-SIAM Joint Summer Research Conference, June 18-22, 2006, Snowbird, Utah. American Mathematical Soc. ISBN 9780821842393. 

Further reading[edit]

Brief Introduction

Monograph

Video Lecture

Textbook on Topology

Other Resources of TDA