Faculty

Gunnar Carlsson, Ann and Bill Swindells Professor of Mathematics, Stanford

Persi Diaconis, Mary V. Sunseri Professor of Mathematics and Statistics, Stanford

Leo Guibas, Professor of Computer Science, Stanford

Susan Holmes, Professor of Statistics, Stanford

Research Staff

Michael Mahoney, Department of Mathematics, Stanford

Monica Nicolau, Department of Mathematics, Stanford

Postdocs

Daniel Müllner, Mathematics

David Romano, Psychiatry & Behavioral Science

Hau-tieng Wu, Mathematics

Grad Students

Aaron Adcock, Electrical Engineering

Sara Kalisnik, Mathematics

Ryan Lewis, ICME

Pawin Vongmasa, ICME

Reza Zadeh, ICME

Alumni

Henry Adams, Mathematics, Duke University

Aravindakshan Babu, ICME

Anthony Bak, Ayasdi

Erik Carlsson

Huang-Wei Chang, ICME

Anne Collins

Paul Debashis, Statistics, UC Davis

Vin de Silva, Mathematics, Pomona College

Yi Ding, ICME

Anton Dochtermann, Mathematics, University of Miami

Tom Griffiths, Cognitive and Linguistic Science, Brown University

Yichi Gu, Deseam Research Lab, Beijng

Michael He

Michele Intermont, Mathematics, Kalamazoo College

Tigran Ishkanov, Ayasdi

Matthew Kahle, Mathematics, Ohio State University

Peter Lee, Mathematics, MIT

Queenie Lee, ICME

Michael Lesnick, IAS

Lek-Heng Lim, Statistics, University of Chicago

Arian Maleki, Rice University

Facundo Memoli, Computer Science, University of Adelaide

Dmitriy Morozov, Lawrence Berkeley Labs

Jennifer Novak Kloke, Ayasdi

Steve Oudot, INRIA-Saclay, France

Jose A. Perea, Mathematics, Duke

Patrick Perry, Statistics, NYU Stern

Harlan Sexton, Ayasdi

Gurjeet Singh, Ayasdi

Andrew Tausz

Guillaume Troianowski, ICME

Mikael Vejdemo-Johansson, Computer Science, University of St. Andrews

Dawn B. Woodard, Operations Research and Information Engineering, Cornell

Cathy Wu

Yuan Yao, Mathematical Sciences, Peking University

Afra Zomorodian, Computer Science, Dartmouth

Margit Zwemer, Kaggle

Overview

The overall goal of this project is to develop flexible topological methods which will allow the analysis of data which is difficult to analyze using classical linear methods. Data obtained by sampling from highly curved manifolds or singular algebraic varieties in Euclidean space are typical examples where our methods will be useful. We intend to develop and refine two pieces of software which have been written by members of our research group, ISOMAP (Tenenbaum) and PLEX (de Silva-Carlsson). ISOMAP is a tool for dimension reduction and parameterization of high dimensional data sets, and PLEX is a homology computing tool which we will use in locating and analyzing singular points in data sets, as well as estimating dimension in situations where standard methods do not work well. We plan to extend the range of applicability of both tools, in the case of ISOMAP by studying embeddings into spaces with non-Euclidean metrics, and in the case of PLEX by building in the Mayer-Vietoris spectral sequence as a tool. Both ISOMAP and PLEX will be adapted for parallel computing. We will also begin the theoretical study of statistical questions relating to topology. For instance, we will initiate the study of higher dimensional homology of subsets sampled from Euclidean space under various sampling hypotheses. The key object of study will be the family of Cech complexes constructed using the distance function in Euclidean space together with a randomly chosen finite set of points in Euclidean space.

The goal of this project is to develop tools for understanding data sets which are not easy to understand using standard methods. This kind of data might include singular points, or might be strongly curved. The data is also high dimensional, in the sense that each data point has many coordinates. For instance, we might have a data set whose points each of which is an image, which has one coordinate for each pixel. Many standard tools rely on linear approximations, which do not work well in strongly curved or singular problems. The kind of tools we have in mind are in part topological, in the sense that they measure more qualitative properties of the spaces involved, such as connectedness, or the number of holes in a space, and so on. This group of methods has the capability of recognizing the number of parameters required to describe a space, without actually parameterizing it. These methods also have the capability of recognizing singular points (like points where two non-parallel planes or non-parallel lines intersect), without actually having to construct coordinates on the space. We will also be further developing and refining methods we have already constructed which can actually find good parameterizations for many high dimensional data sets. Both projects will involve the adaptation for the computer of many methods which have heretofore been used in by-hand calculations for solving theoretical problems. We will also initiate the theoretical development of topological tools in a setting which includes errors and sampling.

The wide availability of scanning technologies for 3-d shape acquisition has led to the development of algorithms and software, in both academia and industry, for building surface meshes from the raw data returned by the scanner. For most optical and contact sensors this raw data is in the form of an unorganized point cloud, or PCD set (point cloud data); current scanners are able to generate object surface samplings at submillimeter accuracy. At such high resolutions, these dense PCD sets become adequate representations of object geometry in themselves, without any need to fit a mesh or surface. The work undertaken under this grant will exploit tools from algebraic topology and computational geometry to perform local and global analysis directly and efficiently on PCD data. Specifically, algorithms for PCD feature detection, segmentation, matching, fitting, and compression will be developed and analyzed. It is expected that PCD sets will become a lightweight and convenient representation of object geometry not only in reconstruction or visualization applications, but even more so in the largely unexplored area of directly interrogating object geometry in the PCD format for applications such as robotic exploration or product customization. Over the next five years we can expect to see wide commercial deployment of 3-d scanning and the appearance of compact, portable scanning devices. By exploiting the acquired PCD data it will become possible to customize and tailor manufactured products to their user to a degree that has been impossible up to now. Technology developed under this proposal should facilitate this transition in the apparel, footwear, hearing-aid, orthodontic, prosthetics, and automotive industries.

This project will develop topological tools for understanding qualitative properties of data sets. We will use homology as applied to data sets directly and to derived complexes to define invariants or signatures that distinguish between the underlying geometric objects. Important goals will include the identification, location, and classification of qualitative features of the data set, such as the presence of corners, edges, cone points, etc. and the use of homology applied to canonically defined blowups and tangent complexes to distinguish between two dimensional shapes in three dimensional Euclidean space. We will use the recently developed techniques of persistence and landmarking to make homology a stable and readily computable invariant. We will also develop the theory of multidimensional persistence, in which one studies spaces that are equipped with several parameters, in order to better understand data sets in which there are several different parameters describing different geometric properties of the space. The overall goal is to continue to develop and improve the available tools for studying qualitative information about geometric objects. The goal of this project is to develop tools for understanding data sets that are not easy to understand using standard methods of statistics and analysis. This kind of data might include singular points, or might be strongly curved. The data is also high dimensional, in the sense that each data point has many coordinates. For instance, we might have a data set whose points each of which is an image, which has one coordinate for each pixel. Many standard tools rely on linear approximations, which do not work well in strongly curved or singular problems. The kind of tools we have in mind are in part topological, in the sense that they measure more qualitative properties of the spaces involved, such as connectedness, or the number of holes in a space, and so on. For example, the project takes the point of view that it is better to understand qualitative properties before attempting to do more precise quantitative analysis and better to distinguish shapes by understanding them qualitatively rather than doing data base comparisons. Thus, methods will be developed to compute, in a timely, robust, and trustworthy manner, the fundamental geometric properties that any realistic mathematical model associated to a given data set must contain. Then statistical and analytic techniques may be applied to the geometrically correct models in order to extract the detailed information desired by practitioners.