
ComparisonBased Random Forests
Authors:
Siavash Haghiri,
Damien Garreau,
Ulrike von Luxburg
Abstract:
Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification th…
▽ More
Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification that relies only on such triplet comparisons. In the theory part of this paper, we establish sufficient conditions for the consistency of such a forest. In a set of comprehensive experiments, we then demonstrate that the proposed random forest is efficient both for classification and regression. In particular, it is even competitive with other methods that have direct access to the metric representation of the data.
△ Less
Submitted 18 June, 2018; originally announced June 2018.

Design and Analysis of the NIPS 2016 Review Process
Authors:
Nihar B. Shah,
Behzad Tabibian,
Krikamol Muandet,
Isabelle Guyon,
Ulrike von Luxburg
Abstract:
Neural Information Processing Systems (NIPS) is a toptier annual conference in machine learning. The 2016 edition of the conference comprised more than 2,400 paper submissions, 3,000 reviewers, and 8,000 attendees. This represents a growth of nearly 40% in terms of submissions, 96% in terms of reviewers, and over 100% in terms of attendees as compared to the previous year. The massive scale as we…
▽ More
Neural Information Processing Systems (NIPS) is a toptier annual conference in machine learning. The 2016 edition of the conference comprised more than 2,400 paper submissions, 3,000 reviewers, and 8,000 attendees. This represents a growth of nearly 40% in terms of submissions, 96% in terms of reviewers, and over 100% in terms of attendees as compared to the previous year. The massive scale as well as rapid growth of the conference calls for a thorough quality assessment of the peerreview process and novel means of improvement. In this paper, we analyze several aspects of the data collected during the review process, including an experiment investigating the efficacy of collecting ordinal rankings from reviewers. Our goal is to check the soundness of the review process, and provide insights that may be useful in the design of the review process of subsequent conferences.
△ Less
Submitted 23 April, 2018; v1 submitted 31 August, 2017; originally announced August 2017.

Twosample Hypothesis Testing for Inhomogeneous Random Graphs
Authors:
Debarghya Ghoshdastidar,
Maurilio Gutzeit,
Alexandra Carpentier,
Ulrike von Luxburg
Abstract:
The study of networks leads to a wide range of high dimensional inference problems. In most practical scenarios, one needs to draw inference from a small population of large networks. The present paper studies hypothesis testing of graphs in this highdimensional regime. We consider the problem of testing between two populations of inhomogeneous random graphs defined on the same set of vertices. W…
▽ More
The study of networks leads to a wide range of high dimensional inference problems. In most practical scenarios, one needs to draw inference from a small population of large networks. The present paper studies hypothesis testing of graphs in this highdimensional regime. We consider the problem of testing between two populations of inhomogeneous random graphs defined on the same set of vertices. We propose tests based on estimates of the Frobenius and operator norms of the difference between the population adjacency matrices. We show that the tests are uniformly consistent in both the "large graph, small sample" and "small graph, large sample" regimes. We further derive lower bounds on the minimax separation rate for the associated testing problems, and show that the constructed tests are near optimal.
△ Less
Submitted 1 August, 2017; v1 submitted 4 July, 2017; originally announced July 2017.

TwoSample Tests for Large Random Graphs Using Network Statistics
Authors:
Debarghya Ghoshdastidar,
Maurilio Gutzeit,
Alexandra Carpentier,
Ulrike von Luxburg
Abstract:
We consider a twosample hypothesis testing problem, where the distributions are defined on the space of undirected graphs, and one has access to only one observation from each model. A motivating example for this problem is comparing the friendship networks on Facebook and LinkedIn. The practical approach to such problems is to compare the networks based on certain network statistics. In this pap…
▽ More
We consider a twosample hypothesis testing problem, where the distributions are defined on the space of undirected graphs, and one has access to only one observation from each model. A motivating example for this problem is comparing the friendship networks on Facebook and LinkedIn. The practical approach to such problems is to compare the networks based on certain network statistics. In this paper, we present a general principle for twosample hypothesis testing in such scenarios without making any assumption about the network generation process. The main contribution of the paper is a general formulation of the problem based on concentration of network statistics, and consequently, a consistent twosample test that arises as the natural solution for this problem. We also show that the proposed test is minimax optimal for certain network statistics.
△ Less
Submitted 26 May, 2017; v1 submitted 17 May, 2017; originally announced May 2017.

Comparison Based Nearest Neighbor Search
Authors:
Siavash Haghiri,
Debarghya Ghoshdastidar,
Ulrike von Luxburg
Abstract:
We consider machine learning in a comparisonbased setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points $i$ and $j$ is smaller than the distance between the points $i$ and $k$. We are concerned with data structures and algorithms to find nearest neig…
▽ More
We consider machine learning in a comparisonbased setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points $i$ and $j$ is smaller than the distance between the points $i$ and $k$. We are concerned with data structures and algorithms to find nearest neighbors based on such comparisons. We focus on a simple yet effective algorithm that recursively splits the space by first selecting two random pivot points and then assigning all other points to the closer of the two (comparison tree). We prove that if the metric space satisfies certain expansion conditions, then with high probability the height of the comparison tree is logarithmic in the number of points, leading to efficient search performance. We also provide an upper bound for the failure probability to return the true nearest neighbor. Experiments show that the comparison tree is competitive with algorithms that have access to the actual distance values, and needs less triplet comparisons than other competitors.
△ Less
Submitted 5 April, 2017; originally announced April 2017.

Kernel functions based on triplet comparisons
Authors:
Matthäus Kleindessner,
Ulrike von Luxburg
Abstract:
Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a lowdimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to highdimens…
▽ More
Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a lowdimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to highdimensional embeddings. These kernel functions can subsequently be used to apply any kernel method to the data set.
△ Less
Submitted 29 October, 2017; v1 submitted 28 July, 2016; originally announced July 2016.

Lens depth function and krelative neighborhood graph: versatile tools for ordinal data analysis
Authors:
Matthäus Kleindessner,
Ulrike von Luxburg
Abstract:
In recent years it has become popular to study machine learning problems in a setting of ordinal distance information rather than numerical distance measurements. By ordinal distance information we refer to binary answers to distance comparisons such as $d(A,B)…
▽ More
In recent years it has become popular to study machine learning problems in a setting of ordinal distance information rather than numerical distance measurements. By ordinal distance information we refer to binary answers to distance comparisons such as $d(A,B)<d(C,D)$. For many problems in machine learning and statistics it is unclear how to solve them in such a scenario. Up to now, the main approach is to explicitly construct an ordinal embedding of the data points in the Euclidean space, an approach that has a number of drawbacks. In this paper, we propose algorithms for the problems of medoid estimation, outlier identification, classification, and clustering when given only ordinal data. They are based on estimating the lens depth function and the $k$relative neighborhood graph on a data set. Our algorithms are simple, are much faster than an ordinal embedding approach and avoid some of its drawbacks, and can easily be parallelized.
△ Less
Submitted 24 July, 2017; v1 submitted 23 February, 2016; originally announced February 2016.

Peer Grading in a Course on Algorithms and Data Structures: Machine Learning Algorithms do not Improve over Simple Baselines
Authors:
Mehdi S. M. Sajjadi,
Morteza Alamgir,
Ulrike von Luxburg
Abstract:
Peer grading is the process of students reviewing each others' work, such as homework submissions, and has lately become a popular mechanism used in massive open online courses (MOOCs). Intrigued by this idea, we used it in a course on algorithms and data structures at the University of Hamburg. Throughout the whole semester, students repeatedly handed in submissions to exercises, which were then…
▽ More
Peer grading is the process of students reviewing each others' work, such as homework submissions, and has lately become a popular mechanism used in massive open online courses (MOOCs). Intrigued by this idea, we used it in a course on algorithms and data structures at the University of Hamburg. Throughout the whole semester, students repeatedly handed in submissions to exercises, which were then evaluated both by teaching assistants and by a peer grading mechanism, yielding a large dataset of teacher and peer grades. We applied different statistical and machine learning methods to aggregate the peer grades in order to come up with accurate final grades for the submissions (supervised and unsupervised, methods based on numeric scores and ordinal rankings). Surprisingly, none of them improves over the baseline of using the mean peer grade as the final grade. We discuss a number of possible explanations for these results and present a thorough analysis of the generated dataset.
△ Less
Submitted 10 February, 2016; v1 submitted 2 June, 2015; originally announced June 2015.

Construction of Tight Frames on Graphs and Application to Denoising
Authors:
Franziska Göbel,
Gilles Blanchard,
Ulrike von Luxburg
Abstract:
Given a neighborhood graph representation of a finite set of points $x_i\in\mathbb{R}^d,i=1,\ldots,n,$ we construct a frame (redundant dictionary) for the space of realvalued functions defined on the graph. This frame is adapted to the underlying geometrical structure of the $x_i$, has finitely many elements, and these elements are localized in frequency as well as in space. This construction fol…
▽ More
Given a neighborhood graph representation of a finite set of points $x_i\in\mathbb{R}^d,i=1,\ldots,n,$ we construct a frame (redundant dictionary) for the space of realvalued functions defined on the graph. This frame is adapted to the underlying geometrical structure of the $x_i$, has finitely many elements, and these elements are localized in frequency as well as in space. This construction follows the ideas of Hammond et al. (2011), with the key point that we construct a tight (or Parseval) frame. This means we have a very simple, explicit reconstruction formula for every function $f$ defined on the graph from the coefficients given by its scalar product with the frame elements. We use this representation in the setting of denoising where we are given noisy observations of a function $f$ defined on the graph. By applying a thresholding method to the coefficients in the reconstruction formula, we define an estimate of $f$ whose risk satisfies a tight oracle inequality.
△ Less
Submitted 21 June, 2017; v1 submitted 18 August, 2014; originally announced August 2014.

Unconscious lie detection as an example of a widespread fallacy in the Neurosciences
Authors:
Volker H. Franz,
Ulrike von Luxburg
Abstract:
Neuroscientists frequently use a certain statistical reasoning to establish the existence of distinct neuronal processes in the brain. We show that this reasoning is flawed and that the large corresponding literature needs reconsideration. We illustrate the fallacy with a recent study that received an enormous press coverage because it concluded that humans detect deceit better if they use unconsc…
▽ More
Neuroscientists frequently use a certain statistical reasoning to establish the existence of distinct neuronal processes in the brain. We show that this reasoning is flawed and that the large corresponding literature needs reconsideration. We illustrate the fallacy with a recent study that received an enormous press coverage because it concluded that humans detect deceit better if they use unconscious processes instead of conscious deliberations. The study was published under a new opendata policy that enabled us to reanalyze the data with more appropriate methods. We found that unconscious performance was close to chance  just as the conscious performance. This illustrates the flaws of this widely used statistical reasoning, the benefits of opendata practices, and the need for careful reconsideration of studies using the same rationale.
△ Less
Submitted 16 July, 2014; originally announced July 2014.

Consistent procedures for cluster tree estimation and pruning
Authors:
Kamalika Chaudhuri,
Sanjoy Dasgupta,
Samory Kpotufe,
Ulrike von Luxburg
Abstract:
For a density $f$ on ${\mathbb R}^d$, a {\it highdensity cluster} is any connected component of $\{x: f(x) \geq λ\}$, for some $λ> 0$. The set of all highdensity clusters forms a hierarchy called the {\it cluster tree} of $f$. We present two procedures for estimating the cluster tree given samples from $f$. The first is a robust variant of the single linkage algorithm for hierarchical clustering…
▽ More
For a density $f$ on ${\mathbb R}^d$, a {\it highdensity cluster} is any connected component of $\{x: f(x) \geq λ\}$, for some $λ> 0$. The set of all highdensity clusters forms a hierarchy called the {\it cluster tree} of $f$. We present two procedures for estimating the cluster tree given samples from $f$. The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the $k$nearest neighbor graph of the samples. We give finitesample convergence rates for these algorithms which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient.
△ Less
Submitted 5 June, 2014; originally announced June 2014.

Shortest path distance in random knearest neighbor graphs
Authors:
Morteza Alamgir,
Ulrike von Luxburg
Abstract:
Consider a weighted or unweighted knearest neighbor graph that has been built on n data points drawn randomly according to some density p on R^d. We study the convergence of the shortest path distance in such graphs as the sample size tends to infinity. We prove that for unweighted kNN graphs, this distance converges to an unpleasant distance function on the underlying space whose properties are…
▽ More
Consider a weighted or unweighted knearest neighbor graph that has been built on n data points drawn randomly according to some density p on R^d. We study the convergence of the shortest path distance in such graphs as the sample size tends to infinity. We prove that for unweighted kNN graphs, this distance converges to an unpleasant distance function on the underlying space whose properties are detrimental to machine learning. We also study the behavior of the shortest path distance in weighted kNN graphs.
△ Less
Submitted 9 July, 2012; v1 submitted 27 June, 2012; originally announced June 2012.

Pruning nearest neighbor cluster trees
Authors:
Samory Kpotufe,
Ulrike von Luxburg
Abstract:
Nearest neighbor (kNN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is it possible to identify spurious structures that might arise due to sampling variability?
Our first contribution is a statistical analysis that reveals how c…
▽ More
Nearest neighbor (kNN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is it possible to identify spurious structures that might arise due to sampling variability?
Our first contribution is a statistical analysis that reveals how certain subgraphs of a kNN graph form a consistent estimator of the cluster tree of the underlying distribution of points. Our second and perhaps most important contribution is the following finite sample guarantee. We carefully work out the tradeoff between aggressive and conservative pruning and are able to guarantee the removal of all spurious cluster structures at all levels of the tree while at the same time guaranteeing the recovery of salient clusters. This is the first such finite sample result in the context of clustering.
△ Less
Submitted 5 May, 2011; v1 submitted 3 May, 2011; originally announced May 2011.

How the result of graph clustering methods depends on the construction of the graph
Authors:
Markus Maier,
Ulrike von Luxburg,
Matthias Hein
Abstract:
We study the scenario of graphbased clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the out…
▽ More
We study the scenario of graphbased clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.
△ Less
Submitted 10 February, 2011; originally announced February 2011.

Clustering Stability: An Overview
Authors:
Ulrike von Luxburg
Abstract:
A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for nonexperts. In this paper we…
▽ More
A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for nonexperts. In this paper we give a highlevel overview about the existing literature on clustering stability. In addition to presenting the results in a slightly informal but accessible way, we relate them to each other and discuss their different implications.
△ Less
Submitted 7 July, 2010; originally announced July 2010.

Hitting and commute times in large graphs are often misleading
Authors:
Ulrike von Luxburg,
Agnes Radl,
Matthias Hein
Abstract:
Next to the shortest path distance, the second most popular distance function between vertices in a graph is the commute distance (resistance distance). For two vertices u and v, the hitting time H_{uv} is the expected time it takes a random walk to travel from u to v. The commute time is its symmetrized version C_{uv} = H_{uv} + H_{vu}. In our paper we study the behavior of hitting times and comm…
▽ More
Next to the shortest path distance, the second most popular distance function between vertices in a graph is the commute distance (resistance distance). For two vertices u and v, the hitting time H_{uv} is the expected time it takes a random walk to travel from u to v. The commute time is its symmetrized version C_{uv} = H_{uv} + H_{vu}. In our paper we study the behavior of hitting times and commute distances when the number n of vertices in the graph is very large. We prove that as n converges to infinty, hitting times and commute distances converge to expressions that do not take into account the global structure of the graph at all. Namely, the hitting time H_{uv} converges to 1/d_v and the commute time to 1/d_u + 1/d_v where d_u and d_v denote the degrees of vertices u and v. In these cases, the hitting and commute times are misleading in the sense that they do not provide information about the structure of the graph. We focus on two major classes of random graphs: random geometric graphs (knearest neighbor graphs, epsilongraphs, Gaussian similarity graphs) and random graphs with given expected degrees (in particular, ErdosRenyi graphs with and without planted partitions)
△ Less
Submitted 26 May, 2011; v1 submitted 5 March, 2010; originally announced March 2010.

Optimal construction of knearest neighbor graphs for identifying noisy clusters
Authors:
Markus Maier,
Matthias Hein,
Ulrike von Luxburg
Abstract:
We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual knearest neighbor or symmetric knearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected…
▽ More
We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual knearest neighbor or symmetric knearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the tlevel set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noisefree and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order log n) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric knearest neighbor graph occurs when one attempts to detect the most significant cluster only.
△ Less
Submitted 17 December, 2009; originally announced December 2009.

How the initialization affects the stability of the kmeans algorithm
Authors:
Sebastien Bubeck,
Marina Meila,
Ulrike von Luxburg
Abstract:
We investigate the role of the initialization for the stability of the kmeans clustering algorithm. As opposed to other papers, we consider the actual kmeans algorithm and do not ignore its property of getting stuck in local optima. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and wh…
▽ More
We investigate the role of the initialization for the stability of the kmeans clustering algorithm. As opposed to other papers, we consider the actual kmeans algorithm and do not ignore its property of getting stuck in local optima. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.
△ Less
Submitted 31 July, 2009; originally announced July 2009.

Statistical Learning Theory: Models, Concepts, and Results
Authors:
Ulrike von Luxburg,
Bernhard Schoelkopf
Abstract:
Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms. In this article we attempt to give a gentle, nontechnical overview over the key ideas and insights of statistical learning theory. We target at a broad audience, not necessarily machine learning researchers. This paper can serve as a starting point for people who want to get an overview o…
▽ More
Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms. In this article we attempt to give a gentle, nontechnical overview over the key ideas and insights of statistical learning theory. We target at a broad audience, not necessarily machine learning researchers. This paper can serve as a starting point for people who want to get an overview on the field before diving into technical details.
△ Less
Submitted 27 October, 2008; originally announced October 2008.

Consistency of spectral clustering
Authors:
Ulrike von Luxburg,
Mikhail Belkin,
Olivier Bousquet
Abstract:
Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new metho…
▽ More
Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.
△ Less
Submitted 4 April, 2008; originally announced April 2008.

A Geometric Approach to Confidence Sets for Ratios: Fieller's Theorem, Generalizations, and Bootstrap
Authors:
Ulrike von Luxburg,
Volker H. Franz
Abstract:
We present a geometric method to determine confidence sets for the ratio E(Y)/E(X) of the means of random variables X and Y. This method reduces the problem of constructing confidence sets for the ratio of two random variables to the problem of constructing confidence sets for the means of onedimensional random variables. It is valid in a large variety of circumstances. In the case of normally…
▽ More
We present a geometric method to determine confidence sets for the ratio E(Y)/E(X) of the means of random variables X and Y. This method reduces the problem of constructing confidence sets for the ratio of two random variables to the problem of constructing confidence sets for the means of onedimensional random variables. It is valid in a large variety of circumstances. In the case of normally distributed random variables, the so constructed confidence sets coincide with the standard Fieller confidence sets. Generalizations of our construction lead to definitions of exact and conservative confidence sets for very general classes of distributions, provided the joint expectation of (X,Y) exists and the linear combinations of the form aX + bY are wellbehaved. Finally, our geometric method allows to derive a very simple bootstrap approach for constructing conservative confidence sets for ratios which perform favorably in certain situations, in particular in the asymmetric heavytailed regime.
△ Less
Submitted 1 November, 2007; originally announced November 2007.

A Tutorial on Spectral Clustering
Authors:
Ulrike von Luxburg
Abstract:
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the kmeans algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at…
▽ More
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the kmeans algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.
△ Less
Submitted 1 November, 2007; originally announced November 2007.

Graph Laplacians and their convergence on random neighborhood graphs
Authors:
Matthias Hein,
JeanYves Audibert,
Ulrike von Luxburg
Abstract:
Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semisupervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three di…
▽ More
Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semisupervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a nonuniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted LaplaceBeltrami operator.
△ Less
Submitted 27 June, 2007; v1 submitted 21 August, 2006; originally announced August 2006.