Showing 1–23 of 23 results for author: von Luxburg, U

  1. arXiv:1806.06616  [pdf, other stat.ML

    Comparison-Based Random Forests

    Authors: Siavash Haghiri, Damien Garreau, Ulrike von Luxburg

    Abstract: Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification th… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: Accepted at ICML 2018, camera-ready version (32 pages, 14 figures)

  2. arXiv:1708.09794  [pdf, other cs.DL

    Design and Analysis of the NIPS 2016 Review Process

    Authors: Nihar B. Shah, Behzad Tabibian, Krikamol Muandet, Isabelle Guyon, Ulrike von Luxburg

    Abstract: Neural Information Processing Systems (NIPS) is a top-tier annual conference in machine learning. The 2016 edition of the conference comprised more than 2,400 paper submissions, 3,000 reviewers, and 8,000 attendees. This represents a growth of nearly 40% in terms of submissions, 96% in terms of reviewers, and over 100% in terms of attendees as compared to the previous year. The massive scale as we… ▽ More

    Submitted 23 April, 2018; v1 submitted 31 August, 2017; originally announced August 2017.

  3. arXiv:1707.00833  [pdf, ps, other stat.ME

    Two-sample Hypothesis Testing for Inhomogeneous Random Graphs

    Authors: Debarghya Ghoshdastidar, Maurilio Gutzeit, Alexandra Carpentier, Ulrike von Luxburg

    Abstract: The study of networks leads to a wide range of high dimensional inference problems. In most practical scenarios, one needs to draw inference from a small population of large networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime. We consider the problem of testing between two populations of inhomogeneous random graphs defined on the same set of vertices. W… ▽ More

    Submitted 1 August, 2017; v1 submitted 4 July, 2017; originally announced July 2017.

    Comments: 46 pages. Title changed from v1, and contains minor textual changes and typo corrections. Note: This paper studies a different problem than arXiv:1705.06168

    MSC Class: 62H15; 62C20; 05C80; 60B20

  4. arXiv:1705.06168  [pdf, ps, other stat.ME

    Two-Sample Tests for Large Random Graphs Using Network Statistics

    Authors: Debarghya Ghoshdastidar, Maurilio Gutzeit, Alexandra Carpentier, Ulrike von Luxburg

    Abstract: We consider a two-sample hypothesis testing problem, where the distributions are defined on the space of undirected graphs, and one has access to only one observation from each model. A motivating example for this problem is comparing the friendship networks on Facebook and LinkedIn. The practical approach to such problems is to compare the networks based on certain network statistics. In this pap… ▽ More

    Submitted 26 May, 2017; v1 submitted 17 May, 2017; originally announced May 2017.

    Comments: To be presented in COLT 2017 (author sequence, funding details and minor typos updated in version 2)

  5. arXiv:1704.01460  [pdf, other stat.ML

    Comparison Based Nearest Neighbor Search

    Authors: Siavash Haghiri, Debarghya Ghoshdastidar, Ulrike von Luxburg

    Abstract: We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points $i$ and $j$ is smaller than the distance between the points $i$ and $k$. We are concerned with data structures and algorithms to find nearest neig… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

    Comments: 16 Pages, 3 Figures

  6. arXiv:1607.08456  [pdf, other stat.ML

    Kernel functions based on triplet comparisons

    Authors: Matthäus Kleindessner, Ulrike von Luxburg

    Abstract: Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a low-dimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to high-dimens… ▽ More

    Submitted 29 October, 2017; v1 submitted 28 July, 2016; originally announced July 2016.

  7. arXiv:1602.07194  [pdf, other stat.ML

    Lens depth function and k-relative neighborhood graph: versatile tools for ordinal data analysis

    Authors: Matthäus Kleindessner, Ulrike von Luxburg

    Abstract: In recent years it has become popular to study machine learning problems in a setting of ordinal distance information rather than numerical distance measurements. By ordinal distance information we refer to binary answers to distance comparisons such as $d(A,B)… ▽ More

    Submitted 24 July, 2017; v1 submitted 23 February, 2016; originally announced February 2016.

    Journal ref: Journal of Machine Learning Research 18(58):1-52, 2017

  8. arXiv:1506.00852  [pdf, other cs.LG

    Peer Grading in a Course on Algorithms and Data Structures: Machine Learning Algorithms do not Improve over Simple Baselines

    Authors: Mehdi S. M. Sajjadi, Morteza Alamgir, Ulrike von Luxburg

    Abstract: Peer grading is the process of students reviewing each others' work, such as homework submissions, and has lately become a popular mechanism used in massive open online courses (MOOCs). Intrigued by this idea, we used it in a course on algorithms and data structures at the University of Hamburg. Throughout the whole semester, students repeatedly handed in submissions to exercises, which were then… ▽ More

    Submitted 10 February, 2016; v1 submitted 2 June, 2015; originally announced June 2015.

    Comments: Published at the Third Annual ACM Conference on Learning at Scale L@S

  9. arXiv:1408.4012  [pdf, other stat.ME

    Construction of Tight Frames on Graphs and Application to Denoising

    Authors: Franziska Göbel, Gilles Blanchard, Ulrike von Luxburg

    Abstract: Given a neighborhood graph representation of a finite set of points $x_i\in\mathbb{R}^d,i=1,\ldots,n,$ we construct a frame (redundant dictionary) for the space of real-valued functions defined on the graph. This frame is adapted to the underlying geometrical structure of the $x_i$, has finitely many elements, and these elements are localized in frequency as well as in space. This construction fol… ▽ More

    Submitted 21 June, 2017; v1 submitted 18 August, 2014; originally announced August 2014.

  10. arXiv:1407.4240  [pdf, other stat.AP

    Unconscious lie detection as an example of a widespread fallacy in the Neurosciences

    Authors: Volker H. Franz, Ulrike von Luxburg

    Abstract: Neuroscientists frequently use a certain statistical reasoning to establish the existence of distinct neuronal processes in the brain. We show that this reasoning is flawed and that the large corresponding literature needs reconsideration. We illustrate the fallacy with a recent study that received an enormous press coverage because it concluded that humans detect deceit better if they use unconsc… ▽ More

    Submitted 16 July, 2014; originally announced July 2014.

  11. arXiv:1406.1546  [pdf, ps, other stat.ML

    Consistent procedures for cluster tree estimation and pruning

    Authors: Kamalika Chaudhuri, Sanjoy Dasgupta, Samory Kpotufe, Ulrike von Luxburg

    Abstract: For a density $f$ on ${\mathbb R}^d$, a {\it high-density cluster} is any connected component of $\{x: f(x) \geq λ\}$, for some $λ> 0$. The set of all high-density clusters forms a hierarchy called the {\it cluster tree} of $f$. We present two procedures for estimating the cluster tree given samples from $f$. The first is a robust variant of the single linkage algorithm for hierarchical clustering… ▽ More

    Submitted 5 June, 2014; originally announced June 2014.

  12. arXiv:1206.6381  [pdf, other cs.LG

    Shortest path distance in random k-nearest neighbor graphs

    Authors: Morteza Alamgir, Ulrike von Luxburg

    Abstract: Consider a weighted or unweighted k-nearest neighbor graph that has been built on n data points drawn randomly according to some density p on R^d. We study the convergence of the shortest path distance in such graphs as the sample size tends to infinity. We prove that for unweighted kNN graphs, this distance converges to an unpleasant distance function on the underlying space whose properties are… ▽ More

    Submitted 9 July, 2012; v1 submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  13. arXiv:1105.0540  [pdf, ps, other stat.ML

    Pruning nearest neighbor cluster trees

    Authors: Samory Kpotufe, Ulrike von Luxburg

    Abstract: Nearest neighbor (k-NN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is it possible to identify spurious structures that might arise due to sampling variability? Our first contribution is a statistical analysis that reveals how c… ▽ More

    Submitted 5 May, 2011; v1 submitted 3 May, 2011; originally announced May 2011.

  14. arXiv:1102.2075  [pdf, other stat.ML

    How the result of graph clustering methods depends on the construction of the graph

    Authors: Markus Maier, Ulrike von Luxburg, Matthias Hein

    Abstract: We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the out… ▽ More

    Submitted 10 February, 2011; originally announced February 2011.

  15. Clustering Stability: An Overview

    Authors: Ulrike von Luxburg

    Abstract: A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for non-experts. In this paper we… ▽ More

    Submitted 7 July, 2010; originally announced July 2010.

    Journal ref: Foundations and Trends in Machine Learning, Vol. 2, No. 3, p. 235-274, 2010

  16. arXiv:1003.1266  [pdf, other cs.DS

    Hitting and commute times in large graphs are often misleading

    Authors: Ulrike von Luxburg, Agnes Radl, Matthias Hein

    Abstract: Next to the shortest path distance, the second most popular distance function between vertices in a graph is the commute distance (resistance distance). For two vertices u and v, the hitting time H_{uv} is the expected time it takes a random walk to travel from u to v. The commute time is its symmetrized version C_{uv} = H_{uv} + H_{vu}. In our paper we study the behavior of hitting times and comm… ▽ More

    Submitted 26 May, 2011; v1 submitted 5 March, 2010; originally announced March 2010.

  17. Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

    Authors: Markus Maier, Matthias Hein, Ulrike von Luxburg

    Abstract: We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest neighbor or symmetric k-nearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected… ▽ More

    Submitted 17 December, 2009; originally announced December 2009.

    Comments: 31 pages, 2 figures

    Journal ref: Theoretical Computer Science, 410(19), 1749-1764, April 2009

  18. arXiv:0907.5494  [pdf, other stat.ML

    How the initialization affects the stability of the k-means algorithm

    Authors: Sebastien Bubeck, Marina Meila, Ulrike von Luxburg

    Abstract: We investigate the role of the initialization for the stability of the k-means clustering algorithm. As opposed to other papers, we consider the actual k-means algorithm and do not ignore its property of getting stuck in local optima. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and wh… ▽ More

    Submitted 31 July, 2009; originally announced July 2009.

  19. arXiv:0810.4752  [pdf, other stat.ML

    Statistical Learning Theory: Models, Concepts, and Results

    Authors: Ulrike von Luxburg, Bernhard Schoelkopf

    Abstract: Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms. In this article we attempt to give a gentle, non-technical overview over the key ideas and insights of statistical learning theory. We target at a broad audience, not necessarily machine learning researchers. This paper can serve as a starting point for people who want to get an overview o… ▽ More

    Submitted 27 October, 2008; originally announced October 2008.

  20. Consistency of spectral clustering

    Authors: Ulrike von Luxburg, Mikhail Belkin, Olivier Bousquet

    Abstract: Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new metho… ▽ More

    Submitted 4 April, 2008; originally announced April 2008.

    Comments: Published in at the Annals of Statistics ( by the Institute of Mathematical Statistics (

    Report number: IMS-AOS-AOS0287 MSC Class: 62G20 (Primary) 05C50 (Secondary)

    Journal ref: Annals of Statistics 2008, Vol. 36, No. 2, 555-586

  21. arXiv:0711.0198  [pdf, other stat.ME

    A Geometric Approach to Confidence Sets for Ratios: Fieller's Theorem, Generalizations, and Bootstrap

    Authors: Ulrike von Luxburg, Volker H. Franz

    Abstract: We present a geometric method to determine confidence sets for the ratio E(Y)/E(X) of the means of random variables X and Y. This method reduces the problem of constructing confidence sets for the ratio of two random variables to the problem of constructing confidence sets for the means of one-dimensional random variables. It is valid in a large variety of circumstances. In the case of normally… ▽ More

    Submitted 1 November, 2007; originally announced November 2007.

    Journal ref: Statistica Sinica, 19(3), 1095-1117. (2009)

  22. arXiv:0711.0189  [pdf, other cs.DS

    A Tutorial on Spectral Clustering

    Authors: Ulrike von Luxburg

    Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at… ▽ More

    Submitted 1 November, 2007; originally announced November 2007.

    Journal ref: Statistics and Computing 17(4), 2007

  23. arXiv:math/0608522  [pdf, ps, other math.ST

    Graph Laplacians and their convergence on random neighborhood graphs

    Authors: Matthias Hein, Jean-Yves Audibert, Ulrike von Luxburg

    Abstract: Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semi-supervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three di… ▽ More

    Submitted 27 June, 2007; v1 submitted 21 August, 2006; originally announced August 2006.

    Comments: Improved presentation, typos corrected, to appear in JMLR

    MSC Class: 62H12 (Primary) 62H30; 62G99 (Secondary)