Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Mon, 19 Feb 18
 [1] arXiv:1802.05753 [pdf, ps, other]

Title: Bayesian variable selection in linear dynamical systemsComments: 19 pagesSubjects: Methodology (stat.ME); Optimization and Control (math.OC); Quantitative Methods (qbio.QM)
We develop a method for reconstructing regulatory interconnection networks between variables evolving according to a linear dynamical system. The work is motivated by the problem of gene regulatory network inference, that is, finding causal effects between genes from gene expression time series data. In biological applications, the typical problem is that the sampling frequency is low, and consequentially the system identification problem is illposed. The low sampling frequency also makes it impossible to estimate derivatives directly from the data. We take a Bayesian approach to the problem, as it offers a natural way to incorporate prior information to deal with the illposedness, through the introduction of sparsity promoting prior for the underlying dynamics matrix. It also provides a framework for modelling both the process and measurement noises. We develop Markov Chain Monte Carlo samplers for the discretevalued zerostructure of the dynamics matrix, and for the continuoustime trajectory of the system.
 [2] arXiv:1802.05761 [pdf, other]

Title: Prediction of spatial functional random processes: Comparing functional and spatiotemporal kriging approachesComments: 33 pages, 11 figuresSubjects: Methodology (stat.ME)
In this paper, we present and compare functional and spatiotemporal (Sp.T.) kriging approaches to predict spatial functional random processes (which can also be viewed as Sp.T. random processes). Comparisons with respect to computational time and prediction performance via functional crossvalidation is evaluated, mainly through a simulation study but also on two real data sets. We restrict comparisons to Sp.T. kriging versus ordinary kriging for functional data (OKFD), since the more flexible functional kriging approaches, pointwise functional kriging (PWFK) and functional kriging total model, coincide with OKFD in several situations. We contribute with new knowledge by proving that OKFD and PWFK coincide under certain conditions. From the simulation study, it is concluded that the prediction performance for the two kriging approaches in general is rather equal for stationary Sp.T. processes, with a tendency for functional kriging to work better for small sample sizes and Sp.T. kriging to work better for large sample sizes. For nonstationary Sp.T. processes, with a common deterministic time trend and/or time varying variances and dependence structure, OKFD performs better than Sp.T. kriging irrespective of sample size. For all simulated cases, the computational time for OKFD was considerably lower compared to those for the Sp.T. kriging methods.
 [3] arXiv:1802.05778 [pdf, other]

Title: A comparison of machine learning techniques for taxonomic classification of teeth from the Family BovidaeAuthors: Gregory J Matthews, Juliet K. Brophy, Maxwell P. Luetkemeier, Hongie Gu, George K. ThiruvathukalSubjects: Applications (stat.AP)
This study explores the performance of modern, accurate machine learning algorithms on the classification of fossil teeth in the Family Bovidae. Isolated bovid teeth are typically the most common fossils found in southern Africa and they often constitute the basis for paleoenvironmental reconstructions. Taxonomic identification of fossil bovid teeth, however, is often imprecise and subjective. Using modern teeth with known taxons, machine learning algorithms can be trained to classify fossils. Previous work by Brophy et. al. 2014 uses elliptical Fourier analysis of the form (size and shape) of the outline of the occlusal surface of each tooth as features in a linear discriminant analysis framework. This manuscript expands on that previous work by exploring how different machine learning approaches classify the teeth and testing which technique is best for classification. Five different machine learning techniques including linear discriminant analysis, neural networks, nuclear penalized multinomial regression, random forests, and support vector machines were used to estimate these models. Support vector machines and random forests perform the best in terms of both logloss and misclassification rate; both of these methods are improvements over linear discriminant analysis. With the identification and application of these superior methods, bovid teeth can be classified with higher accuracy.
 [4] arXiv:1802.05801 [pdf, ps, other]

Title: A Model Free Perspective for Linear Regression: Uniforminmodel Bounds for Post Selection InferenceSubjects: Statistics Theory (math.ST)
For the last two decades, highdimensional data and methods have proliferated throughout the literature. The classical technique of linear regression, however, has not lost its touch in applications. Most highdimensional estimation techniques can be seen as variable selection tools which lead to a smaller set of variables where classical linear regression technique applies. In this paper, we prove estimation error and linear representation bounds for the linear regression estimator uniformly over (many) subsets of variables. Based on deterministic inequalities, our results provide "good" rates when applied to both independent and dependent data. These results are useful in correctly interpreting the linear regression estimator obtained after exploring the data and also in post modelselection inference. All the results are derived under no model assumptions and are nonasymptotic in nature.
 [5] arXiv:1802.05811 [pdf, other]

Title: Distributed Stochastic Optimization via Adaptive Stochastic Gradient DescentSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Stochastic convex optimization algorithms are the most popular way to train machine learning models on largescale data. Scaling up the training process of these models is crucial in many applications, but the most popular algorithm, Stochastic Gradient Descent (SGD), is a serial algorithm that is surprisingly hard to parallelize. In this paper, we propose an efficient distributed stochastic optimization method based on adaptive step sizes and variance reduction techniques. We achieve a linear speedup in the number of machines, small memory footprint, and only a small number of synchronization rounds  logarithmic in dataset size  in which the computation nodes communicate with each other. Critically, our approach is a general reduction than parallelizes any serial SGD algorithm, allowing us to leverage the significant progress that has been made in designing adaptive SGD algorithms. We conclude by implementing our algorithm in the Spark distributed framework and exhibit dramatic performance gains on largescale logistic regression problems.
 [6] arXiv:1802.05814 [pdf, other]

Title: Variational Autoencoders for Collaborative FilteringComments: 10 pages, 3 figures. WWW 2018Subjects: Machine Learning (stat.ML); Information Retrieval (cs.IR); Learning (cs.LG)
We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This nonlinear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has informationtheoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several stateoftheart baselines, including two recentlyproposed neural network approaches, on several realworld datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.
 [7] arXiv:1802.05821 [pdf, other]

Title: Learning Latent Features with Pairwise Penalties in Matrix CompletionComments: 31 pages, 8 figuresSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Lowrank matrix completion (MC) has achieved great success in many realworld data applications. A latent feature model formulation is usually employed and, to improve prediction performance, the similarities between latent variables can be exploited by pairwise learning, e.g., the graph regularized matrix factorization (GRMF) method. However, existing GRMF approaches often use a squared L2 norm to measure the pairwise difference, which may be overly influenced by dissimilar pairs and lead to inferior prediction. To fully empower pairwise learning for matrix completion, we propose a general optimization framework that allows a rich class of (non)convex pairwise penalty functions. A new and efficient algorithm is further developed to uniformly solve the optimization problem, with a theoretical convergence guarantee. In an important situation where the latent variables form a small number of subgroups, its statistical guarantee is also fully characterized. In particular, we theoretically characterize the complexityregularized maximum likelihood estimator, as a special case of our framework. It has a better error bound when compared to the standard tracenorm regularized matrix completion. We conduct extensive experiments on both synthetic and real datasets to demonstrate the superior performance of this general framework.
 [8] arXiv:1802.05841 [pdf]

Title: Rapid Bayesian optimisation for synthesis of short polymer fiber materialsAuthors: Cheng Li, David Rubin de Celis Leal, Santu Rana, Sunil Gupta, Alessandra Sutti, Stewart Greenhill, Teo Slezak, Murray Height, Svetha VenkateshComments: Scientific Report 2017Subjects: Machine Learning (stat.ML); Computational Physics (physics.compph)
The discovery of processes for the synthesis of new materials involves many decisions about process design, operation, and material properties. Experimentation is crucial but as complexity increases, exploration of variables can become impractical using traditional combinatorial approaches. We describe an iterative method which uses machine learning to optimise process development, incorporating multiple qualitative and quantitative objectives. We demonstrate the method with a novel fluid processing platform for synthesis of short polymer fibers, and show how the synthesis process can be efficiently directed to achieve material and process objectives.
 [9] arXiv:1802.05842 [pdf, other]

Title: Neural Granger Causality for Nonlinear Time SeriesSubjects: Machine Learning (stat.ML)
While most classical approaches to Granger causality detection assume linear dynamics, many interactions in applied domains, like neuroscience and genomics, are inherently nonlinear. In these cases, using linear models may lead to inconsistent estimation of Granger causal interactions. We propose a class of nonlinear methods by applying structured multilayer perceptrons (MLPs) or recurrent neural networks (RNNs) combined with sparsityinducing penalties on the weights. By encouraging specific sets of weights to be zeroin particular through the use of convex grouplasso penaltieswe can extract the Granger causal structure. To further contrast with traditional approaches, our framework naturally enables us to efficiently capture longrange dependencies between series either via our RNNs or through an automatic lag selection in the MLP. We show that our neural Granger causality methods outperform stateoftheart nonlinear Granger causality methods on the DREAM3 challenge data. This data consists of nonlinear gene expression and regulation time courses with only a limited number of time points. The successes we show in this challenging dataset provide a powerful example of how deep learning can be useful in cases that go beyond prediction on large datasets. We likewise demonstrate our methods in detecting nonlinear interactions in a human motion capture dataset.
 [10] arXiv:1802.05846 [pdf, other]

Title: Train on Validation: Squeezing the Data LemonSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Model selection on validation data is an essential step in machine learning. While the mixing of data between training and validation is considered taboo, practitioners often violate it to increase performance. Here, we offer a simple, practical method for using the validation set for training, which allows for a continuous, controlled tradeoff between performance and overfitting of model selection. We define the notion of onaveragevalidationstable algorithms as one in which using small portions of validation data for training does not overfit the model selection process. We then prove that stable algorithms are also validation stable. Finally, we demonstrate our method on the MNIST and CIFAR10 datasets using stable algorithms as well as stateoftheart neural networks. Our results show significant increase in test performance with a minor tradeoff in bias admitted to the model selection process.
 [11] arXiv:1802.05917 [pdf, ps, other]

Title: Robust estimation in controlled branching processes: Bayesian estimators via disparitiesComments: Paper and suplementary materialSubjects: Methodology (stat.ME)
This paper is concerned with Bayesian inferential methods for data from controlled branching processes that account for model robustness through the use of disparities. Under regularity conditions, we establish that estimators built on disparitybased posterior, such as expectation and maximum a posteriori estimates, are consistent and efficient under the posited model. Additionally, we show that the estimates are robust to model misspecification and presence of aberrant outliers. To this end, we develop several fundamental ideas relating minimum disparity estimators to Bayesian estimators built on the disparitybased posterior, for dependent treestructured data. We illustrate the methodology through a simulated example and apply our methods to a real data set from cell kinetics.
 [12] arXiv:1802.05936 [pdf, other]

Title: Bayesian crossvalidation of geostatistical modelsSubjects: Computation (stat.CO)
The problem of validating or criticising models for georeferenced data is challenging, since the conclusions can vary significantly depending on the locations of the validation set. This work proposes the use of crossvalidation techniques to assess the goodness of fit of spatial models in different regions of the spatial domain to account for uncertainty in the choice of the validation sets. An obvious problem with the basic crossvalidation scheme is that it is based on selecting only a few out of sample locations to validate the model, possibily making the conclusions sensitive to which partition of the data into training and validation cases is utilized. A possible solution to this issue would be to consider all possible configurations of data divided into training and validation observations. From a Bayesian point of view, this could be computationally demanding, as estimation of parameters usually requires Monte Carlo Markov Chain methods. To deal with this problem, we propose the use of estimated discrepancy functions considering all configurations of data partition in a computationally efficient manner based on sampling importance resampling. In particular, we consider uncertainty in the locations by assigning a prior distribution to them. Furthermore, we propose a stratified crossvalidation scheme to take into account spatial heterogeneity, reducing the total variance of estimated predictive discrepancy measures considered for model assessment. We illustrate the advantages of our proposal with simulated examples of homogeneous and inhomogeneous spatial processes to investigate the effects of our proposal in scenarios of preferential sampling designs. The methods are illustrated with an application to a rainfall dataset.
 [13] arXiv:1802.05975 [pdf, other]

Title: Nonparametric Bayesian estimation of multivariate Hawkes processesSubjects: Statistics Theory (math.ST)
This paper studies nonparametric estimation of parameters of multivariate Hawkes processes. We consider the Bayesian setting and derive posterior concentration rates. First rates are derived for L1metrics for stochastic intensities of the Hawkes process. We then deduce rates for the L1norm of interactions functions of the process. Our results are exemplified by using priors based on piecewise constant functions, with regular or random partitions and priors based on mixtures of Betas distributions. Numerical illustrations are then proposed with in mind applications for inferring functional connectivity graphs of neurons.
 [14] arXiv:1802.05983 [pdf, other]

Title: Disentangling by FactorisingComments: Shorter version appeared in Learning Disentangled Representations: From Perception to Control workshop at NIPS, 2017: this https URLSubjects: Machine Learning (stat.ML); Learning (cs.LG)
We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon $\beta$VAE by providing a better tradeoff between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.
 [15] arXiv:1802.06009 [pdf, ps, other]

Title: Dropout Model Evaluation in MOOCsJournalref: Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI18), 2018Subjects: Applications (stat.AP); Computers and Society (cs.CY); Methodology (stat.ME); Machine Learning (stat.ML)
The field of learning analytics needs to adopt a more rigorous approach for predictive model evaluation that matches the complex practice of modelbuilding. In this work, we present a procedure to statistically test hypotheses about model performance which goes beyond the stateofthepractice in the community to analyze both algorithms and feature extraction methods from raw data. We apply this method to a series of algorithms and feature sets derived from a large sample of Massive Open Online Courses (MOOCs). While a complete comparison of all potential modeling approaches is beyond the scope of this paper, we show that this approach reveals a large gap in dropout prediction performance between forum, assignment, and clickstreambased feature extraction methods, where the latter is significantly better than the former two, which are in turn indistinguishable from one another. This work has methodological implications for evaluating predictive or AIbased models of student success, and practical implications for the design and targeting of atrisk student models and interventions.
 [16] arXiv:1802.06018 [pdf]

Title: Automated Quality Assessment of (Citizen) Weather StationsSubjects: Applications (stat.AP)
Today we have access to a vast amount of weather, air quality, noise or radioactivity data collected by individual around the globe. This volunteered geographic information often contains data of uncertain and of heterogeneous quality, in particular when compared to official insitu measurements. This limits their application, as rigorous, workintensive data cleaning has to be performed, which reduces the amount of data and cannot be performed in realtime. In this paper, we propose dynamically learning the quality of individual sensors by optimizing a weighted Gaussian process regression using a genetic algorithm. We chose weather stations as our use case as these are the most common VGI measurements. The evaluation is done for the southwest of Germany in August 2016 with temperature data from the Wunderground network and the Deutsche Wetter Dienst (DWD), in total 1561 stations. Using a 10fold crossvalidation scheme based on the DWD ground truth, we can show significant improvements of the predicted sensor reading. In our experiment we were obtain a 12.5% improvement on the mean absolute error.
 [17] arXiv:1802.06037 [pdf, other]

Title: Policy Evaluation and Optimization with Continuous TreatmentsComments: appearing at AISTATS 2018Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments. Previous work for discrete treatment/action spaces focuses on inverse probability weighting (IPW) and doubly robust (DR) methods that use a rejection sampling approach for evaluation and the equivalent weighted classification problem for learning. In the continuous setting, this reduction fails as we would almost surely reject all observations. To tackle the case of continuous treatments, we extend the IPW and DR approaches to the continuous setting using a kernel function that leverages treatment proximity to attenuate discrete rejection. Our policy estimator is consistent and we characterize the optimal bandwidth. The resulting continuous policy optimizer (CPO) approach using our estimator achieves convergent regret and approaches the bestinclass policy for learnable policy classes. We demonstrate that the estimator performs well and, in particular, outperforms a discretizationbased benchmark. We further study the performance of our policy optimizer in a case study on personalized dosing based on a dataset of Warfarin patients, their covariates, and final therapeutic doses. Our learned policy outperforms benchmarks and nears the oraclebest linear policy.
 [18] arXiv:1802.06048 [pdf, other]

Title: Highdimensional covariance matrix estimation using a lowrank and diagonal decompositionSubjects: Methodology (stat.ME)
We study highdimensional covariance/precision matrix estimation under the assumption that the covariance/precision matrix can be decomposed into a lowrank component L and a diagonal component D. The rank of L can either be chosen to be small or controlled by a penalty function. Under moderate conditions on the population covariance/precision matrix itself and on the penalty function, we prove some consistency results for our estimators. A blockwise coordinate descent algorithm, which iteratively updates L and D, is then proposed to obtain the estimator in practice. Finally, various numerical experiments are presented: using simulated data, we show that our estimator performs quite well in terms of the KullbackLeibler loss; using stock return data, we show that our method can be applied to obtain enhanced solutions to the Markowitz portfolio selection problem.
 [19] arXiv:1802.06052 [pdf, other]

Title: Online Continuous Submodular MaximizationComments: Accepted by AISTATS 2018Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
In this paper, we consider an online optimization process, where the objective functions are not convex (nor concave) but instead belong to a broad class of continuous submodular functions. We first propose a variant of the FrankWolfe algorithm that has access to the full gradient of the objective functions. We show that it achieves a regret bound of $O(\sqrt{T})$ (where $T$ is the horizon of the online optimization problem) against a $(11/e)$approximation to the best feasible solution in hindsight. However, in many scenarios, only an unbiased estimate of the gradients are available. For such settings, we then propose an online stochastic gradient ascent algorithm that also achieves a regret bound of $O(\sqrt{T})$ regret, albeit against a weaker $1/2$approximation to the best feasible solution in hindsight. We also generalize our results to $\gamma$weakly submodular functions and prove the same sublinear regret bounds. Finally, we demonstrate the efficiency of our algorithms on a few problem instances, including nonconvex/nonconcave quadratic programs, multilinear extensions of submodular set functions, and Doptimal design.
 [20] arXiv:1802.06054 [pdf, other]

Title: Learning Patterns for Detection with Multiscale Scan StatisticsAuthors: James SharpnackSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Methodology (stat.ME)
This paper addresses detecting anomalous patterns in images, timeseries, and tensor data when the location and scale of the pattern is unknown a priori. The multiscale scan statistic convolves the proposed pattern with the image at various scales and returns the maximum of the resulting tensor. Scale corrected multiscale scan statistics apply different standardizations at each scale, and the limiting distribution under the null hypothesisthat the data is only noiseis known for smooth patterns. We consider the problem of simultaneously learning and detecting the anomalous pattern from a dictionary of smooth patterns and a database of many tensors. To this end, we show that the multiscale scan statistic is a subexponential random variable, and prove a chaining lemma for standardized suprema, which may be of independent interest. Then by averaging the statistics over the database of tensors we can learn the pattern and obtain Bernsteintype error bounds. We will also provide a construction of an $\epsilon$net of the location and scale parameters, providing a computationally tractable approximation with similar error bounds.
Crosslists for Mon, 19 Feb 18
 [21] arXiv:1802.05283 (crosslist from cs.LG) [pdf, other]

Title: Designing Random Graph Models Using Variational Autoencoders With Applications to Chemical DesignSubjects: Learning (cs.LG); Physics and Society (physics.socph); Machine Learning (stat.ML)
Deep generative models have been praised for their ability to learn smooth latent representation of images, text, and audio, which can then be used to generate new, plausible data. However, current generative models are unable to work with graphs due to their unique characteristicstheir underlying structure is not Euclidean or gridlike, they remain isomorphic under permutation of the nodes labels, and they come with a different number of nodes and edges. In this paper, we propose a variational autoencoder for graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. Moreover, the decoder is able to guarantee a set of local structural and functional properties in the generated graphs. Experiments reveal that our model is able to learn and mimic the generative process of several wellknown random graph models and can be used to create new molecules more effectively than several state of the art methods.
 [22] arXiv:1802.05733 (crosslist from cs.LG) [pdf, other]

Title: Fair Clustering Through FairletsJournalref: NIPS 2017: 50365044Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We study the question of fair clustering under the {\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the $k$center and the $k$median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventionsfor instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NPhard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically quantify the value of fair clustering on realworld datasets with sensitive attributes.
 [23] arXiv:1802.05756 (crosslist from cs.LG) [pdf, other]

Title: Inferring relevant features: from QFT to PCAAuthors: Cédric BénySubjects: Learning (cs.LG); Quantum Physics (quantph); Machine Learning (stat.ML)
In manybody physics, renormalization techniques are used to extract aspects of a statistical or quantum state that are relevant at large scale, or for low energy experiments. Recent works have proposed that these features can be formally identified as those perturbations of the states whose distinguishability most resist coarsegraining. Here, we examine whether this same strategy can be used to identify important features of an unlabeled dataset. This approach indeed results in a technique very similar to kernel PCA (principal component analysis), but with a kernel function that is automatically adapted to the data, or "learned". We test this approach on handwritten digits, and find that the most relevant features are significantly better for classification than those obtained from a simple gaussian kernel.
 [24] arXiv:1802.05757 (crosslist from cs.LG) [pdf, other]

Title: Stochastic Wasserstein BarycentersSubjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We present a stochastic algorithm to compute the barycenter of a set of probability distributions under the Wasserstein metric from optimal transport. Unlike previous approaches, our method extends to continuous input distributions and allows the support of the barycenter to be adjusted in each iteration. We tackle the problem without regularization, allowing us to recover a sharp output whose support is contained within the support of the true barycenter. We give examples where our algorithm recovers a more meaningful barycenter than previous work. Our method is versatile and can be extended to applications such as generating super samples from a given distribution and recovering blue noise approximations.
 [25] arXiv:1802.05779 (crosslist from quantph) [pdf, other]

Title: Quantum Variational AutoencoderComments: 12 pages, 3 figures, 2 tablesSubjects: Quantum Physics (quantph); Learning (cs.LG); Machine Learning (stat.ML)
Variational autoencoders (VAEs) are powerful generative models with the salient ability to perform inference. Here, we introduce a \emph{quantum variational autoencoder} (QVAE): a VAE whose latent generative process is implemented as a quantum Boltzmann machine (QBM). We show that our model can be trained endtoend by maximizing a welldefined lossfunction: a "quantum" lowerbound to a variational approximation of the loglikelihood. We use quantum Monte Carlo (QMC) simulations to train and evaluate the performance of QVAEs. To achieve the best performance, we first create a VAE platform with discrete latent space generated by a restricted Boltzmann machine (RBM). Our model achieves stateoftheart performance on the MNIST dataset when compared against similar approaches that only involve discrete variables in the generative process. We consider QVAEs with a smaller number of latent units to be able to perform QMC simulations, which are computationally expensive. We show that QVAEs can be trained effectively in regimes where quantum effects are relevant despite training via the quantum bound. Our findings open the way to the use of quantum computers to train QVAEs to achieve competitive performance for generative models. Placing a QBM in the latent space of a VAE leverages the full potential of current and nextgeneration quantum computers as sampling devices.
 [26] arXiv:1802.05786 (crosslist from cs.AI) [pdf]

Title: Truth Validation with EvidenceComments: 40 pages (including Appendix), 3 tables, 3 figuresSubjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In the modern era, abundant information is easily accessible from various sources, however only a few of these sources are reliable as they mostly contain unverified contents. We develop a system to validate the truthfulness of a given statement together with underlying evidence. The proposed system provides supporting evidence when the statement is tagged as false. Our work relies on an inference method on a knowledge graph (KG) to identify the truthfulness of statements. In order to extract the evidence of falseness, the proposed algorithm takes into account combined knowledge from KG and ontologies. The system shows very good results as it provides valid and concise evidence. The quality of KG plays a role in the performance of the inference method which explicitly affects the performance of our evidenceextracting algorithm.
 [27] arXiv:1802.05792 (crosslist from cs.LG) [pdf]

Title: Masked Conditional Neural Networks for Automatic Sound Events RecognitionComments: Restricted Boltzmann Machine, RBM, Conditional RBM, CRBM, Deep Belief Net, DBN, Conditional Neural Network, CLNN, Masked Conditional Neural Network, MCLNN, Environmental Sound Recognition, ESRJournalref: IEEE International Conference on Data Science and Advanced Analytics (DSAA) Year: 2017, Pages: 389  394Subjects: Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Deep neural network architectures designed for application domains other than sound, especially image recognition, may not optimally harness the timefrequency representation when adapted to the sound recognition problem. In this work, we explore the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) for multidimensional temporal signal recognition. The CLNN considers the interframe relationship, and the MCLNN enforces a systematic sparseness over the network's links to enable learning in frequency bands rather than bins allowing the network to be frequency shift invariant mimicking a filterbank. The mask also allows considering several combinations of features concurrently, which is usually handcrafted through exhaustive manual search. We applied the MCLNN to the environmental sound recognition problem using the ESC10 and ESC50 datasets. MCLNN achieved competitive performance, using 12% of the parameters and without augmentation, compared to stateoftheart Convolutional Neural Networks.
 [28] arXiv:1802.05799 (crosslist from cs.LG) [pdf, other]

Title: Horovod: fast and easy distributed deep learning in TensorFlowSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support interGPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of interGPU communication. Depending on the training library's API, the modification required may be either significant or minimal.
Existing methods for enabling multiGPU training under the TensorFlow library entail nonnegligible communication overhead and require users to heavily modify their modelbuilding code, leading many researchers to avoid the whole mess and stick with slower singleGPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient interGPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at https://github.com/uber/horovod.  [29] arXiv:1802.05800 (crosslist from cs.CV) [pdf, other]

Title: TreeCNN: A Deep Convolutional Neural Network for Lifelong LearningComments: 10 pages, 8 figures, 6 tables. Submitted to IEEE Transactions on Pattern Analysis and Machine IntelligenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
In recent years, Convolutional Neural Networks (CNNs) have shown remarkable performance in many computer vision tasks such as object recognition and detection. However, complex training issues, such as "catastrophic forgetting" and hyperparameter tuning, make incremental learning in CNNs a difficult challenge. In this paper, we propose a hierarchical deep neural network, with CNNs at multiple levels, and a corresponding training method for lifelong learning. The network grows in a treelike manner to accommodate the new classes of data without losing the ability to identify the previously trained classes. The proposed network was tested on CIFAR10 and CIFAR100 datasets, and compared against the method of fine tuning specific layers of a conventional CNN. We obtained comparable accuracies and achieved 40% and 20% reduction in training effort in CIFAR10 and CIFAR 100 respectively. The network was able to organize the incoming classes of data into featuredriven superclasses. Our model improves upon existing hierarchical CNN models by adding the capability of selfgrowth and also yields important observations on feature selective classification.
 [30] arXiv:1802.05822 (crosslist from cs.LG) [pdf, other]

Title: AutoEncoding Total Correlation ExplanationSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Advances in unsupervised learning enable reconstruction and generation of samples from complex distributions, but this success is marred by the inscrutability of the representations learned. We propose an informationtheoretic approach to characterizing disentanglement and dependence in representation learning using multivariate mutual information, also called total correlation. The principle of total Correlation Explanation (CorEx) has motivated successful unsupervised learning applications across a variety of domains, but under some restrictive assumptions. Here we relax those restrictions by introducing a flexible variational lower bound to CorEx. Surprisingly, we find that this lower bound is equivalent to the one in variational autoencoders (VAE) under certain conditions. This informationtheoretic view of VAE deepens our understanding of hierarchical VAE and motivates a new algorithm, AnchorVAE, that makes latent codes more interpretable through information maximization and enables generation of richer and more realistic samples.
 [31] arXiv:1802.05828 (crosslist from cs.SY) [pdf]

Title: Improving Power Grid Resilience Through Predictive Outage EstimationJournalref: Power Symposium (NAPS), 2017 North AmericanSubjects: Systems and Control (cs.SY); Applications (stat.AP)
In this paper, in an attempt to improve power grid resilience, a machine learning model is proposed to predictively estimate the component states in response to extreme events. The proposed model is based on a multidimensional Support Vector Machine (SVM) considering the associated resilience index, i.e., the infrastructure quality level and the time duration that each component can withstand the event, as well as predicted path and intensity of the upcoming extreme event. The outcome of the proposed model is the classified component state data to two categories of outage and operational, which can be further used to schedule system resources in a predictive manner with the objective of maximizing its resilience. The proposed model is validated using \"Afold crossvalidation and model benchmarking techniques. The performance of the model is tested through numerical simulations and based on a welldefined and commonlyused performance measure.
 [32] arXiv:1802.05844 (crosslist from cs.AI) [pdf, ps, other]

Title: A Unified View of Causal and Noncausal Feature SelectionSubjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we unify causal and noncausal feature feature selection methods based on the Bayesian network framework. We first show that the objectives of causal and noncausal feature selection methods are equal and are to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We demonstrate that causal and noncausal feature selection take different assumptions of dependency among features to find Markov blanket, and their algorithms are shown different level of approximation for finding Markov blanket. In this framework, we are able to analyze the sample and error bounds of casual and noncausal methods. We conducted extensive experiments to show the correctness of our theoretical analysis.
 [33] arXiv:1802.05872 (crosslist from cs.DC) [pdf, other]

Title: Online Machine Learning in Big Data StreamsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)
The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives.
In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems.
This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related subfields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail.  [34] arXiv:1802.05889 (crosslist from cs.LG) [pdf, other]

Title: Combining Linear NonGaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete DataSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Estimating causal models from observational data is a crucial task in data analysis. For continuousvalued data, Shimizu et al. have proposed a linear acyclic nonGaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same problem are common in practice. Most existing causal discovery methods either ignore the discrete data and apply a continuousvalued algorithm or discretize all the continuous data and then apply a discrete Bayesian network approach. These methods possibly loss important information when we ignore discrete data or introduce the approximation error due to discretization. In this paper, we define a novel hybrid causal model which consists of both continuous and discrete variables. The model assumes: (1) the value of a continuous variable is a linear function of its parent variables plus a nonGaussian noise, and (2) each discrete variable is a logistic variable whose distribution parameters depend on the values of its parent variables. In addition, we derive the BIC scoring function for model selection. The new discovery algorithm can learn causal structures from mixed continuous and discrete data without discretization. We empirically demonstrate the power of our method through thorough simulations.
 [35] arXiv:1802.05910 (crosslist from cs.LG) [pdf, other]

Title: Pattern Localization in Time Series through SignalToModel Alignment in Latent SpaceComments: IEEE ICASSP 2018Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we study the problem of locating a predefined sequence of patterns in a time series. In particular, the studied scenario assumes a theoretical model is available that contains the expected locations of the patterns. This problem is found in several contexts, and it is commonly solved by first synthesizing a time series from the model, and then aligning it to the true time series through dynamic time warping. We propose a technique that increases the similarity of both time series before aligning them, by mapping them into a latent correlation space. The mapping is learned from the data through a machinelearning setup. Experiments on data from nondestructive testing demonstrate that the proposed approach shows significant improvements over the state of the art.
 [36] arXiv:1802.05957 (crosslist from cs.LG) [pdf, other]

Title: Spectral Normalization for Generative Adversarial NetworksComments: Published as a conference paper at ICLR 2018Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SNGANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.
 [37] arXiv:1802.05968 (crosslist from cs.IT) [pdf, other]

Title: Information Theory: A Tutorial IntroductionAuthors: James V StoneSubjects: Information Theory (cs.IT); Machine Learning (stat.ML)
Shannon's mathematical theory of communication defines fundamental limits on how much information can be transmitted between the different components of any manmade or biological system. This paper is an informal but rigorous introduction to the main ideas implicit in Shannon's theory. An annotated reading list is provided for further reading.
 [38] arXiv:1802.05980 (crosslist from qbio.QM) [pdf, other]

Title: WHInter: A Working set algorithm for Highdimensional sparse second order Interaction modelsSubjects: Quantitative Methods (qbio.QM); Learning (cs.LG); Machine Learning (stat.ML)
Learning sparse linear models with twoway interactions is desirable in many application domains such as genomics. l1regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate twoway interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features. Here we present WHInter, a working set algorithm to solve large l1regularised problems with twoway interactions for binary design matrices. The novelty of WHInter stems from a new bound to efficiently identify working sets while avoiding to scan all features, and on fast computations inspired from solutions to the maximum inner product search problem. We apply WHInter to simulated and real genetic data and show that it is more scalable and two orders of magnitude faster than the state of the art.
 [39] arXiv:1802.05981 (crosslist from cs.LG) [pdf, other]

Title: Tensorbased Nonlinear Classifier for HighOrder Data AnalysisAuthors: Konstantinos Makantasis, Anastasios Doulamis, Nikolaos Doulamis, Antonis Nikitakis, Athanasios VoulodimosComments: To appear in IEEE ICASSP 2018. arXiv admin note: text overlap with arXiv:1709.08164Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In this paper we propose a tensorbased nonlinear model for highorder data classification. The advantages of the proposed scheme are that (i) it significantly reduces the number of weight parameters, and hence of required training samples, and (ii) it retains the spatial structure of the input samples. The proposed model, called \textit{Rank}1 FNN, is based on a modification of a feedforward neural network (FNN), such that its weights satisfy the {\it rank}1 canonical decomposition. We also introduce a new learning algorithm to train the model, and we evaluate the \textit{Rank}1 FNN on thirdorder hyperspectral data. Experimental results and comparisons indicate that the proposed model outperforms state of the art classification methods, including deep learning based ones, especially in cases with small numbers of available training samples.
 [40] arXiv:1802.05992 (crosslist from cs.LG) [pdf, other]

Title: Improved GQCNN: Deep Learning Model for Planning Robust GraspsAuthors: Maciej Jaśkowski (1), Jakub Świątkowski (1), Michał Zając (1), Maciej Klimek (1), Jarek Potiuk (1), Piotr Rybicki (1), Piotr Polatowski (1), Przemysław Walczyk (1), Kacper Nowicki (1), Marek Cygan (1 and 2) ((1) NoMagic.AI, (2) Institute of Informatics, University of Warsaw)Comments: 6 pages, 3 figuresSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
Recent developments in the field of robot grasping have shown great improvements in the grasp success rates when dealing with unknown objects. In this work we improve on one of the most promising approaches, the Grasp Quality Convolutional Neural Network (GQCNN) trained on the DexNet 2.0 dataset. We propose a new architecture for the GQCNN and describe practical improvements that increase the model validation accuracy from 92.2% to 95.8% and from 85.9% to 88.0% on respectively imagewise and objectwise training and validation splits.
 [41] arXiv:1802.06014 (crosslist from cs.LG) [pdf, other]

Title: OrthogonalityPromoting Distance Metric Learning: Convex Relaxation and Theoretical AnalysisSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Distance metric learning (DML), which learns a distance metric from labeled "similar" and "dissimilar" data pairs, is widely utilized. Recently, several works investigate orthogonalitypromoting regularization (OPR), which encourages the projection vectors in DML to be close to being orthogonal, to achieve three effects: (1) high balancedness  achieving comparable performance on both frequent and infrequent classes; (2) high compactness  using a small number of projection vectors to achieve a "good" metric; (3) good generalizability  alleviating overfitting to training data. While showing promising results, these approaches suffer three problems. First, they involve solving nonconvex optimization problems where achieving the global optimal is NPhard. Second, it lacks a theoretical understanding why OPR can lead to balancedness. Third, the current generalization error analysis of OPR is not directly on the regularizer. In this paper, we address these three issues by (1) seeking convex relaxations of the original nonconvex problems so that the global optimal is guaranteed to be achievable; (2) providing a formal analysis on OPR's capability of promoting balancedness; (3) providing a theoretical analysis that directly reveals the relationship between OPR and generalization performance. Experiments on various datasets demonstrate that our convex methods are more effective in promoting balancedness, compactness, and generalization, and are computationally more efficient, compared with the nonconvex methods.
Replacements for Mon, 19 Feb 18
 [42] arXiv:1606.00451 (replaced) [pdf, other]

Title: GraphGuided Banding of the Covariance MatrixAuthors: Jacob BienSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO); Machine Learning (stat.ML)
 [43] arXiv:1610.06462 (replaced) [pdf, other]

Title: Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteriaComments: 25 pages, 11 figuresSubjects: Machine Learning (stat.ML); Applications (stat.AP); Methodology (stat.ME)
 [44] arXiv:1611.02762 (replaced) [pdf, other]

Title: Generalized Cluster Trees and Singular MeasuresAuthors: YenChi ChenComments: 51 pages, 6 figuresSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
 [45] arXiv:1701.07506 (replaced) [pdf, ps, other]

Title: Bayesian Hierarchical Models with Conjugate FullConditional Distributions for Dependent Data from the Natural Exponential FamilySubjects: Methodology (stat.ME)
 [46] arXiv:1703.05840 (replaced) [pdf, other]

Title: Conditional Accelerated Lazy Stochastic Gradient DescentComments: 37 pages, 9 figuresSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [47] arXiv:1704.01113 (replaced) [pdf, ps, other]

Title: Damped Posterior Linearization FilterSubjects: Optimization and Control (math.OC); Computation (stat.CO)
 [48] arXiv:1704.09011 (replaced) [pdf, other]

Title: Mostly ExplorationFree Algorithms for Contextual BanditsComments: 6 FiguresSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [49] arXiv:1708.01383 (replaced) [pdf, other]

Title: VarianceReduced Stochastic Learning under Random ReshufflingSubjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [50] arXiv:1708.07164 (replaced) [pdf, ps, other]

Title: NewtonType Methods for NonConvex Optimization Under Inexact Hessian InformationComments: fix some constant in lemmas and proofs in appendix cleaned up a bitSubjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Learning (cs.LG); Machine Learning (stat.ML)
 [51] arXiv:1708.07827 (replaced) [pdf, other]

Title: SecondOrder Optimization for NonConvex Machine Learning: An Empirical StudyComments: 21 pages, 11 figures. Restructure the paper and add experimentsSubjects: Optimization and Control (math.OC); Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
 [52] arXiv:1709.05262 (replaced) [pdf, other]

Title: Supervising Unsupervised LearningComments: 11 two column pages. arXiv admin note: substantial text overlap with arXiv:1612.09030Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
 [53] arXiv:1710.03740 (replaced) [pdf, other]

Title: Mixed Precision TrainingAuthors: Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao WuComments: Published as a conference paper at ICLR 2018Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
 [54] arXiv:1710.05209 (replaced) [pdf, ps, other]

Title: Settling the Sample Complexity for Learning Mixtures of GaussiansAuthors: Hassan Ashtiani, Shai BenDavid, Nick Harvey, Christopher Liaw, Abbas Mehrabian, Yaniv PlanComments: 38 pagesSubjects: Learning (cs.LG); Statistics Theory (math.ST)
 [55] arXiv:1710.08864 (replaced) [pdf, other]

Title: One pixel attack for fooling deep neural networksSubjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [56] arXiv:1711.06598 (replaced) [pdf, other]

Title: How Wrong Am I?  Studying Adversarial Examples and their Impact on Uncertainty in Gaussian Process Machine Learning ModelsComments: 8 pages, 7 pages appendix, 8 figures and 13 tables; improved writing and figuresSubjects: Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)
 [57] arXiv:1712.04248 (replaced) [pdf, other]

Title: DecisionBased Adversarial Attacks: Reliable Attacks Against BlackBox Machine Learning ModelsComments: Published as a conference paper at the Sixth International Conference on Learning Representations (ICLR 2018) this https URLSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
 [58] arXiv:1802.00047 (replaced) [pdf, other]

Title: Matrix completion with deterministic pattern  a geometric perspectiveSubjects: Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [59] arXiv:1802.03001 (replaced) [pdf, ps, other]

Title: Statistical Learnability of Generalized Additive Models based on Total Variation RegularizationAuthors: Shin MatsushimaSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [60] arXiv:1802.04161 (replaced) [pdf]

Title: Risk Factors Associated with Mortality in Game of Thrones: A Longitudinal Cohort StudyAuthors: Suveen Angraal, Ambika Bhatnagar, Suraj Verma, Sukhman Shergill, Aakriti Gupta, Rohan KheraComments: 6 Pages, 2 Tables and 1 FigureSubjects: Other Statistics (stat.OT)
 [61] arXiv:1802.04956 (replaced) [pdf, ps, other]

Title: D2KE: From Distance to Kernel and EmbeddingComments: 18 pages, 4 tablesSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [62] arXiv:1802.04987 (replaced) [pdf, other]

Title: PlayeRank: Multidimensional and roleaware rating of soccer player performanceAuthors: Luca Pappalardo, Paolo Cintia, Paolo Ferragina, Emanuele Massucco, Dino Pedreschi, Fosca GiannottiSubjects: Applications (stat.AP); Artificial Intelligence (cs.AI)
 [63] arXiv:1802.05074 (replaced) [pdf, other]

Title: L4: Practical lossbased stepsize adaptation for deep learningSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [64] arXiv:1802.05688 (replaced) [pdf, other]

Title: Simulation assisted machine learningSubjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (qbio.QM)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)