We gratefully acknowledge support from
the Simons Foundation
and member institutions


New submissions

[ total of 61 entries: 1-61 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 22 Jun 18

[1]  arXiv:1806.07921 [pdf, ps, other]
Title: Beta seasonal autoregressive moving average models
Comments: 26 pages, 5 figures, 4 tables
Journal-ref: Journal of Statistical Computation and Simulation, 2018
Subjects: Methodology (stat.ME)

In this paper we introduce the class of beta seasonal autoregressive moving average ($\beta$SARMA) models for modeling and forecasting time series data that assume values in the standard unit interval. It generalizes the class of beta autoregressive moving average models [Rocha and Cribari-Neto, Test, 2009] by incorporating seasonal dynamics to the model dynamic structure. Besides introducing the new class of models, we develop parameter estimation, hypothesis testing inference, and diagnostic analysis tools. We also discuss out-of-sample forecasting. In particular, we provide closed-form expressions for the conditional score vector and for the conditional Fisher information matrix. We also evaluate the finite sample performances of conditional maximum likelihood estimators and white noise tests using Monte Carlo simulations. An empirical application is presented and discussed.

[2]  arXiv:1806.07934 [pdf, other]
Title: A Function Emulation Approach for Intractable Distributions
Comments: 32 pages, 1 figure
Subjects: Computation (stat.CO)

Doubly intractable distributions arise in many settings, for example in Markov models for point processes and exponential random graph models for networks. Bayesian inference for these models is challenging because they involve intractable normalising "constants" that are actually functions of the parameters of interest. Although several clever computational methods have been developed for these models, each method suffers from computational issues that makes it computationally burdensome or even infeasible for many problems. We propose a novel algorithm that provides computational gains over existing methods by replacing Monte Carlo approximations to the normalising function with a Gaussian process-based approximation. We provide theoretical justification for this method. We also develop a closely related algorithm that is applicable more broadly to any likelihood function that is expensive to evaluate. We illustrate the application of our methods to a variety of challenging simulated and real data examples, including an exponential random graph model, a Markov point process, and a model for infectious disease dynamics. The algorithm shows significant gains in computational efficiency over existing methods, and has the potential for greater gains for more challenging problems. For a random graph model example, we show how this gain in efficiency allows us to carry out accurate Bayesian inference when other algorithms are computationally impractical.

[3]  arXiv:1806.08010 [pdf, other]
Title: Fairness Without Demographics in Repeated Loss Minimization
Comments: To appear ICML 2018
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Machine learning models (e.g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity---minority groups (e.g., non-native speakers) contribute less to the training objective and thus tend to suffer higher loss. Worse, as model accuracy affects user retention, a minority group can shrink over time. In this paper, we first show that the status quo of empirical risk minimization (ERM) amplifies representation disparity over time, which can even make initially fair models unfair. To mitigate this, we develop an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution. We prove that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice, while remaining oblivious to the identity of the groups. We demonstrate that DRO prevents disparity amplification on examples where ERM fails, and show improvements in minority group user satisfaction in a real-world text autocomplete task.

[4]  arXiv:1806.08031 [pdf, ps, other]
Title: A Constructive Algebraic Proof of Student's Theorem
Authors: Yiping Cheng
Comments: 4 pages, no figure
Subjects: Other Statistics (stat.OT)

Student's theorem is an important result in statistics which states that for normal population, the sample variance is independent from the sample mean and has a chi-square distribution. The existing proofs of this theorem either overly rely on advanced tools such as moment generating functions, or fail to explicitly construct an orthogonal matrix used in the proof. This paper provides an elegant explicit construction of that matrix, making the algebraic proof complete. The constructive algebraic proof proposed here is thus very suitable for being included in textbooks.

[5]  arXiv:1806.08059 [pdf, other]
Title: Avoiding Bias Due to Nonrandom Scheduling When Modeling Trends in Home-Field Advantage
Authors: Andrew T. Karl
Subjects: Applications (stat.AP)

Existing approaches for estimating home-field advantage (HFA) include modeling the difference between home and away scores as a function of the difference between home and away team ratings that are treated either as fixed or random effects. We uncover an upward bias in the mixed model HFA estimates that is due to the nonrandom structure of the schedule -- and thus the random effect design matrix -- and explore why the fixed effects model is not subject to the same bias. Intraconference HFAs and standard errors are calculated for each of 3 college sports and 3 professional sports over 18 seasons and then fitted with conference-specific slopes and intercepts to measure the potential linear population trend in HFA.

[6]  arXiv:1806.08069 [pdf, other]
Title: Deep Gaussian Process-Based Bayesian Inference for Contaminant Source Localization
Comments: 28 pages, 14 figures, submitted to IEEE Access
Subjects: Applications (stat.AP)

This paper proposes a Bayesian framework for localization of multiple sources in the event of accidental hazardous contaminant release. The framework assimilates sensor measurements of the contaminant concentration with an integrated multizone computational fluid dynamics (multizone-CFD) based contaminant fate and transport model. To ensure online tractability, the framework uses deep Gaussian process (DGP) based emulator of the multizone-CFD model. To effectively represent the transient response of the multizone-CFD model, the DGP emulator is reformulated using a matrix-variate Gaussian process prior. The resultant deep matrix-variate Gaussian process emulator (DMGPE) is used to define the likelihood of the Bayesian framework, while Markov Chain Monte Carlo approach is used to sample from the posterior distribution. The proposed method is evaluated for single and multiple contaminant sources localization tasks modeled by CONTAM simulator in a single-story building of 30 zones, demonstrating that proposed approach accurately perform inference on locations of contaminant sources. Moreover, the DMGP emulator outperforms both GP and DGP emulator with fewer number of hyperparameters.

[7]  arXiv:1806.08117 [pdf, other]
Title: A data-driven model order reduction approach for Stokes flow through random porous media
Comments: 2 pages, 2 figures
Subjects: Machine Learning (stat.ML); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

Direct numerical simulation of Stokes flow through an impermeable, rigid body matrix by finite elements requires meshes fine enough to resolve the pore-size scale and is thus a computationally expensive task. The cost is significantly amplified when randomness in the pore microstructure is present and therefore multiple simulations need to be carried out. It is well known that in the limit of scale-separation, Stokes flow can be accurately approximated by Darcy's law with an effective diffusivity field depending on viscosity and the pore-matrix topology. We propose a fully probabilistic, Darcy-type, reduced-order model which, based on only a few tens of full-order Stokes model runs, is capable of learning a map from the fine-scale topology to the effective diffusivity and is maximally predictive of the fine-scale response. The reduced-order model learned can significantly accelerate uncertainty quantification tasks as well as provide quantitative confidence metrics of the predictive estimates produced.

[8]  arXiv:1806.08141 [pdf, other]
Title: Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions
Comments: 27 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

By building up on the recent theory that established the connection between implicit generative modeling and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem, where the resulting algorithm resembles the recent dynamics-based Markov Chain Monte Carlo algorithms. We provide formal theoretical analysis where we prove finite-time error guarantees for the proposed algorithm. Our experimental results support our theory and shows that our algorithm is able to capture the structure of challenging distributions.

[9]  arXiv:1806.08144 [pdf, ps, other]
Title: Maximal skewness projections for scale mixtures of skew-normal vectors
Subjects: Methodology (stat.ME)

Multivariate scale mixtures of skew-normal (SMSN) variables are flexible models that account for non-normality in multivariate data scenarios by tail weight assessment and a shape vector representing the asymmetry of the model in a directional fashion. Its stochastic representation involves a skew-normal (SN) vector and a non negative mixing scalar variable, independent of the SN vector, that injects kurtosis into the SMSN model. We address the problem of finding the maximal skewness projection for vectors that follow a SMSN distribution; when simple conditions on the moments of the mixing variable are fulfilled, it can be shown that the direction yielding the maximal skewness is proportional to the shape vector. This finding stresses the directional nature of the asymmetry in this class of distributions; it also provides the theoretical foundations for solving the skewness model based projection pursuit for SMSN vectors. Some examples that show the validity of our theoretical findings for the most famous distributions within the SMSN family are also given. For the sake of completeness we carry out a simulation experiment with artificial data, which sheds light on the usefulness and implications of our result in the statistical practice.

[10]  arXiv:1806.08151 [pdf, ps, other]
Title: Robust and Efficient Boosting Method using the Conditional Risk
Comments: 14 Pages, 2 figures and 5 tables
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Well-known for its simplicity and effectiveness in classification, AdaBoost, however, suffers from overfitting when class-conditional distributions have significant overlap. Moreover, it is very sensitive to noise that appears in the labels. This article tackles the above limitations simultaneously via optimizing a modified loss function (i.e., the conditional risk). The proposed approach has the following two advantages. (1) It is able to directly take into account label uncertainty with an associated label confidence. (2) It introduces a "trustworthiness" measure on training samples via the Bayesian risk rule, and hence the resulting classifier tends to have finite sample performance that is superior to that of the original AdaBoost when there is a large overlap between class conditional distributions. Theoretical properties of the proposed method are investigated. Extensive experimental results using synthetic data and real-world data sets from UCI machine learning repository are provided. The empirical study shows the high competitiveness of the proposed method in predication accuracy and robustness when compared with the original AdaBoost and several existing robust AdaBoost algorithms.

[11]  arXiv:1806.08156 [pdf, ps, other]
Title: Identifiability of Gaussian Structural Equation Models with Dependent Errors Having Equal Variances
Authors: Jose M. Peña
Journal-ref: 7th Causal Inference Workshop at UAI 2018
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

In this paper, we prove that some Gaussian structural equation models with dependent errors having equal variances are identifiable from their corresponding Gaussian distributions. Specifically, we prove identifiability for the Gaussian structural equation models that can be represented as Andersson-Madigan-Perlman chain graphs (Andersson et al., 2001). These chain graphs were originally developed to represent independence models. However, they are also suitable for representing causal models with additive noise (Pe\~{n}a, 2016. Our result implies then that these causal models can be identified from observational data alone. Our result generalizes the result by Peters and B\"{u}hlmann (2014), who considered independent errors having equal variances. The suitability of the equal error variances assumption should be assessed on a per domain basis.

[12]  arXiv:1806.08195 [pdf, other]
Title: Probabilistic PARAFAC2
Comments: 16 pages (incl. 4 pages of supplemental material), 5 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

The PARAFAC2 is a multimodal factor analysis model suitable for analyzing multi-way data when one of the modes has incomparable observation units, for example because of differences in signal sampling or batch sizes. A fully probabilistic treatment of the PARAFAC2 is desirable in order to improve robustness to noise and provide a well founded principle for determining the number of factors, but challenging because the factor loadings are constrained to be orthogonal. We develop two probabilistic formulations of the PARAFAC2 along with variational procedures for inference: In the one approach, the mean values of the factor loadings are orthogonal leading to closed form variational updates, and in the other, the factor loadings themselves are orthogonal using a matrix Von Mises-Fisher distribution. We contrast our probabilistic formulation to the conventional direct fitting algorithm based on maximum likelihood. On simulated data and real fluorescence spectroscopy and gas chromatography-mass spectrometry data, we compare our approach to the conventional PARAFAC2 model estimation and find that the probabilistic formulation is more robust to noise and model order misspecification. The probabilistic PARAFAC2 thus forms a promising framework for modeling multi-way data accounting for uncertainty.

[13]  arXiv:1806.08200 [pdf, other]
Title: Mixtures of Experts Models
Comments: A chapter prepared for the forthcoming Handbook of Mixture Analysis
Subjects: Methodology (stat.ME)

Mixtures of experts models provide a framework in which covariates may be included in mixture models. This is achieved by modelling the parameters of the mixture model as functions of the concomitant covariates. Given their mixture model foundation, mixtures of experts models possess a diverse range of analytic uses, from clustering observations to capturing parameter heterogeneity in cross-sectional data. This chapter focuses on delineating the mixture of experts modelling framework and demonstrates the utility and flexibility of mixtures of experts models as an analytic tool.

[14]  arXiv:1806.08212 [pdf, other]
Title: A Review of Network Inference Techniques for Neural Activation Time Series
Comments: 8 pages, 2 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Studying neural connectivity is considered one of the most promising and challenging areas of modern neuroscience. The underpinnings of cognition are hidden in the way neurons interact with each other. However, our experimental methods of studying real neural connections at a microscopic level are still arduous and costly. An efficient alternative is to infer connectivity based on the neuronal activations using computational methods. A reliable method for network inference, would not only facilitate research of neural circuits without the need of laborious experiments but also reveal insights on the underlying mechanisms of the brain. In this work, we perform a review of methods for neural circuit inference given the activation time series of the neural population. Approaching it from machine learning perspective, we divide the methodologies into unsupervised and supervised learning. The methods are based on correlation metrics, probabilistic point processes, and neural networks. Furthermore, we add a data mining methodology inspired by influence estimation in social networks as a new supervised learning approach. For comparison, we use the small version of the Chalearn Connectomics competition, that is accompanied with ground truth connections between neurons. The experiments indicate that unsupervised learning methods perform better, however, supervised methods could surpass them given enough data and resources.

[15]  arXiv:1806.08258 [pdf, other]
Title: Subgroup Identification using Covariate Adjusted Interaction Trees
Subjects: Methodology (stat.ME)

We consider the problem of identifying sub-groups of participants in a clinical trial that have enhanced treatment effect. Recursive partitioning methods that recursively partition the covariate space based on some measure of between groups treatment effect difference are popular for such sub-group identification. The most commonly used recursive partitioning method, the classification and regression tree algorithm, first creates a large tree by recursively partitioning the covariate space using some splitting criteria and then selects the final tree from all subtrees of the large tree. In the context of subgroup identification, calculation of the splitting criteria and the evaluation measure used for final tree selection rely on comparing differences in means between the treatment and control arm. When covariates are prognostic for the outcome, covariate adjusted estimators have the ability to improve efficiency compared to using differences in means between the treatment and control group. This manuscript develops two covariate adjusted estimators that can be used to both make splitting decisions and for final tree selection. The performance of the resulting covariate adjusted recursive partitioning algorithm is evaluated using simulations and by analyzing a clinical trial that evaluates if motivational interviews improve treatment engagement for substance abusers.

[16]  arXiv:1806.08301 [pdf, ps, other]
Title: Online Saddle Point Problem with Applications to Constrained Online Convex Optimization
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

We study an online saddle point problem where at each iteration a pair of actions need to be chosen without knowledge of the future (convex-concave) payoff functions. The objective is to minimize the gap between the cumulative payoffs and the saddle point value of the aggregate payoff function, which we measure using a metric called "SP-regret". The problem generalizes the online convex optimization framework and can be interpreted as finding the Nash equilibrium for the aggregate of a sequence of two-player zero-sum games. We propose an algorithm that achieves $\tilde{O}(\sqrt{T})$ SP-regret in the general case, and $O(\log T)$ SP-regret for the strongly convex-concave case. We then consider a constrained online convex optimization problem motivated by a variety of applications in dynamic pricing, auctions, and crowdsourcing. We relate this problem to an online saddle point problem and establish $O(\sqrt{T})$ regret using a primal-dual algorithm.

[17]  arXiv:1806.08307 [pdf, other]
Title: WIKS: A general Bayesian nonparametric index for quantifying differences between two populations
Subjects: Statistics Theory (math.ST)

The problem of deciding whether two samples arise from the same distribution is often the question of interest in many research investigations. Numerous statistical methods have been devoted to this issue, but only few of them have considered a Bayesian nonparametric approach. We propose a nonparametric Bayesian index (WIKS) which has the goal of quantifying the difference between two populations $P_1$ and $P_2$ based on samples from them. The WIKS index is defined by a weighted posterior expectation of the Kolmogorov-Smirnov distance between $P_1$ and $P_2$ and, differently from most existing approaches, can be easily computed using any prior distribution over $(P_1,P_2)$. Moreover, WIKS is fast to compute and can be justified under a Bayesian decision-theoretic framework. We present a simulation study that indicates that the WIKS method is more powerful than competing approaches in several settings, even in multivariate settings. We also prove that WIKS is a consistent procedure and controls the level of significance uniformly over the null hypothesis. Finally, we apply WIKS to a data set of scale measurements of three different groups of patients submitted to a questionnaire for Alzheimer diagnostic.

[18]  arXiv:1806.08317 [pdf, other]
Title: Fashion-Gen: The Generative Fashion Dataset and Challenge
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We introduce a new dataset of 293,008 high definition (1360 x 1360 pixels) fashion images paired with item descriptions provided by professional stylists. Each item is photographed from a variety of angles. We provide baseline results on 1) high-resolution image generation, and 2) image generation conditioned on the given text descriptions. We invite the community to improve upon these baselines. In this paper, we also outline the details of a challenge that we are launching based upon this dataset.

[19]  arXiv:1806.08320 [pdf, other]
Title: A Guide to General-Purpose Approximate Bayesian Computation Software
Subjects: Computation (stat.CO)

This Chapter, "A Guide to General-Purpose ABC Software", is to appear in the forthcoming Handbook of Approximate Bayesian Computation (2018). We present general-purpose software to perform Approximate Bayesian Computation (ABC) as implemented in the R-packages abc and EasyABC and the c++ program ABCtoolbox. With simple toy models we demonstrate how to perform parameter inference, model selection, validation and optimal choice of summary statistics. We demonstrate how to combine ABC with Markov Chain Monte Carlo and describe a realistic population genetics application.

Cross-lists for Fri, 22 Jun 18

[20]  arXiv:1806.07908 (cross-list from cs.LG) [pdf, other]
Title: Como funciona o Deep Learning
Comments: Book chapter, in Portuguese, 31 pages
Journal-ref: In: T\'opicos em Gerenciamento de Dados e Informa\c{c}\~oes, SBC, Cap.3, ISBN 978-85-7669-400-7, pp.63-93, 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Deep Learning methods are currently the state-of-the-art in many problems which can be tackled via machine learning, in particular classification problems. However there is still lack of understanding on how those methods work, why they work and what are the limitations involved in using them. In this chapter we will describe in detail the transition from shallow to deep networks, include examples of code on how to implement them, as well as the main issues one faces when training a deep network. Afterwards, we introduce some theoretical background behind the use of deep models, and discuss their limitations.

[21]  arXiv:1806.07937 (cross-list from cs.LG) [pdf, other]
Title: A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning
Comments: 18 pages, 14 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The risks and perils of overfitting in machine learning are well known. However most of the treatment of this, including diagnostic tools and remedies, was developed for the supervised learning case. In this work, we aim to offer new perspectives on the characterization and prevention of overfitting in deep Reinforcement Learning (RL) methods, with a particular focus on continuous domains. We examine several aspects, such as how to define and diagnose overfitting in MDPs, and how to reduce risks by injecting sufficient training diversity. This work complements recent findings on the brittleness of deep RL methods and offers practical observations for RL researchers and practitioners.

[22]  arXiv:1806.07944 (cross-list from cs.SI) [pdf, ps, other]
Title: Searching for a Single Community in a Graph
Comments: ACM Journal on Modeling and Performance Evaluation of Computing Systems (TOMPECS) [to appear]
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

In standard graph clustering/community detection, one is interested in partitioning the graph into more densely connected subsets of nodes. In contrast, the "search" problem of this paper aims to only find the nodes in a "single" such community, the target, out of the many communities that may exist. To do so , we are given suitable side information about the target; for example, a very small number of nodes from the target are labeled as such.
We consider a general yet simple notion of side information: all nodes are assumed to have random weights, with nodes in the target having higher weights on average. Given these weights and the graph, we develop a variant of the method of moments that identifies nodes in the target more reliably, and with lower computation, than generic community detection methods that do not use side information and partition the entire graph. Our empirical results show significant gains in runtime, and also gains in accuracy over other graph clustering algorithms.

[23]  arXiv:1806.07956 (cross-list from cs.SI) [pdf, other]
Title: Reconstructing networks with unknown and heterogeneous errors
Authors: Tiago P. Peixoto
Comments: 27 pages, 17 figures
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

The vast majority of network datasets contains errors and omissions, although this is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here we develop a Bayesian reconstruction approach that lifts these limitations by not only allowing for heterogeneous errors, but also for individual edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable error estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed, and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks.

[24]  arXiv:1806.07963 (cross-list from cs.SI) [pdf, other]
Title: Latent heterogeneous multilayer community detection
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

We propose a method for simultaneously detecting shared and unshared communities in heterogeneous multilayer weighted and undirected networks. The multilayer network is assumed to follow a generative probabilistic model that takes into account the similarities and dissimilarities between the communities. We make use of a variational Bayes approach for jointly inferring the shared and unshared hidden communities from multilayer network observations. We show the robustness of our approach compared to state-of-the art algorithms in detecting disparate (shared and private) communities on synthetic data as well as on real genome-wide fibroblast proliferation dataset.

[25]  arXiv:1806.07978 (cross-list from cs.LG) [pdf, other]
Title: The Corpus Replication Task
Authors: Tobias Eichinger
Comments: the references might not render appropriately. contact the author for details
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

In the field of Natural Language Processing (NLP), we revisit the well-known word embedding algorithm word2vec. Word embeddings identify words by vectors such that the words' distributional similarity is captured. Unexpectedly, besides semantic similarity even relational similarity has been shown to be captured in word embeddings generated by word2vec, whence two questions arise. Firstly, which kind of relations are representable in continuous space and secondly, how are relations built. In order to tackle these questions we propose a bottom-up point of view. We call generating input text for which word2vec outputs target relations solving the Corpus Replication Task. Deeming generalizations of this approach to any set of relations possible, we expect solving of the Corpus Replication Task to provide partial answers to the questions.

[26]  arXiv:1806.08028 (cross-list from cs.LG) [pdf, other]
Title: Gradient Adversarial Training of Neural Networks
Comments: 13 pages, 4 figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose gradient adversarial training, an auxiliary deep learning framework applicable to different machine learning problems. In gradient adversarial training, we leverage a prior belief that in many contexts, simultaneous gradient updates should be statistically indistinguishable from each other. We enforce this consistency using an auxiliary network that classifies the origin of the gradient tensor, and the main network serves as an adversary to the auxiliary network in addition to performing standard task-based training. We demonstrate gradient adversarial training for three different scenarios: (1) as a defense to adversarial examples we classify gradient tensors and tune them to be agnostic to the class of their corresponding example, (2) for knowledge distillation, we do binary classification of gradient tensors derived from the student or teacher network and tune the student gradient tensor to mimic the teacher's gradient tensor; and (3) for multi-task learning we classify the gradient tensors derived from different task loss functions and tune them to be statistically indistinguishable. For each of the three scenarios we show the potential of gradient adversarial training procedure. Specifically, gradient adversarial training increases the robustness of a network to adversarial attacks, is able to better distill the knowledge from a teacher network to a student network compared to soft targets, and boosts multi-task learning by aligning the gradient tensors derived from the task specific loss functions. Overall, our experiments demonstrate that gradient tensors contain latent information about whatever tasks are being trained, and can support diverse machine learning problems when intelligently guided through adversarialization using a auxiliary network.

[27]  arXiv:1806.08049 (cross-list from cs.LG) [pdf, other]
Title: On the Robustness of Interpretability Methods
Comments: presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We argue that robustness of explanations---i.e., that similar inputs should give rise to similar explanations---is a key desideratum for interpretability. We introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics. Finally, we propose ways that robustness can be enforced on existing interpretability approaches.

[28]  arXiv:1806.08065 (cross-list from cs.LG) [pdf, other]
Title: Learning Cognitive Models using Neural Networks
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

A cognitive model of human learning provides information about skills a learner must acquire to perform accurately in a task domain. Cognitive models of learning are not only of scientific interest, but are also valuable in adaptive online tutoring systems. A more accurate model yields more effective tutoring through better instructional decisions. Prior methods of automated cognitive model discovery have typically focused on well-structured domains, relied on student performance data or involved substantial human knowledge engineering. In this paper, we propose Cognitive Representation Learner (CogRL), a novel framework to learn accurate cognitive models in ill-structured domains with no data and little to no human knowledge engineering. Our contribution is two-fold: firstly, we show that representations learnt using CogRL can be used for accurate automatic cognitive model discovery without using any student performance data in several ill-structured domains: Rumble Blocks, Chinese Character, and Article Selection. This is especially effective and useful in domains where an accurate human-authored cognitive model is unavailable or authoring a cognitive model is difficult. Secondly, for domains where a cognitive model is available, we show that representations learned through CogRL can be used to get accurate estimates of skill difficulty and learning rate parameters without using any student performance data. These estimates are shown to highly correlate with estimates using student performance data on an Article Selection dataset.

[29]  arXiv:1806.08079 (cross-list from cs.LG) [pdf, other]
Title: GrCAN: Gradient Boost Convolutional Autoencoder with Neural Decision Forest
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, we further combine convolutional autoencoder with neural decision forest, where autoencoder has its advantages in finding the hidden representations of the input data. We develop a gradient boost module and embed it into the proposed convolutional autoencoder with neural decision forest to improve the performance. The idea of gradient boost is to learn and use the residual in the prediction. In addition, we design a structure to learn the parameters of the neural decision forest and gradient boost module at contiguous steps. The extensive experiments on several public datasets demonstrate that our proposed model achieves good efficiency and prediction performance compared with a series of baseline methods.

[30]  arXiv:1806.08160 (cross-list from math.PR) [pdf, ps, other]
Title: Sharp large deviations for the drift parameter of the explosive Cox-Ingersoll-Ross process
Subjects: Probability (math.PR); Statistics Theory (math.ST)

We consider a non-stationary Cox-Ingersoll-Ross process. We establish a sharp large deviation principle for the maximum likelihood estimator of its drift parameter.

[31]  arXiv:1806.08235 (cross-list from cs.CV) [pdf, other]
Title: Semi-supervised Seizure Prediction with Generative Adversarial Networks
Comments: 6 pages, 5 figures, 3 tables. arXiv admin note: text overlap with arXiv:1707.01976
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

In this article, we propose an approach that can make use of not only labeled EEG signals but also the unlabeled ones which is more accessible. We also suggest the use of data fusion to further improve the seizure prediction accuracy. Data fusion in our vision includes EEG signals, cardiogram signals, body temperature and time. We use the short-time Fourier transform on 28-s EEG windows as a pre-processing step. A generative adversarial network (GAN) is trained in an unsupervised manner where information of seizure onset is disregarded. The trained Discriminator of the GAN is then used as feature extractor. Features generated by the feature extractor are classified by two fully-connected layers (can be replaced by any classifier) for the labeled EEG signals. This semi-supervised seizure prediction method achieves area under the operating characteristic curve (AUC) of 77.68% and 75.47% for the CHBMIT scalp EEG dataset and the Freiburg Hospital intracranial EEG dataset, respectively. Unsupervised training without the need of labeling is important because not only it can be performed in real-time during EEG signal recording, but also it does not require feature engineering effort for each patient.

[32]  arXiv:1806.08240 (cross-list from cs.LG) [pdf, other]
Title: InfoCatVAE: Representation Learning with Categorical Variational Autoencoders
Comments: 9 pages, 3 appendix, 5 figures. arXiv admin note: text overlap with arXiv:1606.03657 by other authors
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

This paper describes InfoCatVAE, an extension of the variational autoencoder that enables unsupervised disentangled representation learning. InfoCatVAE uses multimodal distributions for the prior and the inference network and then maximizes the evidence lower bound objective (ELBO). We connect the new ELBO derived for our model with a natural soft clustering objective which explains the robustness of our approach. We then adapt the InfoGANs method to our setting in order to maximize the mutual information between the categorical code and the generated inputs and obtain an improved model.

[33]  arXiv:1806.08267 (cross-list from cs.LG) [pdf, other]
Title: Gated Complex Recurrent Neural Networks
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Complex numbers have long been favoured for digital signal processing, yet complex representations rarely appear in deep learning architectures. RNNs, widely used to process time series and sequence information, could greatly benefit from complex representations. We present a novel complex gate recurrent cell. When used together with norm-preserving state transition matrices, our complex gated RNN exhibits excellent stability and convergence properties. We demonstrate competitive performance of our complex gated RNN on the synthetic memory and adding task, as well as on the real-world task of human motion prediction.

[34]  arXiv:1806.08295 (cross-list from cs.LG) [pdf, other]
Title: How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Consistently checking the statistical significance of experimental results is one of the mandatory methodological steps to address the so-called "reproducibility crisis" in deep reinforcement learning. In this tutorial paper, we explain how to determine the number of random seeds one should use to provide a statistically significant comparison of the performance of two algorithms. We also discuss the influence of deviations from the assumptions usually made by statistical tests, we provide guidelines to counter their negative effects and some code to perform the tests.

[35]  arXiv:1806.08297 (cross-list from cs.FL) [pdf, other]
Title: Learning Graph Weighted Models on Pictures
Subjects: Formal Languages and Automata Theory (cs.FL); Learning (cs.LG); Machine Learning (stat.ML)

Graph Weighted Models (GWMs) have recently been proposed as a natural generalization of weighted automata over strings and trees to arbitrary families of labeled graphs (and hypergraphs). A GWM generically associates a labeled graph with a tensor network and computes a value by successive contractions directed by its edges. In this paper, we consider the problem of learning GWMs defined over the graph family of pictures (or 2-dimensional words). As a proof of concept, we consider regression and classification tasks over the simple Bars & Stripes and Shifting Bits picture languages and provide an experimental study investigating whether these languages can be learned in the form of a GWM from positive and negative examples using gradient-based methods. Our results suggest that this is indeed possible and that investigating the use of gradient-based methods to learn picture series and functions computed by GWMs over other families of graphs could be a fruitful direction.

[36]  arXiv:1806.08324 (cross-list from cs.LG) [pdf, other]
Title: Countdown Regression: Sharp and Calibrated Survival Predictions
Subjects: Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

Personalized probabilistic forecasts of time to event (such as mortality) can be crucial in decision making, especially in the clinical setting. Inspired by ideas from the meteorology literature, we approach this problem through the paradigm of maximizing sharpness of prediction distributions, subject to calibration. In regression problems, it has been shown that optimizing the continuous ranked probability score (CRPS) instead of maximum likelihood leads to sharper prediction distributions while maintaining calibration. We introduce the Survival-CRPS, a generalization of the CRPS to the time to event setting, and present right-censored and interval-censored variants. To holistically evaluate the quality of predicted distributions over time to event, we present the Survival-AUPRC evaluation metric, an analog to area under the precision-recall curve. We apply these ideas by building a recurrent neural network for mortality prediction, using an Electronic Health Record dataset covering millions of patients. We demonstrate significant benefits in models trained by the Survival-CRPS objective instead of maximum likelihood.

[37]  arXiv:1806.08340 (cross-list from cs.LG) [pdf, other]
Title: Interpretable Discovery in Large Image Data Sets
Comments: Presented at the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Automated detection of new, interesting, unusual, or anomalous images within large data sets has great value for applications from surveillance (e.g., airport security) to science (observations that don't fit a given theory can lead to new discoveries). Many image data analysis systems are turning to convolutional neural networks (CNNs) to represent image content due to their success in achieving high classification accuracy rates. However, CNN representations are notoriously difficult for humans to interpret. We describe a new strategy that combines novelty detection with CNN image features to achieve rapid discovery with interpretable explanations of novel image content. We applied this technique to familiar images from ImageNet as well as to a scientific image collection from planetary science.

[38]  arXiv:1806.08342 (cross-list from cs.LG) [pdf, other]
Title: Quantizing deep convolutional networks for efficient inference: A whitepaper
Comments: 37 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX.
Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits.

[39]  arXiv:1806.08354 (cross-list from cs.CV) [pdf, other]
Title: Learning Instance Segmentation by Interaction
Comments: Website at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)

We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions generalizes to novel objects and backgrounds. To deal with noisy training signal for segmenting objects obtained by self-supervised interactions, we propose robust set loss. A dataset of robot's interactions along-with a few human labeled examples is provided as a benchmark for future research. We test the utility of the learned segmentation model by providing results on a downstream vision-based control task of rearranging multiple objects into target configurations from visual inputs alone. Videos, code, and robotic interaction dataset are available at https://pathak22.github.io/seg-by-interaction/

Replacements for Fri, 22 Jun 18

[40]  arXiv:1105.2454 (replaced) [pdf, other]
Title: High-dimensional instrumental variables regression and confidence sets
Authors: Eric Gautier (TSE), Alexandre Tsybakov (CREST, ENSAE ParisTech), Christiern Rose
Subjects: Statistics Theory (math.ST)
[41]  arXiv:1606.03275 (replaced) [pdf, other]
Title: Analysis of the maximal posterior partition in the Dirichlet Process Gaussian Mixture Model
Comments: 50 pages, 7 figures
Subjects: Statistics Theory (math.ST)
[42]  arXiv:1611.08618 (replaced) [pdf, other]
Title: A Benchmark and Comparison of Active Learning for Logistic Regression
Comments: accepted by Pattern Recognition
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[43]  arXiv:1703.02111 (replaced) [pdf, other]
Title: Classification and clustering for observations of event time data using non-homogeneous Poisson process models
Comments: cleaned up figures and text
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[44]  arXiv:1706.04546 (replaced) [pdf, other]
Title: Reinforcement Learning with Budget-Constrained Nonparametric Function Approximation for Opportunistic Spectrum Access
Comments: 6 pages, submitted
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
[45]  arXiv:1707.05745 (replaced) [pdf, other]
Title: Modeling temporal treatment effects with zero inflated semi-parametric regression models: the case of local development policies in France
Subjects: Applications (stat.AP)
[46]  arXiv:1708.02883 (replaced) [pdf, other]
Title: Maximum Volume Inscribed Ellipsoid: A New Simplex-Structured Matrix Factorization Framework via Facet Enumeration and Convex Optimization
Subjects: Machine Learning (stat.ML)
[47]  arXiv:1710.08269 (replaced) [pdf, other]
Title: A Potts-Mixture Spatiotemporal Joint Model for Combined MEG and EEG Data
Subjects: Applications (stat.AP)
[48]  arXiv:1712.06695 (replaced) [pdf, other]
Title: Accurate Inference for Adaptive Linear Models
Comments: 20 pages; Updated after acceptance to ICML 2018
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[49]  arXiv:1801.01973 (replaced) [pdf, other]
Title: A Note on the Inception Score
Comments: Proc. ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[50]  arXiv:1802.06054 (replaced) [pdf, other]
Title: Learning Patterns for Detection with Multiscale Scan Statistics
Authors: James Sharpnack
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Methodology (stat.ME)
[51]  arXiv:1803.05112 (replaced) [pdf, other]
Title: Uplift Modeling from Separate Labels
Comments: 21 pages, 7 figures
Subjects: Machine Learning (stat.ML)
[52]  arXiv:1805.01532 (replaced) [pdf, other]
Title: Lifted Neural Networks
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[53]  arXiv:1805.01907 (replaced) [pdf, other]
Title: Exploration by Distributional Reinforcement Learning
Comments: IJCAI 2018
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[54]  arXiv:1805.03963 (replaced) [pdf, ps, other]
Title: Monotone Learning with Rectified Wire Networks
Comments: 37 pages, 19 figures, improved section 3
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)
[55]  arXiv:1806.00730 (replaced) [pdf, other]
Title: Minnorm training: an algorithm for training over-parameterized deep neural networks
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[56]  arXiv:1806.01811 (replaced) [pdf, other]
Title: AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
Comments: 17 pages, 3 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[57]  arXiv:1806.01845 (replaced) [pdf, other]
Title: Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex
Comments: 26 pages, 6 figures, 3 tables; v2 fixes some typos
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[58]  arXiv:1806.02199 (replaced) [pdf, other]
Title: Deep Self-Organization: Interpretable Discrete Representation Learning on Time Series
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[59]  arXiv:1806.05769 (replaced) [pdf, other]
Title: Bayesian Uncertainty Quantification and Information Fusion in CALPHAD-based Thermodynamic Modeling
Comments: 22 pages, 8 Figures
Subjects: Materials Science (cond-mat.mtrl-sci); Applications (stat.AP)
[60]  arXiv:1806.06784 (replaced) [pdf, other]
Title: Flexible Collaborative Estimation of the Average Causal Effect of a Treatment using the Outcome-Highly-Adaptive Lasso
Comments: The first two authors contributed equally to this work
Subjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
[61]  arXiv:1806.07172 (replaced) [pdf, ps, other]
Title: Surrogate Outcomes and Transportability
Comments: Submitted to International Journal of Approximate Reasoning
Subjects: Artificial Intelligence (cs.AI); Methodology (stat.ME)
[ total of 61 entries: 1-61 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)