Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Thu, 23 Nov 17
 [1] arXiv:1711.08018 [pdf, other]

Title: Disagreementbased combinatorial pure exploration: Efficient algorithms and an analysis with localizationSubjects: Machine Learning (stat.ML); Learning (cs.LG)
We design new algorithms for the combinatorial pure exploration problem in the multiarm bandit framework. In this problem, we are given K distributions and a collection of subsets $\mathcal{V} \subset 2^K$ of these distributions, and we would like to find the subset $v \in \mathcal{V}$ that has largest cumulative mean, while collecting, in a sequential fashion, as few samples from the distributions as possible. We study both the fixed budget and fixed confidence settings, and our algorithms essentially achieve stateoftheart performance in all settings, improving on previous guarantees for structures like matchings and submatrices that have large augmenting sets. Moreover, our algorithms can be implemented efficiently whenever the decision set V admits linear optimization. Our analysis involves precise concentrationofmeasure arguments and a new algorithm for linear programming with exponentially many constraints.
 [2] arXiv:1711.08030 [pdf, other]

Title: Variancebased sensitivity analysis for timedependent processesComments: 23 PagesSubjects: Computation (stat.CO)
The global sensitivity analysis of timedependent processes requires historyaware approaches. We develop for that purpose a variancebased method that leverages the correlation structure of the problems under study and employs surrogate models to accelerate the computations. The errors resulting from fixing unimportant uncertain parameters to their nominal values are analyzed through a priori estimates. We illustrate our approach on a harmonic oscillator example and on a nonlinear dynamic cholera model.
 [3] arXiv:1711.08037 [pdf, ps, other]

Title: The Doctor Just Won't Accept That!Authors: Zachary C. LiptonComments: Presented at NIPS 2017 Interpretable ML SymposiumSubjects: Machine Learning (stat.ML)
Calls to arms to build interpretable models express a wellfounded discomfort with machine learning. Should a software agent that does not even know what a loan is decide who qualifies for one? Indeed, we ought to be cautious about injecting machine learning (or anything else, for that matter) into applications where there may be a significant risk of causing social harm. However, claims that stakeholders "just won't accept that!" do not provide a sufficient foundation for a proposed field of study. For the field of interpretable machine learning to advance, we must ask the following questions: What precisely won't various stakeholders accept? What do they want? Are these desiderata reasonable? Are they feasible? In order to answer these questions, we'll have to give realworld problems and their respective stakeholders greater consideration.
 [4] arXiv:1711.08042 [pdf, ps, other]

Title: "I know it when I see it". Visualization and Intuitive InterpretabilityAuthors: Fabian OffertComments: Interpretable ML Symposium, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USASubjects: Machine Learning (stat.ML)
Most research on the interpretability of machine learning systems focuses on the development of a more rigorous notion of interpretability. I suggest that a better understanding of the deficiencies of the intuitive notion of interpretability is needed as well. I show that visualization enables but also impedes intuitive interpretability, as it presupposes two levels of technical preinterpretation: dimensionality reduction and regularization. Furthermore, I argue that the use of positive concepts to emulate the distributed semantic structure of machine learning models introduces a significant human bias into the model. As a consequence, I suggest that, if intuitive interpretability is needed, singular representations of internal model states should be avoided.
 [5] arXiv:1711.08063 [pdf]

Title: Clonal analysis of newborn hippocampal dentate granule cell proliferation and development in temporal lobe epilepsyComments: 44 pages, 6 figuresJournalref: eNeuro. 2015;2(6):ENEURO.008715.2015. doi:10.1523/ENEURO.008715.2015Subjects: Machine Learning (stat.ML); Neurons and Cognition (qbio.NC)
Hippocampal dentate granule cells are among the few neuronal cell types generated throughout adult life in mammals. In the normal brain, new granule cells are generated from progenitors in the subgranular zone and integrate in a typical fashion. During the development of epilepsy, granule cell integration is profoundly altered. The new cells migrate to ectopic locations and develop misoriented basal dendrites. Although it has been established that these abnormal cells are newly generated, it is not known whether they arise ubiquitously throughout the progenitor cell pool or are derived from a smaller number of bad actor progenitors. To explore this question, we conducted a clonal analysis study in mice expressing the Brainbow fluorescent protein reporter construct in dentate granule cell progenitors. Mice were examined 2 months after pilocarpineinduced status epilepticus, a treatment that leads to the development of epilepsy. Brain sections were rendered translucent so that entire hippocampi could be reconstructed and all fluorescently labeled cells identified. Our findings reveal that a small number of progenitors produce the majority of ectopic cells following status epilepticus, indicating that either the affected progenitors or their local microenvironments have become pathological. By contrast, granule cells with basal dendrites were equally distributed among clonal groups. This indicates that these progenitors can produce normal cells and suggests that global factors sporadically disrupt the dendritic development of some new cells. Together, these findings strongly predict that distinct mechanisms regulate different aspects
 [6] arXiv:1711.08072 [pdf, other]

Title: Constrained empirical Bayes priors on regression coefficientsSubjects: Statistics Theory (math.ST)
In the context of model uncertainty and selection, empirical Bayes procedures can have undesirable properties such as extreme estimates of inclusion probabilities (Scott and Berger, 2010) or inconsistency under the null model (Liang et al., 2008). To avoid these issues, we define empirical Bayes priors with constraints that ensure that the estimates of the hyperparameters are at least as "vague" as those of proper default priors. In our examples, we observe that constrained EB procedures are better behaved than their unconstrained counterparts and that the Bayesian Information Criterion (BIC) is similar to an intuitively appealing constrained EB procedure.
 [7] arXiv:1711.08077 [pdf, other]

Title: Modeling and emulation of nonstationary Gaussian fieldsComments: 32 pages total, 10 figuresSubjects: Methodology (stat.ME)
Geophysical and other natural processes often exhibit nonstationary covariances and this feature is important to take into account for statistical models that attempt to emulate the physical process. A convolutionbased model is used to represent nonstationary Gaussian processes that allows for variation in the correlation range and vari ance of the process across space. Application of this model has two steps: windowed estimates of the covariance function under the as sumption of local stationary and encoding the local estimates into a single spatial process model that allows for efficient simulation. Specifically we give evidence to show that nonstationary covariance functions based on the Mat`ern family can be reproduced by the Lat ticeKrig model, a flexible, multiresolution representation of Gaussian processes. We propose to fit locally stationary models based on the Mat`ern covariance and then assemble these estimates into a single, global LatticeKrig model. One advantage of the LatticeKrig model is that it is efficient for simulating nonstationary fields even at 105 locations. This work is motivated by the interest in emulating spatial fields derived from numerical model simulations such as Earth system models. We successfully apply these ideas to emulate fields that de scribe the uncertainty in the pattern scaling of mean summer (JJA) surface temperature from a series of climate model experiments. This example is significant because it emulates tens of thousands of loca tions, typical in geophysical model fields, and leverages embarrassing parallel computation to speed up the local covariance fitting
 [8] arXiv:1711.08082 [pdf, other]

Title: Parameter Estimation in Gaussian Mixture Models with Malicious Noise, without Balanced Mixing CoefficientsSubjects: Statistics Theory (math.ST); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
We consider the problem of estimating means of two Gaussians in a 2Gaussian mixture, which is not balanced and is corrupted by noise of an arbitrary distribution. We present a robust algorithm to estimate the parameters, together with upper bounds on the numbers of samples required for the estimate to be correct, where the bounds are parametrised by the dimension, ratio of the mixing coefficients, a measure of the separation of the two Gaussians, related to Mahalanobis distance, and a condition number of the covariance matrix. In theory, this is the first samplecomplexity result for imbalanced mixtures corrupted by adversarial noise. In practice, our algorithm outperforms the vanilla ExpectationMaximisation (EM) algorithm in terms of estimation error.
 [9] arXiv:1711.08093 [pdf, ps, other]

Title: A note on recent criticisms to Birnbaum's theoremSubjects: Statistics Theory (math.ST)
In this note, we provide critical commentary on two articles that cast doubt on the validity and implications of Birnbaum's theorem: Evans (2013) and Mayo (2014). In our view, the proof is correct and the consequences of the theorem are alive and well.
 [10] arXiv:1711.08129 [pdf, other]

Title: PULasso: Highdimensional variable selection with presenceonly dataSubjects: Methodology (stat.ME)
In various realworld problems, we are presented with positive and unlabelled data, referred to as presenceonly responses and where the number of covariates p is large. The combination of presenceonly responses and high dimensionality presents both statistical and computational challenges. In this paper, we develop the PUlasso algorithm for variable selection and classification with positive and unlabelled responses. Our algorithm involves using the majorizationminimization (MM) framework which is a generalization of the wellknown expectationmaximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speedups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm is guaranteed to converge to a stationary point, and then prove that any stationary point achieves the minimax optimal meansquared error of slogp/n, where s is the sparsity of the true parameter. We also demonstrate through simulations that our algorithm outperforms stateoftheart algorithms in the moderate p settings in terms of classification performance. Finally, we demonstrate that our PUlasso algorithm performs well on a biochemistry example.
 [11] arXiv:1711.08147 [pdf, other]

Title: Familywise Error Rate Controlling Procedures for Discrete DataComments: 29 pages, 4 figuresSubjects: Methodology (stat.ME)
In applications such as clinical safety analysis, the data of the experiments usually consists of frequency counts. In the analysis of such data, researchers often face the problem of multiple testing based on discrete test statistics, aimed at controlling familywise error rate (FWER). Most existing FWER controlling procedures are developed for continuous data, which are often conservative when analyzing discrete data. By using minimal attainable pvalues, several FWER controlling procedures have been developed for discrete data in the literature. In this paper, by utilizing known marginal distributions of true null pvalues, three more powerful stepwise procedures are developed, which are modified versions of the conventional Bonferroni, Holm and Hochberg procedures, respectively. It is proved that the first two procedures strongly control the FWER under arbitrary dependence and are more powerful than the existing Taronetype procedures, while the last one only ensures control of the FWER in special scenarios. Through extensive simulation studies, we provide numerical evidence of superior performance of the proposed procedures in terms of the FWER control and minimal power. A real clinical safety data is used to demonstrate applications of our proposed procedures. An R package "MHTdiscrete" and a web application are developed for implementing the proposed procedures.
 [12] arXiv:1711.08160 [pdf, other]

Title: An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality DiscoveryComments: Accepted to the NIPS Time Series Workshop 2017Subjects: Machine Learning (stat.ML)
While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the output is the future value of a single series. A sufficient condition for Granger noncausality in this setting is that all of the outgoing weights of the input data, the past lags of a series, to the first hidden layer are zero. For estimation, we utilize a group lasso penalty to shrink groups of input weights to zero. We also propose a hierarchical penalty for simultaneous Granger causality and lag estimation. We validate our approach on simulated data from both a sparse linear autoregressive model and the sparse and nonlinear Lorenz96 model.
 [13] arXiv:1711.08171 [pdf, other]

Title: Hypergraph $p$Laplacian: A Differential Geometry ViewComments: Extended version of our AAAI18 paperSubjects: Machine Learning (stat.ML); Learning (cs.LG)
The graph Laplacian plays key roles in information processing of relational data, and has analogies with the Laplacian in differential geometry. In this paper, we generalize the analogy between graph Laplacian and differential geometry to the hypergraph setting, and propose a novel hypergraph $p$Laplacian. Unlike the existing twonode graph Laplacians, this generalization makes it possible to analyze hypergraphs, where the edges are allowed to connect any number of nodes. Moreover, we propose a semisupervised learning method based on the proposed hypergraph $p$Laplacian, and formalize them as the analogue to the Dirichlet problem, which often appears in physics. We further explore theoretical connections to normalized hypergraph cut on a hypergraph, and propose normalized cut corresponding to hypergraph $p$Laplacian. The proposed $p$Laplacian is shown to outperform standard hypergraph Laplacians in the experiment on a hypergraph semisupervised learning and normalized cut setting.
 [14] arXiv:1711.08181 [pdf, ps, other]

Title: Estimation of the multifractional function and the stability index of linear multifractional stable processesAuthors: Thi To Nhu DangComments: 22 pagesSubjects: Statistics Theory (math.ST)
In this paper we are interested in multifractional stable processes where the selfsimilarity index $H$ is a function of time, in other words $H$ becomes time changing, and the stability index $\alpha$ is a constant. Using $\beta$ negative power variations ($1/2<\beta<0$), we propose estimators for the value of the multifractional function $H$ at a fixed time $t_0$ and for $\alpha$ for two cases: multifractional Brownian motion ($\alpha=2$) and linear multifractional stable motion ($0<\alpha<2$). We get the consistency of our estimates for the underlying processes with the rate of convergence.
 [15] arXiv:1711.08240 [pdf, other]

Title: Sparsitybased Cholesky Factorization and its Application to Hyperspectral Anomaly DetectionComments: Accepted to the 7th IEEE international workshop on computational advances in multisensor adaptive processing (CAMSAP 2017), CURA\c{C}AO, DUTCH ANTILLES, December 1013, 2017Subjects: Applications (stat.AP)
Estimating large covariance matrices has been a longstanding important problem in many applications and has attracted increased attention over several decades. This paper deals with two methods based on preexisting works to impose sparsity on the covariance matrix via its unit lower triangular matrix (aka Cholesky factor) $\mathbf{T}$. The first method serves to estimate the entries of $\mathbf{T}$ using the Ordinary Least Squares (OLS), then imposes sparsity by exploiting some generalized thresholding techniques such as Soft and Smoothly Clipped Absolute Deviation (SCAD). The second method directly estimates a sparse version of $\mathbf{T}$ by penalizing the negative normal loglikelihood with $L_1$ and SCAD penalty functions. The resulting covariance estimators are always guaranteed to be positive definite. Some MonteCarlo simulations as well as experimental data demonstrate the effectiveness of our estimators for hyperspectral anomaly detection using the Kelly anomaly detector.
 [16] arXiv:1711.08244 [pdf, other]

Title: Adversarial Phenomenon in the Eyes of Bayesian Deep LearningComments: 13 pages, 7 figuresSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Deep Learning models are vulnerable to adversarial examples, i.e.\ images obtained via deliberate imperceptible perturbations, such that the model misclassifies them with high confidence. However, class confidence by itself is an incomplete picture of uncertainty. We therefore use principled Bayesian methods to capture model uncertainty in prediction for observing adversarial misclassification. We provide an extensive study with different Bayesian neural networks attacked in both whitebox and blackbox setups. The behaviour of the networks for noise, attacks and clean test data is compared. We observe that Bayesian neural networks are uncertain in their predictions for adversarial perturbations, a behaviour similar to the one observed for random Gaussian perturbations. Thus, we conclude that Bayesian neural networks can be considered for detecting adversarial examples.
 [17] arXiv:1711.08247 [pdf, other]

Title: Decomposition Strategies for Constructive Preference ElicitationComments: Accepted at the ThirtySecond AAAI Conference on Artificial Intelligence (AAAI18)Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over very large decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized onthefly by solving a constrained optimization problem, while the preferences are learned itera tively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning user preferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested configuration. When the problem involves many decision variables and constraints, this type of interaction poses a significant cognitive burden on the user. We propose a decomposition technique for large preferencebased decision problems relying exclusively on inference and feedback over partial configurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, partwise inference can be (up to exponentially) less computationally demanding than inference over full configurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.
 [18] arXiv:1711.08265 [pdf, other]

Title: Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured ResponsesSubjects: Methodology (stat.ME)
We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taken on renewed interest recently due to the growth of biological and medical data sets with complex, noni.i.d. structures and prolific response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in a wealth of false discoveries when Lasso or its variants are na\"ively applied. Therefore, the research interest of developing effective confounder correction methods is growing. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among the prolific response variables. To fully improve current variable selection methods, we introduce a model that can utilize the dependency information from multiple responses to select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods.
 [19] arXiv:1711.08328 [pdf, ps, other]

Title: Robust BayesLike Estimation: RhoBayes estimationComments: 68 pagesSubjects: Statistics Theory (math.ST)
We consider the problem of estimating the joint distribution $P$ of $n$ independent random variables within the Bayes paradigm from a nonasymptotic point of view. Assuming that $P$ admits some density $s$ with respect to a given reference measure, we consider a density model $\overline S$ for $s$ that we endow with a prior distribution $\pi$ (with support $\overline S$) and we build a robust alternative to the classical Bayes posterior distribution which possesses similar concentration properties around $s$ whenever it belongs to the model $\overline S$. Furthermore, in density estimation, the Hellinger distance between the classical and the robust posterior distributions tends to 0, as the number of observations tends to infinity, under suitable assumptions on the model and the prior, provided that the model $\overline S$ contains the true density $s$. However, unlike what happens with the classical Bayes posterior distribution, we show that the concentration properties of this new posterior distribution are still preserved in the case of a misspecification of the model, that is when $s$ does not belong to $\overline S$ but is close enough to it with respect to the Hellinger distance.
 [20] arXiv:1711.08359 [pdf, ps, other]

Title: Riemannian tangent space mapping and elastic net regularization for costeffective EEG markers of brain atrophy in Alzheimer's diseaseAuthors: Wolfgang Fruehwirt, Matthias Gerstgrasser, Pengfei Zhang, Leonard Weydemann, Markus Waser, Reinhold Schmidt, Thomas Benke, Peter DalBianco, Gerhard Ransmayr, Dieter Grossegger, Heinrich Garn, Gareth W. Peters, Stephen Roberts, Georg DorffnerComments: Presented at NIPS 2017 Workshop on Machine Learning for HealthSubjects: Machine Learning (stat.ML); Signal Processing (eess.SP); Neurons and Cognition (qbio.NC)
The diagnosis of Alzheimer's disease (AD) in routine clinical practice is most commonly based on subjective clinical interpretations. Quantitative electroencephalography (QEEG) measures have been shown to reflect neurodegenerative processes in AD and might qualify as affordable and thereby widely available markers to facilitate the objectivization of AD assessment. Here, we present a novel framework combining Riemannian tangent space mapping and elastic net regression for the development of brain atrophy markers. While most AD QEEG studies are based on small sample sizes and psychological test scores as outcome measures, here we train and test our models using data of one of the largest prospective EEG AD trials ever conducted, including MRI biomarkers of brain atrophy.
 [21] arXiv:1711.08360 [pdf, ps, other]

Title: Information sensitivity functions to assess parameter information gain and identifiability of dynamical systemsAuthors: Sanjay PantSubjects: Methodology (stat.ME)
A new class of functions, called the `Information sensitivity functions' (ISFs), which quantify the information gain about the parameters through the measurements/observables of a dynamical system are presented. These functions can be easily computed through classical sensitivity functions alone and are based on Bayesian and informationtheoretic approaches. While marginal information gain is quantified by decrease in differential entropy, correlations between arbitrary sets of parameters are assessed through mutual information. For individual parameters these information gains are also presented as marginal posterior variances, and, to assess the effect of correlations, as conditional variances when other parameters are given. The easy to interpret ISFs can be used to a) identify timeintervals or regions in dynamical system behaviour where information about the parameters is concentrated; b) assess the effect of measurement noise on the information gain for the parameters; c) assess whether sufficient information in an experimental protocol (input, measurements, and their frequency) is available to identify the parameters; d) assess correlation in the posterior distribution of the parameters to identify the sets of parameters that are likely to be indistinguishable; and e) assess identifiability problems for particular sets of parameters.
 [22] arXiv:1711.08374 [pdf, ps, other]

Title: Variational Bayesian Inference For A Scale Mixture Of Normal Distributions Handling Missing DataSubjects: Machine Learning (stat.ML)
In this paper, a scale mixture of Normal distributions model is developed for classification and clustering of data having outliers and missing values. The classification method, based on a mixture model, focuses on the introduction of latent variables that gives us the possibility to handle sensitivity of model to outliers and to allow a less restrictive modelling of missing data. Inference is processed through a Variational Bayesian Approximation and a Bayesian treatment is adopted for model learning, supervised classification and clustering.
 [23] arXiv:1711.08392 [pdf, other]

Title: An Efficient ADMM Algorithm for Structural Break Detection in Multivariate Time SeriesComments: Accepted to the NIPS Time Series Workshop 2017Subjects: Machine Learning (stat.ML)
We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate nonstationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated using group fused lasso penalties. Our ADMM approach first splits the convex problem into a global quadratic program and a simple group lasso proximal update. We show that the global problem may be parallelized over rows of the time dependent transition matrices and furthermore that each subproblem may be rewritten in a form identical to the loglikelihood of a Gaussian state space model. Consequently, we develop a Kalman smoothing algorithm to solve the global update in time linear in the length of the series.
 [24] arXiv:1711.08411 [pdf, other]

Title: An Orthogonally Equivariant Estimator of the Covariance Matrix in High Dimensions and Small Sample SizeSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
An orthogonally equivariant estimator for the covariance matrix is proposed that is valid when the dimension $p$ is larger than the sample size $n$. Equivariance under orthogonal transformations is a less restrictive assumption than structural assumptions on the true covariance matrix. It reduces the problem of estimation of the covariance matrix to that of estimation of its eigenvalues. In this paper, the eigenvalue estimates are obtained from an adjusted likelihood function derived by approximating the integral over the eigenvectors of the sample covariance matrix, which is a challenging problem in its own right. Comparisons with two wellknown orthogonally equivariant estimators are given, which are based on MonteCarlo risk estimates for simulated data and misclassification errors in a real data analysis.
 [25] arXiv:1711.08426 [pdf, ps, other]

Title: Leverage Score Sampling for Faster Accelerated Regression and ERMSubjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)
Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \\mathbf{A} x  b\_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of nonzero entries in a row of $\mathbf{A}$. Our algorithm improves upon the previous best running time of $ \tilde{O} ((n+\sqrt{n \cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{1})$.
We achieve our result through a careful combination of leverage score sampling techniques, proximal point methods, and accelerated coordinate descent. Our method not only matches the performance of previous methods, but further improves whenever leverage scores of rows are small (up to polylogarithmic factors). We also provide a nonlinear generalization of these results that improves the running time for solving a broader class of ERM problems.  [26] arXiv:1711.08451 [pdf, ps, other]

Title: Causal nearest neighbor rules for optimal treatment regimesSubjects: Machine Learning (stat.ML)
The estimation of optimal treatment regimes is of considerable interest to precision medicine. In this work, we propose a causal $k$nearest neighbor method to estimate the optimal treatment regime. The method roots in the framework of causal inference, and estimates the causal treatment effects within the nearest neighborhood. Although the method is simple, it possesses nice theoretical properties. We show that the causal $k$nearest neighbor regime is universally consistent. That is, the causal $k$nearest neighbor regime will eventually learn the optimal treatment regime as the sample size increases. We also establish its convergence rate. However, the causal $k$nearest neighbor regime may suffer from the curse of dimensionality, i.e. performance deteriorates as dimensionality increases. To alleviate this problem, we develop an adaptive causal $k$nearest neighbor method to perform metric selection and variable selection simultaneously. The performance of the proposed methods is illustrated in simulation studies and in an analysis of a chronic depression clinical trial.
Crosslists for Thu, 23 Nov 17
 [27] arXiv:1711.08014 (crosslist from cs.LG) [pdf, other]

Title: The Riemannian Geometry of Deep Generative ModelsComments: 9 pagesSubjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Deep generative models learn a mapping from a low dimensional latent space to a highdimensional data space. Under certain regularity conditions, these models parameterize nonlinear manifolds in the data space. In this paper, we investigate the Riemannian geometry of these generated manifolds. First, we develop efficient algorithms for computing geodesic curves, which provide an intrinsic notion of distance between points on the manifold. Second, we develop an algorithm for parallel translation of a tangent vector along a path on the manifold. We show how parallel translation can be used to generate analogies, i.e., to transport a change in one data point into a semantically similar change of another data point. Our experiments on real image data show that the manifolds learned by deep generative models, while nonlinear, are surprisingly close to zero curvature. The practical implication is that linear paths in the latent space closely approximate geodesics on the generated manifold. However, further investigation into this phenomenon is warranted, to identify if there are other architectures or datasets where curvature plays a more prominent role. We believe that exploring the Riemannian geometry of deep generative models, using the tools developed in this paper, will be an important step in understanding the highdimensional, nonlinear spaces these models learn.
 [28] arXiv:1711.08054 (crosslist from cs.LG) [pdf, other]

Title: A generative adversarial framework for positiveunlabeled classificationComments: 8 pagesSubjects: Learning (cs.LG); Machine Learning (stat.ML)
In this work, we consider the task of classifying the binary positiveunlabeled (PU) data. The existing discriminative learning based PU models attempt to seek an optimal reweighting strategy for U data, so that a decent decision boundary can be found. In contrast, we provide a totally new paradigm to attack the binary PU task, from perspective of generative learning by leveraging the powerful generative adversarial networks (GANs). Our generative positiveunlabeled (GPU) learning model is devised to express P and N data distributions. It comprises of three discriminators and two generators with different roles, producing both positive and negative samples that resemble those come from the real training dataset. Even with rather limited labeled P data, our GPU framework is capable of capturing the underlying P and N data distribution with infinite realistic sample streams. In this way, an optimal classifier can be trained on those generated samples using a very deep neural networks (DNNs). Moreover, an useful variant of GPU is also introduced for semisupervised classification.
 [29] arXiv:1711.08056 (crosslist from qbio.TO) [pdf]

Title: Assessing Mortality of Blunt Trauma with ComorbidityAuthors: Clive NealSturgessComments: 10 pages,2 figures, 37 referencesSubjects: Tissues and Organs (qbio.TO)
Objectives: To obtain a better estimate of the mortality of individuals suffering from blunt force trauma, including comorbidity. Methodology: The Injury severity Score (ISS) is the default world standard for assessing the severity of multiple injuries. ISS is a mathematical fit to empirical field data. It is demonstrated that ISS is proportional to the Gibbs/Shannon Entropy. A new Entropy measure of morbidity from blunt force trauma including comorbidity is derived based on the von Neumann Entropy, called the Abbreviated Morbidity Scale (AMS). Results: The ISS trauma measure has been applied to a previously published database, and good correlation has been achieved. Here the existing trauma measure is extended to include the comorbidity of disease by calculating an Abbreviated Morbidity Score (AMS), which encapsulates the disease comorbidity in a manner analogous to AIS, and on a consistent Entropy base. Applying Entropy measures to multiple injuries, highlights the role of comorbidity and that the elderly die at much lower levels of injury than the general population, as a consequence of comorbidity. These considerations lead to questions regarding current new car assessment protocols, and how well they protect the most vulnerable road users. Keywords: Blunt Force Trauma, Injury Severity Score, Comorbidity, Entropy.
 [30] arXiv:1711.08095 (crosslist from cs.LG) [pdf, ps, other]

Title: SNeCT: Scalable network constrained Tucker decomposition for integrative multiplatform data analysisComments: 8 pagesSubjects: Learning (cs.LG); Quantitative Methods (qbio.QM); Machine Learning (stat.ML)
Motivation: How do we integratively analyze largescale multiplatform genomic data that are high dimensional and sparse? Furthermore, how can we incorporate prior knowledge, such as the association between genes, in the analysis systematically? Method: To solve this problem, we propose a Scalable Network Constrained Tucker decomposition method we call SNeCT. SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to tensor constructed from large scale multiplatform multicohort cancer data, PanCan12, constrained on a network built from PathwayCommons database. Results: The decomposed factor matrices are applied to stratify cancers, to search for topk similar patients, and to illustrate how the matrices can be used for personalized interpretation. In the stratification test, combined twelvecohort data is clustered to form thirteen subclasses. The thirteen subclasses have a high correlation to tissue of origin in addition to other interesting observations, such as clear separation of OV cancers to two groups, and high clinical correlation within subclusters formed in cohorts BRCA and UCEC. In the topk search, a new patient's genomic profile is generated and searched against existing patients based on the factor matrices. The similarity of the topk patient to the query is high for 23 clinical features, including estrogen/progesterone receptor statuses of BRCA patients with average precision value ranges from 0.72 to 0.86 and from 0.68 to 0.86, respectively. We also provide an illustration of how the factor matrices can be used for interpretable personalized analysis of each patient.
 [31] arXiv:1711.08132 (crosslist from cs.LG) [pdf, other]

Title: Locally Smoothed Neural NetworksComments: In Proceedings of 9th Asian Conference on Machine Learning (ACML2017)Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Convolutional Neural Networks (CNN) and the locally connected layer are limited in capturing the importance and relations of different local receptive fields, which are often crucial for tasks such as face verification, visual question answering, and word sequence prediction. To tackle the issue, we propose a novel locally smoothed neural network (LSNN) in this paper. The main idea is to represent the weight matrix of the locally connected layer as the product of the kernel and the smoother, where the kernel is shared over different local receptive fields, and the smoother is for determining the importance and relations of different local receptive fields. Specifically, a multivariate Gaussian function is utilized to generate the smoother, for modeling the location relations among different local receptive fields. Furthermore, the content information can also be leveraged by setting the mean and precision of the Gaussian function according to the content. Experiments on some variant of MNIST clearly show our advantages over CNN and locally connected layer.
 [32] arXiv:1711.08208 (crosslist from cs.LG) [pdf, other]

Title: Posthoc labeling of arbitrary EEG recordings for dataefficient evaluation of neural decoding methodsSubjects: Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
Many cognitive, sensory and motor processes have correlates in oscillatory neural sources, which are embedded as a subspace into the recorded brain signals. Decoding such processes from noisy magnetoencephalogram/electroencephalogram (M/EEG) signals usually requires the use of datadriven analysis methods. The objective evaluation of such decoding algorithms on experimental raw signals, however, is a challenge: the amount of available M/EEG data typically is limited, labels can be unreliable, and raw signals often are contaminated with artifacts. The latter is specifically problematic, if the artifacts stem from behavioral confounds of the oscillatory neural processes of interest.
To overcome some of these problems, simulation frameworks have been introduced for benchmarking decoding methods. Generating artificial brain signals, however, most simulation frameworks make strong and partially unrealistic assumptions about brain activity, which limits the generalization of obtained results to realworld conditions.
In the present contribution, we thrive to remove many shortcomings of current simulation frameworks and propose a versatile alternative, that allows for objective evaluation and benchmarking of novel datadriven decoding methods for neural signals. Its central idea is to utilize posthoc labelings of arbitrary M/EEG recordings. This strategy makes it paradigmagnostic and allows to generate comparatively large datasets with noiseless labels. Source code and data of the novel simulation approach are made available for facilitating its adoption.  [33] arXiv:1711.08267 (crosslist from cs.LG) [pdf, other]

Title: GraphGAN: Graph Representation Learning with Generative Adversarial NetsAuthors: Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, Minyi GuoComments: The 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 8 pagesSubjects: Learning (cs.LG); Machine Learning (stat.ML)
The goal of graph representation learning is to embed each vertex in a graph into a lowdimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a gametheoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces "fake" samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on realworld datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over stateoftheart baselines.
 [34] arXiv:1711.08277 (crosslist from cs.CV) [pdf, other]

Title: Unleashing the Potential of CNNs for Interpretable FewShot LearningComments: Under review as a conference paper at ICLR 2018Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Convolutional neural networks (CNNs) have been generally acknowledged as one of the driving forces for the advancement of computer vision. Despite their promising performances on many tasks, CNNs still face major obstacles on the road to achieving ideal machine intelligence. One is the difficulty of interpreting them and understanding their inner workings, which is important for diagnosing their failures and correcting them. Another is that standard CNNs require large amounts of annotated data, which is sometimes very hard to obtain. Hence, it is desirable to enable them to learn from few examples. In this work, we address these two limitations of CNNs by developing novel and interpretable models for fewshot learning. Our models are based on the idea of encoding objects in terms of visual concepts, which are interpretable visual cues represented within CNNs. We first use qualitative visualizations and quantitative statistics, to uncover several key properties of feature encoding using visual concepts. Motivated by these properties, we present two intuitive models for the problem of fewshot learning. Experiments show that our models achieve competitive performances, while being much more flexible and interpretable than previous stateoftheart fewshot learning methods. We conclude that visual concepts expose the natural capability of CNNs for fewshot learning.
 [35] arXiv:1711.08325 (crosslist from cs.LG) [pdf]

Title: Utilizing artificial neural networks to predict demand for weathersensitive products at retail storesAuthors: Elham TaghizadehSubjects: Learning (cs.LG); Machine Learning (stat.ML)
One key requirement for effective supply chain management is the quality of its inventory management. Various inventory management methods are typically employed for different types of products based on their demand patterns, product attributes, and supply network. In this paper, our goal is to develop robust demand prediction methods for weather sensitive products at retail stores. We employ historical datasets from Walmart, whose customers and markets are often exposed to extreme weather events which can have a huge impact on sales regarding the affected stores and products. We want to accurately predict the sales of 111 potentially weathersensitive products around the time of major weather events at 45 of Walmart retails locations in the U.S. Intuitively, we may expect an uptick in the sales of umbrellas before a big thunderstorm, but it is difficult for replenishment managers to predict the level of inventory needed to avoid being outofstock or overstock during and after that storm. While they rely on a variety of vendor tools to predict sales around extreme weather events, they mostly employ a timeconsuming process that lacks a systematic measure of effectiveness. We employ all the methods critical to any analytics project and start with data exploration. Critical features are extracted from the raw historical dataset for demand forecasting accuracy and robustness. In particular, we employ Artificial Neural Network for forecasting demand for each product sold around the time of major weather events. Finally, we evaluate our model to evaluate their accuracy and robustness.
 [36] arXiv:1711.08330 (crosslist from cs.DB) [pdf, other]

Title: Adaptive Cardinality EstimationComments: 12 pages, 11 figures, 1 tableSubjects: Databases (cs.DB); Machine Learning (stat.ML)
In this paper we address cardinality estimation problem which is an important subproblem in query optimization. Query optimization is a part of every relational DBMS responsible for finding the best way of the execution for the given query. These ways are called plans. The execution time of different plans may differ by several orders, so query optimizer has a great influence on the whole DBMS performance. We consider costbased query optimization approach as the most popular one. It was observed that costbased optimization quality depends much on cardinality estimation quality. Cardinality of the plan node is the number of tuples returned by it.
In the paper we propose a novel cardinality estimation approach with the use of machine learning methods. The main point of the approach is using query execution statistics of the previously executed queries to improve cardinality estimations. We called this approach adaptive cardinality estimation to reflect this point. The approach is general, flexible, and easy to implement. The experimental evaluation shows that this approach significantly increases the quality of cardinality estimation, and therefore increases the DBMS performance for some queries by several times or even by several dozens of times.  [37] arXiv:1711.08331 (crosslist from cs.LG) [pdf, other]

Title: Learning User Preferences to Incentivize Exploration in the Sharing EconomyComments: Longer version of AAAI'18 paper. arXiv admin note: text overlap with arXiv:1702.02849Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
We study platforms in the sharing economy and discuss the need for incentivizing users to explore options that otherwise would not be chosen. For instance, rental platforms such as Airbnb typically rely on customer reviews to provide users with relevant information about different options. Yet, often a large fraction of options does not have any reviews available. Such options are frequently neglected as viable choices, and in turn are unlikely to be evaluated, creating a vicious cycle. Platforms can engage users to deviate from their preferred choice by offering monetary incentives for choosing a different option instead. To efficiently learn the optimal incentives to offer, we consider structural information in user preferences and introduce a novel algorithm  Coordinated Online Learning (CoOL)  for learning with structural information modeled as convex constraints. We provide formal guarantees on the performance of our algorithm and test the viability of our approach in a user study with data of apartments on Airbnb. Our findings suggest that our approach is wellsuited to learn appropriate incentives and increase exploration on the investigated platform.
 [38] arXiv:1711.08336 (crosslist from cs.CR) [pdf, other]

Title: DeepSign: Deep Learning for Automatic Malware Signature Generation and ClassificationComments: arXiv admin note: text overlap with arXiv:1207.0580 by other authorsJournalref: International Joint Conference on Neural Networks (IJCNN), pages 18, Killarney, Ireland, July 2015Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This paper presents a novel deep learning based method for automatic malware signature generation and classification. The method uses a deep belief network (DBN), implemented with a deep stack of denoising autoencoders, generating an invariant compact representation of the malware behavior. While conventional signature and token based methods for malware detection do not detect a majority of new variants for existing malware, the results presented in this paper show that signatures generated by the DBN allow for an accurate classification of new malware variants. Using a dataset containing hundreds of variants for several major malware families, our method achieves 98.6% classification accuracy using the signatures generated by the DBN. The presented method is completely agnostic to the type of malware behavior that is logged (e.g., API calls and their parameters, registry entries, websites and ports accessed, etc.), and can use any raw input from a sandbox to successfully train the deep neural network which is used to generate malware signatures.
 [39] arXiv:1711.08337 (crosslist from cs.NE) [pdf, ps, other]

Title: Genetic Algorithms for Evolving Computer Chess ProgramsComments: Winner of Gold Award in 11th Annual "Humies" Awards for HumanCompetitive Results. arXiv admin note: substantial text overlap with arXiv:1711.06840, arXiv:1711.06841, arXiv:1711.06839Journalref: IEEE Transactions on Evolutionary Computation, Vol. 18, No. 5, pp. 779789, September 2014Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
This paper demonstrates the use of genetic algorithms for evolving: 1) a grandmasterlevel evaluation function, and 2) a search mechanism for a chess program, the parameter values of which are initialized randomly. The evaluation function of the program is evolved by learning from databases of (human) grandmaster games. At first, the organisms are evolved to mimic the behavior of human grandmasters, and then these organisms are further improved upon by means of coevolution. The search mechanism is evolved by learning from tactical test suites. Our results show that the evolved program outperforms a twotime world computer chess champion and is at par with the other leading computer chess programs.
 [40] arXiv:1711.08352 (crosslist from cs.LG) [pdf, other]

Title: Likelihood Almost Free Inference NetworksComments: arXiv admin note: text overlap with arXiv:1711.02255Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Variational inference for latent variable models is prevalent in various machine learning problems, typically solved by maximizing the Evidence Lower Bound (ELBO) of the true data likelihood with respect to a variational distribution. However, freely enriching the family of variational distribution is challenging since the ELBO requires variational likelihood evaluations of the latent variables. In this paper, we propose a novel framework to enrich the variational family based on an alternative lower bound, by introducing auxiliary random variables to the variational distribution only. While offering a much richer family of complex variational distributions, the resulting inference network is likelihood almost free in the sense that only the latent variables require evaluations from simple likelihoods and samples from all the auxiliary variables are sufficient for maximum likelihood inference. We show that the proposed approach is essentially optimizing a probabilistic mixture of ELBOs, thus enriching modeling capacity and enhancing robustness. It outperforms stateoftheart methods in our experiments on several density estimation tasks.
 [41] arXiv:1711.08364 (crosslist from cs.CV) [pdf, other]

Title: ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with nearoptimal informationtheoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1' for the visited tree leaf, and `0' for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tree split node into two groups, obtaining a significantly simplified twoclass classification problem, which can be handled using a lightweight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A nonconventional lowrank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intraclass variations and maximizing interclass distance for the two random class groups. Finally, we introduce an informationtheoretic approach for aggregating codes of individual trees into a single hash code, producing a nearoptimal unique hash for each class. The proposed approach significantly outperforms stateoftheart hashing methods for image retrieval tasks on largescale public datasets, while performing at the level of other stateoftheart image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of lightweight CNNs, instead of simply going deeper.
 [42] arXiv:1711.08413 (crosslist from cs.CV) [pdf]

Title: SolarisNet: A Deep Regression Network for Solar Radiation PredictionComments: Submitted to I2MTC 2018Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP); Machine Learning (stat.ML)
Effective utilization of photovoltaic (PV) plants requires weather variability robust global solar radiation (GSR) forecasting models. Random weather turbulence phenomena coupled with assumptions of clear sky model as suggested by Hottel pose significant challenges to parametric & nonparametric models in GSR conversion rate estimation. Also, a decent GSR estimate requires costly hightech radiometer and expert dependent instrument handling and measurements, which are subjective. As such, a computer aided monitoring (CAM) system to evaluate PV plant operation feasibility by employing smart grid past data analytics and deep learning is developed. Our algorithm, SolarisNet is a 6layer deep neural network trained on data collected at two weather stations located near Kalyani metrological site, West Bengal, India. The daily GSR prediction performance using SolarisNet outperforms the existing state of art and its efficacy in inferring past GSR data insights to comprehend daily and seasonal GSR variability along with its competence for short term forecasting is discussed.
 [43] arXiv:1711.08421 (crosslist from cs.DS) [pdf, ps, other]

Title: ReliefBased Feature Selection: Introduction and ReviewComments: Submitted for publication, November 2017Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)
Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Reliefbased algorithms (RBAs), a unique family of filterstyle feature selection algorithms that strike an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.
 [44] arXiv:1711.08442 (crosslist from cs.LG) [pdf, other]

Title: From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping SetsComments: AAAI2018, 10 PagesSubjects: Learning (cs.LG); Machine Learning (stat.ML)
We propose a Las Vegas transformation of Markov Chain Monte Carlo (MCMC) estimators of Restricted Boltzmann Machines (RBMs). We denote our approach Markov Chain Las Vegas (MCLV). MCLV gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has maximum number of Markov chain steps K (referred as MCLVK). We present a MCLVK gradient estimator (LVSK) for RBMs and explore the correspondence and differences between LVSK and Contrastive Divergence (CDK), with LVSK significantly outperforming CDK training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.
Replacements for Thu, 23 Nov 17
 [45] arXiv:1207.0437 (replaced) [pdf, other]

Title: Dendrogram/Regionalization of U. S. Counties Based upon Migration FlowsAuthors: Paul B. SlaterComments: 44 pages, certain passages extracted from arXiv:0809.2768, technical problem addressed in more faithfully including the multipage dendrogram (by slightly scaling the individual pages), URL link correctedSubjects: Physics and Society (physics.socph); Social and Information Networks (cs.SI); Applications (stat.AP)
 [46] arXiv:1411.7481 (replaced) [pdf, other]

Title: Nonparametric Bayesian Inference for Mean Residual Life Functions in Survival AnalysisSubjects: Methodology (stat.ME)
 [47] arXiv:1412.3730 (replaced) [pdf, other]

Title: Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing ItComments: 70 pages, 20 figuresSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
 [48] arXiv:1511.06028 (replaced) [pdf, ps, other]

Title: Optimal inference in a class of regression modelsComments: 39 pages plus supplementary materialsSubjects: Statistics Theory (math.ST); Applications (stat.AP)
 [49] arXiv:1607.02793 (replaced) [pdf, other]

Title: On Faster Convergence of Cyclic Block Coordinate Descenttype Methods for Strongly Convex MinimizationComments: Accepted by JLMRSubjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
 [50] arXiv:1611.09384 (replaced) [pdf, other]

Title: The Emergence of Organizing Structure in Conceptual RepresentationComments: In press at Cognitive ScienceSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [51] arXiv:1612.05179 (replaced) [pdf, ps, other]

Title: Regression assisted inference for the average treatment effect in paired experimentsAuthors: Colin B. FogartySubjects: Methodology (stat.ME)
 [52] arXiv:1701.05976 (replaced) [pdf, other]

Title: How often does the best team win? A unified approach to understanding randomness in North American sportComments: 40 pages, 20 figures, 5 tables, code available at this https URLSubjects: Applications (stat.AP)
 [53] arXiv:1701.06010 (replaced) [pdf, other]

Title: Covariance Functions for Multivariate Gaussian Fields Evolving Temporally over Planet EarthSubjects: Statistics Theory (math.ST)
 [54] arXiv:1703.09842 (replaced) [pdf, other]

Title: Inverse RiskSensitive Reinforcement LearningComments: v3 (comments regarding updates): We significantly extended the theory (Theorem 2, 3, 5 and Proposition 3). We also correct some minor typos throughout the document; v2 (comments regarding updates): We corrected some notational typos and made clarifications in the proof. We also added clarifying remarks regarding reference points and acceptance levels which were previously conflatedSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [55] arXiv:1704.08742 (replaced) [pdf, other]

Title: Efficient Feature Screening for LassoType Problems via Hybrid SafeStrong RulesComments: 31 pages, 4 figuresSubjects: Machine Learning (stat.ML); Computation (stat.CO)
 [56] arXiv:1706.05940 (replaced) [pdf, other]

Title: Detection of BlockExchangeable Structure in LargeScale Correlation MatricesSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
 [57] arXiv:1707.00724 (replaced) [pdf, other]

Title: Efficient Probabilistic Performance Bounds for Inverse Reinforcement LearningComments: Accepted to AAAI18Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
 [58] arXiv:1709.06716 (replaced) [pdf, other]

Title: Contrastive Principal Component AnalysisComments: main body is 10 pages, 9 figuresSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [59] arXiv:1709.07109 (replaced) [pdf, other]

Title: Deconvolutional LatentVariable Model for Text Sequence MatchingComments: Accepted by AAAI2018Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
 [60] arXiv:1710.07491 (replaced) [pdf, ps, other]

Title: Dynamic classifier chains for multilabel learningSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [61] arXiv:1711.00342 (replaced) [pdf, ps, other]

Title: Orthogonal Machine Learning: Power and LimitationsSubjects: Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [62] arXiv:1711.01968 (replaced) [pdf, other]

Title: Deformable Deep Convolutional Generative Adversarial Network in Microwave Based Hand Gesture Recognition SystemComments: Accepted by International Conference on Wireless Communications and Signal Processing 2017Subjects: Machine Learning (stat.ML); Learning (cs.LG)
 [63] arXiv:1711.03189 (replaced) [pdf, other]

Title: Deep Hyperspherical LearningComments: To appear in NIPS 2017 (Spotlight)Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [64] arXiv:1711.04297 (replaced) [pdf, other]

Title: On the ERM Principle with Networked DataComments: accepted by AAAI. arXiv admin note: substantial text overlap with arXiv:math/0702683 by other authorsSubjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
 [65] arXiv:1711.06195 (replaced) [pdf, other]

Title: NeurologyasaService for the Developing WorldAuthors: Tejas Dharamsi, Payel Das, Tejaswini Pedapati, Gregory Bramble, Vinod Muthusamy, Horst Samulowitz, Kush R. Varshney, Yuvaraj Rajamanickam, John Thomas, Justin DauwelsComments: Presented at NIPS 2017 Workshop on Machine Learning for the Developing WorldSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [66] arXiv:1711.06346 (replaced) [pdf, other]

Title: Mosquito detection with lowcost smartphones: data acquisition for malaria researchAuthors: Yunpeng Li, Davide Zilli, Henry Chan, Ivan Kiskin, Marianne Sinka, Stephen Roberts, Kathy WillisComments: Presented at NIPS 2017 Workshop on Machine Learning for the Developing WorldSubjects: Machine Learning (stat.ML); Computers and Society (cs.CY)
 [67] arXiv:1711.06373 (replaced) [pdf, other]

Title: Thoracic Disease Identification and Localization with Limited SupervisionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)