Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Mon, 25 Sep 17
 [1] arXiv:1709.07471 [pdf]

Title: Stability of Spatial Smoothness and ClusterSize Threshold Estimates in FMRI using AFNIComments: 4 figures, 20 pagesSubjects: Applications (stat.AP)
In a recent analysis of FMRI datasets [K Mueller et al, Front Hum Neurosci 11:345], the estimated spatial smoothness parameters and the statistical significance of clusters were found to depend strongly on the resampled voxel size (for the same data, over a range of 1 to 3 mm) in one popular FMRI analysis software package (SPM12). High sensitivity of thresholding results on such an arbitrary parameter as final spatial grid size is an undesirable feature in a processing pipeline. Here, we examine the stability of spatial smoothness and clustervolume threshold estimates with respect to voxel resampling size in the AFNI software package's pipeline. A publicly available collection of restingstate and task FMRI datasets from 78 subjects was analyzed using standard processing steps in AFNI. We found that the spatial smoothness and clustervolume thresholds are fairly stable over the voxel resampling size range of 1 to 3 mm, in contradistinction to the reported results from SPM12.
 [2] arXiv:1709.07498 [pdf, other]

Title: Decision making and uncertainty quantification for individualized treatmentsComments: 24 pages, 6 figuresSubjects: Methodology (stat.ME)
Individualized treatment rules (ITR) can improve health outcomes by recognizing that patients may respond differently to treatment and assigning therapy with the most desirable predicted outcome for each individual. Flexible and efficient prediction models are desired as a basis for such ITRs to handle potentially complex interactions between patient factors and treatment. Modern Bayesian semiparametric and nonparametric regression models provide an attractive avenue in this regard as these allow natural posterior uncertainty quantification of patient specific treatment decisions as well as the population wide value of the predictionbased ITR. In addition, via the use of such models, inference is also available for the value of the Optimal ITR. We propose such an approach and implement it using Bayesian Additive Regression Trees (BART) as this model has been shown to perform well in fitting nonparametric regression functions to continuous and binary responses, even with many covariates. It is also computationally efficient for use in practice. With BART we investigate a treatment strategy which utilizes individualized predictions of patient outcomes from BART models. Posterior distributions of patient outcomes under each treatment are used to assign the treatment that maximizes the expected posterior utility. We also describe how to approximate such a treatment policy with a clinically interpretable ITR, and quantify its expected outcome. The proposed method performs very well in extensive simulation studies in comparison with several existing methods. We illustrate the usage of the proposed method to identify an individualized choice of conditioning regimen for patients undergoing hematopoietic cell transplantation and quantify the value of this method of choice in relation to the Optimal ITR as well as nonindividualized treatment strategies.
 [3] arXiv:1709.07524 [pdf, other]

Title: Achieving Parsimony in Bayesian VARs with the Horseshoe PriorSubjects: Applications (stat.AP)
In the context of a vector autoregression (VAR) model, or any multivariate regression model, the number of relevant predictors may be small relative to the information set available from which to build a prediction equation. It is well known that forecasts based off of (unpenalized) least squares estimates can overfit the data and lead to poor predictions. Since the Minnesota prior was proposed (Doan et al. (1984)), there have been many methods developed aiming at improving prediction performance. In this paper we propose the horseshoe prior (Carvalho et al. (2010), Carvalho et al. (2009)) in the context of a Bayesian VAR. The horseshoe prior is a unique shrinkage prior scheme in that it shrinks irrelevant signals rigorously to 0 while allowing large signals to remain large and practically unshrunk. In an empirical study, we show that the horseshoe prior competes favorably with shrinkage schemes commonly used in Bayesian VAR models as well as with a prior that imposes true sparsity in the coefficient vector. Additionally, we propose the use of particle Gibbs with backwards simulation (Lindsten et al. (2012), Andrieu et al. (2010)) for the estimation of the timevarying volatility parameters. We provide a detailed description of all MCMC methods used in the supplementary material that is available online.
 [4] arXiv:1709.07542 [pdf, other]

Title: Heteroscedastic BART Using Multiplicative Regression TreesSubjects: Methodology (stat.ME)
Bayesian additive regression trees (BART) has become increasingly popular as a flexible and scalable nonparametric model useful in many modern applied statistics regression problems. It brings many advantages to the practitioner dealing with large and complex nonlinear response surfaces, such as a matrixfree formulation and the lack of a requirement to specify a regression basis a priori. However, while flexible in fitting the mean, the basic BART model relies on the standard i.i.d. normal model for the errors. This assumption is unrealistic in many applications. Moreover, in many applied problems understanding the relationship between the variance and predictors can be just as important as that of the mean model. We develop a novel heteroscedastic BART model to alleviate these concerns. Our approach is entirely nonparametric and does not rely on an a priori basis for the variance model. In BART, the conditional mean is modeled as a sum of trees, each of which determines a contribution to the overall mean. In this paper, we model the conditional variance with a product of trees, each of which determines a contribution to the overall variance. We implement the approach and demonstrate it on a simple lowdimensional simulated dataset, a higherdimensional dataset of used car prices, a fisheries dataset and data from an alcohol consumption study.
 [5] arXiv:1709.07556 [pdf, other]

Title: Recent Advances on Estimating Population Size with LinkTracing SamplingAuthors: Kyle VincentSubjects: Methodology (stat.ME)
A new approach to estimate population size based on a stratified linktracing sampling design is presented. The method extends on the Frank and Snijders (1994) approach by allowing for heterogeneity in the initial sample selection procedure. RaoBlackwell estimators and corresponding resampling approximations similar to that detailed in Vincent and Thompson (2017) are explored. An empirical application is provided for a hardtoreach networked population. The results demonstrate that the approach has much potential for application to such populations. Supplementary materials for this article are available online.
 [6] arXiv:1709.07557 [pdf, other]

Title: A preconditioning approach for improved estimation of sparse polynomial chaos expansionsSubjects: Computation (stat.CO)
Compressive sampling has been widely used for sparse polynomial chaos (PC) approximation of stochastic functions. The recovery accuracy of compressive sampling depends on the coherence properties of measurement matrix. In this paper, we consider preconditioning the measurement matrix. Premultiplying a linear equation system by a nonsingular matrix results in an equivalent equation system, but it can impact the coherence properties of preconditioned measurement matrix and lead to a different recovery accuracy. In this work, we propose a preconditioning scheme that significantly improves the coherence properties of measurement matrix, and using theoretical motivations and numerical examples highlight the promise of the proposed approach in improving the accuracy of estimated polynomial chaos expansions.
 [7] arXiv:1709.07588 [pdf, ps, other]

Title: Abandon Statistical SignificanceSubjects: Methodology (stat.ME)
In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a pvalue that surpasses the 0.05 threshold and only then is considerationoften scantgiven to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the pvalue threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving pvalues as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.
 [8] arXiv:1709.07593 [pdf]

Title: The Long Term Fréchet distribution: Estimation, Properties and its ApplicationComments: 13 pages, 2 figures, 7 tablesJournalref: Biom Biostat Int J 6(3): 00170Subjects: Statistics Theory (math.ST)
In this paper a new longterm survival distribution is proposed. The so called long term Fr\'echet distribution allows us to fit data where a part of the population is not susceptible to the event of interest. This model may be used, for example, in clinical studies where a portion of the population can be cured during a treatment. It is shown an account of mathematical properties of the new distribution such as its moments and survival properties. As well is presented the maximum likelihood estimators (MLEs) for the parameters. A numerical simulation is carried out in order to verify the performance of the MLEs. Finally, an important application related to the leukemia freesurvival times for transplant patients are discussed to illustrates our proposed distribution
 [9] arXiv:1709.07616 [pdf, other]

Title: Generalized Bayesian Updating and the LossLikelihood BootstrapSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
In this paper, we revisit the weighted likelihood bootstrap and show that it is wellmotivated for Bayesian inference under misspecified models. We extend the underlying idea to a wider family of inferential problems. This allows us to calibrate an analogue of the likelihood function in situations where little is known about the datagenerating mechanism. We demonstrate our method on a number of examples.
 [10] arXiv:1709.07625 [pdf, ps, other]

Title: Total stability of kernel methodsSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Regularized empirical risk minimization using kernels and their corresponding reproducing kernel Hilbert spaces (RKHSs) plays an important role in machine learning. However, the actually used kernel often depends on one or on a few hyperparameters or the kernel is even data dependent in a much more complicated manner. Examples are Gaussian RBF kernels, kernel learning, and hierarchical Gaussian kernels which were recently proposed for deep learning. Therefore, the actually used kernel is often computed by a grid search or in an iterative manner and can often only be considered as an approximation to the "ideal" or "optimal" kernel. The paper gives conditions under which classical kernel based methods based on a convex Lipschitz loss function and on a bounded and smooth kernel are stable, if the probability measure $P$, the regularization parameter $\lambda$, and the kernel $k$ may slightly change in a simultaneous manner. Similar results are also given for pairwise learning. Therefore, the topic of this paper is somewhat more general than in classical robust statistics, where usually only the influence of small perturbations of the probability measure $P$ on the estimated function is considered.
 [11] arXiv:1709.07637 [pdf, other]

Title: Hierarchical Kriging for multifidelity aeroservoelastic simulators  Application to extreme loads on wind turbinesSubjects: Computation (stat.CO); Applications (stat.AP)
In the present work, we consider multifidelity surrogate modelling to fuse the output of multiple aeroservoelastic computer simulators of varying complexity. In many instances, predictions from multiple simulators for the same quantity of interest on a wind turbine are available. In this type of situation, there is strong evidence that fusing the output from multiple aeroservoelastic simulators yields better predictive ability and lower model uncertainty than using any single simulator. Hierarchical Kriging is a multifidelity surrogate modelling method in which the Kriging surrogate model of the cheap (lowfidelity) simulator is used as a trend of the Kriging surrogate model of the higher fidelity simulator. We propose a parametric approach to Hierarchical Kriging where the best surrogate models are selected based on evaluating all possible combinations of the available Kriging parameters candidates. The parametric Hierarchical Kriging approach is illustrated by fusing the extreme flapwise bending moment at the blade root of a large multimegawatt wind turbine as a function of wind velocity, turbulence and wind shear exponent in the presence of model uncertainty and heterogeneously noisy output. The extreme responses are obtained by two widely accepted wind turbine specific aeroservoelastic computer simulators, FAST and Bladed. With limited highfidelity simulations, Hierarchical Kriging produces more accurate predictions of validation data compared to conventional Kriging. In addition, contrary to conventional Kriging, Hierarchical Kriging is shown to be a robust surrogate modelling technique because it is less sensitive to the choice of the Kriging parameters and the choice of the estimation error.
 [12] arXiv:1709.07638 [pdf, other]

Title: Approximate Bayesian Inference in Linear State Space Models for Intermittent Demand Forecasting at ScaleAuthors: Matthias Seeger, Syama Rangapuram, Yuyang Wang, David Salinas, Jan Gasthaus, Tim Januschowski, Valentin FlunkertSubjects: Machine Learning (stat.ML); Learning (cs.LG)
We present a scalable and robust Bayesian inference method for linear state space models. The method is applied to demand forecasting in the context of a large ecommerce platform, paying special attention to intermittent and bursty target statistics. Inference is approximated by the NewtonRaphson algorithm, reduced to lineartime Kalman smoothing, which allows us to operate on several orders of magnitude larger problems than previous related work. In a study on large realworld sales datasets, our method outperforms competing approaches on fast and medium moving items.
 [13] arXiv:1709.07662 [pdf, ps, other]

Title: Estimating the maximum possible earthquake magnitude using extreme value methodology: the Groningen caseSubjects: Applications (stat.AP); Geophysics (physics.geoph)
The areacharacteristic, maximum possible earthquake magnitude $T_M$ is required by the earthquake engineering community, disaster management agencies and the insurance industry. The GutenbergRichter law predicts that earthquake magnitudes $M$ follow a truncated exponential distribution. In the geophysical literature several estimation procedures were proposed, see for instance Kijko and Singh (Acta Geophys., 2011) and the references therein. Estimation of $T_M$ is of course an extreme value problem to which the classical methods for endpoint estimation could be applied. We argue that recent methods on truncated tails at high levels (Beirlant et al., Extremes, 2016; Electron. J. Stat., 2017) constitute a more appropriate setting for this estimation problem. We present upper confidence bounds to quantify uncertainty of the point estimates. We also compare methods from the extreme value and geophysical literature through simulations. Finally, the different methods are applied to the magnitude data for the earthquakes induced by gas extraction in the Groningen province of the Netherlands.
 [14] arXiv:1709.07710 [pdf, ps, other]

Title: Barker's algorithm for Bayesian inference with intractable likelihoodsComments: To appear in the Brazilian Journal of Probability and StatisticsSubjects: Computation (stat.CO)
In this expository paper we abstract and describe a simple MCMC scheme for sampling from intractable target densities. The approach has been introduced in Gon\c{c}alves et al. (2017a) in the specific context of jumpdiffusions, and is based on the Barker's algorithm paired with a simple Bernoulli factory type scheme, the so called 2coin algorithm. In many settings it is an alternative to standard MetropolisHastings pseudomarginal method for simulating from intractable target densities. Although Barker's is wellknown to be slightly less efficient than MetropolisHastings, the key advantage of our approach is that it allows to implement the "marginal Barker's" instead of the extended state space pseudomarginal MetropolisHastings, owing to the special form of the accept/reject probability. We shall illustrate our methodology in the context of Bayesian inference for discretely observed WrightFisher family of diffusions.
 [15] arXiv:1709.07716 [pdf, other]

Title: Testing covariate significance in spatial point process firstorder intensityComments: 22 pages (15 main doc + 7 appendix); 8 figures; 3 tablesSubjects: Methodology (stat.ME); Applications (stat.AP)
Modelling the firstorder intensity function in one of the main aims in point process theory, and it has been approached so far from different perspectives. One appealing model describes the intensity as a function of a spatial covariate. In the recent literature, estimation theory and several applications have been developed assuming this hypothesis, but without formally checking the goodnessoffit of the model.
In this paper we address this problem and test whether the model is appropriate. We propose a test statistic based on a $L^2$distance; we prove the asymptotic normality of the statistic and suggest a bootstrap procedure to calibrate the test. We present two applications with real data and a simulation study to better understand the performance of our proposals.  [16] arXiv:1709.07731 [pdf, ps, other]

Title: Estimate Exchange over Network is Good for Distributed Hard Thresholding PursuitSubjects: Machine Learning (stat.ML); Signal Processing (eess.SP)
We investigate an existing distributed algorithm for learning sparse signals or data over networks. The algorithm is iterative and exchanges intermediate estimates of a sparse signal over a network. This learning strategy using exchange of intermediate estimates over the network requires a limited communication overhead for information transmission. Our objective in this article is to show that the strategy is good for learning in spite of limited communication. In pursuit of this objective, we first provide a restricted isometry property (RIP)based theoretical analysis on convergence of the iterative algorithm. Then, using simulations, we show that the algorithm provides competitive performance in learning sparse signals visavis an existing alternate distributed algorithm. The alternate distributed algorithm exchanges more information including observations and system parameters.
 [17] arXiv:1709.07752 [pdf, ps, other]

Title: Bernstein  von Mises theorems for statistical inverse problems II: Compound Poisson processesComments: 47 pagesSubjects: Statistics Theory (math.ST)
We study nonparametric Bayesian statistical inference for the parameters governing a pure jump process of the form $$Y_t = \sum_{k=1}^{N(t)} Z_k,~~~ t \ge 0,$$ where $N(t)$ is a standard Poisson process of intensity $\lambda$, and $Z_k$ are drawn i.i.d. from jump measure $\mu$. A highdimensional wavelet series prior for the L\'evy measure $\nu = \lambda \mu$ is devised and the posterior distribution arises from observing discrete samples $Y_\Delta, Y_{2\Delta}, \dots, Y_{n\Delta}$ at fixed observation distance $\Delta$, giving rise to a nonlinear inverse inference problem. We derive contraction rates in uniform norm for the posterior distribution around the true L\'evy density that are optimal up to logarithmic factors over H\"older classes, as sample size $n$ increases. We prove a functional Bernsteinvon Mises theorem for the distribution functions of both $\mu$ and $\nu$, as well as for the intensity $\lambda$, establishing the fact that the posterior distribution is approximated by an infinitedimensional Gaussian measure whose covariance structure is shown to attain the Cram\'erRao lower bound for this inverse problem. As a consequence posterior based inferences, such as nonparametric credible sets, are asymptotically valid and optimal from a frequentist point of view.
 [18] arXiv:1709.07778 [pdf, other]

Title: On predictive density estimation with additional informationComments: 30 pages, 4 FiguresSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Based on independently distributed $X_1 \sim N_p(\theta_1, \sigma^2_1 I_p)$ and $X_2 \sim N_p(\theta_2, \sigma^2_2 I_p)$, we consider the efficiency of various predictive density estimators for $Y_1 \sim N_p(\theta_1, \sigma^2_Y I_p)$, with the additional information $\theta_1  \theta_2 \in A$ and known $\sigma^2_1, \sigma^2_2, \sigma^2_Y$. We provide improvements on benchmark predictive densities such as plugin, the maximum likelihood, and the minimum risk equivariant predictive densities. Dominance results are obtained for $\alpha$divergence losses and include Bayesian improvements for reverse KullbackLeibler loss, and KullbackLeibler (KL) loss in the univariate case ($p=1$). An ensemble of techniques are exploited, including variance expansion (for KL loss), point estimation duality, and concave inequalities. Representations for Bayesian predictive densities, and in particular for $\hat{q}_{\pi_{U,A}}$ associated with a uniform prior for $\theta=(\theta_1, \theta_2)$ truncated to $\{\theta \in \mathbb{R}^{2p}: \theta_1  \theta_2 \in A \}$, are established and are used for the Bayesian dominance findings. Finally and interestingly, these Bayesian predictive densities also relate to skewnormal distributions, as well as new forms of such distributions.
 [19] arXiv:1709.07779 [pdf, other]

Title: The GENIUS Approach to Robust Mendelian Randomization InferenceComments: 41 pages, 3 figuresSubjects: Methodology (stat.ME)
Mendelian randomization (MR) is a popular instrumental variable (IV) approach, in which one or several genetic markers serve as IVs that can be leveraged to recover under certain conditions, valid inferences about a given exposureoutcome causal association subject to unmeasured confounding. A key IV identification condition known as the exclusion restriction states that the IV has no direct effect on the outcome that is not mediated by the exposure in view. In MR studies, such an assumption requires an unrealistic level of knowledge and understanding of the mechanism by which the genetic markers causally affect the outcome, particularly when a large number of genetic variants are considered as IVs. As a result, possible violation of the exclusion restriction can seldom be ruled out in such MR studies, and if present, such violation can invalidate IVbased inferences even if unbeknownst to the analyst, confounding is either negligible or absent. To address this concern, we introduce a new class of IV estimators which are robust to violation of the exclusion restriction under a large collection of data generating mechanisms consistent with parametric models commonly assumed in the MR literature. Our approach which we have named "MR GEstimation under No Interaction with Unmeasured Selection" (MR GENIUS) may in fact be viewed as a modification to Robins' Gestimation approach that is robust to both additive unmeasured confounding and violation of the exclusion restriction assumption. We also give fairly weak conditions under which MR GENIUS is also robust to unmeasured confounding of the IVoutcome relation, another possible violation of a key IV Identification condition.
 [20] arXiv:1709.07796 [pdf, other]

Title: On overfitting and asymptotic bias in batch reinforcement learning with partial observabilitySubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
This paper stands in the context of reinforcement learning with partial observability and limited data. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data), and theoretically show that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. Our analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations. Finally, we also discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting.
 [21] arXiv:1709.07798 [pdf]

Title: A multivariate zeroinflated logistic model for microbiome relative abundance dataAuthors: Zhigang Li, Katherine Lee, Margaret R. Karagas, Juliette C. Madan, Anne G. Hoen, Hongzhe LiComments: Corresponding contact: Zhigang.Li@dartmouth.eduSubjects: Applications (stat.AP)
The human microbiome plays critical roles in human health and has been linked to many diseases. While advanced sequencing technologies can characterize the composition of the microbiome in unprecedented detail, it remains challenging to disentangle the complex interplay between human microbiome and disease risk factors due to the complicated nature of microbiome data. Excessive number of zero values, high dimensionality, the hierarchical phylogenetic tree and compositional structure are compounded and consequently make existing methods inadequate to appropriately address these issues. We propose a multivariate twopart model, zeroinflated logistic normal (ZILN) model to analyze the association of disease risk factors with individual microbial taxa and overall microbial community composition. This approach can naturally handle excessive numbers of zeros and the compositional data structure with the zero part and the logisticnormal part of the model. For parameter estimation, an estimating equations approach is employed and enables us to address the complex intertaxa correlation structure induced by the hierarchical phylogenetic tree structure and the compositional data structure. This model is able to incorporate standard regularization approaches to deal with high dimensionality. Simulation shows that our model outperforms existing methods. Performance of our approach is also demonstrated through the application of the model in a real data set.
 [22] arXiv:1709.07842 [pdf, other]

Title: Bayesian Optimization for Parameter Tuning of the XOR Neural NetworkSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)
When applying Machine Learning techniques to problems, one must select model parameters to ensure that the system converges but also does not become stuck at the objective function's local minimum. Tuning these parameters becomes a nontrivial task for large models and it is not always apparent if the user has found the optimal parameters. We aim to automate the process of tuning a Neural Network, (where only a limited number of parameter search attempts are available) by implementing Bayesian Optimization. In particular, by assigning Gaussian Process Priors to the parameter space, we utilize Bayesian Optimization to tune an Artificial Neural Network used to learn the XOR function, with the result of achieving higher prediction accuracy.
Crosslists for Mon, 25 Sep 17
 [23] arXiv:1709.06917 (crosslist from cs.RO) [pdf, other]

Title: Using Parameterized BlackBox Priors to Scale Up ModelBased Policy Search for RoboticsComments: 8 pages, 4 figures, 2 algorithms, 1 table; Video at this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The most dataefficient algorithms for reinforcement learning in robotics are modelbased policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced BlackDROPS algorithm exploits a blackbox optimization algorithm to achieve both high dataefficiency and good computation times when several cores are used; nevertheless, like all modelbased policy search approaches, BlackDROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in BlackDROPS that leverages parameterized blackbox priors to (1) scale up to highdimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the "pendubot" swingup task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more dataefficient than previous modelbased policy search algorithms (with and without priors) and that it can allow a physical 6legged robot to learn new gaits in only 16 to 30 seconds of interaction time.
 [24] arXiv:1709.06919 (crosslist from cs.RO) [pdf, other]

Title: Bayesian Optimization with Automatic Prior Selection for DataEfficient Direct Policy SearchComments: 8 pages, 4 figures, 1 algorithm; Video at this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5DOF planar arm and on a possibly damaged, 6legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors.
 [25] arXiv:1709.07446 (crosslist from qfin.MF) [pdf, ps, other]

Title: Arbitrage and GeometryComments: 22 pages, 9 figuresSubjects: Mathematical Finance (qfin.MF); Statistics Theory (math.ST)
This article introduces the notion of arbitrage for a situation involving a collection of investments and a payoff matrix describing the return to an investor of each investment under each of a set of possible scenarios. We explain the Arbitrage Theorem, discuss its geometric meaning, and show its equivalence to Farkas' Lemma. We then ask a seemingly innocent question: given a random payoff matrix, what is the probability of an arbitrage opportunity? This question leads to some interesting geometry involving hyperplane arrangements and related topics.
 [26] arXiv:1709.07534 (crosslist from cs.AI) [pdf, other]

Title: MRNetProduct2Vec: A Multitask Recurrent Neural Network for Product EmbeddingsComments: Published in ECMLPKDD 2017 (Applied Data Science Track)Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Ecommerce websites such as Amazon, Alibaba, Flipkart, and Walmart sell billions of products. Machine learning (ML) algorithms involving products are often used to improve the customer experience and increase revenue, e.g., product similarity, recommendation, and price estimation. The products are required to be represented as features before training an ML algorithm. In this paper, we propose an approach called MRNetProduct2Vec for creating generic embeddings of products within an ecommerce ecosystem. We learn a dense and lowdimensional embedding where a diverse set of signals related to a product are explicitly injected into its representation. We train a Discriminative Multitask Bidirectional Recurrent Neural Network (RNN), where the input is a product title fed through a Bidirectional RNN and at the output, product labels corresponding to fifteen different tasks are predicted. The task set includes several intrinsic characteristics about a product such as price, weight, size, color, popularity, and material. We evaluate the proposed embedding quantitatively and qualitatively. We demonstrate that they are almost as good as sparse and extremely highdimensional TFIDF representation in spite of having less than 3% of the TFIDF dimension. We also use a multimodal autoencoder for comparing products from different languageregions and show preliminary yet promising qualitative results.
 [27] arXiv:1709.07601 (crosslist from cs.DS) [pdf, ps, other]

Title: Stochastic Input Models in Online ComputingAuthors: Yasushi KawaseSubjects: Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)
In this paper, we study twelve stochastic input models for online problems and reveal the relationships among the competitive ratios for the models. The competitive ratio is defined as the worst ratio between the expected optimal value and the expected profit of the solution obtained by the online algorithm where the input distribution is restricted according to the model. To handle a broad class of online problems, we use a framework called requestanswer games that is introduced by BenDavid et al. The stochastic input models consist of two types: known distribution and unknown distribution. For each type, we consider six classes of distributions: dependent distributions, deterministic input, independent distributions, identical independent distribution, random order of a deterministic input, and random order of independent distributions. As an application of the models, we consider two basic online problems, which are variants of the secretary problem and the prophet inequality problem, under the twelve stochastic input models. We see the difference of the competitive ratios through these problems.
 [28] arXiv:1709.07808 (crosslist from quantph) [pdf, other]

Title: Quantum Memristors in Quantum PhotonicsSubjects: Quantum Physics (quantph); Mesoscale and Nanoscale Physics (condmat.meshall); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We propose a method to build quantum memristors in quantum photonic platforms. We firstly design an effective beam splitter, which is tunable in realtime, by means of a MachZehndertype array with two equal 50:50 beam splitters and a tunable retarder, which allows us to control its reflectivity. Then, we show that this tunable beam splitter, when equipped with weak measurements and classical feedback, behaves as a quantum memristor. Indeed, in order to prove its quantumness, we show how to codify quantum information in the coherent beams. Moreover, we estimate the memory capability of the quantum memristor. Finally, we show the feasibility of the proposed setup in integrated quantum photonics.
 [29] arXiv:1709.07848 (crosslist from quantph) [pdf, other]

Title: Generalized Quantum Reinforcement Learning with Quantum TechnologiesSubjects: Quantum Physics (quantph); Mesoscale and Nanoscale Physics (condmat.meshall); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose a protocol to perform generalized quantum reinforcement learning with quantum technologies. At variance with recent results on quantum reinforcement learning with superconducting circuits [L. Lamata, Sci. Rep. 7, 1609 (2017)], in our current protocol coherent feedback during the learning process is not required, enabling its implementation in a wide variety of quantum systems. We consider diverse possible scenarios for an agent, an environment, and a register that connects them, involving multiqubit and multilevel systems, as well as opensystem dynamics. We finally propose possible implementations of this protocol in trapped ions and superconducting circuits. The field of quantum reinforcement learning with quantum technologies will enable enhanced quantum control, as well as more efficient machine learning calculations.
 [30] arXiv:1709.07871 (crosslist from cs.CV) [pdf, other]

Title: FiLM: Visual Reasoning with a General Conditioning LayerComments: Extends arXiv:1707.03017Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
We introduce a generalpurpose conditioning method for neural networks called FiLM: Featurewise Linear Modulation. FiLM layers influence neural network computation via a simple, featurewise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning  answering imagerelated questions which require a multistep, highlevel process  a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve stateoftheart error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zeroshot.
Replacements for Mon, 25 Sep 17
 [31] arXiv:1605.02408 (replaced) [pdf, ps, other]

Title: Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity AnalysisSubjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
 [32] arXiv:1606.05771 (replaced) [pdf, other]

Title: Brief Report on Estimating Regularized Gaussian Networks from Continuous and Ordinal DataAuthors: Sacha EpskampSubjects: Methodology (stat.ME); Applications (stat.AP)
 [33] arXiv:1609.07958 (replaced) [pdf, other]

Title: Binary Hypothesis Testing via Measure Transformed Quasi Likelihood Ratio TestComments: Important notice  The paper: N. Halay and K. Todros, "Plugin measuretransformed quasi likelihood ratio test for random signal detection," IEEE Signal Processing Letters, vol. 24, no. 6, pp. 838842, Jun. 2017, refers to the first arxiv version of this article this https URLSubjects: Methodology (stat.ME)
 [34] arXiv:1610.09572 (replaced) [pdf, ps, other]

Title: Density Tracking by Quadrature for Stochastic Differential EquationsComments: 38 pages, 4 figures, extensive revisions made to v2, comments welcomeSubjects: Computation (stat.CO); Numerical Analysis (math.NA); Probability (math.PR)
 [35] arXiv:1703.00144 (replaced) [pdf, other]

Title: Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement RankComments: 13 pages, 1 figureSubjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [36] arXiv:1703.00864 (replaced) [pdf, other]

Title: The Unreasonable Effectiveness of Structured Random Orthogonal EmbeddingsSubjects: Machine Learning (stat.ML); Computation (stat.CO)
 [37] arXiv:1703.02379 (replaced) [pdf, other]

Title: Global WeisfeilerLehman Graph KernelsComments: 10 pages, accepted at IEEE ICDM 2017 ("Glocalized WeisfeilerLehman Graph Kernels: GlobalLocal Feature Maps of Graphs")Subjects: Learning (cs.LG); Machine Learning (stat.ML)
 [38] arXiv:1704.00666 (replaced) [pdf, other]
 [39] arXiv:1704.04222 (replaced) [pdf, other]

Title: Learning Latent Representations for Speech Generation and TransformationComments: Accepted to Interspeech 2017Journalref: Interspeech 2017, pp 12731277Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
 [40] arXiv:1705.10261 (replaced) [pdf, other]

Title: Sparse MaximumEntropy Random Graphs with a Given PowerLaw Degree DistributionSubjects: Probability (math.PR); Statistical Mechanics (condmat.statmech); Social and Information Networks (cs.SI); Statistics Theory (math.ST); Physics and Society (physics.socph)
 [41] arXiv:1706.03475 (replaced) [pdf, other]

Title: Confident Multiple Choice LearningComments: Accepted in ICML 2017Subjects: Learning (cs.LG); Machine Learning (stat.ML)
 [42] arXiv:1708.02107 (replaced) [pdf, other]

Title: Adaptive Estimation of Nonparametric Geometric GraphsComments: 39 pages, 4 figures; real data experiment (Gr\'evy's zebras in Kenya) addedSubjects: Statistics Theory (math.ST); Probability (math.PR)
 [43] arXiv:1708.04729 (replaced) [pdf, other]

Title: Deconvolutional Paragraph Representation LearningComments: Accepted by NIPS 2017Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
 [44] arXiv:1708.09477 (replaced) [pdf, other]

Title: A Compressive Sensing Approach to Community Detection with ApplicationsComments: 39 pages, 10 figures Version 2, disabled 'showkeys' packageSubjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
 [45] arXiv:1709.00353 (replaced) [pdf, ps, other]

Title: Gaussian approximation of maxima of Wiener functionals and its application to highfrequency dataAuthors: Yuta KoikeComments: 39 pages. Some typos have been corrected. Some proofs have been rearranged. Some results have been slightly improvedSubjects: Statistics Theory (math.ST); Probability (math.PR)
 [46] arXiv:1709.04702 (replaced) [pdf, other]

Title: Trait evolution with jumps: illusionary normalityAuthors: Krzysztof BartoszekComments: this http URL&sprawId=23Journalref: Proceedings of the XXIII National Conference on Applications of Mathematics in Biology and Medicine. 2017, pp. 2328Subjects: Populations and Evolution (qbio.PE); Probability (math.PR); Applications (stat.AP)
 [47] arXiv:1709.07417 (replaced) [pdf, other]

Title: Neural Optimizer Search with Reinforcement LearningComments: ICML 2017 Conference paperSubjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)