# Statistics

## New submissions

[ total of 47 entries: 1-47 ]
[ showing up to 2000 entries per page: fewer | more ]

### New submissions for Mon, 25 Sep 17

[1]
Title: Stability of Spatial Smoothness and Cluster-Size Threshold Estimates in FMRI using AFNI
Subjects: Applications (stat.AP)

In a recent analysis of FMRI datasets [K Mueller et al, Front Hum Neurosci 11:345], the estimated spatial smoothness parameters and the statistical significance of clusters were found to depend strongly on the resampled voxel size (for the same data, over a range of 1 to 3 mm) in one popular FMRI analysis software package (SPM12). High sensitivity of thresholding results on such an arbitrary parameter as final spatial grid size is an undesirable feature in a processing pipeline. Here, we examine the stability of spatial smoothness and cluster-volume threshold estimates with respect to voxel resampling size in the AFNI software package's pipeline. A publicly available collection of resting-state and task FMRI datasets from 78 subjects was analyzed using standard processing steps in AFNI. We found that the spatial smoothness and cluster-volume thresholds are fairly stable over the voxel resampling size range of 1 to 3 mm, in contradistinction to the reported results from SPM12.

[2]
Title: Decision making and uncertainty quantification for individualized treatments
Subjects: Methodology (stat.ME)

Individualized treatment rules (ITR) can improve health outcomes by recognizing that patients may respond differently to treatment and assigning therapy with the most desirable predicted outcome for each individual. Flexible and efficient prediction models are desired as a basis for such ITRs to handle potentially complex interactions between patient factors and treatment. Modern Bayesian semiparametric and nonparametric regression models provide an attractive avenue in this regard as these allow natural posterior uncertainty quantification of patient specific treatment decisions as well as the population wide value of the prediction-based ITR. In addition, via the use of such models, inference is also available for the value of the Optimal ITR. We propose such an approach and implement it using Bayesian Additive Regression Trees (BART) as this model has been shown to perform well in fitting nonparametric regression functions to continuous and binary responses, even with many covariates. It is also computationally efficient for use in practice. With BART we investigate a treatment strategy which utilizes individualized predictions of patient outcomes from BART models. Posterior distributions of patient outcomes under each treatment are used to assign the treatment that maximizes the expected posterior utility. We also describe how to approximate such a treatment policy with a clinically interpretable ITR, and quantify its expected outcome. The proposed method performs very well in extensive simulation studies in comparison with several existing methods. We illustrate the usage of the proposed method to identify an individualized choice of conditioning regimen for patients undergoing hematopoietic cell transplantation and quantify the value of this method of choice in relation to the Optimal ITR as well as non-individualized treatment strategies.

[3]
Title: Achieving Parsimony in Bayesian VARs with the Horseshoe Prior
Subjects: Applications (stat.AP)

In the context of a vector autoregression (VAR) model, or any multivariate regression model, the number of relevant predictors may be small relative to the information set available from which to build a prediction equation. It is well known that forecasts based off of (un-penalized) least squares estimates can overfit the data and lead to poor predictions. Since the Minnesota prior was proposed (Doan et al. (1984)), there have been many methods developed aiming at improving prediction performance. In this paper we propose the horseshoe prior (Carvalho et al. (2010), Carvalho et al. (2009)) in the context of a Bayesian VAR. The horseshoe prior is a unique shrinkage prior scheme in that it shrinks irrelevant signals rigorously to 0 while allowing large signals to remain large and practically unshrunk. In an empirical study, we show that the horseshoe prior competes favorably with shrinkage schemes commonly used in Bayesian VAR models as well as with a prior that imposes true sparsity in the coefficient vector. Additionally, we propose the use of particle Gibbs with backwards simulation (Lindsten et al. (2012), Andrieu et al. (2010)) for the estimation of the time-varying volatility parameters. We provide a detailed description of all MCMC methods used in the supplementary material that is available online.

[4]
Title: Heteroscedastic BART Using Multiplicative Regression Trees
Subjects: Methodology (stat.ME)

Bayesian additive regression trees (BART) has become increasingly popular as a flexible and scalable non-parametric model useful in many modern applied statistics regression problems. It brings many advantages to the practitioner dealing with large and complex non-linear response surfaces, such as a matrix-free formulation and the lack of a requirement to specify a regression basis a priori. However, while flexible in fitting the mean, the basic BART model relies on the standard i.i.d. normal model for the errors. This assumption is unrealistic in many applications. Moreover, in many applied problems understanding the relationship between the variance and predictors can be just as important as that of the mean model. We develop a novel heteroscedastic BART model to alleviate these concerns. Our approach is entirely non-parametric and does not rely on an a priori basis for the variance model. In BART, the conditional mean is modeled as a sum of trees, each of which determines a contribution to the overall mean. In this paper, we model the conditional variance with a product of trees, each of which determines a contribution to the overall variance. We implement the approach and demonstrate it on a simple low-dimensional simulated dataset, a higher-dimensional dataset of used car prices, a fisheries dataset and data from an alcohol consumption study.

[5]
Authors: Kyle Vincent
Subjects: Methodology (stat.ME)

A new approach to estimate population size based on a stratified link-tracing sampling design is presented. The method extends on the Frank and Snijders (1994) approach by allowing for heterogeneity in the initial sample selection procedure. Rao-Blackwell estimators and corresponding resampling approximations similar to that detailed in Vincent and Thompson (2017) are explored. An empirical application is provided for a hard-to-reach networked population. The results demonstrate that the approach has much potential for application to such populations. Supplementary materials for this article are available online.

[6]
Title: A preconditioning approach for improved estimation of sparse polynomial chaos expansions
Subjects: Computation (stat.CO)

Compressive sampling has been widely used for sparse polynomial chaos (PC) approximation of stochastic functions. The recovery accuracy of compressive sampling depends on the coherence properties of measurement matrix. In this paper, we consider preconditioning the measurement matrix. Premultiplying a linear equation system by a non-singular matrix results in an equivalent equation system, but it can impact the coherence properties of preconditioned measurement matrix and lead to a different recovery accuracy. In this work, we propose a preconditioning scheme that significantly improves the coherence properties of measurement matrix, and using theoretical motivations and numerical examples highlight the promise of the proposed approach in improving the accuracy of estimated polynomial chaos expansions.

[7]
Title: Abandon Statistical Significance
Subjects: Methodology (stat.ME)

In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration--often scant--given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.

[8]
Title: The Long Term Fréchet distribution: Estimation, Properties and its Application
Comments: 13 pages, 2 figures, 7 tables
Journal-ref: Biom Biostat Int J 6(3): 00170
Subjects: Statistics Theory (math.ST)

In this paper a new long-term survival distribution is proposed. The so called long term Fr\'echet distribution allows us to fit data where a part of the population is not susceptible to the event of interest. This model may be used, for example, in clinical studies where a portion of the population can be cured during a treatment. It is shown an account of mathematical properties of the new distribution such as its moments and survival properties. As well is presented the maximum likelihood estimators (MLEs) for the parameters. A numerical simulation is carried out in order to verify the performance of the MLEs. Finally, an important application related to the leukemia free-survival times for transplant patients are discussed to illustrates our proposed distribution

[9]
Title: Generalized Bayesian Updating and the Loss-Likelihood Bootstrap
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

In this paper, we revisit the weighted likelihood bootstrap and show that it is well-motivated for Bayesian inference under misspecified models. We extend the underlying idea to a wider family of inferential problems. This allows us to calibrate an analogue of the likelihood function in situations where little is known about the data-generating mechanism. We demonstrate our method on a number of examples.

[10]
Title: Total stability of kernel methods
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Regularized empirical risk minimization using kernels and their corresponding reproducing kernel Hilbert spaces (RKHSs) plays an important role in machine learning. However, the actually used kernel often depends on one or on a few hyperparameters or the kernel is even data dependent in a much more complicated manner. Examples are Gaussian RBF kernels, kernel learning, and hierarchical Gaussian kernels which were recently proposed for deep learning. Therefore, the actually used kernel is often computed by a grid search or in an iterative manner and can often only be considered as an approximation to the "ideal" or "optimal" kernel. The paper gives conditions under which classical kernel based methods based on a convex Lipschitz loss function and on a bounded and smooth kernel are stable, if the probability measure $P$, the regularization parameter $\lambda$, and the kernel $k$ may slightly change in a simultaneous manner. Similar results are also given for pairwise learning. Therefore, the topic of this paper is somewhat more general than in classical robust statistics, where usually only the influence of small perturbations of the probability measure $P$ on the estimated function is considered.

[11]
Title: Hierarchical Kriging for multi-fidelity aero-servo-elastic simulators - Application to extreme loads on wind turbines
Subjects: Computation (stat.CO); Applications (stat.AP)

In the present work, we consider multi-fidelity surrogate modelling to fuse the output of multiple aero-servo-elastic computer simulators of varying complexity. In many instances, predictions from multiple simulators for the same quantity of interest on a wind turbine are available. In this type of situation, there is strong evidence that fusing the output from multiple aero-servo-elastic simulators yields better predictive ability and lower model uncertainty than using any single simulator. Hierarchical Kriging is a multi-fidelity surrogate modelling method in which the Kriging surrogate model of the cheap (low-fidelity) simulator is used as a trend of the Kriging surrogate model of the higher fidelity simulator. We propose a parametric approach to Hierarchical Kriging where the best surrogate models are selected based on evaluating all possible combinations of the available Kriging parameters candidates. The parametric Hierarchical Kriging approach is illustrated by fusing the extreme flapwise bending moment at the blade root of a large multi-megawatt wind turbine as a function of wind velocity, turbulence and wind shear exponent in the presence of model uncertainty and heterogeneously noisy output. The extreme responses are obtained by two widely accepted wind turbine specific aero-servo-elastic computer simulators, FAST and Bladed. With limited high-fidelity simulations, Hierarchical Kriging produces more accurate predictions of validation data compared to conventional Kriging. In addition, contrary to conventional Kriging, Hierarchical Kriging is shown to be a robust surrogate modelling technique because it is less sensitive to the choice of the Kriging parameters and the choice of the estimation error.

[12]
Title: Approximate Bayesian Inference in Linear State Space Models for Intermittent Demand Forecasting at Scale
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We present a scalable and robust Bayesian inference method for linear state space models. The method is applied to demand forecasting in the context of a large e-commerce platform, paying special attention to intermittent and bursty target statistics. Inference is approximated by the Newton-Raphson algorithm, reduced to linear-time Kalman smoothing, which allows us to operate on several orders of magnitude larger problems than previous related work. In a study on large real-world sales datasets, our method outperforms competing approaches on fast and medium moving items.

[13]
Title: Estimating the maximum possible earthquake magnitude using extreme value methodology: the Groningen case
Subjects: Applications (stat.AP); Geophysics (physics.geo-ph)

The area-characteristic, maximum possible earthquake magnitude $T_M$ is required by the earthquake engineering community, disaster management agencies and the insurance industry. The Gutenberg-Richter law predicts that earthquake magnitudes $M$ follow a truncated exponential distribution. In the geophysical literature several estimation procedures were proposed, see for instance Kijko and Singh (Acta Geophys., 2011) and the references therein. Estimation of $T_M$ is of course an extreme value problem to which the classical methods for endpoint estimation could be applied. We argue that recent methods on truncated tails at high levels (Beirlant et al., Extremes, 2016; Electron. J. Stat., 2017) constitute a more appropriate setting for this estimation problem. We present upper confidence bounds to quantify uncertainty of the point estimates. We also compare methods from the extreme value and geophysical literature through simulations. Finally, the different methods are applied to the magnitude data for the earthquakes induced by gas extraction in the Groningen province of the Netherlands.

[14]
Title: Barker's algorithm for Bayesian inference with intractable likelihoods
Comments: To appear in the Brazilian Journal of Probability and Statistics
Subjects: Computation (stat.CO)

In this expository paper we abstract and describe a simple MCMC scheme for sampling from intractable target densities. The approach has been introduced in Gon\c{c}alves et al. (2017a) in the specific context of jump-diffusions, and is based on the Barker's algorithm paired with a simple Bernoulli factory type scheme, the so called 2-coin algorithm. In many settings it is an alternative to standard Metropolis-Hastings pseudo-marginal method for simulating from intractable target densities. Although Barker's is well-known to be slightly less efficient than Metropolis-Hastings, the key advantage of our approach is that it allows to implement the "marginal Barker's" instead of the extended state space pseudo-marginal Metropolis-Hastings, owing to the special form of the accept/reject probability. We shall illustrate our methodology in the context of Bayesian inference for discretely observed Wright-Fisher family of diffusions.

[15]
Title: Testing covariate significance in spatial point process first-order intensity
Comments: 22 pages (15 main doc + 7 appendix); 8 figures; 3 tables
Subjects: Methodology (stat.ME); Applications (stat.AP)

Modelling the first-order intensity function in one of the main aims in point process theory, and it has been approached so far from different perspectives. One appealing model describes the intensity as a function of a spatial covariate. In the recent literature, estimation theory and several applications have been developed assuming this hypothesis, but without formally checking the goodness-of-fit of the model.
In this paper we address this problem and test whether the model is appropriate. We propose a test statistic based on a $L^2$-distance; we prove the asymptotic normality of the statistic and suggest a bootstrap procedure to calibrate the test. We present two applications with real data and a simulation study to better understand the performance of our proposals.

[16]
Title: Estimate Exchange over Network is Good for Distributed Hard Thresholding Pursuit
Subjects: Machine Learning (stat.ML); Signal Processing (eess.SP)

We investigate an existing distributed algorithm for learning sparse signals or data over networks. The algorithm is iterative and exchanges intermediate estimates of a sparse signal over a network. This learning strategy using exchange of intermediate estimates over the network requires a limited communication overhead for information transmission. Our objective in this article is to show that the strategy is good for learning in spite of limited communication. In pursuit of this objective, we first provide a restricted isometry property (RIP)-based theoretical analysis on convergence of the iterative algorithm. Then, using simulations, we show that the algorithm provides competitive performance in learning sparse signals vis-a-vis an existing alternate distributed algorithm. The alternate distributed algorithm exchanges more information including observations and system parameters.

[17]
Title: Bernstein - von Mises theorems for statistical inverse problems II: Compound Poisson processes
Subjects: Statistics Theory (math.ST)

We study nonparametric Bayesian statistical inference for the parameters governing a pure jump process of the form $$Y_t = \sum_{k=1}^{N(t)} Z_k,~~~ t \ge 0,$$ where $N(t)$ is a standard Poisson process of intensity $\lambda$, and $Z_k$ are drawn i.i.d. from jump measure $\mu$. A high-dimensional wavelet series prior for the L\'evy measure $\nu = \lambda \mu$ is devised and the posterior distribution arises from observing discrete samples $Y_\Delta, Y_{2\Delta}, \dots, Y_{n\Delta}$ at fixed observation distance $\Delta$, giving rise to a nonlinear inverse inference problem. We derive contraction rates in uniform norm for the posterior distribution around the true L\'evy density that are optimal up to logarithmic factors over H\"older classes, as sample size $n$ increases. We prove a functional Bernstein-von Mises theorem for the distribution functions of both $\mu$ and $\nu$, as well as for the intensity $\lambda$, establishing the fact that the posterior distribution is approximated by an infinite-dimensional Gaussian measure whose covariance structure is shown to attain the Cram\'er-Rao lower bound for this inverse problem. As a consequence posterior based inferences, such as nonparametric credible sets, are asymptotically valid and optimal from a frequentist point of view.

[18]
Title: On predictive density estimation with additional information
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

Based on independently distributed $X_1 \sim N_p(\theta_1, \sigma^2_1 I_p)$ and $X_2 \sim N_p(\theta_2, \sigma^2_2 I_p)$, we consider the efficiency of various predictive density estimators for $Y_1 \sim N_p(\theta_1, \sigma^2_Y I_p)$, with the additional information $\theta_1 - \theta_2 \in A$ and known $\sigma^2_1, \sigma^2_2, \sigma^2_Y$. We provide improvements on benchmark predictive densities such as plug-in, the maximum likelihood, and the minimum risk equivariant predictive densities. Dominance results are obtained for $\alpha-$divergence losses and include Bayesian improvements for reverse Kullback-Leibler loss, and Kullback-Leibler (KL) loss in the univariate case ($p=1$). An ensemble of techniques are exploited, including variance expansion (for KL loss), point estimation duality, and concave inequalities. Representations for Bayesian predictive densities, and in particular for $\hat{q}_{\pi_{U,A}}$ associated with a uniform prior for $\theta=(\theta_1, \theta_2)$ truncated to $\{\theta \in \mathbb{R}^{2p}: \theta_1 - \theta_2 \in A \}$, are established and are used for the Bayesian dominance findings. Finally and interestingly, these Bayesian predictive densities also relate to skew-normal distributions, as well as new forms of such distributions.

[19]
Title: The GENIUS Approach to Robust Mendelian Randomization Inference
Subjects: Methodology (stat.ME)

Mendelian randomization (MR) is a popular instrumental variable (IV) approach, in which one or several genetic markers serve as IVs that can be leveraged to recover under certain conditions, valid inferences about a given exposure-outcome causal association subject to unmeasured confounding. A key IV identification condition known as the exclusion restriction states that the IV has no direct effect on the outcome that is not mediated by the exposure in view. In MR studies, such an assumption requires an unrealistic level of knowledge and understanding of the mechanism by which the genetic markers causally affect the outcome, particularly when a large number of genetic variants are considered as IVs. As a result, possible violation of the exclusion restriction can seldom be ruled out in such MR studies, and if present, such violation can invalidate IV-based inferences even if unbeknownst to the analyst, confounding is either negligible or absent. To address this concern, we introduce a new class of IV estimators which are robust to violation of the exclusion restriction under a large collection of data generating mechanisms consistent with parametric models commonly assumed in the MR literature. Our approach which we have named "MR G-Estimation under No Interaction with Unmeasured Selection" (MR GENIUS) may in fact be viewed as a modification to Robins' G-estimation approach that is robust to both additive unmeasured confounding and violation of the exclusion restriction assumption. We also give fairly weak conditions under which MR GENIUS is also robust to unmeasured confounding of the IV-outcome relation, another possible violation of a key IV Identification condition.

[20]
Title: On overfitting and asymptotic bias in batch reinforcement learning with partial observability
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

This paper stands in the context of reinforcement learning with partial observability and limited data. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data), and theoretically show that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. Our analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations. Finally, we also discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting.

[21]
Title: A multivariate zero-inflated logistic model for microbiome relative abundance data
Subjects: Applications (stat.AP)

The human microbiome plays critical roles in human health and has been linked to many diseases. While advanced sequencing technologies can characterize the composition of the microbiome in unprecedented detail, it remains challenging to disentangle the complex interplay between human microbiome and disease risk factors due to the complicated nature of microbiome data. Excessive number of zero values, high dimensionality, the hierarchical phylogenetic tree and compositional structure are compounded and consequently make existing methods inadequate to appropriately address these issues. We propose a multivariate two-part model, zero-inflated logistic normal (ZILN) model to analyze the association of disease risk factors with individual microbial taxa and overall microbial community composition. This approach can naturally handle excessive numbers of zeros and the compositional data structure with the zero part and the logistic-normal part of the model. For parameter estimation, an estimating equations approach is employed and enables us to address the complex inter-taxa correlation structure induced by the hierarchical phylogenetic tree structure and the compositional data structure. This model is able to incorporate standard regularization approaches to deal with high dimensionality. Simulation shows that our model outperforms existing methods. Performance of our approach is also demonstrated through the application of the model in a real data set.

[22]
Title: Bayesian Optimization for Parameter Tuning of the XOR Neural Network
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

When applying Machine Learning techniques to problems, one must select model parameters to ensure that the system converges but also does not become stuck at the objective function's local minimum. Tuning these parameters becomes a non-trivial task for large models and it is not always apparent if the user has found the optimal parameters. We aim to automate the process of tuning a Neural Network, (where only a limited number of parameter search attempts are available) by implementing Bayesian Optimization. In particular, by assigning Gaussian Process Priors to the parameter space, we utilize Bayesian Optimization to tune an Artificial Neural Network used to learn the XOR function, with the result of achieving higher prediction accuracy.

### Cross-lists for Mon, 25 Sep 17

[23]  arXiv:1709.06917 (cross-list from cs.RO) [pdf, other]
Title: Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
Comments: 8 pages, 4 figures, 2 algorithms, 1 table; Video at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the "pendubot" swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.

[24]  arXiv:1709.06919 (cross-list from cs.RO) [pdf, other]
Title: Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search
Comments: 8 pages, 4 figures, 1 algorithm; Video at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors.

[25]  arXiv:1709.07446 (cross-list from q-fin.MF) [pdf, ps, other]
Title: Arbitrage and Geometry
Subjects: Mathematical Finance (q-fin.MF); Statistics Theory (math.ST)

This article introduces the notion of arbitrage for a situation involving a collection of investments and a payoff matrix describing the return to an investor of each investment under each of a set of possible scenarios. We explain the Arbitrage Theorem, discuss its geometric meaning, and show its equivalence to Farkas' Lemma. We then ask a seemingly innocent question: given a random payoff matrix, what is the probability of an arbitrage opportunity? This question leads to some interesting geometry involving hyperplane arrangements and related topics.

[26]  arXiv:1709.07534 (cross-list from cs.AI) [pdf, other]
Title: MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings
Comments: Published in ECML-PKDD 2017 (Applied Data Science Track)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

E-commerce websites such as Amazon, Alibaba, Flipkart, and Walmart sell billions of products. Machine learning (ML) algorithms involving products are often used to improve the customer experience and increase revenue, e.g., product similarity, recommendation, and price estimation. The products are required to be represented as features before training an ML algorithm. In this paper, we propose an approach called MRNet-Product2Vec for creating generic embeddings of products within an e-commerce ecosystem. We learn a dense and low-dimensional embedding where a diverse set of signals related to a product are explicitly injected into its representation. We train a Discriminative Multi-task Bidirectional Recurrent Neural Network (RNN), where the input is a product title fed through a Bidirectional RNN and at the output, product labels corresponding to fifteen different tasks are predicted. The task set includes several intrinsic characteristics about a product such as price, weight, size, color, popularity, and material. We evaluate the proposed embedding quantitatively and qualitatively. We demonstrate that they are almost as good as sparse and extremely high-dimensional TF-IDF representation in spite of having less than 3% of the TF-IDF dimension. We also use a multimodal autoencoder for comparing products from different language-regions and show preliminary yet promising qualitative results.

[27]  arXiv:1709.07601 (cross-list from cs.DS) [pdf, ps, other]
Title: Stochastic Input Models in Online Computing
Authors: Yasushi Kawase
Subjects: Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)

In this paper, we study twelve stochastic input models for online problems and reveal the relationships among the competitive ratios for the models. The competitive ratio is defined as the worst ratio between the expected optimal value and the expected profit of the solution obtained by the online algorithm where the input distribution is restricted according to the model. To handle a broad class of online problems, we use a framework called request-answer games that is introduced by Ben-David et al. The stochastic input models consist of two types: known distribution and unknown distribution. For each type, we consider six classes of distributions: dependent distributions, deterministic input, independent distributions, identical independent distribution, random order of a deterministic input, and random order of independent distributions. As an application of the models, we consider two basic online problems, which are variants of the secretary problem and the prophet inequality problem, under the twelve stochastic input models. We see the difference of the competitive ratios through these problems.

[28]  arXiv:1709.07808 (cross-list from quant-ph) [pdf, other]
Title: Quantum Memristors in Quantum Photonics
Subjects: Quantum Physics (quant-ph); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We propose a method to build quantum memristors in quantum photonic platforms. We firstly design an effective beam splitter, which is tunable in real-time, by means of a Mach-Zehnder-type array with two equal 50:50 beam splitters and a tunable retarder, which allows us to control its reflectivity. Then, we show that this tunable beam splitter, when equipped with weak measurements and classical feedback, behaves as a quantum memristor. Indeed, in order to prove its quantumness, we show how to codify quantum information in the coherent beams. Moreover, we estimate the memory capability of the quantum memristor. Finally, we show the feasibility of the proposed setup in integrated quantum photonics.

[29]  arXiv:1709.07848 (cross-list from quant-ph) [pdf, other]
Title: Generalized Quantum Reinforcement Learning with Quantum Technologies
Subjects: Quantum Physics (quant-ph); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose a protocol to perform generalized quantum reinforcement learning with quantum technologies. At variance with recent results on quantum reinforcement learning with superconducting circuits [L. Lamata, Sci. Rep. 7, 1609 (2017)], in our current protocol coherent feedback during the learning process is not required, enabling its implementation in a wide variety of quantum systems. We consider diverse possible scenarios for an agent, an environment, and a register that connects them, involving multiqubit and multilevel systems, as well as open-system dynamics. We finally propose possible implementations of this protocol in trapped ions and superconducting circuits. The field of quantum reinforcement learning with quantum technologies will enable enhanced quantum control, as well as more efficient machine learning calculations.

[30]  arXiv:1709.07871 (cross-list from cs.CV) [pdf, other]
Title: FiLM: Visual Reasoning with a General Conditioning Layer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

### Replacements for Mon, 25 Sep 17

[31]  arXiv:1605.02408 (replaced) [pdf, ps, other]
Title: Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity Analysis
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
[32]  arXiv:1606.05771 (replaced) [pdf, other]
Title: Brief Report on Estimating Regularized Gaussian Networks from Continuous and Ordinal Data
Authors: Sacha Epskamp
Subjects: Methodology (stat.ME); Applications (stat.AP)
[33]  arXiv:1609.07958 (replaced) [pdf, other]
Title: Binary Hypothesis Testing via Measure Transformed Quasi Likelihood Ratio Test
Comments: Important notice - The paper: N. Halay and K. Todros, "Plug-in measure-transformed quasi likelihood ratio test for random signal detection," IEEE Signal Processing Letters, vol. 24, no. 6, pp. 838-842, Jun. 2017, refers to the first arxiv version of this article this https URL
Subjects: Methodology (stat.ME)
[34]  arXiv:1610.09572 (replaced) [pdf, ps, other]
Title: Density Tracking by Quadrature for Stochastic Differential Equations
Subjects: Computation (stat.CO); Numerical Analysis (math.NA); Probability (math.PR)
[35]  arXiv:1703.00144 (replaced) [pdf, other]
Title: Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[36]  arXiv:1703.00864 (replaced) [pdf, other]
Title: The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings
Subjects: Machine Learning (stat.ML); Computation (stat.CO)
[37]  arXiv:1703.02379 (replaced) [pdf, other]
Title: Global Weisfeiler-Lehman Graph Kernels
Comments: 10 pages, accepted at IEEE ICDM 2017 ("Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs")
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[38]  arXiv:1704.00666 (replaced) [pdf, other]
Title: Asymptotic causal inference with observational studies trimmed by the estimated propensity scores
Authors: Shu Yang, Peng Ding
Comments: 21 pages, 1 figures and 3 tables
Subjects: Methodology (stat.ME)
[39]  arXiv:1704.04222 (replaced) [pdf, other]
Title: Learning Latent Representations for Speech Generation and Transformation
Journal-ref: Interspeech 2017, pp 1273-1277
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
[40]  arXiv:1705.10261 (replaced) [pdf, other]
Title: Sparse Maximum-Entropy Random Graphs with a Given Power-Law Degree Distribution
Subjects: Probability (math.PR); Statistical Mechanics (cond-mat.stat-mech); Social and Information Networks (cs.SI); Statistics Theory (math.ST); Physics and Society (physics.soc-ph)
[41]  arXiv:1706.03475 (replaced) [pdf, other]
Title: Confident Multiple Choice Learning
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[42]  arXiv:1708.02107 (replaced) [pdf, other]
Title: Adaptive Estimation of Nonparametric Geometric Graphs
Comments: 39 pages, 4 figures; real data experiment (Gr\'evy's zebras in Kenya) added
Subjects: Statistics Theory (math.ST); Probability (math.PR)
[43]  arXiv:1708.04729 (replaced) [pdf, other]
Title: Deconvolutional Paragraph Representation Learning
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
[44]  arXiv:1708.09477 (replaced) [pdf, other]
Title: A Compressive Sensing Approach to Community Detection with Applications
Comments: 39 pages, 10 figures Version 2, disabled 'showkeys' package
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
[45]  arXiv:1709.00353 (replaced) [pdf, ps, other]
Title: Gaussian approximation of maxima of Wiener functionals and its application to high-frequency data
Authors: Yuta Koike
Comments: 39 pages. Some typos have been corrected. Some proofs have been rearranged. Some results have been slightly improved
Subjects: Statistics Theory (math.ST); Probability (math.PR)
[46]  arXiv:1709.04702 (replaced) [pdf, other]
Title: Trait evolution with jumps: illusionary normality
Journal-ref: Proceedings of the XXIII National Conference on Applications of Mathematics in Biology and Medicine. 2017, pp. 23-28
Subjects: Populations and Evolution (q-bio.PE); Probability (math.PR); Applications (stat.AP)
[47]  arXiv:1709.07417 (replaced) [pdf, other]
Title: Neural Optimizer Search with Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
[ total of 47 entries: 1-47 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)