We gratefully acknowledge support from
the Simons Foundation
and member institutions

Machine Learning

New submissions

[ total of 66 entries: 1-66 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 17 Jul 18

[1]  arXiv:1807.05292 [pdf, other]
Title: Neural Networks Regularization Through Representation Learning
Comments: 196 pages, 44 figures, 489 references, INSA Rouen Normandie, Normandie Universit\'e
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Neural network models and deep models are one of the leading and state of the art models in machine learning. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such models requires a large number of training samples which is not always available. One of the fundamental issues in neural networks is overfitting which is the issue tackled in this thesis. Such problem often occurs when the training of large models is performed using few training samples. Many approaches have been proposed to prevent the network from overfitting and improve its generalization performance such as data augmentation, early stopping, parameters sharing, unsupervised learning, dropout, batch normalization, etc.
In this thesis, we tackle the neural network overfitting issue from a representation learning perspective by considering the situation where few training samples are available which is the case of many real world applications. We propose three contributions. The first one presented in chapter 2 is dedicated to dealing with structured output problems to perform multivariate regression when the output variable y contains structural dependencies between its components. The second contribution described in chapter 3 deals with the classification task where we propose to exploit prior knowledge about the internal representation of the hidden layers in neural networks. Our last contribution presented in chapter 4 showed the interest of transfer learning in applications where only few samples are available. In this contribution, we provide an automatic system based on such learning scheme with an application to medical domain. In this application, the task consists in localizing the third lumbar vertebra in a 3D CT scan. This work has been done in collaboration with the clinic Rouen Henri Becquerel Center who provided us with data.

[2]  arXiv:1807.05306 [pdf, other]
Title: Generative Adversarial Privacy
Comments: A preliminary version of this work was presented at the Privacy in Machine Learning and Artificial Intelligence Workshop, ICML 2018. arXiv admin note: text overlap with arXiv:1710.09549
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Machine Learning (stat.ML)

We present a data-driven framework called generative adversarial privacy (GAP). Inspired by recent advancements in generative adversarial networks (GANs), GAP allows the data holder to learn the privatization mechanism directly from the data. Under GAP, finding the optimal privacy mechanism is formulated as a constrained minimax game between a privatizer and an adversary. We show that for appropriately chosen adversarial loss functions, GAP provides privacy guarantees against strong information-theoretic adversaries. We also evaluate the performance of GAP on multi-dimensional Gaussian mixture models and the GENKI face database.

[3]  arXiv:1807.05307 [pdf, other]
Title: How Do Classifiers Induce Agents To Invest Effort Strategically?
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)

Machine learning is often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort in order to change the outcomes they receive, and we give a tight characterization of when such agents can be incentivized to invest specified forms of effort into improving their outcomes as opposed to "gaming" the classifier. We show that whenever any "reasonable" mechanism can do so, a simple linear mechanism suffices.

[4]  arXiv:1807.05317 [pdf]
Title: LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks
Comments: To be published in FPGA for Software Programmers (FSP 2018)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Recent work has shown that Field-Programmable Gate Arrays (FPGAs) play an important role in the acceleration of Machine Learning applications. Initial specification of machine learning applications are often done using a high-level Python-oriented framework such as Tensorflow, followed by a manual translation to either C or RTL for synthesis using vendor tools. This manual translation step is time-consuming and requires expertise that limit the applicability of FPGAs in this important domain. In this paper, we present an open-source tool-flow that maps numerical computation models written in Tensorflow to synthesizable hardware. Unlike other tools, which are often constrained by a small number of inflexible templates, our flow uses Google's XLA compiler which emits LLVM code directly from a Tensorflow specification. This LLVM code can then be used with a high-level synthesis tool to automatically generate hardware. We show that our flow allows users to generate Deep Neural Networks with very few lines of Python code.

[5]  arXiv:1807.05328 [pdf, other]
Title: On the Acceleration of L-BFGS with Second-Order Information and Stochastic Batches
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

This paper proposes a framework of L-BFGS based on the (approximate) second-order information with stochastic batches, as a novel approach to the finite-sum minimization problems. Different from the classical L-BFGS where stochastic batches lead to instability, we use a smooth estimate for the evaluations of the gradient differences while achieving acceleration by well-scaling the initial Hessians. We provide theoretical analyses for both convex and nonconvex cases. In addition, we demonstrate that within the popular applications of least-square and cross-entropy losses, the algorithm admits a simple implementation in the distributed environment. Numerical experiments support the efficiency of our algorithms.

[6]  arXiv:1807.05343 [pdf, ps, other]
Title: Generalization in quasi-periodic environments
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

By and large the behavior of stochastic gradient is regarded as a challenging problem, and it is often presented in the framework of statistical machine learning. This paper offers a novel view on the analysis of on-line models of learning that arises when dealing with a generalized version of stochastic gradient that is based on dissipative dynamics. In order to face the complex evolution of these models, a systematic treatment is proposed which is based on energy balance equations that are derived by means of the Caldirola-Kanai (CK) Hamiltonian. According to these equations, learning can be regarded as an ordering process which corresponds with the decrement of the loss function. Finally, the main results established in this paper is that in the case of quasi-periodic environments, where the pattern novelty is progressively limited as time goes by, the system dynamics yields an asymptotically consistent solution in the weight space, that is the solution maps similar patterns to the same decision.

[7]  arXiv:1807.05351 [pdf, other]
Title: ML-Schema: Exposing the Semantics of Machine Learning with Schemas and Ontologies
Comments: Poster, selected for the 2nd Reproducibility in Machine Learning Workshop at ICML 2018, Stockholm, Sweden
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (stat.ML)

The ML-Schema, proposed by the W3C Machine Learning Schema Community Group, is a top-level ontology that provides a set of classes, properties, and restrictions for representing and interchanging information on machine learning algorithms, datasets, and experiments. It can be easily extended and specialized and it is also mapped to other more domain-specific ontologies developed in the area of machine learning and data mining. In this paper we overview existing state-of-the-art machine learning interchange formats and present the first release of ML-Schema, a canonical format resulted of more than seven years of experience among different research institutions. We argue that exposing semantics of machine learning algorithms, models, and experiments through a canonical format may pave the way to better interpretability and to realistically achieve the full interoperability of experiments regardless of platform or adopted workflow solution.

[8]  arXiv:1807.05459 [pdf]
Title: Multi-time-horizon Solar Forecasting Using Recurrent Neural Network
Comments: Accepted at: IEEE Energy Conversion Congress and Exposition (ECCE 2018), 7 pages, 5 figures, code available: sakshi-mishra.github.io
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

The non-stationarity characteristic of the solar power renders traditional point forecasting methods to be less useful due to large prediction errors. This results in increased uncertainties in the grid operation, thereby negatively affecting the reliability and increased cost of operation. This research paper proposes a unified architecture for multi-time-horizon predictions for short and long-term solar forecasting using Recurrent Neural Networks (RNN). The paper describes an end-to-end pipeline to implement the architecture along with the methods to test and validate the performance of the prediction model. The results demonstrate that the proposed method based on the unified architecture is effective for multi-horizon solar forecasting and achieves a lower root-mean-squared prediction error compared to the previous best-performing methods which use one model for each time-horizon. The proposed method enables multi-horizon forecasts with real-time inputs, which have a high potential for practical applications in the evolving smart grid.

[9]  arXiv:1807.05464 [pdf, other]
Title: Tractable Querying and Learning in Hybrid Domains via Sum-Product Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Probabilistic representations, such as Bayesian and Markov networks, are fundamental to much of statistical machine learning. Thus, learning probabilistic representations directly from data is a deep challenge, the main computational bottleneck being inference that is intractable. Tractable learning is a powerful new paradigm that attempts to learn distributions that support efficient probabilistic querying. By leveraging local structure, representations such as sum-product networks (SPNs) can capture high tree-width models with many hidden layers, essentially a deep architecture, while still admitting a range of probabilistic queries to be computable in time polynomial in the network size. The leaf nodes in SPNs, from which more intricate mixtures are formed, are tractable univariate distributions, and so the literature has focused on Bernoulli and Gaussian random variables. This is clearly a restriction for handling mixed discrete-continuous data, especially if the continuous features are generated from non-parametric and non-Gaussian distribution families. In this work, we present a framework that systematically integrates SPN structure learning with weighted model integration, a recently introduced computational abstraction for performing inference in hybrid domains, by means of piecewise polynomial approximations of density functions of arbitrary shape. Our framework is instantiated by exploiting the notion of propositional abstractions, thus minimally interfering with the SPN structure learning module, and supports a powerful query interface for conditioning on interval constraints. Our empirical results show that our approach is effective, and allows a study of the trade off between the granularity of the learned model and its predictive power.

[10]  arXiv:1807.05490 [pdf]
Title: Semi-Supervised Feature Learning for Off-Line Writer Identifications
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Conventional approaches used supervised learning to estimate off-line writer identifications. In this study, we improved the off-line writer identifica- tions by semi-supervised feature learning pipeline, which trained the extra unla- beled data and the original labeled data simultaneously. In specific, we proposed a weighted label smoothing regularization (WLSR) method, which assigned the weighted uniform label distribution to the extra unlabeled data. We regularized the convolutional neural network (CNN) baseline, which allows learning more discriminative features to represent the properties of different writing styles. Based on experiments on ICDAR2013, CVL and IAM benchmark datasets, our results showed that semi-supervised feature learning improved the baseline meas- urement and achieved better performance compared with existing writer identifications approaches.

[11]  arXiv:1807.05515 [pdf, other]
Title: Magnitude Bounded Matrix Factorisation for Recommender Systems
Comments: 11 pages, 6 figures, TNNLS
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Low rank matrix factorisation is often used in recommender systems as a way of extracting latent features. When dealing with large and sparse datasets, traditional recommendation algorithms face the problem of acquiring large, unrestrained, fluctuating values over predictions especially for users/items with very few corresponding observations. Although the problem has been somewhat solved by imposing bounding constraints over its objectives, and/or over all entries to be within a fixed range, in terms of gaining better recommendations, these approaches have two major shortcomings that we aim to mitigate in this work: one is they can only deal with one pair of fixed bounds for all entries, and the other one is they are very time-consuming when applied on large scale recommender systems. In this paper, we propose a novel algorithm named Magnitude Bounded Matrix Factorisation (MBMF), which allows different bounds for individual users/items and performs very fast on large scale datasets. The key idea of our algorithm is to construct a model by constraining the magnitudes of each individual user/item feature vector. We achieve this by converting from the Cartesian to Spherical coordinate system with radii set as the corresponding magnitudes, which allows the above constrained optimisation problem to become an unconstrained one. The Stochastic Gradient Descent (SGD) method is then applied to solve the unconstrained task efficiently. Experiments on synthetic and real datasets demonstrate that in most cases the proposed MBMF is superior over all existing algorithms in terms of accuracy and time complexity.

[12]  arXiv:1807.05527 [pdf, other]
Title: Learning Probabilistic Logic Programs in Continuous Domains
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The field of statistical relational learning aims at unifying logic and probability to reason and learn from data. Perhaps the most successful paradigm in the field is probabilistic logic programming: the enabling of stochastic primitives in logic programming, which is now increasingly seen to provide a declarative background to complex machine learning applications. While many systems offer inference capabilities, the more significant challenge is that of learning meaningful and interpretable symbolic representations from data. In that regard, inductive logic programming and related techniques have paved much of the way for the last few decades.
Unfortunately, a major limitation of this exciting landscape is that much of the work is limited to finite-domain discrete probability distributions. Recently, a handful of systems have been extended to represent and perform inference with continuous distributions. The problem, of course, is that classical solutions for inference are either restricted to well-known parametric families (e.g., Gaussians) or resort to sampling strategies that provide correct answers only in the limit. When it comes to learning, moreover, inducing representations remains entirely open, other than "data-fitting" solutions that force-fit points to aforementioned parametric families.
In this paper, we take the first steps towards inducing probabilistic logic programs for continuous and mixed discrete-continuous data, without being pigeon-holed to a fixed set of distribution families. Our key insight is to leverage techniques from piecewise polynomial function approximation theory, yielding a principled way to learn and compositionally construct density functions. We test the framework and discuss the learned representations.

[13]  arXiv:1807.05597 [pdf, other]
Title: Deep Learning for Semantic Segmentation on Minimal Hardware
Comments: 12 pages, 5 figures, RoboCup International Symposium 2018
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)

Deep learning has revolutionised many fields, but it is still challenging to transfer its success to small mobile robots with minimal hardware. Specifically, some work has been done to this effect in the RoboCup humanoid football domain, but results that are performant and efficient and still generally applicable outside of this domain are lacking. We propose an approach conceptually different from those taken previously. It is based on semantic segmentation and does achieve these desired properties. In detail, it is being able to process full VGA images in real-time on a low-power mobile processor. It can further handle multiple image dimensions without retraining, it does not require specific domain knowledge for achieving a high frame rate and it is applicable on a minimal mobile hardware.

[14]  arXiv:1807.05650 [pdf, other]
Title: Time Series Deinterleaving of DNS Traffic
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Stream deinterleaving is an important problem with various applications in the cybersecurity domain. In this paper, we consider the specific problem of deinterleaving DNS data streams using machine-learning techniques, with the objective of automating the extraction of malware domain sequences. We first develop a generative model for user request generation and DNS stream interleaving. Based on these we evaluate various inference strategies for deinterleaving including augmented HMMs and LSTMs on synthetic datasets. Our results demonstrate that state-of-the-art LSTMs outperform more traditional augmented HMMs in this application domain.

[15]  arXiv:1807.05666 [pdf, other]
Title: Scene Learning: Deep Convolutional Networks For Wind Power Prediction by Embedding Turbines into Grid Space
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Wind power prediction is of vital importance in wind power utilization. There have been a lot of researches based on the time series of the wind power or speed, but In fact, these time series cannot express the temporal and spatial changes of wind, which fundamentally hinders the advance of wind power prediction. In this paper, a new kind of feature that can describe the process of temporal and spatial variation is proposed, namely, Spatio-Temporal Features. We first map the data collected at each moment from the wind turbine to the plane to form the state map, namely, the scene, according to the relative positions. The scene time series over a period of time is a multi-channel image, i.e. the Spatio-Temporal Features. Based on the Spatio-Temporal Features, the deep convolutional network is applied to predict the wind power, achieving a far better accuracy than the existing methods. Compared with the starge-of-the-art method, the mean-square error (MSE) in our method is reduced by 49.83%, and the average time cost for training models can be shortened by a factor of more than 150.

[16]  arXiv:1807.05726 [pdf, other]
Title: Backward Reduction of CNN Models with Information Flow Analysis
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

This paper proposes backward reduction, an algorithm that explores the compact CNN design from the information flow perspective. This algorithm can remove substantial non-zero weighting parameters (redundant neural channels) by considering the network dynamic behavior, which the traditional model compaction techniques cannot achieve, to reduce the size of a model. With the aid of our proposed algorithm, we achieve significant model reduction results of ResNet-34 in ImageNet scale (32.3% reduction), which is 3X better than the state-of-the-art result (10.8%). Even for highly optimized models like SqueezeNet and MobileNet, we still achieve additional 10.81% and 37.56% reduction, respectively, with negligible performance degradation.

[17]  arXiv:1807.05800 [pdf, other]
Title: Anomaly Machine Component Detection by Deep Generative Model with Unregularized Score
Comments: The 2018 International Joint Conference on Neural Networks (IJCNN2018)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

One of the most common needs in manufacturing plants is rejecting products not coincident with the standards as anomalies. Accurate and automatic anomaly detection improves product reliability and reduces inspection cost. Probabilistic models have been employed to detect test samples with lower likelihoods as anomalies in unsupervised manner. Recently, a probabilistic model called deep generative model (DGM) has been proposed for end-to-end modeling of natural images and already achieved a certain success. However, anomaly detection of machine components with complicated structures is still challenging because they produce a wide variety of normal image patches with low likelihoods. For overcoming this difficulty, we propose unregularized score for the DGM. As its name implies, the unregularized score is the anomaly score of the DGM without the regularization terms. The unregularized score is robust to the inherent complexity of a sample and has a smaller risk of rejecting a sample appearing less frequently but being coincident with the standards.

[18]  arXiv:1807.05827 [pdf, other]
Title: Remember and Forget for Experience Replay
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Experience replay (ER) is crucial for attaining high data-efficiency in off-policy deep reinforcement learning (RL). ER entails the recall of experiences obtained in past iterations to compute gradient estimates for the current policy. However, the accuracy of such updates may deteriorate when the policy diverges from past behaviors. Remedies that aim to abate policy changes, such as target networks and hyper-parameter tuning, do not prevent the policy from becoming disconnected from past experiences, possibly undermining the effectiveness of ER. We introduce an algorithm that relies on systematic Remembering and Forgetting for ER (ReF-ER). In ReF-ER the RL agents forget experiences that would be too unlikely with the current policy and constrain policy changes within a trust region of past behaviors in the replay memory. We show that ReF-ER improves the reliability and performance of off-policy RL, both in the deterministic and in the stochastic policy gradients settings. Finally, we complement ReF-ER with a novel off-policy actor-critic algorithm (RACER) for continuous-action control problems. RACER employs a computationally efficient closed-form approximation of on-policy action values and is shown to be highly competitive with state-of-the-art algorithms on benchmark problems, while being robust to large hyper-parameter variations.

[19]  arXiv:1807.05832 [pdf, other]
Title: Manifold Adversarial Learning
Comments: 10 pages, 4 figures, under review in NIPS 2018
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The recently proposed adversarial training methods show the robustness to both adversarial and original examples and achieve state-of-the-art results in supervised and semi-supervised learning. All the existing adversarial training methods con- sider only how the worst perturbed examples (i.e., adversarial examples) could affect the model output. Despite their success, we argue that such setting may be in lack of generalization, since the output space (or label space) is apparently less informative. In this paper, we propose a novel method, called Manifold Adver- sarial Training (MAT). MAT manages to build an adversarial framework based on how the worst perturbation could affect the distributional manifold rather than the output space. Particularly, a latent data space with the Gaussian Mixture Model (GMM) will be first derived. On one hand, MAT tries to perturb the input samples in the way that would rough the distributional manifold the worst. On the other hand, the deep learning model is trained trying to promote in the latent space the manifold smoothness, measured by the variation of Gaussian mixtures (given the local perturbation around the data point). Importantly, since the latent space is more informative than the output space, the proposed MAT can learn better a ro- bust and compact data representation, leading to further performance improvemen- t. The proposed MAT is important in that it can be considered as a superset of one recently-proposed discriminative feature learning approach called center loss. We conducted a series of experiments in both supervised and semi-supervised learn- ing on three benchmark data sets, showing that the proposed MAT can achieve remarkable performance, much better than those of the state-of-the-art adversarial approaches.

[20]  arXiv:1807.05887 [pdf, other]
Title: Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
Comments: This paper is accepted by ECML-PKDD 2018
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep Reinforcement Learning (DRL) has achieved impressive success in many applications. A key component of many DRL models is a neural network representing a Q function, to estimate the expected cumulative reward following a state-action pair. The Q function neural network contains a lot of implicit knowledge about the RL problems, but often remains unexamined and uninterpreted. To our knowledge, this work develops the first mimic learning framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to approximate neural network predictions. An LMUT is learned using a novel on-line algorithm that is well-suited for an active play setting, where the mimic learner observes an ongoing interaction between the neural net and the environment. Empirical evaluation shows that an LMUT mimics a Q function substantially better than five baseline methods. The transparent tree structure of an LMUT facilitates understanding the network's learned knowledge by analyzing feature influence, extracting rules, and highlighting the super-pixels in image inputs.

[21]  arXiv:1807.05935 [pdf, other]
Title: Siamese Survival Analysis with Competing Risks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Survival analysis in the presence of multiple possible adverse events, i.e., competing risks, is a pervasive problem in many industries (healthcare, finance, etc.). Since only one event is typically observed, the incidence of an event of interest is often obscured by other related competing events. This nonidentifiability, or inability to estimate true cause-specific survival curves from empirical data, further complicates competing risk survival analysis. We introduce Siamese Survival Prognosis Network (SSPN), a novel deep learning architecture for estimating personalized risk scores in the presence of competing risks. SSPN circumvents the nonidentifiability problem by avoiding the estimation of cause-specific survival curves and instead determines pairwise concordant time-dependent risks, where longer event times are assigned lower risks. Furthermore, SSPN is able to directly optimize an approximation to the C-discrimination index, rather than relying on well-known metrics which are unable to capture the unique requirements of survival analysis with competing risks.

[22]  arXiv:1807.05936 [pdf, other]
Title: Variational Inference: A Unified Framework of Generative Models and Some Revelations
Authors: Jianlin Su
Comments: 6 pages, 4 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We reinterpreting the variational inference in a new perspective. Via this way, we can easily prove that EM algorithm, VAE, GAN, AAE, ALI(BiGAN) are all special cases of variational inference. The proof also reveals the loss of standard GAN is incomplete and it explains why we need to train GAN cautiously. From that, we find out a regularization term to improve stability of GAN training.

[23]  arXiv:1807.05960 [pdf, other]
Title: Meta-Learning with Latent Embedding Optimization
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have the practical difficulties of operating in high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a low-dimensional latent generative representation of model parameters and performing gradient-based meta-learning in this space with latent embedding optimization (LEO), effectively decoupling the gradient-based adaptation procedure from the underlying high-dimensional space of model parameters. Our evaluation shows that LEO can achieve state-of-the-art performance on the competitive 5-way 1-shot miniImageNet classification task.

Cross-lists for Tue, 17 Jul 18

[24]  arXiv:1807.05154 (cross-list from cs.CL) [pdf, other]
Title: Deep Enhanced Representation for Implicit Discourse Relation Recognition
Comments: 13(10) pages, accepted by COLING 2018
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Implicit discourse relation recognition is a challenging task as the relation prediction without explicit connectives in discourse parsing needs understanding of text spans and cannot be easily derived from surface features from the input sentence pairs. Thus, properly representing the text is very crucial to this task. In this paper, we propose a model augmented with different grained text representations, including character, subword, word, sentence, and sentence pair levels. The proposed deeper model is evaluated on the benchmark treebank and achieves state-of-the-art accuracy with greater than 48% in 11-way and $F_1$ score greater than 50% in 4-way classifications for the first time according to our best knowledge.

[25]  arXiv:1807.05237 (cross-list from physics.comp-ph) [pdf, other]
Title: irbasis: Open-source database and software for intermediate-representation basis functions of imaginary-time Green's function
Subjects: Computational Physics (physics.comp-ph); Statistical Mechanics (cond-mat.stat-mech); Strongly Correlated Electrons (cond-mat.str-el); Superconductivity (cond-mat.supr-con); Machine Learning (cs.LG)

The open-source library, irbasis, provides easy-to-use tools for two sets of orthogonal functions named intermediate representation (IR). The IR basis enables a compact representation of the Matsubara Green's function and efficient calculations of quantum models. The IR basis functions are defined as the solution of an integral equation whose analytical solution is not available for this moment. The library consists of a database of pre-computed high-precision numerical solutions and computational code for evaluating the functions from the database. This paper describes technical details and demonstrates how to use the library.

[26]  arXiv:1807.05344 (cross-list from stat.ML) [pdf, other]
Title: Adversarially Learned Mixture Model
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The Adversarially Learned Mixture Model (AMM) is a generative model for unsupervised or semi-supervised data clustering. The AMM is the first adversarially optimized method to model the conditional dependence between inferred continuous and categorical latent variables. Experiments on the MNIST and SVHN datasets show that the AMM allows for semantic separation of complex data when little or no labeled data is available. The AMM achieves a state-of-the-art unsupervised clustering error rate of 2.86% on the MNIST dataset. A semi-supervised extension of the AMM yields competitive results on the SVHN dataset.

[27]  arXiv:1807.05411 (cross-list from stat.ML) [pdf, other]
Title: Sparse Relaxed Regularized Regression: SR3
Comments: 23 pages, 12 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

Regularized regression problems are ubiquitous in statistical modeling, signal processing, and machine learning. Sparse regression in particular has been instrumental in scientific model discovery, including compressed sensing applications, variable selection, and high-dimensional analysis. We propose a new and highly effective approach for regularized regression, called SR3.
The key idea is to solve a relaxation of the regularized problem, which has three advantages over the state-of-the-art: (1) solutions of the relaxed problem are superior with respect to errors, false positives, and conditioning, (2) relaxation allows extremely fast algorithms for both convex and nonconvex formulations, and (3) the methods apply to composite regularizers such as total variation (TV) and its nonconvex variants. We demonstrate the improved performance of SR3 across a range of regularized regression problems with synthetic and real data, including compressed sensing, LASSO, matrix completion and TV regularization. To promote reproducible research, we include a companion Matlab package that implements these popular applications.

[28]  arXiv:1807.05560 (cross-list from cs.SI) [pdf, other]
Title: DeepInf: Social Influence Prediction with Deep Learning
Comments: 10 pages, 5 figures, to appear in KDD 2018 proceedings
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)

Social and information networking activities such as on Facebook, Twitter, WeChat, and Weibo have become an indispensable part of our everyday life, where we can easily access friends' behaviors and are in turn influenced by them. Consequently, an effective social influence prediction for each user is critical for a variety of applications such as online recommendation and advertising.
Conventional social influence prediction approaches typically design various hand-crafted rules to extract user- and network-specific features. However, their effectiveness heavily relies on the knowledge of domain experts. As a result, it is usually difficult to generalize them into different domains. Inspired by the recent success of deep neural networks in a wide range of computing applications, we design an end-to-end framework, DeepInf, to learn users' latent feature representation for predicting social influence. In general, DeepInf takes a user's local network as the input to a graph neural network for learning her latent social representation. We design strategies to incorporate both network structures and user-specific features into convolutional neural and attention networks. Extensive experiments on Open Academic Graph, Twitter, Weibo, and Digg, representing different types of social and information networks, demonstrate that the proposed end-to-end model, DeepInf, significantly outperforms traditional feature engineering-based approaches, suggesting the effectiveness of representation learning for social applications.

[29]  arXiv:1807.05561 (cross-list from stat.ML) [pdf, other]
Title: Spatio-Temporal Structured Sparse Regression with Hierarchical Gaussian Process Priors
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

This paper introduces a new sparse spatio-temporal structured Gaussian process regression framework for online and offline Bayesian inference. This is the first framework that gives a time-evolving representation of the interdependencies between the components of the sparse signal of interest. A hierarchical Gaussian process describes such structure and the interdependencies are represented via the covariance matrices of the prior distributions. The inference is based on the expectation propagation method and the theoretical derivation of the posterior distribution is provided in the paper. The inference framework is thoroughly evaluated over synthetic, real video and electroencephalography (EEG) data where the spatio-temporal evolving patterns need to be reconstructed with high accuracy. It is shown that it achieves 15% improvement of the F-measure compared with the alternating direction method of multipliers, spatio-temporal sparse Bayesian learning method and one-level Gaussian process model. Additionally, the required memory for the proposed algorithm is less than in the one-level Gaussian process model. This structured sparse regression framework is of broad applicability to source localisation and object detection problems with sparse signals.

[30]  arXiv:1807.05595 (cross-list from math.OC) [pdf, other]
Title: Separable Dictionary Learning with Global Optimality and Applications to Diffusion MRI
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

Dictionary learning is a popular class of methods for modeling complex data by learning sparse representations directly from the data. For some large-scale applications, exploiting a known structure of the signal is often essential for reducing the complexity of algorithms and representations. One such method is tensor factorization by which a large multi-dimensional dataset can be explicitly factored or separated along each dimension of the data in order to break the representation up into smaller components. Learning dictionaries for tensor structured data is called tensor or separable dictionary learning. While there have been many recent works on separable dictionary learning, typical formulations involve solving a non-convex optimization problem and guaranteeing global optimality remains a challenge. In this work, we propose a framework that uses recent developments in matrix/tensor factorization to provide theoretical and numerical guarantees of the global optimality for the separable dictionary learning problem. We will demonstrate our algorithm on diffusion magnetic resonance imaging (dMRI) data, a medical imaging modality which measures water diffusion along multiple angular directions in every voxel of an MRI volume. For this application, state-of-the-art methods learn dictionaries for the angular domain of the signals without consideration for the spatial domain. In this work, we apply the proposed separable dictionary learning method to learn spatial and angular dMRI dictionaries jointly and show results on denoising phantom and real dMRI brain data.

[31]  arXiv:1807.05620 (cross-list from cs.CR) [pdf, other]
Title: NEUZZ: Efficient Fuzzing with Neural Program Learning
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Fuzzing has become the de facto standard technique for finding software vulnerabilities. However, even the state-of-the-art fuzzers are not very efficient at finding hard-to-trigger software bugs. Coverage-guided evolutionary fuzzers, while fast and scalable, often get stuck at fruitless sequences of random mutations. By contrast, more systematic techniques like symbolic and concolic execution incur significant performance overhead and struggle to scale to larger programs.
We design, implement, and evaluate NEUZZ, an efficient fuzzer that guides the fuzzing input generation process using deep neural networks. NEUZZ efficiently learns a differentiable neural approximation of the target program logic. The differentiability of the surrogate neural program, unlike the original target program, allows us to use efficient optimization techniques like gradient descent to identify promising mutations that are more likely to trigger hard-to-reach code in the target program.
We evaluate NEUZZ on 10 popular real-world programs and demonstrate that NEUZZ consistently outperforms AFL, a state-of-the-art evolutionary fuzzer, both at finding new bugs and achieving higher edge coverage. In total, NEUZZ found 36 previously unknown bugs that AFL failed to find and achieved, on average, 70 more edge coverage than AFL. Our results also demonstrate that NEUZZ can achieve average 9 more edge coverage while taking 16 less training time than other learning-enabled fuzzers.

[32]  arXiv:1807.05636 (cross-list from cs.CV) [pdf, other]
Title: Cross Pixel Optical Flow Similarity for Self-Supervised Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We propose a novel method for learning convolutional neural image representations without manual supervision. We use motion cues in the form of optical flow, to supervise representations of static images. The obvious approach of training a network to predict flow from a single image can be needlessly difficult due to intrinsic ambiguities in this prediction task. We instead propose a much simpler learning goal: embed pixels such that the similarity between their embeddings matches that between their optical flow vectors. At test time, the learned deep network can be used without access to video or flow information and transferred to tasks such as image classification, detection, and segmentation. Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results for self-supervision in general, and is overall state of the art in self-supervised pretraining for semantic image segmentation, as demonstrated on standard benchmarks.

[33]  arXiv:1807.05720 (cross-list from cs.CY) [pdf]
Title: Governing autonomous vehicles: emerging responses for safety, liability, privacy, cybersecurity, and industry risks
Comments: Transport Reviews, 2018
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

The benefits of autonomous vehicles (AVs) are widely acknowledged, but there are concerns about the extent of these benefits and AV risks and unintended consequences. In this article, we first examine AVs and different categories of the technological risks associated with them. We then explore strategies that can be adopted to address these risks, and explore emerging responses by governments for addressing AV risks. Our analyses reveal that, thus far, governments have in most instances avoided stringent measures in order to promote AV developments and the majority of responses are non-binding and focus on creating councils or working groups to better explore AV implications. The US has been active in introducing legislations to address issues related to privacy and cybersecurity. The UK and Germany, in particular, have enacted laws to address liability issues, other countries mostly acknowledge these issues, but have yet to implement specific strategies. To address privacy and cybersecurity risks strategies ranging from introduction or amendment of non-AV specific legislation to creating working groups have been adopted. Much less attention has been paid to issues such as environmental and employment risks, although a few governments have begun programmes to retrain workers who might be negatively affected.

[34]  arXiv:1807.05748 (cross-list from stat.ML) [pdf, other]
Title: Learning Stochastic Differential Equations With Gaussian Processes Without Gradient Matching
Comments: The pdf is the accepted version of the paper to be presented in 2018 IEEE International Workshop on Machine Learning for Signal Processing
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We introduce a novel paradigm for learning non-parametric drift and diffusion functions for stochastic differential equation (SDE) that are learnt to simulate trajectory distributions that match observations of arbitrary spacings. This is in contrast to existing gradient matching or other approximations that do not optimize simulated responses. We demonstrate that our general stochastic distribution optimisation leads to robust and efficient learning of SDE systems.

[35]  arXiv:1807.05836 (cross-list from q-fin.ST) [pdf, other]
Title: Forecasting market states
Comments: 11 pages, 4 figures
Subjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a novel methodology to define, analyse and forecast market states. In our approach market states are identified by a reference sparse precision matrix and a vector of expectation values. In our procedure each multivariate observation is associated to a given market state accordingly to a penalized likelihood maximization. The procedure is made computationally very efficient and can be used with a large number of assets. We demonstrate that this procedure successfully classifies different states of the markets in an unsupervised manner. In particular, we describe an experiment with one hundred log-returns and two states in which the methodology automatically associates one state to periods with average positive returns and the other state to periods with average negative returns, therefore discovering spontaneously the common classification of `bull' and `bear' markets. In another experiment, with again one hundred log-returns and two states, we demonstrate that this procedure can be efficiently used to forecast off-sample future market states with significant prediction accuracy. This methodology opens the way to a range of applications in risk management and trading strategies where the correlation structure plays a central role.

[36]  arXiv:1807.05838 (cross-list from cs.CV) [pdf, ps, other]
Title: Assessing fish abundance from underwater video using deep neural networks
Comments: IJCNN 2018
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Uses of underwater videos to assess diversity and abundance of fish are being rapidly adopted by marine biologists. Manual processing of videos for quantification by human analysts is time and labour intensive. Automatic processing of videos can be employed to achieve the objectives in a cost and time-efficient way. The aim is to build an accurate and reliable fish detection and recognition system, which is important for an autonomous robotic platform. However, there are many challenges involved in this task (e.g. complex background, deformation, low resolution and light propagation). Recent advancement in the deep neural network has led to the development of object detection and recognition in real time scenarios. An end-to-end deep learning-based architecture is introduced which outperformed the state of the art methods and first of its kind on fish assessment task. A Region Proposal Network (RPN) introduced by an object detector termed as Faster R-CNN was combined with three classification networks for detection and recognition of fish species obtained from Remote Underwater Video Stations (RUVS). An accuracy of 82.4% (mAP) obtained from the experiments are much higher than previously proposed methods.

[37]  arXiv:1807.05849 (cross-list from cs.CL) [pdf, ps, other]
Title: Neural Chinese Word Segmentation with Dictionary Knowledge
Comments: This paper has been accepted by The Seventh CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2018)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Chinese word segmentation (CWS) is an important task for Chinese NLP. Recently, many neural network based methods have been proposed for CWS. However, these methods require a large number of labeled sentences for model training, and usually cannot utilize the useful information in Chinese dictionary. In this paper, we propose two methods to exploit the dictionary information for CWS. The first one is based on pseudo labeled data generation, and the second one is based on multi-task learning. The experimental results on two benchmark datasets validate that our approach can effectively improve the performance of Chinese word segmentation, especially when training data is insufficient.

[38]  arXiv:1807.05852 (cross-list from stat.ML) [pdf, ps, other]
Title: Machine Learning with Membership Privacy using Adversarial Regularization
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Machine learning models leak information about the datasets on which they are trained. An adversary can build an algorithm to trace the individual members of a model's training dataset. As a fundamental inference attack, he aims to distinguish between data points that were part of the model's training set and any other data points from the same distribution. This is known as the tracing (and also membership inference) attack. In this paper, we focus on such attacks against black-box models, where the adversary can only observe the output of the model, but not its parameters. This is the current setting of machine learning as a service in the Internet.
We introduce a privacy mechanism to train machine learning models that provably achieve membership privacy: the model's predictions on its training data are indistinguishable from its predictions on other data points from the same distribution. We design a strategic mechanism where the privacy mechanism anticipates the membership inference attacks. The objective is to train a model such that not only does it have the minimum prediction error (high utility), but also it is the most robust model against its corresponding strongest inference attack (high privacy). We formalize this as a min-max game optimization problem, and design an adversarial training algorithm that minimizes the classification loss of the model as well as the maximum gain of the membership inference attack against it. This strategy, which guarantees membership privacy (as prediction indistinguishability), acts also as a strong regularizer and significantly generalizes the model.
We evaluate our privacy mechanism on deep neural networks using different benchmark datasets. We show that our min-max strategy can mitigate the risk of membership inference attacks (close to the random guess) with a negligible cost in terms of the classification error.

[39]  arXiv:1807.05924 (cross-list from cs.RO) [pdf]
Title: Bipedal Walking Robot using Deep Deterministic Policy Gradient
Comments: 10 pages, 8 pages
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Machine learning algorithms have found several applications in the field of robotics and control systems. The control systems community has started to show interest towards several machine learning algorithms from the sub-domains such as supervised learning, imitation learning and reinforcement learning to achieve autonomous control and intelligent decision making. Amongst many complex control problems, stable bipedal walking has been the most challenging problem. In this paper, we present an architecture to design and simulate a planar bipedal walking robot(BWR) using a realistic robotics simulator, Gazebo. The robot demonstrates successful walking behaviour by learning through several of its trial and errors, without any prior knowledge of itself or the world dynamics. The autonomous walking of the BWR is achieved using reinforcement learning algorithm called Deep Deterministic Policy Gradient(DDPG). DDPG is one of the algorithms for learning controls in continuous action spaces. After training the model in simulation, it was observed that, with a proper shaped reward function, the robot achieved faster walking or even rendered a running gait with an average speed of 0.83 m/s. The gait pattern of the bipedal walker was compared with the actual human walking pattern. The results show that the bipedal walking pattern had similar characteristics to that of a human walking pattern.

[40]  arXiv:1807.05926 (cross-list from stat.ML) [pdf]
Title: Novel Feature-Based Clustering of Micro-Panel Data (CluMP)
Comments: R-code is available upon request to the corresponding author. This working paper was submitted to Journal of Classification in April 2018
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Micro-panel data are collected and analysed in many research and industry areas. Cluster analysis of micro-panel data is an unsupervised learning exploratory method identifying subgroup clusters in a data set which include homogeneous objects in terms of the development dynamics of monitored variables. The supply of clustering methods tailored to micro-panel data is limited. The present paper focuses on a feature-based clustering method, introducing a novel two-step characteristic-based approach designed for this type of data. The proposed CluMP method aims to identify clusters that are at least as internally homogeneous and externally heterogeneous as those obtained by alternative methods already implemented in the statistical system R. We compare the clustering performance of the devised algorithm with two extant methods using simulated micro-panel data sets. Our approach has yielded similar or better outcomes than the other methods, the advantage of the proposed algorithm being time efficiency which makes it applicable for large data sets.

[41]  arXiv:1807.05981 (cross-list from eess.SP) [pdf, other]
Title: A deep learning architecture to detect events in EEG signals during sleep
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)

Electroencephalography (EEG) during sleep is used by clinicians to evaluate various neurological disorders. In sleep medicine, it is relevant to detect macro-events (> 10s) such as sleep stages, and micro-events (<2s) such as spindles and K-complexes. Annotations of such events require a trained sleep expert, a time consuming and tedious process with a large inter-scorer variability. Automatic algorithms have been developed to detect various types of events but these are event-specific. We propose a deep learning method that jointly predicts locations, durations and types of events in EEG time series. It relies on a convolutional neural network that builds a feature representation from raw EEG signals. Numerical experiments demonstrate efficiency of this new approach on various event detection tasks compared to current state-of-the-art, event specific, algorithms.

Replacements for Tue, 17 Jul 18

[42]  arXiv:1712.08238 (replaced) [pdf, ps, other]
Title: Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment
Comments: Accepted paper (not camera-ready version) of FATML 2018 conference, Fairness, Accountability and Transparency in Machine Learning, 2018, Proceedings of Machine Learning Research
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Applications (stat.AP)
[43]  arXiv:1802.09902 (replaced) [pdf, other]
Title: Attention-Based Guided Structured Sparsity of Deep Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[44]  arXiv:1803.02453 (replaced) [pdf, other]
Title: A Reductions Approach to Fair Classification
Subjects: Machine Learning (cs.LG)
[45]  arXiv:1804.10885 (replaced) [pdf]
Title: Dense Adaptive Cascade Forest: A Self Adaptive Deep Ensemble for Classification Problems
Authors: Haiyang Wang
Comments: 19 pages, 6 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[46]  arXiv:1805.08882 (replaced) [pdf, other]
Title: Multi-task Maximum Entropy Inverse Reinforcement Learning
Comments: Presented at 1st Workshop on Goal Specifications for Reinforcement Learning (ICML/IJCAI/AAMAS 2018)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[47]  arXiv:1805.11088 (replaced) [pdf, other]
Title: Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation
Comments: This paper has been accepted by IJCAI 2018
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[48]  arXiv:1806.01600 (replaced) [pdf, other]
Title: Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning
Comments: 20 pages, 4 figures, 2 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[49]  arXiv:1806.05009 (replaced) [pdf, ps, other]
Title: Tree Edit Distance Learning via Adaptive Symbol Embeddings
Comments: Paper at the International Conference of Machine Learning (2018), 2018-07-10 to 2018-07-15 in Stockholm, Sweden
Journal-ref: Proceedings of Machine Learning Research 80 (2018) 3973-3982
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[50]  arXiv:1807.04162 (replaced) [pdf, other]
Title: TherML: Thermodynamics of Machine Learning
Comments: Presented at the ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (stat.ML)
[51]  arXiv:1807.04222 (replaced) [pdf, ps, other]
Title: Modified Regularized Dual Averaging Method for Training Sparse Convolutional Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[52]  arXiv:1601.03764 (replaced) [pdf, ps, other]
Title: Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Comments: To appear in the Transactions of the Association for Computational Linguistics
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
[53]  arXiv:1610.06447 (replaced) [pdf, other]
Title: Regularized Optimal Transport and the Rot Mover's Distance
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[54]  arXiv:1704.02906 (replaced) [pdf, other]
Title: Multi-Agent Diverse Generative Adversarial Networks
Comments: This is an updated version of our CVPR'18 paper with the same title. In this version, we also introduce MAD-GAN-Sim in Appendix B
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[55]  arXiv:1705.04770 (replaced) [pdf, other]
Title: Bayesian Decision Making in Groups is Hard
Subjects: Statistics Theory (math.ST); Computational Complexity (cs.CC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)
[56]  arXiv:1705.07404 (replaced) [pdf, other]
Title: CrossNets: Cross-Information Flow in Deep Learning Architectures
Comments: 12 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[57]  arXiv:1706.03922 (replaced) [pdf, other]
Title: Analyzing the Robustness of Nearest Neighbors to Adversarial Examples
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[58]  arXiv:1706.05681 (replaced) [pdf, other]
Title: On the convergence of mirror descent beyond stochastic convex programming
Comments: 30 pages, 5 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[59]  arXiv:1802.03800 (replaced) [pdf, other]
Title: Drug response prediction by ensemble learning and drug-induced gene expression signatures
Comments: Will appear in Genomics Journal
Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG); Machine Learning (stat.ML)
[60]  arXiv:1803.04204 (replaced) [pdf, other]
Title: Semiparametric Contextual Bandits
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[61]  arXiv:1805.06576 (replaced) [pdf, other]
Title: A Spline Theory of Deep Networks (Extended Version)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[62]  arXiv:1805.09476 (replaced) [pdf, other]
Title: Hierarchical Clustering with Structural Constraints
Comments: In Proc. 35th International Conference on Machine Learning (ICML 2018)
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[63]  arXiv:1806.01771 (replaced) [pdf, other]
Title: Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference
Comments: Presented at the ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models. Stockholm, Sweden, 2018
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[64]  arXiv:1806.04594 (replaced) [pdf, ps, other]
Title: Exponential Weights on the Hypercube in Polynomial Time
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[65]  arXiv:1807.02710 (replaced) [pdf, other]
Title: Improving DNN-based Music Source Separation using Phase Features
Comments: 7 pages, 9 figures, Joint Workshop on Machine Learning for Music at ICML, IJCAI/ECAI and AAMAS, 2018
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[66]  arXiv:1807.05077 (replaced) [pdf, other]
Title: Maximizing Invariant Data Perturbation with Stochastic Optimization
Comments: 11 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[ total of 66 entries: 1-66 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)