# Learning

## New submissions

[ total of 55 entries: 1-55 ]
[ showing up to 2000 entries per page: fewer | more ]

### New submissions for Fri, 17 Nov 17

[1]
Title: Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Deep generative neural networks have proven effective at both conditional and unconditional modeling of complex data distributions. Conditional generation enables interactive control, but creating new controls often requires expensive retraining. In this paper, we develop a method to condition generation without retraining the model. By post-hoc learning latent constraints, value functions that identify regions in latent space that generate outputs with desired attributes, we can conditionally sample from these regions with gradient-based optimization or amortized actor functions. Combining attribute constraints with a universal "realism" constraint, which enforces similarity to the data distribution, we generate realistic conditional images from an unconditional variational autoencoder. Further, using gradient-based optimization, we demonstrate identity-preserving transformations that make the minimal adjustment in latent space to modify the attributes of an image. Finally, with discrete sequences of musical notes, we demonstrate zero-shot conditional generation, learning latent constraints in the absence of labeled data or a differentiable reward function. Code with dedicated cloud instance has been made publicly available (https://goo.gl/STGMGx).

[2]
Title: A Distance for HMMs based on Aggregated Wasserstein Metric and State Registration
Comments: Our manuscript is based on our conference paper with the same title published in 14th European Conference on Computer Vision (ECCV 2016, spotlight). It has been significantly extended and is now in journal submission
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time position follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture model-HMM (GMM-HMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of states. The distance is defined meaningfully even for two HMMs that are estimated from data of different dimensionality, a situation that can arise due to missing variables. This distance quantifies the dissimilarity of GMM-HMMs by measuring both the difference between the two marginal GMMs and that between the two transition matrices. Our new distance is tested on tasks of retrieval, classification, and t-SNE visualization of time series. Experiments on both synthetic and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.

[3]
Title: ORBIT: Ordering Based Information Transfer Across Space and Time for Global Surface Water Monitoring
Subjects: Learning (cs.LG); Geophysics (physics.geo-ph)

Many earth science applications require data at both high spatial and temporal resolution for effective monitoring of various ecosystem resources. Due to practical limitations in sensor design, there is often a trade-off in different resolutions of spatio-temporal datasets and hence a single sensor alone cannot provide the required information. Various data fusion methods have been proposed in the literature that mainly rely on individual timesteps when both datasets are available to learn a mapping between features values at different resolutions using local relationships between pixels. Earth observation data is often plagued with spatially and temporally correlated noise, outliers and missing data due to atmospheric disturbances which pose a challenge in learning the mapping from a local neighborhood at individual timesteps. In this paper, we aim to exploit time-independent global relationships between pixels for robust transfer of information across different scales. Specifically, we propose a new framework, ORBIT (Ordering Based Information Transfer) that uses relative ordering constraint among pixels to transfer information across both time and scales. The effectiveness of the framework is demonstrated for global surface water monitoring using both synthetic and real-world datasets.

[4]
Title: Hierarchical Modeling of Seed Variety Yields and Decision Making for Future Planting Plans
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Eradicating hunger and malnutrition is a key development goal of the 21st century. We address the problem of optimally identifying seed varieties to reliably increase crop yield within a risk-sensitive decision-making framework. Specifically, we introduce a novel hierarchical machine learning mechanism for predicting crop yield (the yield of different seed varieties of the same crop). We integrate this prediction mechanism with a weather forecasting model, and propose three different approaches for decision making under uncertainty to select seed varieties for planting so as to balance yield maximization and risk.We apply our model to the problem of soybean variety selection given in the 2016 Syngenta Crop Challenge. Our prediction model achieves a median absolute error of 3.74 bushels per acre and thus provides good estimates for input into the decision models.Our decision models identify the selection of soybean varieties that appropriately balance yield and risk as a function of the farmer's risk aversion level. More generally, our models support farmers in decision making about which seed varieties to plant.

[5]
Subjects: Learning (cs.LG); Optimization and Control (math.OC)

Most algorithms for reinforcement learning work by estimating action-value functions. Here we present a method that uses Lagrange multipliers, the costate equation, and multilayer neural networks to compute policy gradients. We show that this method can find solutions to time-optimal control problems, driving nonlinear mechanical systems quickly to a target configuration. On these tasks its performance is comparable to that of deep deterministic policy gradient, a recent action-value method.

[6]
Title: Zero-Shot Learning via Class-Conditioned Deep Generative Models
Comments: To appear in AAAI 2018
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned end-to-end using only the seen-class training data. The model infers corresponding attributes of a test image by maximizing the VAE lower bound; the inferred attributes may be linked to labels not seen when training. We further extend our model to a (1) semi-supervised/transductive setting by leveraging unlabeled unseen-class data via an unsupervised learning module, and (2) few-shot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of benchmark data sets.

[7]
Title: Knowledge transfer for surgical activity prediction
Subjects: Learning (cs.LG)

Lack of training data hinders automatic recognition and prediction of surgical activities necessary for situation-aware operating rooms. We propose using knowledge transfer to compensate for data deficit and improve prediction. We used two approaches to extract and transfer surgical process knowledge. First, we encoded semantic information about surgical terms using word embedding which boosted learning process. Secondly, we passed knowledge between different clinical datasets of neurosurgical procedures using transfer learning. Transfer learning was shown to be more effective than a simple combination of data, especially for less similar procedures. The combination of two methods provided 22% improvement of activity prediction. We also made several pertinent observations about surgical practices.

[8]
Title: Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems - the models (often deep networks or wide networks or both) are compute and memory intensive. Low-precision numerics and model compression using knowledge distillation are popular techniques to lower both the compute requirements and memory footprint of these deployed models. In this paper, we study the combination of these two techniques and show that the performance of low-precision networks can be significantly improved by using knowledge distillation techniques. Our approach, Apprentice, achieves state-of-the-art accuracies using ternary precision and 4-bit precision for variants of ResNet architecture on ImageNet dataset. We present three schemes using which one can apply knowledge distillation techniques to various stages of the train-and-deploy pipeline.

[9]
Title: Pricing Football Players using Neural Networks
Authors: Sourya Dey
Subjects: Learning (cs.LG)

We designed a multilayer perceptron neural network to predict the price of a football (soccer) player using data on more than 15,000 players from the football simulation video game FIFA 2017. The network was optimized by experimenting with different activation functions, number of neurons and layers, learning rate and its decay, Nesterov momentum based stochastic gradient descent, L2 regularization, and early stopping. Simultaneous exploration of various aspects of neural network training is performed and their trade-offs are investigated. Our final model achieves a top-5 accuracy of 87.2% among 119 pricing categories, and places any footballer within 6.32% of his actual price on average.

[10]
Title: On Communication Complexity of Classification Problems
Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Information Theory (cs.IT)

This work introduces a model of distributed learning in the spirit of Yao's communication complexity model. We consider a two-party setting, where each of the players gets a list of labelled examplesand they communicate in order to jointly perform some learning task. To naturally fit into the framework of learning theory, we allow the players to send each other labelled examples, where each example costs one unit of communication. This model can also be thought of as a distributed version of sample compression schemes.
We study several fundamental questions in this model. For example, we define the analogues of the complexity classes P, NP and coNP, and show that in this model P equals the intersection of NP and coNP. The proof does not seem to follow from the analogous statement in classical communication complexity; in particular, our proof uses different techniques, including boosting and metric properties of VC classes.
This framework allows to prove, in the context of distributed learning, unconditional separations between various learning contexts, like realizable versus agnostic learning, and proper versus improper learning. The proofs here are based on standard ideas from communication complexity as well as learning theory and geometric constructions in Euclidean space. As a corollary, we also obtain lower bounds that match the performance of algorithms from previous works on distributed classification.

[11]
Title: How Generative Adversarial Nets and its variants Work: An Overview of GAN
Subjects: Learning (cs.LG)

Generative Adversarial Networks gets wide attention in machine learning field because of its massive potential to learn high dimensional, complex real data. Specifically, it does not need to do further distribution assumption and can simply infer real-like samples from latent space. This powerful property leads GAN to be applied various applications such as image synthesis, image attribute editing and semantically decomposing of image. In this review paper, we look into details of GAN that firstly show how it operates and fundamental meaning of objective functions and point to GAN variants applied to vast amount of tasks.

[12]
Title: Budget-Constrained Multi-Armed Bandits with Multiple Plays
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We study the multi-armed bandit problem with multiple plays and a budget constraint for both the stochastic and the adversarial setting. At each round, exactly $K$ out of $N$ possible arms have to be played (with $1\leq K \leq N$). In addition to observing the individual rewards for each arm played, the player also learns a vector of costs which has to be covered with an a-priori defined budget $B$. The game ends when the sum of current costs associated with the played arms exceeds the remaining budget.
Firstly, we analyze this setting for the stochastic case, for which we assume each arm to have an underlying cost and reward distribution with support $[c_{\min}, 1]$ and $[0, 1]$, respectively. We derive an Upper Confidence Bound (UCB) algorithm which achieves $O(NK^4 \log B)$ regret.
Secondly, for the adversarial case in which the entire sequence of rewards and costs is fixed in advance, we derive an upper bound on the regret of order $O(\sqrt{NB\log(N/K)})$ utilizing an extension of the well-known $\texttt{Exp3}$ algorithm. We also provide upper bounds that hold with high probability and a lower bound of order $\Omega((1 - K/N)^2 \sqrt{NB/K})$.

[13]
Title: Less-forgetful Learning for Domain Expansion in Deep Neural Networks
Comments: 8 pages, accepted to AAAI 2018
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Expanding the domain that deep neural network has already learned without accessing old domain data is a challenging task because deep neural networks forget previously learned information when learning new data from a new domain. In this paper, we propose a less-forgetful learning method for the domain expansion scenario. While existing domain adaptation techniques solely focused on adapting to new domains, the proposed technique focuses on working well with both old and new domains without needing to know whether the input is from the old or new domain. First, we present two naive approaches which will be problematic, then we provide a new method using two proposed properties for less-forgetful learning. Finally, we prove the effectiveness of our method through experiments on image classification tasks. All datasets used in the paper, will be released on our website for someone's follow-up study.

[14]
Comments: Accepted to NIPS 2017 Hierarchical Reinforcement Learning Workshop
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

Goal-conditional policies allow reinforcement learning agents to pursue specific goals during different episodes. In addition to their potential to generalize desired behavior to unseen goals, such policies may also help in defining options for arbitrary subgoals, enabling higher-level planning. While trying to achieve a specific goal, an agent may also be able to exploit information about the degree to which it has achieved alternative goals. Reinforcement learning agents have only recently been endowed with such capacity for hindsight, which is highly valuable in environments with sparse rewards. In this paper, we show how hindsight can be introduced to likelihood-ratio policy gradient methods, generalizing this capacity to an entire class of highly successful algorithms. Our preliminary experiments suggest that hindsight may increase the sample efficiency of policy gradient methods.

[15]
Title: A unified view of gradient-based attribution methods for Deep Neural Networks
Comments: Accepted at NIPS 2017 - Workshop Interpreting, Explaining and Visualizing Deep Learning
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Understanding the flow of information in Deep Neural Networks is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, only few attempts to analyze them from a theoretical perspective have been made in the past. In this work we analyze various state-of-the-art attribution methods and prove unexplored connections between them. We also show how some methods can be reformulated and more conveniently implemented. Finally, we perform an empirical evaluation with six attribution methods on a variety of tasks and architectures and discuss their strengths and limitations.

### Cross-lists for Fri, 17 Nov 17

[16]  arXiv:1711.05734 (cross-list from cs.DC) [pdf, other]
Title: Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm${}^2$) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple Chipmunk engines can cooperate to form a single systolic array. In this way, the Chipmunk architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed by Graves et al., consuming less than 13 mW of average power.

[17]  arXiv:1711.05747 (cross-list from cs.SD) [pdf, other]
Title: Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition
Subjects: Sound (cs.SD); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)

We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recognition (ASR) systems. Prior work demonstrates that GANs can effectively suppress additive noise in raw waveform speech signals, improving perceptual quality metrics; however this technique was not justified in the context of ASR. In this work, we conduct a detailed study to measure the effectiveness of GANs in enhancing speech contaminated by both additive and reverberant noise. Motivated by recent advances in image processing, we propose operating GANs on log-Mel filterbank spectra instead of waveforms, which requires less computation and is more robust to reverberant noise. While GAN enhancement improves the performance of a clean-trained ASR system on noisy speech, it falls short of the performance achieved by conventional multi-style training (MTR). By appending the GAN-enhanced features to the noisy inputs and retraining, we achieve a 7% WER improvement relative to the MTR system.

[18]  arXiv:1711.05762 (cross-list from math.OC) [pdf, other]
Title: Random gradient extrapolation for distributed and stochastic optimization
Authors: Guanghui Lan, Yi Zhou
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we consider a class of finite-sum convex optimization problems defined over a distributed multiagent network with $m$ agents connected to a central server. In particular, the objective function consists of the average of $m$ ($\ge 1$) smooth components associated with each network agent together with a strongly convex term. Our major contribution is to develop a new randomized incremental gradient algorithm, namely random gradient extrapolation method (RGEM), which does not require any exact gradient evaluation even for the initial point, but can achieve the optimal ${\cal O}(\log(1/\epsilon))$ complexity bound in terms of the total number of gradient evaluations of component functions to solve the finite-sum problems. Furthermore, we demonstrate that for stochastic finite-sum optimization problems, RGEM maintains the optimal ${\cal O}(1/\epsilon)$ complexity (up to a certain logarithmic factor) in terms of the number of stochastic gradient computations, but attains an ${\cal O}(\log(1/\epsilon))$ complexity in terms of communication rounds (each round involves only one agent). It is worth noting that the former bound is independent of the number of agents $m$, while the latter one only linearly depends on $m$ or even $\sqrt m$ for ill-conditioned problems. To the best of our knowledge, this is the first time that these complexity bounds have been obtained for distributed and stochastic optimization problems. Moreover, our algorithms were developed based on a novel dual perspective of Nesterov's accelerated gradient method.

[19]  arXiv:1711.05822 (cross-list from cs.DL) [pdf]
Title: Understanding the Changing Roles of Scientific Publications via Citation Embeddings
Comments: CLBib-2017: Second Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics
Subjects: Digital Libraries (cs.DL); Learning (cs.LG)

Researchers may describe different aspects of past scientific publications in their publications and the descriptions may keep changing in the evolution of science. The diverse and changing descriptions (i.e., citation context) on a publication characterize the impact and contributions of the past publication. In this article, we aim to provide an approach to understanding the changing and complex roles of a publication characterized by its citation context. We described a method to represent the publications' dynamic roles in science community in different periods as a sequence of vectors by training temporal embedding models. The temporal representations can be used to quantify how much the roles of publications changed and interpret how they changed. Our study in the biomedical domain shows that our metric on the changes of publications' roles is stable over time at the population level but significantly distinguish individuals. We also show the interpretability of our methods by a concrete example.

[20]  arXiv:1711.05828 (cross-list from cs.IR) [pdf, other]
Title: BoostJet: Towards Combining Statistical Aggregates with Neural Embeddings for Recommendations
Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

Recommenders have become widely popular in recent years because of their broader applicability in many e-commerce applications. These applications rely on recommenders for generating advertisements for various offers or providing content recommendations. However, the quality of the generated recommendations depends on user features (like demography, temporality), offer features (like popularity, price), and user-offer features (like implicit or explicit feedback). Current state-of-the-art recommenders do not explore such diverse features concurrently while generating the recommendations.
In this paper, we first introduce the notion of Trackers which enables us to capture the above-mentioned features and thus incorporate users' online behaviour through statistical aggregates of different features (demography, temporality, popularity, price). We also show how to capture offer-to-offer relations, based on their consumption sequence, leveraging neural embeddings for offers in our Offer2Vec algorithm. We then introduce BoostJet, a novel recommender which integrates the Trackers along with the neural embeddings using MatrixNet, an efficient distributed implementation of gradient boosted decision tree, to improve the recommendation quality significantly. We provide an in-depth evaluation of BoostJet on Yandex's dataset, collecting online behaviour from tens of millions of online users, to demonstrate the practicality of BoostJet in terms of recommendation quality as well as scalability.

[21]  arXiv:1711.05859 (cross-list from cs.CV) [pdf, other]
Title: Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Network biology has been successfully used to help reveal complex mechanisms of disease, especially cancer. On the other hand, network biology requires in-depth knowledge to construct disease-specific networks, but our current knowledge is very limited even with the recent advances in human cancer biology. Deep learning has shown a great potential to address the difficult situation like this. However, deep learning technologies conventionally use grid-like structured data, thus application of deep learning technologies to the classification of human disease subtypes is yet to be explored. Recently, graph based deep learning techniques have emerged, which becomes an opportunity to leverage analyses in network biology. In this paper, we proposed a hybrid model, which integrates two key components 1) graph convolution neural network (graph CNN) and 2) relation network (RN). We utilize graph CNN as a component to learn expression patterns of cooperative gene community, and RN as a component to learn associations between learned patterns. The proposed model is applied to the PAM50 breast cancer subtype classification task, the standard breast cancer subtype classification of clinical utility. In experiments of both subtype classification and patient survival analysis, our proposed method achieved significantly better performances than existing methods. We believe that this work is an important starting point to realize the upcoming personalized medicine.

[22]  arXiv:1711.05869 (cross-list from stat.ML) [pdf, other]
Title: Predictive Independence Testing, Predictive Conditional Independence Testing, and Predictive Graphical Modelling
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)

Testing (conditional) independence of multivariate random variables is a task central to statistical inference and modelling in general - though unfortunately one for which to date there does not exist a practicable workflow. State-of-art workflows suffer from the need for heuristic or subjective manual choices, high computational complexity, or strong parametric assumptions.
We address these problems by establishing a theoretical link between multivariate/conditional independence testing, and model comparison in the multivariate predictive modelling aka supervised learning task. This link allows advances in the extensively studied supervised learning workflow to be directly transferred to independence testing workflows - including automated tuning of machine learning type which addresses the need for a heuristic choice, the ability to quantitatively trade-off computational demand with accuracy, and the modern black-box philosophy for checking and interfacing.
As a practical implementation of this link between the two workflows, we present a python package 'pcit', which implements our novel multivariate and conditional independence tests, interfacing the supervised learning API of the scikit-learn package. Theory and package also allow for straightforward independence test based learning of graphical model structure.
We empirically show that our proposed predictive independence test outperform or are on par to current practice, and the derived graphical model structure learning algorithms asymptotically recover the 'true' graph. This paper, and the 'pcit' package accompanying it, thus provide powerful, scalable, generalizable, and easy-to-use methods for multivariate and conditional independence testing, as well as for graphical model structure learning.

[23]  arXiv:1711.05918 (cross-list from cs.CV) [pdf, other]
Title: Priming Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Visual priming is known to affect the human visual system to allow detection of scene elements, even those that may have been near unnoticeable before, such as the presence of camouflaged animals. This process has been shown to be an effect of top-down signaling in the visual system triggered by the said cue. In this paper, we propose a mechanism to mimic the process of priming in the context of object detection and segmentation. We view priming as having a modulatory, cue dependent effect on layers of features within a network. Our results show how such a process can be complementary to, and at times more effective than simple post-processing applied to the output of the network, notably so in cases where the object is hard to detect such as in severe noise. Moreover, we find the effects of priming are sometimes stronger when early visual layers are affected. Overall, our experiments confirm that top-down signals can go a long way in improving object detection and segmentation.

[24]  arXiv:1711.05934 (cross-list from cs.CV) [pdf, other]
Title: Enhanced Attacks on Defensively Distilled Deep Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG)

Deep neural networks (DNNs) have achieved tremendous success in many tasks of machine learning, such as the image classification. Unfortunately, researchers have shown that DNNs are easily attacked by adversarial examples, slightly perturbed images which can mislead DNNs to give incorrect classification results. Such attack has seriously hampered the deployment of DNN systems in areas where security or safety requirements are strict, such as autonomous cars, face recognition, malware detection. Defensive distillation is a mechanism aimed at training a robust DNN which significantly reduces the effectiveness of adversarial examples generation. However, the state-of-the-art attack can be successful on distilled networks with 100% probability. But it is a white-box attack which needs to know the inner information of DNN. Whereas, the black-box scenario is more general. In this paper, we first propose the epsilon-neighborhood attack, which can fool the defensively distilled networks with 100% success rate in the white-box setting, and it is fast to generate adversarial examples with good visual quality. On the basis of this attack, we further propose the region-based attack against defensively distilled DNNs in the black-box setting. And we also perform the bypass attack to indirectly break the distillation defense as a complementary method. The experimental results show that our black-box attacks have a considerable success rate on defensively distilled networks.

[25]  arXiv:1711.06064 (cross-list from stat.ML) [pdf, other]
Title: Gaussian Process Decentralized Data Fusion Meets Transfer Learning in Large-Scale Distributed Cooperative Perception
Comments: 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), Extended version with proofs, 14 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO)

This paper presents novel Gaussian process decentralized data fusion algorithms exploiting the notion of agent-centric support sets for distributed cooperative perception of large-scale environmental phenomena. To overcome the limitations of scale in existing works, our proposed algorithms allow every mobile sensing agent to choose a different support set and dynamically switch to another during execution for encapsulating its own data into a local summary that, perhaps surprisingly, can still be assimilated with the other agents' local summaries (i.e., based on their current choices of support sets) into a globally consistent summary to be used for predicting the phenomenon. To achieve this, we propose a novel transfer learning mechanism for a team of agents capable of sharing and transferring information encapsulated in a summary based on a support set to that utilizing a different support set with some loss that can be theoretically bounded and analyzed. To alleviate the issue of information loss accumulating over multiple instances of transfer learning, we propose a new information sharing mechanism to be incorporated into our algorithms in order to achieve memory-efficient lazy transfer learning. Empirical evaluation on real-world datasets show that our algorithms outperform the state-of-the-art methods.

[26]  arXiv:1711.06068 (cross-list from cs.HC) [pdf]
Title: The signature of robot action success in EEG signals of a human observer: Decoding and visualization using deep convolutional neural networks
Subjects: Human-Computer Interaction (cs.HC); Learning (cs.LG); Robotics (cs.RO)

The importance of robotic assistive devices grows in our work and everyday life. Cooperative scenarios involving both robots and humans require safe human-robot interaction. One important aspect here is the management of robot errors, including fast and accurate online robot-error detection and correction. Analysis of brain signals from a human interacting with a robot may help identifying robot errors, but accuracies of such analyses have still substantial space for improvement. In this paper we evaluate whether a novel framework based on deep convolutional neural networks (deep ConvNets) could improve the accuracy of decoding robot errors from the EEG of a human observer, both during an object grasping and a pouring task. We show that deep ConvNets reached significantly higher accuracies than both regularized Linear Discriminant Analysis (rLDA) and filter bank common spatial patterns (FB-CSP) combined with rLDA, both widely used EEG classifiers. Deep ConvNets reached mean accuracies of 75% +/- 9 %, rLDA 65% +/- 10% and FB-CSP + rLDA 63% +/- 6% for decoding of erroneous vs. correct trials. Visualization of the time-domain EEG features learned by the ConvNets to decode errors revealed spatiotemporal patterns that reflected differences between the two experimental paradigms. Across subjects, ConvNet decoding accuracies were significantly correlated with those obtained with rLDA, but not CSP, indicating that in the present context ConvNets behaved more 'rLDA-like' (but consistently better), while in a previous decoding study with another task but the same ConvNet architecture, it was found to behave more 'CSP-like'. Our findings thus provide further support for the assumption that deep ConvNets are a versatile addition to the existing toolbox of EEG decoding techniques, and we discuss steps how ConvNet EEG decoding performance could be further optimized.

[27]  arXiv:1711.06114 (cross-list from stat.ML) [pdf, other]
Title: Robust Unsupervised Domain Adaptation for Neural Networks via Moment Alignment
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

A novel approach for unsupervised domain adaptation for neural networks is proposed that relies on a metric-based regularization of the learning process. The metric-based regularization aims at domain-invariant latent feature representations by means of maximizing the similarity between domain-specific activation distributions. The proposed metric results from modifying an integral probability metric in a way such that it becomes translation-invariant on a polynomial reproducing kernel Hilbert space. The metric has an intuitive interpretation in the dual space as sum of differences of central moments of the corresponding activation distributions. As demonstrated by an analysis on standard benchmark datasets for sentiment analysis and object recognition the outlined approach shows more robustness \wrt parameter changes than state-of-the-art approaches while achieving even higher classification accuracies.

[28]  arXiv:1711.06178 (cross-list from stat.ML) [pdf, other]
Title: Beyond Sparsity: Tree Regularization of Deep Models for Interpretability
Comments: To appear in AAAI 2018. Contains 9-page main paper and appendix with supplementary material
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

The lack of interpretability remains a key barrier to the adoption of deep models in many applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with few nodes. Using intuitive toy examples as well as medical tasks for treating sepsis and HIV, we demonstrate that this new tree regularization yields models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power.

[29]  arXiv:1711.06195 (cross-list from stat.ML) [pdf, other]
Title: Neurology-as-a-Service for the Developing World
Comments: Presented at NIPS 2017 Workshop on Machine Learning for the Developing World
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Electroencephalography (EEG) is an extensively-used and well-studied technique in the field of medical diagnostics and treatment for brain disorders, including epilepsy, migraines, and tumors. The analysis and interpretation of EEGs require physicians to have specialized training, which is not common even among most doctors in the developed world, let alone the developing world where physician shortages plague society. This problem can be addressed by teleEEG that uses remote EEG analysis by experts or by local computer processing of EEGs. However, both of these options are prohibitively expensive and the second option requires abundant computing resources and infrastructure, which is another concern in developing countries where there are resource constraints on capital and computing infrastructure. In this work, we present a cloud-based deep neural network approach to provide decision support for non-specialist physicians in EEG analysis and interpretation. Named `neurology-as-a-service,' the approach requires almost no manual intervention in feature engineering and in the selection of an optimal architecture and hyperparameters of the neural network. In this study, we deploy a pipeline that includes moving EEG data to the cloud and getting optimal models for various classification tasks. Our initial prototype has been tested only in developed world environments to-date, but our intention is to test it in developing world environments in future work. We demonstrate the performance of our proposed approach using the BCI2000 EEG MMI dataset, on which our service attains 63.4\% accuracy for the task of classifying real vs.\ imaginary activity performed by the subject, which is significantly higher than what is obtained with a shallow approach such as support vector machines.

[30]  arXiv:1711.06221 (cross-list from stat.ML) [pdf, other]
Title: A Forward-Backward Approach for Visualizing Information Flow in Deep Networks
Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We introduce a new, systematic framework for visualizing information flow in deep networks. Specifically, given any trained deep convolutional network model and a given test image, our method produces a compact support in the image domain that corresponds to a (high-resolution) feature that contributes to the given explanation. Our method is both computationally efficient as well as numerically robust. We present several preliminary numerical results that support the benefits of our framework over existing methods.

[31]  arXiv:1711.06252 (cross-list from stat.ME) [pdf, other]
Title: A New Method for Performance Analysis in Nonlinear Dimensionality Reduction
Comments: 20 pages, 8 figures, 2 tables
Subjects: Methodology (stat.ME); Learning (cs.LG)

In this paper, we develop a local rank correlation measure which quantifies the performance of dimension reduction methods. The local rank correlation is easily interpretable, and robust against the extreme skewness of nearest neighbor distributions in high dimensions. Some benchmark datasets are studied. We find that the local rank correlation closely corresponds to our visual interpretation of the quality of the output. In addition, we demonstrate that the local rank correlation is useful in estimating the intrinsic dimensionality of the original data, and in selecting a suitable value of tuning parameters used in some algorithms.

### Replacements for Fri, 17 Nov 17

[32]  arXiv:1612.00563 (replaced) [pdf, other]
Title: Self-critical Sequence Training for Image Captioning
Comments: CVPR 2017 + additional analysis + fixed baseline results, 16 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[33]  arXiv:1702.07956 (replaced) [pdf, ps, other]
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[34]  arXiv:1703.02403 (replaced) [pdf, other]
Title: On Structured Prediction Theory with Calibrated Convex Surrogate Losses
Comments: Appears in: Advances in Neural Information Processing Systems 30 (NIPS 2017). 30 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[35]  arXiv:1703.10675 (replaced) [pdf, other]
Title: Applying Ricci Flow to High Dimensional Manifold Learning
Subjects: Learning (cs.LG)
[36]  arXiv:1708.05929 (replaced) [pdf, other]
Title: Explaining Anomalies in Groups with Characterizing Subspace Rules
Comments: 17 pages, 6 figures, 8 tables
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[37]  arXiv:1709.09268 (replaced) [pdf, other]
Title: FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
[38]  arXiv:1710.04584 (replaced) [pdf, ps, other]
Title: Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[39]  arXiv:1710.08864 (replaced) [pdf, other]
Title: One pixel attack for fooling deep neural networks
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[40]  arXiv:1711.05697 (replaced) [pdf, other]
Title: Motif-based Convolutional Neural Network on Graphs
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)
[41]  arXiv:1508.07964 (replaced) [pdf, other]
Title: Wald-Kernel: Learning to Aggregate Information for Sequential Inference
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[42]  arXiv:1603.03833 (replaced) [pdf, other]
Title: From virtual demonstration to real-world manipulation using LSTM and MDN
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
[43]  arXiv:1609.01885 (replaced) [pdf, other]
Title: DAiSEE: Towards User Engagement Recognition in the Wild
Comments: 10 pages, 6 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
[44]  arXiv:1703.00410 (replaced) [pdf, other]
Title: Detecting Adversarial Samples from Artifacts
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[45]  arXiv:1703.05830 (replaced) [pdf, other]
Title: Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
[46]  arXiv:1705.06995 (replaced) [pdf, other]
Title: Nearly second-order asymptotic optimality of sequential change-point detection with one-sample updates
Subjects: Statistics Theory (math.ST); Learning (cs.LG)
[47]  arXiv:1706.07179 (replaced) [pdf, other]
Title: RelNet: End-to-End Modeling of Entities & Relations
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
[48]  arXiv:1710.10468 (replaced) [pdf, other]
Title: Speaker Diarization with LSTM
Subjects: Audio and Speech Processing (eess.AS); Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[49]  arXiv:1711.00066 (replaced) [pdf, other]
Title: Fraternal Dropout
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
[50]  arXiv:1711.01434 (replaced) [pdf]
Title: Transaction Fraud Detection Using GRU-centered Sandwich-structured Model
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
[51]  arXiv:1711.04291 (replaced) [pdf, other]
Title: Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
Comments: 10 pages, 4 figures, 13 tables
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[52]  arXiv:1711.04851 (replaced) [pdf, other]
Title: Learning and Visualizing Localized Geometric Features Using 3D-CNN: An Application to Manufacturability Analysis of Drilled Holes
Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
[53]  arXiv:1711.05376 (replaced) [pdf, other]
Title: Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
[54]  arXiv:1711.05411 (replaced) [pdf, other]
Title: Z-Forcing: Training Stochastic Recurrent Networks