We gratefully acknowledge support from
the Simons Foundation
and member institutions


New submissions

[ total of 26 entries: 1-26 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 22 Sep 17

[1]  arXiv:1709.06994 [pdf, other]
Title: Structured Probabilistic Pruning for Deep Convolutional Neural Network Acceleration
Comments: 7 pages, CNN model compression and acceleration
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Although deep Convolutional Neural Network (CNN) has shown better performance in various machine learning tasks, its application is accompanied by a significant increase in storage and computation. Among CNN simplification techniques, parameter pruning is a promising approach which aims at reducing the number of weights of various layers without intensively reducing the original accuracy. In this paper, we propose a novel progressive parameter pruning method, named Structured Probabilistic Pruning (SPP), which efficiently prunes weights of convolutional layers in a probabilistic manner. Unlike existing deterministic pruning approaches, in which the pruned weights of a well-trained model are permanently eliminated, SPP utilizes the relative importance of weights during training iterations, which makes the pruning procedure more accurate by leveraging the accumulated weight importance. Specifically, we introduce an effective weight competition mechanism to emphasize the important weights and gradually undermine the unimportant ones. Experiments indicate that our proposed method has obtained superior performance on ConvNet and AlexNet compared with existing pruning methods. Our pruned AlexNet achieves 4.0 $\sim$ 8.9x (averagely 5.8x) layer-wise speedup in convolutional layers with only 1.3\% top-5 error increase on the ImageNet-2012 validation dataset. We also prove the effectiveness of our method on transfer learning scenarios using AlexNet.

[2]  arXiv:1709.07093 [pdf, other]
Title: Near Optimal Sketching of Low-Rank Tensor Regression
Comments: 36 pages, 2 figures, 2 tables, Accepted at NIPS 2017
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D,R}} \|A\Theta-b\|_2, \end{align*} where $\mathcal{S}_{\odot D,R}$ is the set of $\Theta$ for which $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$ for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors. That is, $\Theta$ is a low-dimensional, low-rank tensor. This is motivated by the fact that the number of parameters in $\Theta$ is only $R \cdot \sum_{d=1}^D p_d$, which is significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $\Theta$, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections $\Phi \in \mathbb{R}^{m \times n}$, with $m \ll n$, to reduce the problem to a much smaller problem $\min_{\Theta} \|\Phi A \Theta - \Phi b\|_2$, for which if $\Theta'$ is a near-optimum to the smaller problem, then it is also a near optimum to the original problem. We obtain significantly smaller dimension and sparsity in $\Phi$ than is possible for ordinary least squares regression, and we also provide a number of numerical simulations supporting our theory.

[3]  arXiv:1709.07116 [pdf, other]
Title: Variational Memory Addressing in Generative Models
Subjects: Learning (cs.LG)

Aiming to augment generative models with external memory, we interpret the output of a memory module with stochastic addressing as a conditional mixture distribution, where a read operation corresponds to sampling a discrete memory address and retrieving the corresponding content from memory. This perspective allows us to apply variational inference to memory addressing, which enables effective training of the memory module by using the target information to guide memory lookups. Stochastic addressing is particularly well-suited for generative models as it naturally encourages multimodality which is a prominent aspect of most high-dimensional datasets. Treating the chosen address as a latent variable also allows us to quantify the amount of information gained with a memory lookup and measure the contribution of the memory module to the generative process. To illustrate the advantages of this approach we incorporate it into a variational autoencoder and apply the resulting model to the task of generative few-shot learning. The intuition behind this architecture is that the memory module can pick a relevant template from memory and the continuous part of the model can concentrate on modeling remaining variations. We demonstrate empirically that our model is able to identify and access the relevant memory contents even with hundreds of unseen Omniglot characters in memory

[4]  arXiv:1709.07149 [pdf, ps, other]
Title: Learning RBM with a DC programming Approach
Comments: Accepted in ACML2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

By exploiting the property that the RBM log-likelihood function is the difference of convex functions, we formulate a stochastic variant of the difference of convex functions (DC) programming to minimize the negative log-likelihood. Interestingly, the traditional contrastive divergence algorithm is a special case of the above formulation and the hyperparameters of the two algorithms can be chosen such that the amount of computation per mini-batch is identical. We show that for a given computational budget the proposed algorithm almost always reaches a higher log-likelihood more rapidly, compared to the standard contrastive divergence algorithm. Further, we modify this algorithm to use the centered gradients and show that it is more efficient and effective compared to the standard centered gradient algorithm on benchmark datasets.

[5]  arXiv:1709.07172 [pdf, ps, other]
Title: SpectralFPL: Online Spectral Learning for Single Topic Models
Comments: 8 pages, 5 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

This paper studies how to efficiently learn an optimal latent variable model online from large streaming data. Latent variable models can explain the observed data in terms of unobserved concepts. They are traditionally studied in the unsupervised learning setting, and learned by iterative methods such as the EM. Very few online learning algorithms for latent variable models have been developed, and the most popular one is online EM. Though online EM is computationally efficient, it typically converges to a local optimum. In this work, we motivate and develop SpectralFPL, a novel algorithm to learn latent variable models online from streaming data. SpectralFPL is computationally efficient, and we prove that it quickly learns the global optimum under a bag-of-words model by deriving an $O(\sqrt n)$ regret bound. Experiment results also demonstrate a consistent performance improvement of SpectralFPL over online EM: in both synthetic and real-world experiments, SpectralFPL's performance is similar with or even better than online EM with optimally tuned parameters.

[6]  arXiv:1709.07314 [pdf, ps, other]
Title: Exact Learning of Lightweight Description Logic Ontologies
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

We study the problem of learning description logic (DL) ontologies in Angluin et al.'s framework of exact learning via queries. We admit membership queries ("is a given subsumption entailed by the target ontology?") and equivalence queries ("is a given ontology equivalent to the target ontology?"). We present three main results: (1) ontologies formulated in (two relevant versions of) the description logic DL-Lite can be learned with polynomially many queries of polynomial size; (2) this is not the case for ontologies formulated in the description logic EL, even when only acyclic ontologies are admitted; and (3) ontologies formulated in a fragment of EL related to the web ontology language OWL 2 RL can be learned in polynomial time. We also show that neither membership nor equivalence queries alone are sufficient in cases (1) and (3).

[7]  arXiv:1709.07377 [pdf, other]
Title: Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE
Comments: 22 pages, 15 figures
Subjects: Learning (cs.LG)

Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line segment that joins minority class instances. In this paper we propose Geometric SMOTE (G-SMOTE) as a generalization of the SMOTE data generation mechanism. G-SMOTE generates synthetic samples in a geometric region of the input space, around each selected minority instance. While in the basic configuration this region is a hyper-sphere, G-SMOTE allows its deformation to a hyper-spheroid and finally to a line segment, emulating, in the last case, the SMOTE mechanism. The performance of G-SMOTE is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when G-SMOTE is used as an oversampling algorithm.

Cross-lists for Fri, 22 Sep 17

[8]  arXiv:1709.07077 (cross-list from cs.CV) [pdf, other]
Title: Estimated Depth Map Helps Image Classification
Authors: Yihui He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We consider image classification with estimated depth. This problem falls into the domain of transfer learning, since we are using a model trained on a set of depth images to generate depth maps (additional features) for use in another classification problem using another disjoint set of images. It's challenging as no direct depth information is provided. Though depth estimation has been well studied, none have attempted to aid image classification with estimated depth. Therefore, we present a way of transferring domain knowledge on depth estimation to a separate image classification task over a disjoint set of train, and test data. We build a RGBD dataset based on RGB dataset and do image classification on it. Then evaluation the performance of neural networks on the RGBD dataset compared to the RGB dataset. From our experiments, the benefit is significant with shallow and deep networks. It improves ResNet-20 by 0.55% and ResNet-56 by 0.53%. Our code and dataset are available publicly.

[9]  arXiv:1709.07089 (cross-list from cs.SY) [pdf, other]
Title: On the Design of LQR Kernels for Efficient Controller Learning
Comments: 8 pages, 5 figures, to appear in 56th IEEE Conference on Decision and Control (CDC 2017)
Subjects: Systems and Control (cs.SY); Learning (cs.LG); Machine Learning (stat.ML)

Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.

[10]  arXiv:1709.07109 (cross-list from cs.CL) [pdf, other]
Title: Deconvolutional Latent-Variable Model for Text Sequence Matching
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

A latent-variable model is introduced for text matching, inferring sentence representations by jointly optimizing generative and discriminative objectives. To alleviate typical optimization challenges in latent-variable models for text, we employ deconvolutional networks as the sequence decoder (generator), providing learned latent codes with more semantic information and better generalization. Our model, trained in an unsupervised manner, yields stronger empirical predictive performance than a decoder based on Long Short-Term Memory (LSTM), with less parameters and considerably faster training. Further, we apply it to text sequence-matching problems. The proposed model significantly outperforms several strong sentence-encoding baselines, especially in the semi-supervised setting.

[11]  arXiv:1709.07124 (cross-list from cs.SD) [pdf, other]
Title: Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding
Comments: To be presented at WASPAA 2017
Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (NMF) of spectrograms. We name this network architecture deep recurrent NMF (DR-NMF). The proposed DR-NMF network has three distinct advantages. First, DR-NMF provides better interpretability than other deep architectures, since the weights correspond to NMF model parameters, even after training. This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization. Second, like many deep networks, DR-NMF is an order of magnitude faster at test time than NMF, since computation of the network output only requires evaluating a few layers at each time step. Third, when a limited amount of training data is available, DR-NMF exhibits stronger generalization and separation performance compared to sparse NMF and state-of-the-art long-short term memory (LSTM) networks. When a large amount of training data is available, DR-NMF achieves lower yet competitive separation performance compared to LSTM networks.

[12]  arXiv:1709.07150 (cross-list from cs.AI) [pdf, other]
Title: Feature Engineering for Predictive Modeling using Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly enumerates the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.

[13]  arXiv:1709.07200 (cross-list from cs.CV) [pdf, other]
Title: Temporal Multimodal Fusion for Video Emotion Classification in the Wild
Journal-ref: ACM - ICMI 2017, Nov 2017, Glasgow, United Kingdom
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)

This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework -- lying in describing videos by audio and visual features used by a supervised classifier to infer the labels -- this paper investigates several novel directions. First of all, improved face descriptors based on 2D and 3D Convo-lutional Neural Networks are proposed. Second, the paper explores several fusion methods, temporal and multimodal, including a novel hierarchical method combining features and scores. In addition, we carefully reviewed the different stages of the pipeline and designed a CNN architecture adapted to the task; this is important as the size of the training set is small compared to the difficulty of the problem, making generalization difficult. The so-obtained model ranked 4th at the 2017 Emotion in the Wild challenge with the accuracy of 58.8 %.

[14]  arXiv:1709.07223 (cross-list from cs.CV) [pdf, other]
Title: Convolutional neural networks that teach microscopes how to image
Comments: 13 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Optics (physics.optics)

Deep learning algorithms offer a powerful means to automatically analyze the content of medical images. However, many biological samples of interest are primarily transparent to visible light and contain features that are difficult to resolve with a standard optical microscope. Here, we use a convolutional neural network (CNN) not only to classify images, but also to optimize the physical layout of the imaging device itself. We increase the classification accuracy of a microscope's recorded images by merging an optical model of image formation into the pipeline of a CNN. The resulting network simultaneously determines an ideal illumination arrangement to highlight important sample features during image acquisition, along with a set of convolutional weights to classify the detected images post-capture. We demonstrate our joint optimization technique with an experimental microscope configuration that automatically identifies malaria-infected cells with 5-10% higher accuracy than standard and alternative microscope lighting designs.

[15]  arXiv:1709.07224 (cross-list from cs.MA) [pdf, other]
Title: Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement Learning
Comments: 8 pages, 6 figures, version 1
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY); Machine Learning (stat.ML)

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

[16]  arXiv:1709.07308 (cross-list from cs.DS) [pdf, other]
Title: Predicting Positive and Negative Links with Noisy Queries: Theory & Practice
Comments: 17 pages, 12 figures. arXiv admin note: text overlap with arXiv:1609.00750
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Learning (cs.LG); Social and Information Networks (cs.SI); Combinatorics (math.CO)

Social networks and interactions in social media involve both positive and negative relationships. Signed graphs capture both types of relationships: positive edges correspond to pairs of "friends", and negative edges to pairs of "foes". The edge sign prediction problem, that aims to predict whether an interaction between a pair of nodes will be positive or negative, is an important graph mining task for which many heuristics have recently been proposed [Leskovec 2010].
We model the edge sign prediction problem as follows: we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability $0<q<\frac{1}{2}$. Let $\delta=1-2q$ be the bias. We provide an algorithm that recovers all signs correctly with high probability in the presence of noise for any constant gap $\delta$ with $O(\frac{n\log n}{\delta^4})$ queries. Our algorithm uses breadth first search as its main algorithmic primitive. A byproduct of our proposed learning algorithm is the use of $s-t$ paths as an informative feature to predict the sign of the edge $(s,t)$. As a heuristic, we use edge disjoint $s-t$ paths of short length as a feature for predicting edge signs in real-world signed networks. Our findings suggest that the use of paths improves the classification accuracy, especially for pairs of nodes with no common neighbors.

[17]  arXiv:1709.07359 (cross-list from stat.ML) [pdf, other]
Title: Class-Splitting Generative Adversarial Networks
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Generative Adversarial Networks (GANs) produce systematically better quality samples when class label information is provided., i.e. in the conditional GAN setup. This is still observed for the recently proposed Wasserstein GAN formulation which stabilized adversarial training and allows considering high capacity network architectures such as ResNet. In this work we show how to boost conditional GAN by augmenting available class labels. The new classes come from clustering in the representation space learned by the same GAN model. The proposed strategy is also feasible when no class information is available, i.e. in the unsupervised setup. Our generated samples reach state-of-the-art Inception scores for CIFAR-10 and STL-10 datasets in both supervised and unsupervised setup.

[18]  arXiv:1709.07409 (cross-list from quant-ph) [pdf, other]
Title: Quantum Autoencoders via Quantum Adders with Genetic Algorithms
Subjects: Quantum Physics (quant-ph); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The quantum autoencoder is a recent paradigm in the field of quantum machine learning, which may enable an enhanced use of resources in quantum technologies. To this end, quantum neural networks with less nodes in the inner than in the outer layers were considered. Here, we propose a useful connection between approximate quantum adders and quantum autoencoders. Specifically, this link allows us to employ optimized approximate quantum adders, obtained with genetic algorithms, for the implementation of quantum autoencoders for a variety of initial states. Furthermore, we can also directly optimize the quantum autoencoders via genetic algorithms. Our approach opens a different path for the design of quantum autoencoders in controllable quantum platforms.

[19]  arXiv:1709.07417 (cross-list from cs.AI) [pdf, other]
Title: Neural Optimizer Search with Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.

[20]  arXiv:1709.07433 (cross-list from stat.ML) [pdf, other]
Title: Perturbative Black Box Variational Inference
Comments: Neural Information Processing Systems (NIPS 2017), to appear. Joint first authorship of first two mentioned authors
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. These divergences can be tuned to be more mass-covering (preventing overfitting in complex models), but are also often harder to optimize using Monte-Carlo gradients. In this paper, we view BBVI with generalized divergences as a form of biased importance sampling. The choice of divergence determines a bias-variance tradeoff between the tightness of the bound (low bias) and the variance of its gradient estimators. Drawing on variational perturbation theory of statistical physics, we use these insights to construct a new variational bound which is tighter than the KL bound and more mass covering. Compared to alpha-divergences, its reparameterization gradients have a lower variance. We show in several experiments on Gaussian Processes and Variational Autoencoders that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.

Replacements for Fri, 22 Sep 17

[21]  arXiv:1701.09177 (replaced) [pdf, other]
Title: A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering
Journal-ref: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[22]  arXiv:1709.04514 (replaced) [pdf, other]
Title: Differentially Private Mixture of Generative Neural Networks
Comments: This is the full version of the paper with the same title to appear at 17th IEEE International Conference on Data Mining series (ICDM 2017)
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR)
[23]  arXiv:1709.06293 (replaced) [pdf, other]
Title: Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
Comments: 15 pages, 9 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[24]  arXiv:1706.00136 (replaced) [pdf, other]
Title: Scalable Generalized Linear Bandits: Online Computation and Hashing
Comments: accepted to NIPS'17
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[25]  arXiv:1709.04090 (replaced) [pdf, other]
Title: A Constrained, Weighted-L1 Minimization Approach for Joint Discovery of Heterogeneous Neural Connectivity Graphs
Comments: 8 pages
Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG)
[26]  arXiv:1709.05289 (replaced) [pdf, ps, other]
Title: Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Subjects: Functional Analysis (math.FA); Learning (cs.LG); Machine Learning (stat.ML)
[ total of 26 entries: 1-26 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)