Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 22 Sep 17
 [1] arXiv:1709.06994 [pdf, other]

Title: Structured Probabilistic Pruning for Deep Convolutional Neural Network AccelerationComments: 7 pages, CNN model compression and accelerationSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Although deep Convolutional Neural Network (CNN) has shown better performance in various machine learning tasks, its application is accompanied by a significant increase in storage and computation. Among CNN simplification techniques, parameter pruning is a promising approach which aims at reducing the number of weights of various layers without intensively reducing the original accuracy. In this paper, we propose a novel progressive parameter pruning method, named Structured Probabilistic Pruning (SPP), which efficiently prunes weights of convolutional layers in a probabilistic manner. Unlike existing deterministic pruning approaches, in which the pruned weights of a welltrained model are permanently eliminated, SPP utilizes the relative importance of weights during training iterations, which makes the pruning procedure more accurate by leveraging the accumulated weight importance. Specifically, we introduce an effective weight competition mechanism to emphasize the important weights and gradually undermine the unimportant ones. Experiments indicate that our proposed method has obtained superior performance on ConvNet and AlexNet compared with existing pruning methods. Our pruned AlexNet achieves 4.0 $\sim$ 8.9x (averagely 5.8x) layerwise speedup in convolutional layers with only 1.3\% top5 error increase on the ImageNet2012 validation dataset. We also prove the effectiveness of our method on transfer learning scenarios using AlexNet.
 [2] arXiv:1709.07093 [pdf, other]

Title: Near Optimal Sketching of LowRank Tensor RegressionComments: 36 pages, 2 figures, 2 tables, Accepted at NIPS 2017Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D,R}} \A\Thetab\_2, \end{align*} where $\mathcal{S}_{\odot D,R}$ is the set of $\Theta$ for which $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$ for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors. That is, $\Theta$ is a lowdimensional, lowrank tensor. This is motivated by the fact that the number of parameters in $\Theta$ is only $R \cdot \sum_{d=1}^D p_d$, which is significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $\Theta$, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections $\Phi \in \mathbb{R}^{m \times n}$, with $m \ll n$, to reduce the problem to a much smaller problem $\min_{\Theta} \\Phi A \Theta  \Phi b\_2$, for which if $\Theta'$ is a nearoptimum to the smaller problem, then it is also a near optimum to the original problem. We obtain significantly smaller dimension and sparsity in $\Phi$ than is possible for ordinary least squares regression, and we also provide a number of numerical simulations supporting our theory.
 [3] arXiv:1709.07116 [pdf, other]

Title: Variational Memory Addressing in Generative ModelsSubjects: Learning (cs.LG)
Aiming to augment generative models with external memory, we interpret the output of a memory module with stochastic addressing as a conditional mixture distribution, where a read operation corresponds to sampling a discrete memory address and retrieving the corresponding content from memory. This perspective allows us to apply variational inference to memory addressing, which enables effective training of the memory module by using the target information to guide memory lookups. Stochastic addressing is particularly wellsuited for generative models as it naturally encourages multimodality which is a prominent aspect of most highdimensional datasets. Treating the chosen address as a latent variable also allows us to quantify the amount of information gained with a memory lookup and measure the contribution of the memory module to the generative process. To illustrate the advantages of this approach we incorporate it into a variational autoencoder and apply the resulting model to the task of generative fewshot learning. The intuition behind this architecture is that the memory module can pick a relevant template from memory and the continuous part of the model can concentrate on modeling remaining variations. We demonstrate empirically that our model is able to identify and access the relevant memory contents even with hundreds of unseen Omniglot characters in memory
 [4] arXiv:1709.07149 [pdf, ps, other]

Title: Learning RBM with a DC programming ApproachComments: Accepted in ACML2017Subjects: Learning (cs.LG); Machine Learning (stat.ML)
By exploiting the property that the RBM loglikelihood function is the difference of convex functions, we formulate a stochastic variant of the difference of convex functions (DC) programming to minimize the negative loglikelihood. Interestingly, the traditional contrastive divergence algorithm is a special case of the above formulation and the hyperparameters of the two algorithms can be chosen such that the amount of computation per minibatch is identical. We show that for a given computational budget the proposed algorithm almost always reaches a higher loglikelihood more rapidly, compared to the standard contrastive divergence algorithm. Further, we modify this algorithm to use the centered gradients and show that it is more efficient and effective compared to the standard centered gradient algorithm on benchmark datasets.
 [5] arXiv:1709.07172 [pdf, ps, other]

Title: SpectralFPL: Online Spectral Learning for Single Topic ModelsComments: 8 pages, 5 figuresSubjects: Learning (cs.LG); Machine Learning (stat.ML)
This paper studies how to efficiently learn an optimal latent variable model online from large streaming data. Latent variable models can explain the observed data in terms of unobserved concepts. They are traditionally studied in the unsupervised learning setting, and learned by iterative methods such as the EM. Very few online learning algorithms for latent variable models have been developed, and the most popular one is online EM. Though online EM is computationally efficient, it typically converges to a local optimum. In this work, we motivate and develop SpectralFPL, a novel algorithm to learn latent variable models online from streaming data. SpectralFPL is computationally efficient, and we prove that it quickly learns the global optimum under a bagofwords model by deriving an $O(\sqrt n)$ regret bound. Experiment results also demonstrate a consistent performance improvement of SpectralFPL over online EM: in both synthetic and realworld experiments, SpectralFPL's performance is similar with or even better than online EM with optimally tuned parameters.
 [6] arXiv:1709.07314 [pdf, ps, other]

Title: Exact Learning of Lightweight Description Logic OntologiesSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
We study the problem of learning description logic (DL) ontologies in Angluin et al.'s framework of exact learning via queries. We admit membership queries ("is a given subsumption entailed by the target ontology?") and equivalence queries ("is a given ontology equivalent to the target ontology?"). We present three main results: (1) ontologies formulated in (two relevant versions of) the description logic DLLite can be learned with polynomially many queries of polynomial size; (2) this is not the case for ontologies formulated in the description logic EL, even when only acyclic ontologies are admitted; and (3) ontologies formulated in a fragment of EL related to the web ontology language OWL 2 RL can be learned in polynomial time. We also show that neither membership nor equivalence queries alone are sufficient in cases (1) and (3).
 [7] arXiv:1709.07377 [pdf, other]

Title: Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTEComments: 22 pages, 15 figuresSubjects: Learning (cs.LG)
Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line segment that joins minority class instances. In this paper we propose Geometric SMOTE (GSMOTE) as a generalization of the SMOTE data generation mechanism. GSMOTE generates synthetic samples in a geometric region of the input space, around each selected minority instance. While in the basic configuration this region is a hypersphere, GSMOTE allows its deformation to a hyperspheroid and finally to a line segment, emulating, in the last case, the SMOTE mechanism. The performance of GSMOTE is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when GSMOTE is used as an oversampling algorithm.
Crosslists for Fri, 22 Sep 17
 [8] arXiv:1709.07077 (crosslist from cs.CV) [pdf, other]

Title: Estimated Depth Map Helps Image ClassificationAuthors: Yihui HeSubjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We consider image classification with estimated depth. This problem falls into the domain of transfer learning, since we are using a model trained on a set of depth images to generate depth maps (additional features) for use in another classification problem using another disjoint set of images. It's challenging as no direct depth information is provided. Though depth estimation has been well studied, none have attempted to aid image classification with estimated depth. Therefore, we present a way of transferring domain knowledge on depth estimation to a separate image classification task over a disjoint set of train, and test data. We build a RGBD dataset based on RGB dataset and do image classification on it. Then evaluation the performance of neural networks on the RGBD dataset compared to the RGB dataset. From our experiments, the benefit is significant with shallow and deep networks. It improves ResNet20 by 0.55% and ResNet56 by 0.53%. Our code and dataset are available publicly.
 [9] arXiv:1709.07089 (crosslist from cs.SY) [pdf, other]

Title: On the Design of LQR Kernels for Efficient Controller LearningComments: 8 pages, 5 figures, to appear in 56th IEEE Conference on Decision and Control (CDC 2017)Subjects: Systems and Control (cs.SY); Learning (cs.LG); Machine Learning (stat.ML)
Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a firstorder system, we construct two kernels that specifically leverage the structure of the wellknown Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.
 [10] arXiv:1709.07109 (crosslist from cs.CL) [pdf, other]

Title: Deconvolutional LatentVariable Model for Text Sequence MatchingSubjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
A latentvariable model is introduced for text matching, inferring sentence representations by jointly optimizing generative and discriminative objectives. To alleviate typical optimization challenges in latentvariable models for text, we employ deconvolutional networks as the sequence decoder (generator), providing learned latent codes with more semantic information and better generalization. Our model, trained in an unsupervised manner, yields stronger empirical predictive performance than a decoder based on Long ShortTerm Memory (LSTM), with less parameters and considerably faster training. Further, we apply it to text sequencematching problems. The proposed model significantly outperforms several strong sentenceencoding baselines, especially in the semisupervised setting.
 [11] arXiv:1709.07124 (crosslist from cs.SD) [pdf, other]

Title: Deep Recurrent NMF for Speech Separation by Unfolding Iterative ThresholdingComments: To be presented at WASPAA 2017Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative softthresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (NMF) of spectrograms. We name this network architecture deep recurrent NMF (DRNMF). The proposed DRNMF network has three distinct advantages. First, DRNMF provides better interpretability than other deep architectures, since the weights correspond to NMF model parameters, even after training. This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization. Second, like many deep networks, DRNMF is an order of magnitude faster at test time than NMF, since computation of the network output only requires evaluating a few layers at each time step. Third, when a limited amount of training data is available, DRNMF exhibits stronger generalization and separation performance compared to sparse NMF and stateoftheart longshort term memory (LSTM) networks. When a large amount of training data is available, DRNMF achieves lower yet competitive separation performance compared to LSTM networks.
 [12] arXiv:1709.07150 (crosslist from cs.AI) [pdf, other]

Title: Feature Engineering for Predictive Modeling using Reinforcement LearningSubjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no welldefined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly enumerates the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.
 [13] arXiv:1709.07200 (crosslist from cs.CV) [pdf, other]

Title: Temporal Multimodal Fusion for Video Emotion Classification in the WildJournalref: ACM  ICMI 2017, Nov 2017, Glasgow, United KingdomSubjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)
This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework  lying in describing videos by audio and visual features used by a supervised classifier to infer the labels  this paper investigates several novel directions. First of all, improved face descriptors based on 2D and 3D Convolutional Neural Networks are proposed. Second, the paper explores several fusion methods, temporal and multimodal, including a novel hierarchical method combining features and scores. In addition, we carefully reviewed the different stages of the pipeline and designed a CNN architecture adapted to the task; this is important as the size of the training set is small compared to the difficulty of the problem, making generalization difficult. The soobtained model ranked 4th at the 2017 Emotion in the Wild challenge with the accuracy of 58.8 %.
 [14] arXiv:1709.07223 (crosslist from cs.CV) [pdf, other]

Title: Convolutional neural networks that teach microscopes how to imageComments: 13 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Optics (physics.optics)
Deep learning algorithms offer a powerful means to automatically analyze the content of medical images. However, many biological samples of interest are primarily transparent to visible light and contain features that are difficult to resolve with a standard optical microscope. Here, we use a convolutional neural network (CNN) not only to classify images, but also to optimize the physical layout of the imaging device itself. We increase the classification accuracy of a microscope's recorded images by merging an optical model of image formation into the pipeline of a CNN. The resulting network simultaneously determines an ideal illumination arrangement to highlight important sample features during image acquisition, along with a set of convolutional weights to classify the detected images postcapture. We demonstrate our joint optimization technique with an experimental microscope configuration that automatically identifies malariainfected cells with 510% higher accuracy than standard and alternative microscope lighting designs.
 [15] arXiv:1709.07224 (crosslist from cs.MA) [pdf, other]

Title: Learning Complex Swarm Behaviors by Exploiting Local Communication Protocols with Deep Reinforcement LearningComments: 8 pages, 6 figures, version 1Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY); Machine Learning (stat.ML)
Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multiagent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multirobot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit taskspecific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2Dphysics environment, and compare the implications of different communication protocols.
 [16] arXiv:1709.07308 (crosslist from cs.DS) [pdf, other]

Title: Predicting Positive and Negative Links with Noisy Queries: Theory & PracticeAuthors: Charalampos E. Tsourakakis, Michael Mitzenmacher, Jarosław Błasiok, Ben Lawson, Preetum Nakkiran, Vasileios NakosComments: 17 pages, 12 figures. arXiv admin note: text overlap with arXiv:1609.00750Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Learning (cs.LG); Social and Information Networks (cs.SI); Combinatorics (math.CO)
Social networks and interactions in social media involve both positive and negative relationships. Signed graphs capture both types of relationships: positive edges correspond to pairs of "friends", and negative edges to pairs of "foes". The edge sign prediction problem, that aims to predict whether an interaction between a pair of nodes will be positive or negative, is an important graph mining task for which many heuristics have recently been proposed [Leskovec 2010].
We model the edge sign prediction problem as follows: we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability $0<q<\frac{1}{2}$. Let $\delta=12q$ be the bias. We provide an algorithm that recovers all signs correctly with high probability in the presence of noise for any constant gap $\delta$ with $O(\frac{n\log n}{\delta^4})$ queries. Our algorithm uses breadth first search as its main algorithmic primitive. A byproduct of our proposed learning algorithm is the use of $st$ paths as an informative feature to predict the sign of the edge $(s,t)$. As a heuristic, we use edge disjoint $st$ paths of short length as a feature for predicting edge signs in realworld signed networks. Our findings suggest that the use of paths improves the classification accuracy, especially for pairs of nodes with no common neighbors.  [17] arXiv:1709.07359 (crosslist from stat.ML) [pdf, other]

Title: ClassSplitting Generative Adversarial NetworksSubjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Generative Adversarial Networks (GANs) produce systematically better quality samples when class label information is provided., i.e. in the conditional GAN setup. This is still observed for the recently proposed Wasserstein GAN formulation which stabilized adversarial training and allows considering high capacity network architectures such as ResNet. In this work we show how to boost conditional GAN by augmenting available class labels. The new classes come from clustering in the representation space learned by the same GAN model. The proposed strategy is also feasible when no class information is available, i.e. in the unsupervised setup. Our generated samples reach stateoftheart Inception scores for CIFAR10 and STL10 datasets in both supervised and unsupervised setup.
 [18] arXiv:1709.07409 (crosslist from quantph) [pdf, other]

Title: Quantum Autoencoders via Quantum Adders with Genetic AlgorithmsSubjects: Quantum Physics (quantph); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The quantum autoencoder is a recent paradigm in the field of quantum machine learning, which may enable an enhanced use of resources in quantum technologies. To this end, quantum neural networks with less nodes in the inner than in the outer layers were considered. Here, we propose a useful connection between approximate quantum adders and quantum autoencoders. Specifically, this link allows us to employ optimized approximate quantum adders, obtained with genetic algorithms, for the implementation of quantum autoencoders for a variety of initial states. Furthermore, we can also directly optimize the quantum autoencoders via genetic algorithms. Our approach opens a different path for the design of quantum autoencoders in controllable quantum platforms.
 [19] arXiv:1709.07417 (crosslist from cs.AI) [pdf, other]

Title: Neural Optimizer Search with Reinforcement LearningSubjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.
 [20] arXiv:1709.07433 (crosslist from stat.ML) [pdf, other]

Title: Perturbative Black Box Variational InferenceComments: Neural Information Processing Systems (NIPS 2017), to appear. Joint first authorship of first two mentioned authorsSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the KullbackLeibler (KL) divergence, such as alpha divergences. These divergences can be tuned to be more masscovering (preventing overfitting in complex models), but are also often harder to optimize using MonteCarlo gradients. In this paper, we view BBVI with generalized divergences as a form of biased importance sampling. The choice of divergence determines a biasvariance tradeoff between the tightness of the bound (low bias) and the variance of its gradient estimators. Drawing on variational perturbation theory of statistical physics, we use these insights to construct a new variational bound which is tighter than the KL bound and more mass covering. Compared to alphadivergences, its reparameterization gradients have a lower variance. We show in several experiments on Gaussian Processes and Variational Autoencoders that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on heldout data.
Replacements for Fri, 22 Sep 17
 [21] arXiv:1701.09177 (replaced) [pdf, other]

Title: A Dirichlet Mixture Model of Hawkes Processes for Event Sequence ClusteringJournalref: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USASubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [22] arXiv:1709.04514 (replaced) [pdf, other]

Title: Differentially Private Mixture of Generative Neural NetworksComments: This is the full version of the paper with the same title to appear at 17th IEEE International Conference on Data Mining series (ICDM 2017)Subjects: Learning (cs.LG); Cryptography and Security (cs.CR)
 [23] arXiv:1709.06293 (replaced) [pdf, other]

Title: Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement LearningComments: 15 pages, 9 figuresSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [24] arXiv:1706.00136 (replaced) [pdf, other]

Title: Scalable Generalized Linear Bandits: Online Computation and HashingComments: accepted to NIPS'17Subjects: Machine Learning (stat.ML); Learning (cs.LG)
 [25] arXiv:1709.04090 (replaced) [pdf, other]

Title: A Constrained, WeightedL1 Minimization Approach for Joint Discovery of Heterogeneous Neural Connectivity GraphsComments: 8 pagesSubjects: Neurons and Cognition (qbio.NC); Learning (cs.LG)
 [26] arXiv:1709.05289 (replaced) [pdf, ps, other]

Title: Optimal approximation of piecewise smooth functions using deep ReLU neural networksSubjects: Functional Analysis (math.FA); Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)