# Learning

## New submissions

[ total of 40 entries: 1-40 ]
[ showing up to 2000 entries per page: fewer | more ]

### New submissions for Fri, 20 Apr 18

[1]
Title: Co-sampling: Training Robust Networks for Extremely Noisy Supervision
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Training robust deep networks is challenging under noisy labels. Current methodologies focus on estimating the noise transition matrix. However, this matrix is not easy to be estimated exactly. In this paper, free of the matrix estimation, we present a simple but robust learning paradigm called "Co-sampling", which can train deep networks robustly under extremely noisy labels. Briefly, our paradigm trains two networks simultaneously. In each mini-batch data, each network samples its small-loss instances, and cross-trains on such instances from its peer network. We conduct experiments on several simulated noisy datasets. Empirical results demonstrate that, under extremely noisy labels, the Co-sampling approach trains deep learning models robustly.

[2]
Title: A Study on Overfitting in Deep Reinforcement Learning
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen robustly'': commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.

[3]
Title: A Multi-task Selected Learning Approach for Solving New Type 3D Bin Packing Problem
Comments: 9 pages, 3 figures. arXiv admin note: text overlap with arXiv:1708.05930
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

This paper studies a new type of 3D bin packing problem (BPP), in which a number of cuboid-shaped items must be put into a bin one by one orthogonally. The objective is to find a way to place these items that can minimize the surface area of the bin. This problem is based on the fact that there is no fixed-sized bin in many real business scenarios and the cost of a bin is proportional to its surface area. Based on previous research on 3D BPP, the surface area is determined by the sequence, spatial locations and orientations of items. It is a new NP-hard combinatorial optimization problem on unfixed-sized bin packing, for which we propose a multi-task framework based on Selected Learning, generating the sequence and orientations of items packed into the bin simultaneously. During training steps, Selected Learning chooses one of loss functions derived from Deep Reinforcement Learning and Supervised Learning corresponding to the training procedure. Numerical results show that the method proposed significantly outperforms Lego baselines by a substantial gain of 7.52%. Moreover, we produce large scale 3D Bin Packing order data set for studying bin packing problems and will release it to the research community.

[4]
Title: Modeling and Simultaneously Removing Bias via Adversarial Neural Networks
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In real world systems, the predictions of deployed Machine Learned models affect the training data available to build subsequent models. This introduces a bias in the training data that needs to be addressed. Existing solutions to this problem attempt to resolve the problem by either casting this in the reinforcement learning framework or by quantifying the bias and re-weighting the loss functions. In this work, we develop a novel Adversarial Neural Network (ANN) model, an alternative approach which creates a representation of the data that is invariant to the bias. We take the Paid Search auction as our working example and ad display position features as the confounding features for this setting. We show the success of this approach empirically on both synthetic data as well as real world paid search auction data from a major search engine.

[5]
Title: K-Nearest Oracles Borderline Dynamic Classifier Ensemble Selection
Comments: Paper accepted for publication on IJCNN 2018
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Dynamic Ensemble Selection (DES) techniques aim to select locally competent classifiers for the classification of each new test sample. Most DES techniques estimate the competence of classifiers using a given criterion over the region of competence of the test sample (its the nearest neighbors in the validation set). The K-Nearest Oracles Eliminate (KNORA-E) DES selects all classifiers that correctly classify all samples in the region of competence of the test sample, if such classifier exists, otherwise, it removes from the region of competence the sample that is furthest from the test sample, and the process repeats. When the region of competence has samples of different classes, KNORA-E can reduce the region of competence in such a way that only samples of a single class remain in the region of competence, leading to the selection of locally incompetent classifiers that classify all samples in the region of competence as being from the same class. In this paper, we propose two DES techniques: K-Nearest Oracles Borderline (KNORA-B) and K-Nearest Oracles Borderline Imbalanced (KNORA-BI). KNORA-B is a DES technique based on KNORA-E that reduces the region of competence but maintains at least one sample from each class that is in the original region of competence. KNORA-BI is a variation of KNORA-B for imbalance datasets that reduces the region of competence but maintains at least one minority class sample if there is any in the original region of competence. Experiments are conducted comparing the proposed techniques with 19 DES techniques from the literature using 40 datasets. The results show that the proposed techniques achieved interesting results, with KNORA-BI outperforming state-of-art techniques.

[6]
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Fueled by massive amounts of data, models produced by machine-learning (ML) algorithms, especially deep neural networks, are being used in diverse domains where trustworthiness is a concern, including automotive systems, finance, health care, natural language processing, and malware detection. Of particular concern is the use of ML algorithms in cyber-physical systems (CPS), such as self-driving cars and aviation, where an adversary can cause serious consequences. However, existing approaches to generating adversarial examples and devising robust ML algorithms mostly ignore the semantics and context of the overall system containing the ML component. For example, in an autonomous vehicle using deep learning for perception, not every adversarial example for the neural network might lead to a harmful consequence. Moreover, one may want to prioritize the search for adversarial examples towards those that significantly modify the desired semantics of the overall system. Along the same lines, existing algorithms for constructing robust ML algorithms ignore the specification of the overall system. In this paper, we argue that the semantics and specification of the overall system has a crucial role to play in this line of research. We present preliminary research results that support this claim.

[7]
Title: Low Rank Structure of Learned Representations
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

A key feature of neural networks, particularly deep convolutional neural networks, is their ability to "learn" useful representations from data. The very last layer of a neural network is then simply a linear model trained on these "learned" representations. Despite their numerous applications in other tasks such as classification, retrieval, clustering etc., a.k.a. transfer learning, not much work has been published that investigates the structure of these representations or whether structure can be imposed on them during the training process.
In this paper, we study the dimensionality of the learned representations by models that have proved highly succesful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100 datasets, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank representations of activations at various stages in the neural network. Empirically, we show that this has implications for compression and robustness to adversarial examples.

[8]
Title: Scalable attribute-aware network embedding with localily
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (stat.ML)

Adding attributes for nodes to network embedding helps to improve the ability of the learned joint representation to depict features from topology and attributes simultaneously. Recent research on the joint embedding has exhibited a promising performance on a variety of tasks by jointly embedding the two spaces. However, due to the indispensable requirement of globality based information, present approaches contain a flaw of in-scalability. Here we propose \emph{SANE}, a scalable attribute-aware network embedding algorithm with locality, to learn the joint representation from topology and attributes. By enforcing the alignment of a local linear relationship between each node and its K-nearest neighbors in topology and attribute space, the joint embedding representations are more informative comparing with a single representation from topology or attributes alone. And we argue that the locality in \emph{SANE} is the key to learning the joint representation at scale. By using several real-world networks from diverse domains, We demonstrate the efficacy of \emph{SANE} in performance and scalability aspect. Overall, for performance on label classification, SANE successfully reaches up to the highest F1-score on most datasets, and even closer to the baseline method that needs label information as extra inputs, compared with other state-of-the-art joint representation algorithms. What's more, \emph{SANE} has an up to 71.4\% performance gain compared with the single topology-based algorithm. For scalability, we have demonstrated the linearly time complexity of \emph{SANE}. In addition, we intuitively observe that when the network size scales to 100,000 nodes, the "learning joint embedding" step of \emph{SANE} only takes $\approx10$ seconds.

[9]
Title: Large-scale Nonlinear Variable Selection via Kernel Random Features
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the first kernel-based variable selection method applicable to large datasets. It sidesteps the typical poor scaling properties of kernel methods by mapping the inputs into a relatively low-dimensional space of random features. The algorithm discovers the variables relevant for the regression task together with learning the prediction model through learning the appropriate nonlinear random feature maps. We demonstrate the outstanding performance of our method on a set of large-scale synthetic and real datasets.

[10]
Title: Lipschitz Continuity in Model-based Reinforcement Learning
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Model-based reinforcement-learning methods learn transition and reward models and use them to guide behavior. We analyze the impact of learning models that are Lipschitz continuous---the distance between function values for two inputs is bounded by a linear function of the distance between the inputs. Our first result shows a tight bound on model errors for multi-step predictions with Lipschitz continuous models. We go on to prove an error bound for the value-function estimate arising from such models and show that the estimated value function is itself Lipschitz continuous. We conclude with empirical results that demonstrate significant benefits to enforcing Lipschitz continuity of neural net models during reinforcement learning.

[11]
Title: A sequential sampling strategy for extreme event statistics in nonlinear dynamical systems
Subjects: Learning (cs.LG); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)

We develop a method for the evaluation of extreme event statistics associated with nonlinear dynamical systems, using a small number of samples. From an initial dataset of design points, we formulate a sequential strategy that provides the 'next-best' data point (set of parameters) that when evaluated results in improved estimates of the probability density function (pdf) for a scalar quantity of interest. The approach utilizes Gaussian process regression to perform Bayesian inference on the parameter-to-observation map describing the quantity of interest. We then approximate the desired pdf along with uncertainty bounds utilizing the posterior distribution of the inferred map. The 'next-best' design point is sequentially determined through an optimization procedure that selects the point in parameter space that maximally reduces uncertainty between the estimated bounds of the pdf prediction. Since the optimization process utilizes only information from the inferred map it has minimal computational cost. Moreover, the special form of the metric emphasizes the tails of the pdf. The method is practical for systems where the dimensionality of the parameter space is of moderate size, i.e. order O(10). We apply the method to estimate the extreme event statistics for a very high-dimensional system with millions of degrees of freedom: an offshore platform subjected to three-dimensional irregular waves. It is demonstrated that the developed approach can accurately determine the extreme event statistics using limited number of samples.

[12]
Title: Deep Transfer Network with Joint Distribution Adaptation: A New Intelligent Fault Diagnosis Framework for Industry Application
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In recent years, an increasing popularity of deep learning model for intelligent condition monitoring and diagnosis as well as prognostics used for mechanical systems and structures has been observed. In the previous studies, however, a major assumption accepted by default, is that the training and testing data are taking from same feature distribution. Unfortunately, this assumption is mostly invalid in real application, resulting in a certain lack of applicability for the traditional diagnosis approaches. Inspired by the idea of transfer learning that leverages the knowledge learnt from rich labeled data in source domain to facilitate diagnosing a new but similar target task, a new intelligent fault diagnosis framework, i.e., deep transfer network (DTN), which generalizes deep learning model to domain adaptation scenario, is proposed in this paper. By extending the marginal distribution adaptation (MDA) to joint distribution adaptation (JDA), the proposed framework can exploit the discrimination structures associated with the labeled data in source domain to adapt the conditional distribution of unlabeled target data, and thus guarantee a more accurate distribution matching. Extensive empirical evaluations on three fault datasets validate the applicability and practicability of DTN, while achieving many state-of-the-art transfer results in terms of diverse operating conditions, fault severities and fault types.

[13]
Title: A Dynamic Boosted Ensemble Learning Based on Random Forest
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose Dynamic Boosted Random Forest (DBRF), a novel ensemble algorithm that incorporates the notion of hard example mining into Random Forest (RF) and thus combines the high accuracy of Boosting algorithm with the strong generalization of Bagging algorithm. Specifically, we propose to measure the quality of each leaf node of every decision tree in the random forest to determine hard examples. By iteratively training and then removing easy examples and noise examples from training data, we evolve the random forest to focus on hard examples dynamically so as to learn decision boundaries better. Data can be cascaded through these random forests learned in each iteration in sequence to generate predictions, thus making RF deep. We also propose to use evolution mechanism, stacking mechanism and smart iteration mechanism to improve the performance of the model. DBRF outperforms RF on three UCI datasets and achieved state-of-the-art results compared to other deep models. Moreover, we show that DBRF is also a new way of sampling and can be very useful when learning from unbalanced data.

[14]
Title: Deep Triplet Ranking Networks for One-Shot Recognition
Authors: Meng Ye, Yuhong Guo
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Despite the breakthroughs achieved by deep learning models in conventional supervised learning scenarios, their dependence on sufficient labeled training data in each class prevents effective applications of these deep models in situations where labeled training instances for a subset of novel classes are very sparse -- in the extreme case only one instance is available for each class. To tackle this natural and important challenge, one-shot learning, which aims to exploit a set of well labeled base classes to build classifiers for the new target classes that have only one observed instance per class, has recently received increasing attention from the research community. In this paper we propose a novel end-to-end deep triplet ranking network to perform one-shot learning. The proposed approach learns class universal image embeddings on the well labeled base classes under a triplet ranking loss, such that the instances from new classes can be categorized based on their similarity with the one-shot instances in the learned embedding space. Moreover, our approach can naturally incorporate the available one-shot instances from the new classes into the embedding learning process to improve the triplet ranking model. We conduct experiments on two popular datasets for one-shot learning. The results show the proposed approach achieves better performance than the state-of-the- art comparison methods.

### Cross-lists for Fri, 20 Apr 18

[15]  arXiv:1708.07199 (cross-list from cs.CV) [pdf, other]
Title: 3D Morphable Models as Spatial Transformer Networks
Comments: Accepted to ICCV 2017 2nd Workshop on Geometry Meets Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, self-occlusion and large pose changes.

[16]  arXiv:1804.02541 (cross-list from cs.CV) [pdf, other]
Title: Statistical transformer networks: learning shape and appearance models via self supervision
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We generalise Spatial Transformer Networks (STN) by replacing the parametric transformation of a fixed, regular sampling grid with a deformable, statistical shape model which is itself learnt. We call this a Statistical Transformer Network (StaTN). By training a network containing a StaTN end-to-end for a particular task, the network learns the optimal nonrigid alignment of the input data for the task. Moreover, the statistical shape model is learnt with no direct supervision (such as landmarks) and can be reused for other tasks. Besides training for a specific task, we also show that a StaTN can learn a shape model using generic loss functions. This includes a loss inspired by the minimum description length principle in which an appearance model is also learnt from scratch. In this configuration, our model learns an active appearance model and a means to fit the model from scratch with no supervision at all, even identity labels.

[17]  arXiv:1804.06952 (cross-list from cs.DS) [pdf, ps, other]
Title: Distributed Simulation and Distributed Inference
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)

Independent samples from an unknown probability distribution $\mathbf{p}$ on a domain of size $k$ are distributed across $n$ players, with each player holding one sample. Each player can communicate $\ell$ bits to a central referee in a simultaneous message passing (SMP) model of communication to help the referee infer a property of the unknown $\mathbf{p}$. When $\ell\geq\log k$ bits, the problem reduces to the well-studied collocated case where all the samples are available in one place. In this work, we focus on the communication-starved setting of $\ell < \log k$, in which the landscape may change drastically. We propose a general formulation for inference problems in this distributed setting, and instantiate it to two prototypical inference questions: learning and uniformity testing.

[18]  arXiv:1804.06964 (cross-list from cs.NE) [pdf, other]
Title: GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

A key problem in deep multi-attribute learning is to effectively discover the inter-attribute correlation structures. Typically, the conventional deep multi-attribute learning approaches follow the pipeline of manually designing the network architectures based on task-specific expertise prior knowledge and careful network tunings, leading to the inflexibility for various complicated scenarios in practice. Motivated by addressing this problem, we propose an efficient greedy neural architecture search approach (GNAS) to automatically discover the optimal tree-like deep architecture for multi-attribute learning. In a greedy manner, GNAS divides the optimization of global architecture into the optimizations of individual connections step by step. By iteratively updating the local architectures, the global tree-like architecture gets converged where the bottom layers are shared across relevant attributes and the branches in top layers more encode attribute-specific features. Experiments on three benchmark multi-attribute datasets show the effectiveness and compactness of neural architectures derived by GNAS, and also demonstrate the efficiency of GNAS in searching neural architectures.

[19]  arXiv:1804.07010 (cross-list from stat.ML) [pdf, other]
Title: Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations
Authors: Maziar Raissi
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Systems and Control (cs.SY); Analysis of PDEs (math.AP); Optimization and Control (math.OC)

Classical numerical methods for solving partial differential equations suffer from the curse dimensionality mainly due to their reliance on meticulously generated spatio-temporal grids. Inspired by modern deep learning based techniques for solving forward and inverse problems associated with partial differential equations, we circumvent the tyranny of numerical discretization by devising an algorithm that is scalable to high-dimensions. In particular, we approximate the unknown solution by a deep neural network which essentially enables us to benefit from the merits of automatic differentiation. To train the aforementioned neural network we leverage the well-known connection between high-dimensional partial differential equations and forward-backward stochastic differential equations. In fact, independent realizations of a standard Brownian motion will act as training data. We test the effectiveness of our approach for a couple of benchmark problems spanning a number of scientific domains including Black-Scholes-Barenblatt and Hamilton-Jacobi-Bellman equations, both in 100-dimensions.

[20]  arXiv:1804.07059 (cross-list from stat.ML) [pdf, other]
Title: Exploring Partially Observed Networks with Nonparametric Bandits
Comments: 15 pages, 6 figures, currently under review
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

Real-world networks such as social and communication networks are too large to be observed entirely. Such networks are often partially observed such that network size, network topology, and nodes of the original network are unknown. In this paper we formalize the Adaptive Graph Exploring problem. We assume that we are given an incomplete snapshot of a large network and additional nodes can be discovered by querying nodes in the currently observed network. The goal of this problem is to maximize the number of observed nodes within a given query budget. Querying which set of nodes maximizes the size of the observed network? We formulate this problem as an exploration-exploitation problem and propose a novel nonparametric multi-arm bandit (MAB) algorithm for identifying which nodes to be queried. Our contributions include: (1) $i$KNN-UCB, a novel nonparametric MAB algorithm, applies $k$-nearest neighbor UCB to the setting when the arms are presented in a vector space, (2) provide theoretical guarantee that $i$KNN-UCB algorithm has sublinear regret, and (3) applying $i$KNN-UCB algorithm on synthetic networks and real-world networks from different domains, we show that our method discovers up to 40% more nodes compared to existing baselines.

[21]  arXiv:1804.07091 (cross-list from stat.ML) [pdf, other]
Title: Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection
Comments: Accepted by TPAMI. Examples and code: this https URL
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Applications (stat.AP)

Automatic detection of anomalies in space- and time-varying measurements is an important tool in several fields, e.g., fraud detection, climate analysis, or healthcare monitoring. We present an algorithm for detecting anomalous regions in multivariate spatio-temporal time-series, which allows for spotting the interesting parts in large amounts of data, including video and text data. In opposition to existing techniques for detecting isolated anomalous data points, we propose the "Maximally Divergent Intervals" (MDI) framework for unsupervised detection of coherent spatial regions and time intervals characterized by a high Kullback-Leibler divergence compared with all other data given. In this regard, we define an unbiased Kullback-Leibler divergence that allows for ranking regions of different size and show how to enable the algorithm to run on large-scale data sets in reasonable time using an interval proposal technique. Experiments on both synthetic and real data from various domains, such as climate analysis, video surveillance, and text forensics, demonstrate that our method is widely applicable and a valuable tool for finding interesting events in different types of data.

[22]  arXiv:1804.07098 (cross-list from cs.CV) [pdf, other]
Title: Unsupervised Prostate Cancer Detection on H&E using Convolutional Adversarial Autoencoders
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We propose an unsupervised method using self-clustering convolutional adversarial autoencoders to classify prostate tissue as tumor or non-tumor without any labeled training data. The clustering method is integrated into the training of the autoencoder and requires only little post-processing. Our network trains on hematoxylin and eosin (H&E) input patches and we tested two different reconstruction targets, H&E and immunohistochemistry (IHC). We show that antibody-driven feature learning using IHC helps the network to learn relevant features for the clustering task. Our network achieves a F1 score of 0.62 using only a small set of validation labels to assign classes to clusters.

[23]  arXiv:1804.07101 (cross-list from stat.ML) [pdf, other]
Title: Dictionary learning - from local towards global and adaptive
Authors: Karin Schnass
Comments: 11 figures, 4 pages per figure
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This paper studies the convergence behaviour of dictionary learning via the Iterative Thresholding and K-residual Means (ITKrM) algorithm. On one hand it is shown that there exist stable fixed points that do not correspond to the generating dictionary, which can be characterised as very coherent. On the other hand it is proved that ITKrM is a contraction under much relaxed conditions than previously necessary. Based on the characterisation of the stable fixed points, replacing coherent atoms with carefully designed replacement candidates is proposed. In experiments on synthetic data this outperforms random or no replacement and always leads to full dictionary recovery. Finally the question how to learn dictionaries without knowledge of the correct dictionary size and sparsity level is addressed. Decoupling the replacement strategy of coherent or unused atoms into pruning and adding, and slowly carefully increasing the sparsity level, leads to an adaptive version of ITKrM. In several experiments this adaptive dictionary learning algorithm is shown to recover a generating dictionary from randomly initialised dictionaries of various sizes on synthetic data and to learn meaningful dictionaries on image data.

[24]  arXiv:1804.07134 (cross-list from stat.ML) [pdf, other]
Title: varrank: an R package for variable ranking based on mutual information with applications to observed systemic datasets
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This article describes the R package varrank. It has a flexible implementation of heuristic approaches which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The core functionality is a general implementation of the minimum redundancy maximum relevance (mRMRe) model. This approach is based on information theory metrics. It is compatible with discrete and continuous data which are discretised using a large choice of possible rules. The two main problems that can be addressed by this package are the selection of the most representative variables for modeling a collection of variables of interest, i.e., dimension reduction, and variable ranking with respect to a set of variables of interest.

[25]  arXiv:1804.07144 (cross-list from cs.NE) [pdf, other]
Title: Human Activity Recognition using Recurrent Neural Networks
Journal-ref: International Cross-Domain Conference for Machine Learning and Knowledge Extraction: CD-MAKE 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

Human activity recognition using smart home sensors is one of the bases of ubiquitous computing in smart environments and a topic undergoing intense research in the field of ambient assisted living. The increasingly large amount of data sets calls for machine learning methods. In this paper, we introduce a deep learning model that learns to classify human activities without using any prior knowledge. For this purpose, a Long Short Term Memory (LSTM) Recurrent Neural Network was applied to three real world smart home datasets. The results of these experiments show that the proposed approach outperforms the existing ones in terms of accuracy and performance.

[26]  arXiv:1804.07155 (cross-list from cs.CV) [pdf, other]
Title: Instance Selection Improves Geometric Mean Accuracy: A Study on Imbalanced Data Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

A natural way of handling imbalanced data is to attempt to equalise the class frequencies and train the classifier of choice on balanced data. For two-class imbalanced problems, the classification success is typically measured by the geometric mean (GM) of the true positive and true negative rates. Here we prove that GM can be improved upon by instance selection, and give the theoretical conditions for such an improvement. We demonstrate that GM is non-monotonic with respect to the number of retained instances, which discourages systematic instance selection. We also show that balancing the distribution frequencies is inferior to a direct maximisation of GM. To verify our theoretical findings, we carried out an experimental study of 12 instance selection methods for imbalanced data, using 66 standard benchmark data sets. The results reveal possible room for new instance selection methods for imbalanced data.

[27]  arXiv:1804.07209 (cross-list from cs.NE) [pdf, other]
Title: NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

This paper introduces "Non-Autonomous Input-Output Stable Network" (NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pattern-dependent processing depth. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one input-dependent equilibrium assuming tanh units, and multiple stable equilibria for ReLU units. An efficient implementation that enforces the stability under derived conditions for both fully-connected and convolutional layers is also presented. Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.

[28]  arXiv:1804.07237 (cross-list from cs.CV) [pdf, other]
Title: Multi-view Hybrid Embedding: A Divide-and-Conquer Approach
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

We present a novel cross-view classification algorithm where the gallery and probe data come from different views. A popular approach to tackle this problem is the multi-view subspace learning (MvSL) that aims to learn a latent subspace shared by multi-view data. Despite promising results obtained on some applications, the performance of existing methods deteriorates dramatically when the multi-view data is sampled from nonlinear manifolds or suffers from heavy outliers. To circumvent this drawback, motivated by the Divide-and-Conquer strategy, we propose Multi-view Hybrid Embedding (MvHE), a unique method of dividing the problem of cross-view classification into three subproblems and building one model for each subproblem. Specifically, the first model is designed to remove view discrepancy, whereas the second and third models attempt to discover the intrinsic nonlinear structure and to increase discriminability in intra-view and inter-view samples respectively. The kernel extension is conducted to further boost the representation power of MvHE. Extensive experiments are conducted on four benchmark datasets. Our methods demonstrate overwhelming advantages against the state-of-the-art MvSL based cross-view classification approaches in terms of classification accuracy and robustness.

[29]  arXiv:1804.07269 (cross-list from cs.RO) [pdf, other]
Title: Socially Guided Intrinsic Motivation for Robot Learning of Motor Skills
Journal-ref: Autonomous Robots, Springer Verlag, 2014, 36 (3), pp.273-294
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

This paper presents a technical approach to robot learning of motor skills which combines active intrinsically motivated learning with imitation learning. Our architecture, called SGIM-D, allows efficient learning of high-dimensional continuous sensorimotor inverse models in robots, and in particular learns distributions of parameterised motor policies that solve a corresponding distribution of parameterised goals/tasks. This is made possible by the technical integration of imitation learning techniques within an algorithm for learning inverse models that relies on active goal babbling. After reviewing social learning and intrinsic motivation approaches to action learning, we describe the general framework of our algorithm, before detailing its architecture. In an experiment where a robot arm has to learn to use a flexible fishing line , we illustrate that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation and benefits from human demonstration properties to learn how to produce varied outcomes in the environment, while developing more precise control policies in large spaces.

### Replacements for Fri, 20 Apr 18

[30]  arXiv:1702.08159 (replaced) [pdf, other]
Title: McKernel: A Library for Approximate Kernel Expansions in Log-linear Time
Subjects: Learning (cs.LG)
[31]  arXiv:1703.07015 (replaced) [pdf, other]
Title: Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks
Subjects: Learning (cs.LG)
[32]  arXiv:1803.01271 (replaced) [pdf, other]
Title: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[33]  arXiv:1804.06352 (replaced) [pdf, other]
Title: High Dimensional Time Series Generators
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[34]  arXiv:1804.06518 (replaced) [pdf, ps, other]
Title: Online Non-Additive Path Learning under Full and Partial Information
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[35]  arXiv:1803.02400 (replaced) [pdf, other]
Title: Natural Language to Structured Query Generation via Meta-Learning
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
[36]  arXiv:1803.06442 (replaced) [pdf, other]
Title: Replica Symmetry Breaking in Bipartite Spin Glasses and Neural Networks
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)
[37]  arXiv:1804.03126 (replaced) [pdf, other]
Title: Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks
Comments: Early draft! Added a brief discussion on DNNs for program synthesis and translation, provided a brief justification for the LSTM use and fixed typos
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Learning (cs.LG)
[38]  arXiv:1804.03958 (replaced) [pdf, other]
Title: Interdependent Gibbs Samplers
Comments: Added a reference to a previous work which considered a very similar algorithm
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[39]  arXiv:1804.06216 (replaced) [pdf, other]
Title: Learning Sparse Latent Representations with the Deep Copula Information Bottleneck
Comments: Published as a conference paper at ICLR 2018. Aleksander Wieczorek and Mario Wieser contributed equally to this work
Journal-ref: Conference track - ICLR 2018
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[40]  arXiv:1804.06234 (replaced) [pdf, other]
Title: Clustering Analysis on Locally Asymptotically Self-similar Processes