Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 20 Apr 18
 [1] arXiv:1804.06872 [pdf, other]

Title: Cosampling: Training Robust Networks for Extremely Noisy SupervisionAuthors: Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, Masashi SugiyamaSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Training robust deep networks is challenging under noisy labels. Current methodologies focus on estimating the noise transition matrix. However, this matrix is not easy to be estimated exactly. In this paper, free of the matrix estimation, we present a simple but robust learning paradigm called "Cosampling", which can train deep networks robustly under extremely noisy labels. Briefly, our paradigm trains two networks simultaneously. In each minibatch data, each network samples its smallloss instances, and crosstrains on such instances from its peer network. We conduct experiments on several simulated noisy datasets. Empirical results demonstrate that, under extremely noisy labels, the Cosampling approach trains deep learning models robustly.
 [2] arXiv:1804.06893 [pdf, other]

Title: A Study on Overfitting in Deep Reinforcement LearningSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen ``robustly'': commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.
 [3] arXiv:1804.06896 [pdf, other]

Title: A Multitask Selected Learning Approach for Solving New Type 3D Bin Packing ProblemComments: 9 pages, 3 figures. arXiv admin note: text overlap with arXiv:1708.05930Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
This paper studies a new type of 3D bin packing problem (BPP), in which a number of cuboidshaped items must be put into a bin one by one orthogonally. The objective is to find a way to place these items that can minimize the surface area of the bin. This problem is based on the fact that there is no fixedsized bin in many real business scenarios and the cost of a bin is proportional to its surface area. Based on previous research on 3D BPP, the surface area is determined by the sequence, spatial locations and orientations of items. It is a new NPhard combinatorial optimization problem on unfixedsized bin packing, for which we propose a multitask framework based on Selected Learning, generating the sequence and orientations of items packed into the bin simultaneously. During training steps, Selected Learning chooses one of loss functions derived from Deep Reinforcement Learning and Supervised Learning corresponding to the training procedure. Numerical results show that the method proposed significantly outperforms Lego baselines by a substantial gain of 7.52%. Moreover, we produce large scale 3D Bin Packing order data set for studying bin packing problems and will release it to the research community.
 [4] arXiv:1804.06909 [pdf, other]

Title: Modeling and Simultaneously Removing Bias via Adversarial Neural NetworksAuthors: John Moore, Joel Pfeiffer, Kai Wei, Rishabh Iyer, Denis Charles, Ran GiladBachrach, Levi Boyles, Eren ManavogluSubjects: Learning (cs.LG); Machine Learning (stat.ML)
In real world systems, the predictions of deployed Machine Learned models affect the training data available to build subsequent models. This introduces a bias in the training data that needs to be addressed. Existing solutions to this problem attempt to resolve the problem by either casting this in the reinforcement learning framework or by quantifying the bias and reweighting the loss functions. In this work, we develop a novel Adversarial Neural Network (ANN) model, an alternative approach which creates a representation of the data that is invariant to the bias. We take the Paid Search auction as our working example and ad display position features as the confounding features for this setting. We show the success of this approach empirically on both synthetic data as well as real world paid search auction data from a major search engine.
 [5] arXiv:1804.06943 [pdf, other]

Title: KNearest Oracles Borderline Dynamic Classifier Ensemble SelectionAuthors: Dayvid V. R. Oliveira, George D. C. Cavalcanti, Thyago N. Porpino, Rafael M. O. Cruz, Robert SabourinComments: Paper accepted for publication on IJCNN 2018Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Dynamic Ensemble Selection (DES) techniques aim to select locally competent classifiers for the classification of each new test sample. Most DES techniques estimate the competence of classifiers using a given criterion over the region of competence of the test sample (its the nearest neighbors in the validation set). The KNearest Oracles Eliminate (KNORAE) DES selects all classifiers that correctly classify all samples in the region of competence of the test sample, if such classifier exists, otherwise, it removes from the region of competence the sample that is furthest from the test sample, and the process repeats. When the region of competence has samples of different classes, KNORAE can reduce the region of competence in such a way that only samples of a single class remain in the region of competence, leading to the selection of locally incompetent classifiers that classify all samples in the region of competence as being from the same class. In this paper, we propose two DES techniques: KNearest Oracles Borderline (KNORAB) and KNearest Oracles Borderline Imbalanced (KNORABI). KNORAB is a DES technique based on KNORAE that reduces the region of competence but maintains at least one sample from each class that is in the original region of competence. KNORABI is a variation of KNORAB for imbalance datasets that reduces the region of competence but maintains at least one minority class sample if there is any in the original region of competence. Experiments are conducted comparing the proposed techniques with 19 DES techniques from the literature using 40 datasets. The results show that the proposed techniques achieved interesting results, with KNORABI outperforming stateofart techniques.
 [6] arXiv:1804.07045 [pdf, other]

Title: Semantic Adversarial Deep LearningSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Fueled by massive amounts of data, models produced by machinelearning (ML) algorithms, especially deep neural networks, are being used in diverse domains where trustworthiness is a concern, including automotive systems, finance, health care, natural language processing, and malware detection. Of particular concern is the use of ML algorithms in cyberphysical systems (CPS), such as selfdriving cars and aviation, where an adversary can cause serious consequences. However, existing approaches to generating adversarial examples and devising robust ML algorithms mostly ignore the semantics and context of the overall system containing the ML component. For example, in an autonomous vehicle using deep learning for perception, not every adversarial example for the neural network might lead to a harmful consequence. Moreover, one may want to prioritize the search for adversarial examples towards those that significantly modify the desired semantics of the overall system. Along the same lines, existing algorithms for constructing robust ML algorithms ignore the specification of the overall system. In this paper, we argue that the semantics and specification of the overall system has a crucial role to play in this line of research. We present preliminary research results that support this claim.
 [7] arXiv:1804.07090 [pdf, other]

Title: Low Rank Structure of Learned RepresentationsSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
A key feature of neural networks, particularly deep convolutional neural networks, is their ability to "learn" useful representations from data. The very last layer of a neural network is then simply a linear model trained on these "learned" representations. Despite their numerous applications in other tasks such as classification, retrieval, clustering etc., a.k.a. transfer learning, not much work has been published that investigates the structure of these representations or whether structure can be imposed on them during the training process.
In this paper, we study the dimensionality of the learned representations by models that have proved highly succesful for image classification. We focus on ResNet18, ResNet50 and VGG19 and observe that when trained on CIFAR10 or CIFAR100 datasets, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank representations of activations at various stages in the neural network. Empirically, we show that this has implications for compression and robustness to adversarial examples.  [8] arXiv:1804.07152 [pdf, other]

Title: Scalable attributeaware network embedding with localilySubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Adding attributes for nodes to network embedding helps to improve the ability of the learned joint representation to depict features from topology and attributes simultaneously. Recent research on the joint embedding has exhibited a promising performance on a variety of tasks by jointly embedding the two spaces. However, due to the indispensable requirement of globality based information, present approaches contain a flaw of inscalability. Here we propose \emph{SANE}, a scalable attributeaware network embedding algorithm with locality, to learn the joint representation from topology and attributes. By enforcing the alignment of a local linear relationship between each node and its Knearest neighbors in topology and attribute space, the joint embedding representations are more informative comparing with a single representation from topology or attributes alone. And we argue that the locality in \emph{SANE} is the key to learning the joint representation at scale. By using several realworld networks from diverse domains, We demonstrate the efficacy of \emph{SANE} in performance and scalability aspect. Overall, for performance on label classification, SANE successfully reaches up to the highest F1score on most datasets, and even closer to the baseline method that needs label information as extra inputs, compared with other stateoftheart joint representation algorithms. What's more, \emph{SANE} has an up to 71.4\% performance gain compared with the single topologybased algorithm. For scalability, we have demonstrated the linearly time complexity of \emph{SANE}. In addition, we intuitively observe that when the network size scales to 100,000 nodes, the "learning joint embedding" step of \emph{SANE} only takes $\approx10$ seconds.
 [9] arXiv:1804.07169 [pdf, ps, other]

Title: Largescale Nonlinear Variable Selection via Kernel Random FeaturesSubjects: Learning (cs.LG); Machine Learning (stat.ML)
We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the first kernelbased variable selection method applicable to large datasets. It sidesteps the typical poor scaling properties of kernel methods by mapping the inputs into a relatively lowdimensional space of random features. The algorithm discovers the variables relevant for the regression task together with learning the prediction model through learning the appropriate nonlinear random feature maps. We demonstrate the outstanding performance of our method on a set of largescale synthetic and real datasets.
 [10] arXiv:1804.07193 [pdf, other]

Title: Lipschitz Continuity in Modelbased Reinforcement LearningSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Modelbased reinforcementlearning methods learn transition and reward models and use them to guide behavior. We analyze the impact of learning models that are Lipschitz continuousthe distance between function values for two inputs is bounded by a linear function of the distance between the inputs. Our first result shows a tight bound on model errors for multistep predictions with Lipschitz continuous models. We go on to prove an error bound for the valuefunction estimate arising from such models and show that the estimated value function is itself Lipschitz continuous. We conclude with empirical results that demonstrate significant benefits to enforcing Lipschitz continuity of neural net models during reinforcement learning.
 [11] arXiv:1804.07240 [pdf, other]

Title: A sequential sampling strategy for extreme event statistics in nonlinear dynamical systemsSubjects: Learning (cs.LG); Computational Physics (physics.compph); Machine Learning (stat.ML)
We develop a method for the evaluation of extreme event statistics associated with nonlinear dynamical systems, using a small number of samples. From an initial dataset of design points, we formulate a sequential strategy that provides the 'nextbest' data point (set of parameters) that when evaluated results in improved estimates of the probability density function (pdf) for a scalar quantity of interest. The approach utilizes Gaussian process regression to perform Bayesian inference on the parametertoobservation map describing the quantity of interest. We then approximate the desired pdf along with uncertainty bounds utilizing the posterior distribution of the inferred map. The 'nextbest' design point is sequentially determined through an optimization procedure that selects the point in parameter space that maximally reduces uncertainty between the estimated bounds of the pdf prediction. Since the optimization process utilizes only information from the inferred map it has minimal computational cost. Moreover, the special form of the metric emphasizes the tails of the pdf. The method is practical for systems where the dimensionality of the parameter space is of moderate size, i.e. order O(10). We apply the method to estimate the extreme event statistics for a very highdimensional system with millions of degrees of freedom: an offshore platform subjected to threedimensional irregular waves. It is demonstrated that the developed approach can accurately determine the extreme event statistics using limited number of samples.
 [12] arXiv:1804.07265 [pdf]

Title: Deep Transfer Network with Joint Distribution Adaptation: A New Intelligent Fault Diagnosis Framework for Industry ApplicationComments: 10 pages, 10 figuresSubjects: Learning (cs.LG); Machine Learning (stat.ML)
In recent years, an increasing popularity of deep learning model for intelligent condition monitoring and diagnosis as well as prognostics used for mechanical systems and structures has been observed. In the previous studies, however, a major assumption accepted by default, is that the training and testing data are taking from same feature distribution. Unfortunately, this assumption is mostly invalid in real application, resulting in a certain lack of applicability for the traditional diagnosis approaches. Inspired by the idea of transfer learning that leverages the knowledge learnt from rich labeled data in source domain to facilitate diagnosing a new but similar target task, a new intelligent fault diagnosis framework, i.e., deep transfer network (DTN), which generalizes deep learning model to domain adaptation scenario, is proposed in this paper. By extending the marginal distribution adaptation (MDA) to joint distribution adaptation (JDA), the proposed framework can exploit the discrimination structures associated with the labeled data in source domain to adapt the conditional distribution of unlabeled target data, and thus guarantee a more accurate distribution matching. Extensive empirical evaluations on three fault datasets validate the applicability and practicability of DTN, while achieving many stateoftheart transfer results in terms of diverse operating conditions, fault severities and fault types.
 [13] arXiv:1804.07270 [pdf, other]

Title: A Dynamic Boosted Ensemble Learning Based on Random ForestAuthors: Xingzhang Ren, Chen Long, Leilei Zhang, Ye Wei, Dongdong Du, Jingxi Liang, Weiping Li, Shikun ZhangSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose Dynamic Boosted Random Forest (DBRF), a novel ensemble algorithm that incorporates the notion of hard example mining into Random Forest (RF) and thus combines the high accuracy of Boosting algorithm with the strong generalization of Bagging algorithm. Specifically, we propose to measure the quality of each leaf node of every decision tree in the random forest to determine hard examples. By iteratively training and then removing easy examples and noise examples from training data, we evolve the random forest to focus on hard examples dynamically so as to learn decision boundaries better. Data can be cascaded through these random forests learned in each iteration in sequence to generate predictions, thus making RF deep. We also propose to use evolution mechanism, stacking mechanism and smart iteration mechanism to improve the performance of the model. DBRF outperforms RF on three UCI datasets and achieved stateoftheart results compared to other deep models. Moreover, we show that DBRF is also a new way of sampling and can be very useful when learning from unbalanced data.
 [14] arXiv:1804.07275 [pdf, other]

Title: Deep Triplet Ranking Networks for OneShot RecognitionSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Despite the breakthroughs achieved by deep learning models in conventional supervised learning scenarios, their dependence on sufficient labeled training data in each class prevents effective applications of these deep models in situations where labeled training instances for a subset of novel classes are very sparse  in the extreme case only one instance is available for each class. To tackle this natural and important challenge, oneshot learning, which aims to exploit a set of well labeled base classes to build classifiers for the new target classes that have only one observed instance per class, has recently received increasing attention from the research community. In this paper we propose a novel endtoend deep triplet ranking network to perform oneshot learning. The proposed approach learns class universal image embeddings on the well labeled base classes under a triplet ranking loss, such that the instances from new classes can be categorized based on their similarity with the oneshot instances in the learned embedding space. Moreover, our approach can naturally incorporate the available oneshot instances from the new classes into the embedding learning process to improve the triplet ranking model. We conduct experiments on two popular datasets for oneshot learning. The results show the proposed approach achieves better performance than the stateofthe art comparison methods.
Crosslists for Fri, 20 Apr 18
 [15] arXiv:1708.07199 (crosslist from cs.CV) [pdf, other]

Title: 3D Morphable Models as Spatial Transformer NetworksComments: Accepted to ICCV 2017 2nd Workshop on Geometry Meets Deep LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMMSTN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and selfocclusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, selfocclusion and large pose changes.
 [16] arXiv:1804.02541 (crosslist from cs.CV) [pdf, other]

Title: Statistical transformer networks: learning shape and appearance models via self supervisionSubjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We generalise Spatial Transformer Networks (STN) by replacing the parametric transformation of a fixed, regular sampling grid with a deformable, statistical shape model which is itself learnt. We call this a Statistical Transformer Network (StaTN). By training a network containing a StaTN endtoend for a particular task, the network learns the optimal nonrigid alignment of the input data for the task. Moreover, the statistical shape model is learnt with no direct supervision (such as landmarks) and can be reused for other tasks. Besides training for a specific task, we also show that a StaTN can learn a shape model using generic loss functions. This includes a loss inspired by the minimum description length principle in which an appearance model is also learnt from scratch. In this configuration, our model learns an active appearance model and a means to fit the model from scratch with no supervision at all, even identity labels.
 [17] arXiv:1804.06952 (crosslist from cs.DS) [pdf, ps, other]

Title: Distributed Simulation and Distributed InferenceSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
Independent samples from an unknown probability distribution $\mathbf{p}$ on a domain of size $k$ are distributed across $n$ players, with each player holding one sample. Each player can communicate $\ell$ bits to a central referee in a simultaneous message passing (SMP) model of communication to help the referee infer a property of the unknown $\mathbf{p}$. When $\ell\geq\log k$ bits, the problem reduces to the wellstudied collocated case where all the samples are available in one place. In this work, we focus on the communicationstarved setting of $\ell < \log k$, in which the landscape may change drastically. We propose a general formulation for inference problems in this distributed setting, and instantiate it to two prototypical inference questions: learning and uniformity testing.
 [18] arXiv:1804.06964 (crosslist from cs.NE) [pdf, other]

Title: GNAS: A Greedy Neural Architecture Search Method for MultiAttribute LearningSubjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
A key problem in deep multiattribute learning is to effectively discover the interattribute correlation structures. Typically, the conventional deep multiattribute learning approaches follow the pipeline of manually designing the network architectures based on taskspecific expertise prior knowledge and careful network tunings, leading to the inflexibility for various complicated scenarios in practice. Motivated by addressing this problem, we propose an efficient greedy neural architecture search approach (GNAS) to automatically discover the optimal treelike deep architecture for multiattribute learning. In a greedy manner, GNAS divides the optimization of global architecture into the optimizations of individual connections step by step. By iteratively updating the local architectures, the global treelike architecture gets converged where the bottom layers are shared across relevant attributes and the branches in top layers more encode attributespecific features. Experiments on three benchmark multiattribute datasets show the effectiveness and compactness of neural architectures derived by GNAS, and also demonstrate the efficiency of GNAS in searching neural architectures.
 [19] arXiv:1804.07010 (crosslist from stat.ML) [pdf, other]

Title: ForwardBackward Stochastic Neural Networks: Deep Learning of Highdimensional Partial Differential EquationsAuthors: Maziar RaissiSubjects: Machine Learning (stat.ML); Learning (cs.LG); Systems and Control (cs.SY); Analysis of PDEs (math.AP); Optimization and Control (math.OC)
Classical numerical methods for solving partial differential equations suffer from the curse dimensionality mainly due to their reliance on meticulously generated spatiotemporal grids. Inspired by modern deep learning based techniques for solving forward and inverse problems associated with partial differential equations, we circumvent the tyranny of numerical discretization by devising an algorithm that is scalable to highdimensions. In particular, we approximate the unknown solution by a deep neural network which essentially enables us to benefit from the merits of automatic differentiation. To train the aforementioned neural network we leverage the wellknown connection between highdimensional partial differential equations and forwardbackward stochastic differential equations. In fact, independent realizations of a standard Brownian motion will act as training data. We test the effectiveness of our approach for a couple of benchmark problems spanning a number of scientific domains including BlackScholesBarenblatt and HamiltonJacobiBellman equations, both in 100dimensions.
 [20] arXiv:1804.07059 (crosslist from stat.ML) [pdf, other]

Title: Exploring Partially Observed Networks with Nonparametric BanditsComments: 15 pages, 6 figures, currently under reviewSubjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)
Realworld networks such as social and communication networks are too large to be observed entirely. Such networks are often partially observed such that network size, network topology, and nodes of the original network are unknown. In this paper we formalize the Adaptive Graph Exploring problem. We assume that we are given an incomplete snapshot of a large network and additional nodes can be discovered by querying nodes in the currently observed network. The goal of this problem is to maximize the number of observed nodes within a given query budget. Querying which set of nodes maximizes the size of the observed network? We formulate this problem as an explorationexploitation problem and propose a novel nonparametric multiarm bandit (MAB) algorithm for identifying which nodes to be queried. Our contributions include: (1) $i$KNNUCB, a novel nonparametric MAB algorithm, applies $k$nearest neighbor UCB to the setting when the arms are presented in a vector space, (2) provide theoretical guarantee that $i$KNNUCB algorithm has sublinear regret, and (3) applying $i$KNNUCB algorithm on synthetic networks and realworld networks from different domains, we show that our method discovers up to 40% more nodes compared to existing baselines.
 [21] arXiv:1804.07091 (crosslist from stat.ML) [pdf, other]

Title: Detecting Regions of Maximal Divergence for SpatioTemporal Anomaly DetectionComments: Accepted by TPAMI. Examples and code: this https URLSubjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Applications (stat.AP)
Automatic detection of anomalies in space and timevarying measurements is an important tool in several fields, e.g., fraud detection, climate analysis, or healthcare monitoring. We present an algorithm for detecting anomalous regions in multivariate spatiotemporal timeseries, which allows for spotting the interesting parts in large amounts of data, including video and text data. In opposition to existing techniques for detecting isolated anomalous data points, we propose the "Maximally Divergent Intervals" (MDI) framework for unsupervised detection of coherent spatial regions and time intervals characterized by a high KullbackLeibler divergence compared with all other data given. In this regard, we define an unbiased KullbackLeibler divergence that allows for ranking regions of different size and show how to enable the algorithm to run on largescale data sets in reasonable time using an interval proposal technique. Experiments on both synthetic and real data from various domains, such as climate analysis, video surveillance, and text forensics, demonstrate that our method is widely applicable and a valuable tool for finding interesting events in different types of data.
 [22] arXiv:1804.07098 (crosslist from cs.CV) [pdf, other]

Title: Unsupervised Prostate Cancer Detection on H&E using Convolutional Adversarial AutoencodersSubjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We propose an unsupervised method using selfclustering convolutional adversarial autoencoders to classify prostate tissue as tumor or nontumor without any labeled training data. The clustering method is integrated into the training of the autoencoder and requires only little postprocessing. Our network trains on hematoxylin and eosin (H&E) input patches and we tested two different reconstruction targets, H&E and immunohistochemistry (IHC). We show that antibodydriven feature learning using IHC helps the network to learn relevant features for the clustering task. Our network achieves a F1 score of 0.62 using only a small set of validation labels to assign classes to clusters.
 [23] arXiv:1804.07101 (crosslist from stat.ML) [pdf, other]

Title: Dictionary learning  from local towards global and adaptiveAuthors: Karin SchnassComments: 11 figures, 4 pages per figureSubjects: Machine Learning (stat.ML); Learning (cs.LG)
This paper studies the convergence behaviour of dictionary learning via the Iterative Thresholding and Kresidual Means (ITKrM) algorithm. On one hand it is shown that there exist stable fixed points that do not correspond to the generating dictionary, which can be characterised as very coherent. On the other hand it is proved that ITKrM is a contraction under much relaxed conditions than previously necessary. Based on the characterisation of the stable fixed points, replacing coherent atoms with carefully designed replacement candidates is proposed. In experiments on synthetic data this outperforms random or no replacement and always leads to full dictionary recovery. Finally the question how to learn dictionaries without knowledge of the correct dictionary size and sparsity level is addressed. Decoupling the replacement strategy of coherent or unused atoms into pruning and adding, and slowly carefully increasing the sparsity level, leads to an adaptive version of ITKrM. In several experiments this adaptive dictionary learning algorithm is shown to recover a generating dictionary from randomly initialised dictionaries of various sizes on synthetic data and to learn meaningful dictionaries on image data.
 [24] arXiv:1804.07134 (crosslist from stat.ML) [pdf, other]

Title: varrank: an R package for variable ranking based on mutual information with applications to observed systemic datasetsComments: 18 pages, 4 figuresSubjects: Machine Learning (stat.ML); Learning (cs.LG)
This article describes the R package varrank. It has a flexible implementation of heuristic approaches which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The core functionality is a general implementation of the minimum redundancy maximum relevance (mRMRe) model. This approach is based on information theory metrics. It is compatible with discrete and continuous data which are discretised using a large choice of possible rules. The two main problems that can be addressed by this package are the selection of the most representative variables for modeling a collection of variables of interest, i.e., dimension reduction, and variable ranking with respect to a set of variables of interest.
 [25] arXiv:1804.07144 (crosslist from cs.NE) [pdf, other]

Title: Human Activity Recognition using Recurrent Neural NetworksAuthors: Deepika Singh, Erinc Merdivan, Ismini Psychoula, Johannes Kropf, Sten Hanke, Matthieu Geist, Andreas HolzingerJournalref: International CrossDomain Conference for Machine Learning and Knowledge Extraction: CDMAKE 2017Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
Human activity recognition using smart home sensors is one of the bases of ubiquitous computing in smart environments and a topic undergoing intense research in the field of ambient assisted living. The increasingly large amount of data sets calls for machine learning methods. In this paper, we introduce a deep learning model that learns to classify human activities without using any prior knowledge. For this purpose, a Long Short Term Memory (LSTM) Recurrent Neural Network was applied to three real world smart home datasets. The results of these experiments show that the proposed approach outperforms the existing ones in terms of accuracy and performance.
 [26] arXiv:1804.07155 (crosslist from cs.CV) [pdf, other]

Title: Instance Selection Improves Geometric Mean Accuracy: A Study on Imbalanced Data ClassificationComments: 11 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
A natural way of handling imbalanced data is to attempt to equalise the class frequencies and train the classifier of choice on balanced data. For twoclass imbalanced problems, the classification success is typically measured by the geometric mean (GM) of the true positive and true negative rates. Here we prove that GM can be improved upon by instance selection, and give the theoretical conditions for such an improvement. We demonstrate that GM is nonmonotonic with respect to the number of retained instances, which discourages systematic instance selection. We also show that balancing the distribution frequencies is inferior to a direct maximisation of GM. To verify our theoretical findings, we carried out an experimental study of 12 instance selection methods for imbalanced data, using 66 standard benchmark data sets. The results reveal possible room for new instance selection methods for imbalanced data.
 [27] arXiv:1804.07209 (crosslist from cs.NE) [pdf, other]

Title: NAISNet: Stable Deep Networks from NonAutonomous Differential EquationsComments: 29 pages, 7 figuresSubjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
This paper introduces "NonAutonomous InputOutput Stable Network" (NAISNet), a very deep architecture where each stacked processing block is derived from a timeinvariant nonautonomous dynamical system. Nonautonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a patterndependent processing depth. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one inputdependent equilibrium assuming tanh units, and multiple stable equilibria for ReLU units. An efficient implementation that enforces the stability under derived conditions for both fullyconnected and convolutional layers is also presented. Experimental results show how NAISNet exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.
 [28] arXiv:1804.07237 (crosslist from cs.CV) [pdf, other]

Title: Multiview Hybrid Embedding: A DivideandConquer ApproachSubjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
We present a novel crossview classification algorithm where the gallery and probe data come from different views. A popular approach to tackle this problem is the multiview subspace learning (MvSL) that aims to learn a latent subspace shared by multiview data. Despite promising results obtained on some applications, the performance of existing methods deteriorates dramatically when the multiview data is sampled from nonlinear manifolds or suffers from heavy outliers. To circumvent this drawback, motivated by the DivideandConquer strategy, we propose Multiview Hybrid Embedding (MvHE), a unique method of dividing the problem of crossview classification into three subproblems and building one model for each subproblem. Specifically, the first model is designed to remove view discrepancy, whereas the second and third models attempt to discover the intrinsic nonlinear structure and to increase discriminability in intraview and interview samples respectively. The kernel extension is conducted to further boost the representation power of MvHE. Extensive experiments are conducted on four benchmark datasets. Our methods demonstrate overwhelming advantages against the stateoftheart MvSL based crossview classification approaches in terms of classification accuracy and robustness.
 [29] arXiv:1804.07269 (crosslist from cs.RO) [pdf, other]

Title: Socially Guided Intrinsic Motivation for Robot Learning of Motor SkillsJournalref: Autonomous Robots, Springer Verlag, 2014, 36 (3), pp.273294Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
This paper presents a technical approach to robot learning of motor skills which combines active intrinsically motivated learning with imitation learning. Our architecture, called SGIMD, allows efficient learning of highdimensional continuous sensorimotor inverse models in robots, and in particular learns distributions of parameterised motor policies that solve a corresponding distribution of parameterised goals/tasks. This is made possible by the technical integration of imitation learning techniques within an algorithm for learning inverse models that relies on active goal babbling. After reviewing social learning and intrinsic motivation approaches to action learning, we describe the general framework of our algorithm, before detailing its architecture. In an experiment where a robot arm has to learn to use a flexible fishing line , we illustrate that SGIMD efficiently combines the advantages of social learning and intrinsic motivation and benefits from human demonstration properties to learn how to produce varied outcomes in the environment, while developing more precise control policies in large spaces.
Replacements for Fri, 20 Apr 18
 [30] arXiv:1702.08159 (replaced) [pdf, other]

Title: McKernel: A Library for Approximate Kernel Expansions in Loglinear TimeAuthors: Joachim D. Curtó, Irene C. Zarza, Feng Yang, Alexander J. Smola, Fernando De La Torre, ChongWah Ngo, Luc Van GoolSubjects: Learning (cs.LG)
 [31] arXiv:1703.07015 (replaced) [pdf, other]

Title: Modeling Long and ShortTerm Temporal Patterns with Deep Neural NetworksComments: Accepted by SIGIR 2018Subjects: Learning (cs.LG)
 [32] arXiv:1803.01271 (replaced) [pdf, other]

Title: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence ModelingSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
 [33] arXiv:1804.06352 (replaced) [pdf, other]

Title: High Dimensional Time Series GeneratorsSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [34] arXiv:1804.06518 (replaced) [pdf, ps, other]

Title: Online NonAdditive Path Learning under Full and Partial InformationSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [35] arXiv:1803.02400 (replaced) [pdf, other]

Title: Natural Language to Structured Query Generation via MetaLearningComments: in NAACL HLT 2018Subjects: Computation and Language (cs.CL); Learning (cs.LG)
 [36] arXiv:1803.06442 (replaced) [pdf, other]

Title: Replica Symmetry Breaking in Bipartite Spin Glasses and Neural NetworksComments: 33 pages, 14 figuresSubjects: Disordered Systems and Neural Networks (condmat.disnn); Learning (cs.LG)
 [37] arXiv:1804.03126 (replaced) [pdf, other]

Title: Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural NetworksComments: Early draft! Added a brief discussion on DNNs for program synthesis and translation, provided a brief justification for the LSTM use and fixed typosSubjects: HumanComputer Interaction (cs.HC); Artificial Intelligence (cs.AI); Learning (cs.LG)
 [38] arXiv:1804.03958 (replaced) [pdf, other]

Title: Interdependent Gibbs SamplersComments: Added a reference to a previous work which considered a very similar algorithmSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [39] arXiv:1804.06216 (replaced) [pdf, other]

Title: Learning Sparse Latent Representations with the Deep Copula Information BottleneckComments: Published as a conference paper at ICLR 2018. Aleksander Wieczorek and Mario Wieser contributed equally to this workJournalref: Conference track  ICLR 2018Subjects: Machine Learning (stat.ML); Learning (cs.LG)
 [40] arXiv:1804.06234 (replaced) [pdf, other]

Title: Clustering Analysis on Locally Asymptotically Selfsimilar ProcessesComments: arXiv admin note: substantial text overlap with arXiv:1801.09049Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)