# Learning

## New submissions

[ total of 21 entries: 1-21 ]
[ showing up to 2000 entries per page: fewer | more ]

### New submissions for Fri, 19 Jan 18

[1]
Title: An Overview of Machine Teaching
Comments: A tutorial document grown out of NIPS 2017 Workshop on Teaching Machines, Robots, and Humans
Subjects: Learning (cs.LG)

In this paper we try to organize machine teaching as a coherent set of ideas. Each idea is presented as varying along a dimension. The collection of dimensions then form the problem space of machine teaching, such that existing teaching problems can be characterized in this space. We hope this organization allows us to gain deeper understanding of individual teaching problems, discover connections among them, and identify gaps in the field.

[2]
Title: Faster Algorithms for Large-scale Machine Learning using Simple Sampling Techniques
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Now a days, the major challenge in machine learning is the Big~Data' challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The focus is on reducing the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used Empirical Risk Minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove the same convergence for systematic sampling, cyclic sampling and the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques.

[3]
Title: An Iterative Closest Point Method for Unsupervised Word Translation
Subjects: Learning (cs.LG)

Unsupervised word translation from non-parallel inter-lingual corpora has attracted much research interest. Very recently, neural network methods trained with adversarial loss functions achieved high accuracy on this task. Despite the impressive success of the recent techniques, they suffer from the typical drawbacks of generative adversarial models: sensitivity to hyper-parameters, long training time and lack of interpretability. In this paper, we make the observation that two sufficiently similar distributions can be aligned correctly with iterative matching methods. We present a novel method that first aligns the second moment of the word distributions of the two languages and then iteratively refines the alignment. Our simple linear method is able to achieve better or equal performance to recent state-of-the-art deep adversarial approaches and typically does a little better than the supervised baseline. Our method is also efficient, easy to parallelize and interpretable.

[4]
Title: Latitude: A Model for Mixed Linear-Tropical Matrix Factorization
Comments: 14 pages, 6 figures. To appear in 2018 SIAM International Conference on Data Mining (SDM '18). For the source code, see this https URL
Subjects: Learning (cs.LG)

Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.

### Cross-lists for Fri, 19 Jan 18

[5]  arXiv:1801.05852 (cross-list from cs.SI) [pdf, other]
Title: Network Representation Learning: A Survey
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

With the widespread use of information technologies, information networks have increasingly become popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life such as the structure of society, information diffusion, and different patterns of communication. However, the large scale of information networks often makes network analytic tasks computationally expensive and intractable. Recently, network representation learning has been proposed as a new learning paradigm that embeds network vertices into a low-dimensional vector space, by preserving network topology structure, vertex content, and other side information. This facilitates the original network to be easily handled in the new vector space for further analysis. In this survey, we perform a thorough review of the current literature on network representation learning in the field of data mining and machine learning. We propose a new categorization to analyze and summarize state-of-the-art network representation learning techniques according to the methodology they employ and the network information they preserve. Finally, to facilitate research on this topic, we summarize benchmark datasets and evaluation methodologies, and discuss open issues and future research directions in this field.

[6]  arXiv:1801.05856 (cross-list from cs.SI) [pdf, other]
Title: Active Community Detection: A Maximum Likelihood Approach
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

We propose novel semi-supervised and active learning algorithms for the problem of community detection on networks. The algorithms are based on optimizing the likelihood function of the community assignments given a graph and an estimate of the statistical model that generated it. The optimization framework is inspired by prior work on the unsupervised community detection problem in Stochastic Block Models (SBM) using Semi-Definite Programming (SDP). In this paper we provide the next steps in the evolution of learning communities in this context which involves a constrained semi-definite programming algorithm, and a newly presented active learning algorithm. The active learner intelligently queries nodes that are expected to maximize the change in the model likelihood. Experimental results show that this active learning algorithm outperforms the random-selection semi-supervised version of the same algorithm as well as other state-of-the-art active learning algorithms. Our algorithms significantly improved performance is demonstrated on both real-world and SBM-generated networks even when the SBM has a signal to noise ratio (SNR) below the known unsupervised detectability threshold.

[7]  arXiv:1801.05894 (cross-list from math.HO) [pdf, other]
Title: Deep Learning: An Introduction for Applied Mathematicians
Subjects: History and Overview (math.HO); Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)

Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature.

[8]  arXiv:1801.06024 (cross-list from cs.CL) [pdf, other]
Title: Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations
Comments: The 31st Annual Conference on Neural Information Processing (NIPS) - Workshop on Learning Disentangled Features: from Perception to Control, Long Beach, CA, December 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

We train multi-task autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and part-of-speech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the representation space by interpolating between sentences, which yields interesting pseudo-English sentences, many of which have recognizable syntactic structure. Lastly, we point out an interesting property of our models: The difference-vector between two sentences can be added to change a third sentence with similar features in a meaningful way.

[9]  arXiv:1801.06027 (cross-list from cs.DB) [pdf, other]
Title: In-RDBMS Hardware Acceleration of Advanced Analytics
Subjects: Databases (cs.DB); Hardware Architecture (cs.AR); Learning (cs.LG)

The data revolution is fueled by advances in several areas, including databases, high-performance computer architecture, and machine learning. Although timely, there is a void of solutions that brings these disjoint directions together. This paper sets out to be the initial step towards such a union. The aim is to devise a solution for the in-Database Acceleration of Advanced Analytics (DAnA). DAnA empowers database users to leap beyond traditional data summarization techniques and seamlessly utilize hardware-accelerated machine learning. Deploying specialized hardware, such as FPGAs, for in-database analytics currently requires hand-designing the hardware and manually routing the data. Instead, DAnA automatically maps a high-level specification of in-database analytics queries to the FPGA accelerator. The accelerator implementation is generated from a User Defined Function (UDF), expressed as part of a SQL query in a Python-embedded Domain Specific Language (DSL). To realize efficient in-database integration, DAnA accelerators contain a novel hardware structure, Striders, that directly interface with the buffer pool of the database. DAnA obtains the schema and page layout information from the database catalog to configure the Striders. In turn, Striders extract, cleanse, and process the training data tuples, which are consumed by a multi-threaded FPGA engine that executes the analytics algorithm. We integrated DAnA with PostgreSQL to generate hardware accelerators for a range of real-world and synthetic datasets running diverse ML algorithms. Results show that DAnA-enhanced PostgreSQL provides, on average, 11.3x end-to-end speedup than MADLib and 5.4x faster than multi-threaded MADLib running on Greenplum. DAnA provides these benefits while hiding the complexity of hardware design from data scientists and allowing them to express the algorithm in 30-60 lines of Python.

[10]  arXiv:1801.06048 (cross-list from cs.CY) [pdf, other]
Title: Deep Learning for Fatigue Estimation on the Basis of Multimodal Human-Machine Interactions
Comments: 12 pages, 10 figures, 1 table; presented at XXIX IUPAP Conference in Computational Physics (CCP2017) July 9-13, 2017, Paris, University Pierre et Marie Curie - Sorbonne (this https URL)
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Learning (cs.LG)

The new method is proposed to monitor the level of current physical load and accumulated fatigue by several objective and subjective characteristics. It was applied to the dataset targeted to estimate the physical load and fatigue by several statistical and machine learning methods. The data from peripheral sensors (accelerometer, GPS, gyroscope, magnetometer) and brain-computing interface (electroencephalography) were collected, integrated, and analyzed by several statistical and machine learning methods (moment analysis, cluster analysis, principal component analysis, etc.). The hypothesis 1 was presented and proved that physical activity can be classified not only by objective parameters, but by subjective parameters also. The hypothesis 2 (experienced physical load and subsequent restoration as fatigue level can be estimated quantitatively and distinctive patterns can be recognized) was presented and some ways to prove it were demonstrated. Several "physical load" and "fatigue" metrics were proposed. The results presented allow to extend application of the machine learning methods for characterization of complex human activity patterns (for example, to estimate their actual physical load and fatigue, and give cautions and advice).

[11]  arXiv:1801.06077 (cross-list from q-fin.CP) [pdf, other]
Title: The QLBS Q-Learner Goes NuQLear: Fitted Q Iteration, Inverse RL, and Option Portfolios
Authors: Igor Halperin
Subjects: Computational Finance (q-fin.CP); Learning (cs.LG)

The QLBS model is a discrete-time option hedging and pricing model that is based on Dynamic Programming (DP) and Reinforcement Learning (RL). It combines the famous Q-Learning method for RL with the Black-Scholes (-Merton) model's idea of reducing the problem of option pricing and hedging to the problem of optimal rebalancing of a dynamic replicating portfolio for the option, which is made of a stock and cash. Here we expand on several NuQLear (Numerical Q-Learning) topics with the QLBS model. First, we investigate the performance of Fitted Q Iteration for a RL (data-driven) solution to the model, and benchmark it versus a DP (model-based) solution, as well as versus the BSM model. Second, we develop an Inverse Reinforcement Learning (IRL) setting for the model, where we only observe prices and actions (re-hedges) taken by a trader, but not rewards. Third, we outline how the QLBS model can be used for pricing portfolios of options, rather than a single option in isolation, thus providing its own, data-driven and model independent solution to the (in)famous volatility smile problem of the Black-Scholes model.

[12]  arXiv:1801.06146 (cross-list from cs.CL) [pdf, ps, other]
Title: Fine-tuned Language Models for Text Classification
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

Transfer learning has revolutionized computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Fine-tuned Language Models (FitLaM), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a state-of-the-art language model. Our method significantly outperforms the state-of-the-art on five text classification tasks, reducing the error by 18-24% on the majority of datasets. We open-source our pretrained models and code to enable adoption by the community.

[13]  arXiv:1801.06159 (cross-list from stat.ML) [pdf, other]
Title: When Does Stochastic Gradient Algorithm Work Well?
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhood of the optimal solutions). We then empirically demonstrate that these assumptions hold for logistic regression and standard deep neural networks on classical data sets. Thus our analysis helps to explain when efficient behavior can be expected from the SGD method in training classification models and deep neural networks.

[14]  arXiv:1801.06176 (cross-list from cs.CL) [pdf, other]
Title: Integrating planning for task-completion dialogue policy learning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Training a task-completion dialogue agent with real users via reinforcement learning (RL) could be prohibitively expensive, because it requires many interactions with users. One alternative is to resort to a user simulator, while the discrepancy of between simulated and real users makes the learned policy unreliable in practice. This paper addresses these challenges by integrating planning into the dialogue policy learning based on Dyna-Q framework, and provides a more sample-efficient approach to learn the dialogue polices. The proposed agent consists of a planner trained on-line with limited real user experience that can generate large amounts of simulated experience to supplement with limited real user experience, and a policy model trained on these hybrid experiences. The effectiveness of our approach is validated on a movie-booking task in both a simulation setting and a human-in-the-loop setting.

### Replacements for Fri, 19 Jan 18

[15]  arXiv:1702.07958 (replaced) [pdf, other]
Title: Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret
Comments: 22 pages, 2 figures; ICML 2017; this version includes additional discussions of Newtron, and a variant of SOBA that directly uses an online exp-concave optimization oracle
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[16]  arXiv:1704.08443 (replaced) [pdf, other]
Title: DNA Steganalysis Using Deep Recurrent Neural Networks
Subjects: Learning (cs.LG); Multimedia (cs.MM)
[17]  arXiv:1711.04126 (replaced) [pdf, other]
Title: Disease Prediction from Electronic Health Records Using Generative Adversarial Networks
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[18]  arXiv:1505.04252 (replaced) [pdf, ps, other]
Title: Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
[19]  arXiv:1605.02408 (replaced) [pdf, ps, other]
Title: Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity Analysis