Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 19 Jan 18
 [1] arXiv:1801.05927 [pdf, other]

Title: An Overview of Machine TeachingComments: A tutorial document grown out of NIPS 2017 Workshop on Teaching Machines, Robots, and HumansSubjects: Learning (cs.LG)
In this paper we try to organize machine teaching as a coherent set of ideas. Each idea is presented as varying along a dimension. The collection of dimensions then form the problem space of machine teaching, such that existing teaching problems can be characterized in this space. We hope this organization allows us to gain deeper understanding of individual teaching problems, discover connections among them, and identify gaps in the field.
 [2] arXiv:1801.05931 [pdf, ps, other]

Title: Faster Algorithms for Largescale Machine Learning using Simple Sampling TechniquesComments: 80 figuresSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Now a days, the major challenge in machine learning is the `Big~Data' challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The focus is on reducing the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select minibatches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used Empirical Risk Minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAGII and MBSGD (Minibatched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove the same convergence for systematic sampling, cyclic sampling and the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques.
 [3] arXiv:1801.06126 [pdf, other]

Title: An Iterative Closest Point Method for Unsupervised Word TranslationSubjects: Learning (cs.LG)
Unsupervised word translation from nonparallel interlingual corpora has attracted much research interest. Very recently, neural network methods trained with adversarial loss functions achieved high accuracy on this task. Despite the impressive success of the recent techniques, they suffer from the typical drawbacks of generative adversarial models: sensitivity to hyperparameters, long training time and lack of interpretability. In this paper, we make the observation that two sufficiently similar distributions can be aligned correctly with iterative matching methods. We present a novel method that first aligns the second moment of the word distributions of the two languages and then iteratively refines the alignment. Our simple linear method is able to achieve better or equal performance to recent stateoftheart deep adversarial approaches and typically does a little better than the supervised baseline. Our method is also efficient, easy to parallelize and interpretable.
 [4] arXiv:1801.06136 [pdf, other]

Title: Latitude: A Model for Mixed LinearTropical Matrix FactorizationComments: 14 pages, 6 figures. To appear in 2018 SIAM International Conference on Data Mining (SDM '18). For the source code, see this https URLSubjects: Learning (cs.LG)
Nonnegative matrix factorization (NMF) is one of the most frequentlyused matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, maxtimes, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed lineartropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.
Crosslists for Fri, 19 Jan 18
 [5] arXiv:1801.05852 (crosslist from cs.SI) [pdf, other]

Title: Network Representation Learning: A SurveySubjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)
With the widespread use of information technologies, information networks have increasingly become popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life such as the structure of society, information diffusion, and different patterns of communication. However, the large scale of information networks often makes network analytic tasks computationally expensive and intractable. Recently, network representation learning has been proposed as a new learning paradigm that embeds network vertices into a lowdimensional vector space, by preserving network topology structure, vertex content, and other side information. This facilitates the original network to be easily handled in the new vector space for further analysis. In this survey, we perform a thorough review of the current literature on network representation learning in the field of data mining and machine learning. We propose a new categorization to analyze and summarize stateoftheart network representation learning techniques according to the methodology they employ and the network information they preserve. Finally, to facilitate research on this topic, we summarize benchmark datasets and evaluation methodologies, and discuss open issues and future research directions in this field.
 [6] arXiv:1801.05856 (crosslist from cs.SI) [pdf, other]

Title: Active Community Detection: A Maximum Likelihood ApproachSubjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)
We propose novel semisupervised and active learning algorithms for the problem of community detection on networks. The algorithms are based on optimizing the likelihood function of the community assignments given a graph and an estimate of the statistical model that generated it. The optimization framework is inspired by prior work on the unsupervised community detection problem in Stochastic Block Models (SBM) using SemiDefinite Programming (SDP). In this paper we provide the next steps in the evolution of learning communities in this context which involves a constrained semidefinite programming algorithm, and a newly presented active learning algorithm. The active learner intelligently queries nodes that are expected to maximize the change in the model likelihood. Experimental results show that this active learning algorithm outperforms the randomselection semisupervised version of the same algorithm as well as other stateoftheart active learning algorithms. Our algorithms significantly improved performance is demonstrated on both realworld and SBMgenerated networks even when the SBM has a signal to noise ratio (SNR) below the known unsupervised detectability threshold.
 [7] arXiv:1801.05894 (crosslist from math.HO) [pdf, other]

Title: Deep Learning: An Introduction for Applied MathematiciansSubjects: History and Overview (math.HO); Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of stateofthe art software on a large scale image classification problem. We finish with references to the current literature.
 [8] arXiv:1801.06024 (crosslist from cs.CL) [pdf, other]

Title: Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden RepresentationsComments: The 31st Annual Conference on Neural Information Processing (NIPS)  Workshop on Learning Disentangled Features: from Perception to Control, Long Beach, CA, December 2017Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We train multitask autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and partofspeech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the representation space by interpolating between sentences, which yields interesting pseudoEnglish sentences, many of which have recognizable syntactic structure. Lastly, we point out an interesting property of our models: The differencevector between two sentences can be added to change a third sentence with similar features in a meaningful way.
 [9] arXiv:1801.06027 (crosslist from cs.DB) [pdf, other]

Title: InRDBMS Hardware Acceleration of Advanced AnalyticsSubjects: Databases (cs.DB); Hardware Architecture (cs.AR); Learning (cs.LG)
The data revolution is fueled by advances in several areas, including databases, highperformance computer architecture, and machine learning. Although timely, there is a void of solutions that brings these disjoint directions together. This paper sets out to be the initial step towards such a union. The aim is to devise a solution for the inDatabase Acceleration of Advanced Analytics (DAnA). DAnA empowers database users to leap beyond traditional data summarization techniques and seamlessly utilize hardwareaccelerated machine learning. Deploying specialized hardware, such as FPGAs, for indatabase analytics currently requires handdesigning the hardware and manually routing the data. Instead, DAnA automatically maps a highlevel specification of indatabase analytics queries to the FPGA accelerator. The accelerator implementation is generated from a User Defined Function (UDF), expressed as part of a SQL query in a Pythonembedded Domain Specific Language (DSL). To realize efficient indatabase integration, DAnA accelerators contain a novel hardware structure, Striders, that directly interface with the buffer pool of the database. DAnA obtains the schema and page layout information from the database catalog to configure the Striders. In turn, Striders extract, cleanse, and process the training data tuples, which are consumed by a multithreaded FPGA engine that executes the analytics algorithm. We integrated DAnA with PostgreSQL to generate hardware accelerators for a range of realworld and synthetic datasets running diverse ML algorithms. Results show that DAnAenhanced PostgreSQL provides, on average, 11.3x endtoend speedup than MADLib and 5.4x faster than multithreaded MADLib running on Greenplum. DAnA provides these benefits while hiding the complexity of hardware design from data scientists and allowing them to express the algorithm in 3060 lines of Python.
 [10] arXiv:1801.06048 (crosslist from cs.CY) [pdf, other]

Title: Deep Learning for Fatigue Estimation on the Basis of Multimodal HumanMachine InteractionsAuthors: Yuri Gordienko, Sergii Stirenko, Yuriy Kochura, Oleg Alienin, Michail Novotarskiy, Nikita GordienkoComments: 12 pages, 10 figures, 1 table; presented at XXIX IUPAP Conference in Computational Physics (CCP2017) July 913, 2017, Paris, University Pierre et Marie Curie  Sorbonne (this https URL)Subjects: Computers and Society (cs.CY); HumanComputer Interaction (cs.HC); Learning (cs.LG)
The new method is proposed to monitor the level of current physical load and accumulated fatigue by several objective and subjective characteristics. It was applied to the dataset targeted to estimate the physical load and fatigue by several statistical and machine learning methods. The data from peripheral sensors (accelerometer, GPS, gyroscope, magnetometer) and braincomputing interface (electroencephalography) were collected, integrated, and analyzed by several statistical and machine learning methods (moment analysis, cluster analysis, principal component analysis, etc.). The hypothesis 1 was presented and proved that physical activity can be classified not only by objective parameters, but by subjective parameters also. The hypothesis 2 (experienced physical load and subsequent restoration as fatigue level can be estimated quantitatively and distinctive patterns can be recognized) was presented and some ways to prove it were demonstrated. Several "physical load" and "fatigue" metrics were proposed. The results presented allow to extend application of the machine learning methods for characterization of complex human activity patterns (for example, to estimate their actual physical load and fatigue, and give cautions and advice).
 [11] arXiv:1801.06077 (crosslist from qfin.CP) [pdf, other]

Title: The QLBS QLearner Goes NuQLear: Fitted Q Iteration, Inverse RL, and Option PortfoliosAuthors: Igor HalperinComments: 18 pages, 5 figuresSubjects: Computational Finance (qfin.CP); Learning (cs.LG)
The QLBS model is a discretetime option hedging and pricing model that is based on Dynamic Programming (DP) and Reinforcement Learning (RL). It combines the famous QLearning method for RL with the BlackScholes (Merton) model's idea of reducing the problem of option pricing and hedging to the problem of optimal rebalancing of a dynamic replicating portfolio for the option, which is made of a stock and cash. Here we expand on several NuQLear (Numerical QLearning) topics with the QLBS model. First, we investigate the performance of Fitted Q Iteration for a RL (datadriven) solution to the model, and benchmark it versus a DP (modelbased) solution, as well as versus the BSM model. Second, we develop an Inverse Reinforcement Learning (IRL) setting for the model, where we only observe prices and actions (rehedges) taken by a trader, but not rewards. Third, we outline how the QLBS model can be used for pricing portfolios of options, rather than a single option in isolation, thus providing its own, datadriven and model independent solution to the (in)famous volatility smile problem of the BlackScholes model.
 [12] arXiv:1801.06146 (crosslist from cs.CL) [pdf, ps, other]

Title: Finetuned Language Models for Text ClassificationSubjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
Transfer learning has revolutionized computer vision, but existing approaches in NLP still require taskspecific modifications and training from scratch. We propose Finetuned Language Models (FitLaM), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for finetuning a stateoftheart language model. Our method significantly outperforms the stateoftheart on five text classification tasks, reducing the error by 1824% on the majority of datasets. We opensource our pretrained models and code to enable adoption by the community.
 [13] arXiv:1801.06159 (crosslist from stat.ML) [pdf, other]

Title: When Does Stochastic Gradient Algorithm Work Well?Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)
In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhood of the optimal solutions). We then empirically demonstrate that these assumptions hold for logistic regression and standard deep neural networks on classical data sets. Thus our analysis helps to explain when efficient behavior can be expected from the SGD method in training classification models and deep neural networks.
 [14] arXiv:1801.06176 (crosslist from cs.CL) [pdf, other]

Title: Integrating planning for taskcompletion dialogue policy learningComments: 11 pages, 6 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Training a taskcompletion dialogue agent with real users via reinforcement learning (RL) could be prohibitively expensive, because it requires many interactions with users. One alternative is to resort to a user simulator, while the discrepancy of between simulated and real users makes the learned policy unreliable in practice. This paper addresses these challenges by integrating planning into the dialogue policy learning based on DynaQ framework, and provides a more sampleefficient approach to learn the dialogue polices. The proposed agent consists of a planner trained online with limited real user experience that can generate large amounts of simulated experience to supplement with limited real user experience, and a policy model trained on these hybrid experiences. The effectiveness of our approach is validated on a moviebooking task in both a simulation setting and a humanintheloop setting.
Replacements for Fri, 19 Jan 18
 [15] arXiv:1702.07958 (replaced) [pdf, other]

Title: Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ RegretComments: 22 pages, 2 figures; ICML 2017; this version includes additional discussions of Newtron, and a variant of SOBA that directly uses an online expconcave optimization oracleSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [16] arXiv:1704.08443 (replaced) [pdf, other]

Title: DNA Steganalysis Using Deep Recurrent Neural NetworksSubjects: Learning (cs.LG); Multimedia (cs.MM)
 [17] arXiv:1711.04126 (replaced) [pdf, other]

Title: Disease Prediction from Electronic Health Records Using Generative Adversarial NetworksComments: 6 pages, 3 figuresSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [18] arXiv:1505.04252 (replaced) [pdf, ps, other]

Title: Global Convergence of Unmodified 3Block ADMM for a Class of Convex Minimization ProblemsSubjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
 [19] arXiv:1605.02408 (replaced) [pdf, ps, other]

Title: Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity AnalysisComments: Section 4.1 is updatedSubjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
 [20] arXiv:1710.02030 (replaced) [pdf, other]

Title: McDiarmid Drift Detection Methods for Evolving Data StreamsComments: 9 pages, 3 figures, 3 tablesSubjects: Machine Learning (stat.ML); Databases (cs.DB); Learning (cs.LG)
 [21] arXiv:1801.00746 (replaced) [pdf, other]

Title: Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network CompilerComments: Accepted by ASPLOS 2018Subjects: Neural and Evolutionary Computing (cs.NE); Emerging Technologies (cs.ET); Learning (cs.LG)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)