We gratefully acknowledge support from
the Simons Foundation
and member institutions

Machine Learning

New submissions

[ total of 55 entries: 1-55 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 22 Oct 18

[1]  arXiv:1810.08217 [pdf, other]
Title: Well, how accurate is it? A Study of Deep Learning Methods for Reynolds-Averaged Navier-Stokes Simulations
Comments: Code and data available at: this https URL
Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn); Machine Learning (stat.ML)

With this study we investigate the accuracy of deep learning models for the inference of Reynolds-Averaged Navier-Stokes solutions. We focus on a modernized U-net architecture, and evaluate a large number of trained neural networks with respect to their accuracy for the calculation of pressure and velocity distributions. In particular, we illustrate how training data size and the number of weights influence the accuracy of the solutions. With our best models we arrive at a mean relative pressure and velocity error of less than 3% across a range of previously unseen airfoil shapes. In addition all source code is publicly available in order to ensure reproducibility and to provide a starting point for researchers interested in deep learning methods for physics problems. While this work focuses on RANS solutions, the neural network architecture and learning setup are very generic, and applicable to a wide range of PDE boundary value problems on Cartesian grids.

[2]  arXiv:1810.08223 [pdf, other]
Title: Micro-Browsing Models for Search Snippets
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

Click-through rate (CTR) is a key signal of relevance for search engine results, both organic and sponsored. CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user. There has been considerable work on user browsing models, to model and analyze both the examination and the relevance components of CTR. In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets. The snippet text of a result often plays a critical role in the perceived relevance of the result. We study how particular words within a line of snippet can influence user behavior. We validate this new micro-browsing user model by considering the problem of predicting which snippet will yield higher CTR, and show that classification accuracy is dramatically higher with our micro-browsing user model. The key insight in this paper is that varying relatively few words within a snippet, and even their location within a snippet, can have a significant influence on the clickthrough of a snippet.

[3]  arXiv:1810.08280 [pdf, other]
Title: Exploring Adversarial Examples in Malware Detection
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

The Convolutional Neural Network (CNN) architecture is increasingly being applied to new domains, such as malware detection, where it is able to learn malicious behavior from raw bytes extracted from executables. These architectures reach impressive performance with no feature engineering effort involved, but their robustness against active attackers is yet to be understood. Such malware detectors could face a new attack vector in the form of adversarial interference with the classification model. Existing evasion attacks intended to cause misclassification on test-time instances, which have been extensively studied for image classifiers, are not applicable because of the input semantics that prevents arbitrary changes to the binaries. This paper explores the area of adversarial examples for malware detection. By training an existing model on a production-scale dataset, we show that some previous attacks are less effective than initially reported, while simultaneously highlighting architectural weaknesses that facilitate new attack strategies for malware classification. Finally, we explore more generalizable attack strategies that increase the potential effectiveness of evasion attacks.

[4]  arXiv:1810.08305 [pdf, other]
Title: Open Vocabulary Learning on Source Code with a Graph-Structured Cache
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task --- with over 100% relative improvement on the latter --- at the cost of a moderate increase in computation time.

[5]  arXiv:1810.08309 [pdf]
Title: Unsupervised Anomalous Data Space Specification
Authors: Ian J Davis
Comments: 18 Pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Computer algorithms are written with the intent that when run they perform a useful function. Typically any information obtained is unknown until the algorithm is run. However, if the behavior of an algorithm can be fully described by precomputing just once how this algorithm will respond when executed on any input, this precomputed result provides a complete specification for all solutions in the problem domain. We apply this idea to a previous anomaly detection algorithm, and in doing so transform it from one that merely detects individual anomalies when asked to discover potentially anomalous values, into an algorithm also capable of generating a complete specification for those values it would deem to be anomalous. This specification is derived by examining no more than a small training data, can be obtained in very small constant time, and is inherently far more useful than results obtained by repeated execution of this tool. For example, armed with such a specification one can ask how close an anomaly is to being deemed normal, and can validate this answer not by exhaustively testing the algorithm but by examining if the specification so generated is indeed correct. This powerful idea can be applied to any algorithm whose runtime behavior can be recovered from its construction and so has wide applicability.

[6]  arXiv:1810.08313 [pdf, other]
Title: Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

Large-scale machine learning training, in particular distributed stochastic gradient descent, needs to be robust to inherent system variability such as node straggling and random communication delays. This work considers a distributed training framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. We analyze the true speed of error convergence with respect to wall-clock time (instead of the number of iterations), and analyze how it is affected by the frequency of averaging. The main contribution is the design of AdaComm, an adaptive communication strategy that starts with infrequent averaging to save communication delay and improve convergence speed, and then increases the communication frequency in order to achieve a low error floor. Rigorous experiments on training deep neural networks show that AdaComm can take $3 \times$ less time than fully synchronous SGD, and still reach the same final training loss.

[7]  arXiv:1810.08322 [pdf, ps, other]
Title: Sequenced-Replacement Sampling for Deep Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose sequenced-replacement sampling (SRS) for training deep neural networks. The basic idea is to assign a fixed sequence index to each sample in the dataset. Once a mini-batch is randomly drawn in each training iteration, we refill the original dataset by successively adding samples according to their sequence index. Thus we carry out replacement sampling but in a batched and sequenced way. In a sense, SRS could be viewed as a way of performing "mini-batch augmentation". It is particularly useful for a task where we have a relatively small images-per-class such as CIFAR-100. Together with a longer period of initial large learning rate, it significantly improves the classification accuracy in CIFAR-100 over the current state-of-the-art results. Our experiments indicate that training deeper networks with SRS is less prone to over-fitting. In the best case, we achieve an error rate as low as 10.10%.

[8]  arXiv:1810.08323 [pdf, other]
Title: Learning Multi-Layer Transform Models
Comments: In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, 2018
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Learned data models based on sparsity are widely used in signal processing and imaging applications. A variety of methods for learning synthesis dictionaries, sparsifying transforms, etc., have been proposed in recent years, often imposing useful structures or properties on the models. In this work, we focus on sparsifying transform learning, which enjoys a number of advantages. We consider multi-layer or nested extensions of the transform model, and propose efficient learning algorithms. Numerical experiments with image data illustrate the behavior of the multi-layer transform learning algorithm and its usefulness for image denoising. Multi-layer models provide better denoising quality than single layer schemes.

[9]  arXiv:1810.08351 [pdf, other]
Title: Exchangeability and Kernel Invariance in Trained MLPs
Comments: 26 pages, 16 Figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In the analysis of machine learning models, it is often convenient to assume that the parameters are IID. This assumption is not satisfied when the parameters are updated through training processes such as SGD. A relaxation of the IID condition is a probabilistic symmetry known as exchangeability. We show the sense in which the weights in MLPs are exchangeable. This yields the result that in certain instances, the layer-wise kernel of fully-connected layers remains approximately constant during training. We identify a sharp change in the macroscopic behavior of networks as the covariance between weights changes from zero.

[10]  arXiv:1810.08359 [pdf]
Title: Malicious Web Domain Identification using Online Credibility and Performance Data by Considering the Class Imbalance Issue
Comments: 20 pages
Journal-ref: Industrial Management & Data Systems, 2018
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Purpose: Malicious web domain identification is of significant importance to the security protection of Internet users. With online credibility and performance data, this paper aims to investigate the use of machine learning tech-niques for malicious web domain identification by considering the class imbalance issue (i.e., there are more benign web domains than malicious ones). Design/methodology/approach: We propose an integrated resampling approach to handle class imbalance by combining the Synthetic Minority Over-sampling TEchnique (SMOTE) and Particle Swarm Optimisation (PSO), a population-based meta-heuristic algorithm. We use the SMOTE for over-sampling and PSO for under-sampling. Findings: By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain datasets with different imbalance ratios. Com-pared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications: This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains, but also provides an effective resampling approach for handling the class imbal-ance issue in the area of malicious web domain identification. Originality/value: Online credibility and performance data is applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class im-balance issue. The performance of the proposed approach is confirmed based on real-world datasets with different imbalance ratios.

[11]  arXiv:1810.08363 [pdf, other]
Title: Generative Low-Shot Network Expansion
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Conventional deep learning classifiers are static in the sense that they are trained on a predefined set of classes and learning to classify a novel class typically requires re-training. In this work, we address the problem of Low-Shot network expansion learning. We introduce a learning framework which enables expanding a pre-trained (base) deep network to classify novel classes when the number of examples for the novel classes is particularly small. We present a simple yet powerful hard distillation method where the base network is augmented with additional weights to classify the novel classes, while keeping the weights of the base network unchanged. We show that since only a small number of weights needs to be trained, the hard distillation excels in low-shot training scenarios. Furthermore, hard distillation avoids detriment to classification performance on the base classes. Finally, we show that low-shot network expansion can be done with a very small memory footprint by using a compact generative model of the base classes training data with only a negligible degradation relative to learning with the full training set.

[12]  arXiv:1810.08379 [pdf, other]
Title: Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators
Comments: Accepted by ICCAD 2018
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Neural approximate computing gains enormous energy-efficiency at the cost of tolerable quality-loss. A neural approximator can map the input data to output while a classifier determines whether the input data are safe to approximate with quality guarantee. However, existing works cannot maximize the invocation of the approximator, resulting in limited speedup and energy saving. By exploring the mapping space of those target functions, in this paper, we observe a nonuniform distribution of the approximation error incurred by the same approximator. We thus propose a novel approximate computing architecture with a Multiclass-Classifier and Multiple Approximators (MCMA). These approximators have identical network topologies and thus can share the same hardware resource in a neural processing unit(NPU) clip. In the runtime, MCMA can swap in the invoked approximator by merely shipping the synapse weights from the on-chip memory to the buffers near MAC within a cycle. We also propose efficient co-training methods for such MCMA architecture. Experimental results show a more substantial invocation of MCMA as well as the gain of energy-efficiency.

[13]  arXiv:1810.08515 [pdf, ps, other]
Title: Transfer Learning versus Multi-agent Learning regarding Distributed Decision-Making in Highway Traffic
Comments: Proc. of the 10th International Workshop on Agents in Traffic and Transportation (ATT 2018), co-located with ECAI/IJCAI, AAMAS and ICML 2018 conferences (FAIM 2018)
Journal-ref: CEUR Workshop Proceedings 2018
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Transportation and traffic are currently undergoing a rapid increase in terms of both scale and complexity. At the same time, an increasing share of traffic participants are being transformed into agents driven or supported by artificial intelligence resulting in mixed-intelligence traffic. This work explores the implications of distributed decision-making in mixed-intelligence traffic. The investigations are carried out on the basis of an online-simulated highway scenario, namely the MIT \emph{DeepTraffic} simulation. In the first step traffic agents are trained by means of a deep reinforcement learning approach, being deployed inside an elitist evolutionary algorithm for hyperparameter search. The resulting architectures and training parameters are then utilized in order to either train a single autonomous traffic agent and transfer the learned weights onto a multi-agent scenario or else to conduct multi-agent learning directly. Both learning strategies are evaluated on different ratios of mixed-intelligence traffic. The strategies are assessed according to the average speed of all agents driven by artificial intelligence. Traffic patterns that provoke a reduction in traffic flow are analyzed with respect to the different strategies.

[14]  arXiv:1810.08552 [pdf, ps, other]
Title: Nonlinear integro-differential operator regression with neural networks
Comments: 5 pages, 3 figures, preprint submitted to the Journal of Computational Physics
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

This note introduces a regression technique for finding a class of nonlinear integro-differential operators from data. The method parametrizes the spatial operator with neural networks and Fourier transforms such that it can fit a class of nonlinear operators without needing a library of a priori selected operators. We verify that this method can recover the spatial operators in the fractional heat equation and the Kuramoto-Sivashinsky equation from numerical solutions of the equations.

[15]  arXiv:1810.08575 [pdf, other]
Title: Supervising strong learners by amplifying weak experts
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems. Iterated Amplification is closely related to Expert Iteration (Anthony et al., 2017; Silver et al., 2017), except that it uses no external reward function. We present results in algorithmic environments, showing that Iterated Amplification can efficiently learn complex behaviors.

[16]  arXiv:1810.08591 [pdf, other]
Title: A Modern Take on the Bias-Variance Tradeoff in Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We revisit the bias-variance tradeoff for neural networks in light of modern empirical findings. The traditional bias-variance tradeoff in machine learning suggests that as model complexity grows, variance increases. Classical bounds in statistical learning theory point to the number of parameters in a model as a measure of model complexity, which means the tradeoff would indicate that variance increases with the size of neural networks. However, we empirically find that variance due to training set sampling is roughly \textit{constant} (with both width and depth) in practice. Variance caused by the non-convexity of the loss landscape is different. We find that it decreases with width and increases with depth, in our setting. We provide theoretical analysis, in a simplified setting inspired by linear models, that is consistent with our empirical findings for width. We view bias-variance as a useful lens to study generalization through and encourage further theoretical explanation from this perspective.

Cross-lists for Mon, 22 Oct 18

[17]  arXiv:1712.08230 (cross-list from cs.IT) [pdf, other]
Title: Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers
Comments: To appear in IEEE Transactions on Communications
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)

We propose two coded schemes for the distributed computing problem of multiplying a matrix by a set of vectors. The first scheme is based on partitioning the matrix into submatrices and applying maximum distance separable (MDS) codes to each submatrix. For this scheme, we prove that up to a given number of partitions the communication load and the computational delay (not including the encoding and decoding delay) are identical to those of the scheme recently proposed by Li et al., based on a single, long MDS code. However, due to the use of shorter MDS codes, our scheme yields a significantly lower overall computational delay when the delay incurred by encoding and decoding is also considered. We further propose a second coded scheme based on Luby Transform (LT) codes under inactivation decoding. Interestingly, LT codes may reduce the delay over the partitioned scheme at the expense of an increased communication load. We also consider distributed computing under a deadline and show numerically that the proposed schemes outperform other schemes in the literature, with the LT code-based scheme yielding the best performance for the scenarios considered.

[18]  arXiv:1810.08229 (cross-list from cs.CV) [pdf, other]
Title: MRI Reconstruction via Cascaded Channel-wise Attention Network
Comments: 4 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We consider an MRI reconstruction problem with input of k-space data at a very low undersampled rate. This can practically benefit patient due to reduced time of MRI scan, but it is also challenging since quality of reconstruction may be compromised. Currently, deep learning based methods dominate MRI reconstruction over traditional approaches such as Compressed Sensing, but they rarely show satisfactory performance in the case of low undersampled k-space data. One explanation is that these methods treat channel-wise features equally, which results in degraded representation ability of the neural network. To solve this problem, we propose a new model called MRI Cascaded Channel-wise Attention Network (MICCAN), highlighted by three components: (i) a variant of U-net with Channel-wise Attention (UCA) module, (ii) a long skip connection and (iii) a combined loss. Our model is able to attend to salient information by filtering irrelevant features and also concentrate on high-frequency information by enforcing low-frequency information bypassed to the final output. We conduct both quantitative evaluation and qualitative analysis of our method on a cardiac dataset. The experiment shows that our method achieves very promising results in terms of three common metrics on the MRI reconstruction with low undersampled k-space data.

[19]  arXiv:1810.08303 (cross-list from cs.AI) [pdf, other]
Title: Compositional Verification for Autonomous Systems with Deep Learning Components
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As autonomy becomes prevalent in many applications, ranging from recommendation systems to fully autonomous vehicles, there is an increased need to provide safety guarantees for such systems. The problem is difficult, as these are large, complex systems which operate in uncertain environments, requiring data-driven machine-learning components. However, learning techniques such as Deep Neural Networks, widely used today, are inherently unpredictable and lack the theoretical foundations to provide strong assurance guarantees. We present a compositional approach for the scalable, formal verification of autonomous systems that contain Deep Neural Network components. The approach uses assume-guarantee reasoning whereby {\em contracts}, encoding the input-output behavior of individual components, allow the designer to model and incorporate the behavior of the learning-enabled components working side-by-side with the other components. We illustrate the approach on an example taken from the autonomous vehicles domain.

[20]  arXiv:1810.08326 (cross-list from cs.CV) [pdf, other]
Title: Domain-Invariant Projection Learning for Zero-Shot Recognition
Comments: Accepted to NIPS 2018
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Zero-shot learning (ZSL) aims to recognize unseen object classes without any training samples, which can be regarded as a form of transfer learning from seen classes to unseen ones. This is made possible by learning a projection between a feature space and a semantic space (e.g. attribute space). Key to ZSL is thus to learn a projection function that is robust against the often large domain gap between the seen and unseen classes. In this paper, we propose a novel ZSL model termed domain-invariant projection learning (DIPL). Our model has two novel components: (1) A domain-invariant feature self-reconstruction task is introduced to the seen/unseen class data, resulting in a simple linear formulation that casts ZSL into a min-min optimization problem. Solving the problem is non-trivial, and a novel iterative algorithm is formulated as the solver, with rigorous theoretic algorithm analysis provided. (2) To further align the two domains via the learned projection, shared semantic structure among seen and unseen classes is explored via forming superclasses in the semantic space. Extensive experiments show that our model outperforms the state-of-the-art alternatives by significant margins.

[21]  arXiv:1810.08329 (cross-list from cs.CV) [pdf, other]
Title: Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning
Comments: Submitted to IJCV
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Zero-shot learning (ZSL) aims to transfer knowledge from seen classes to unseen ones so that the latter can be recognised without any training samples. This is made possible by learning a projection function between a feature space and a semantic space (e.g. attribute space). Considering the seen and unseen classes as two domains, a big domain gap often exists which challenges ZSL. Inspired by the fact that an unseen class is not exactly `unseen' if it belongs to the same superclass as a seen class, we propose a novel inductive ZSL model that leverages superclasses as the bridge between seen and unseen classes to narrow the domain gap. Specifically, we first build a class hierarchy of multiple superclass layers and a single class layer, where the superclasses are automatically generated by data-driven clustering over the semantic representations of all seen and unseen class names. We then exploit the superclasses from the class hierarchy to tackle the domain gap challenge in two aspects: deep feature learning and projection function learning. First, to narrow the domain gap in the feature space, we integrate a recurrent neural network (RNN) defined with the superclasses into a convolutional neural network (CNN), in order to enforce the superclass hierarchy. Second, to further learn a transferrable projection function for ZSL, a novel projection function learning method is proposed by exploiting the superclasses to align the two domains. Importantly, our transferrable feature and projection learning methods can be easily extended to a closely related task -- few-shot learning (FSL). Extensive experiments show that the proposed model significantly outperforms the state-of-the-art alternatives in both ZSL and FSL tasks.

[22]  arXiv:1810.08332 (cross-list from cs.CV) [pdf, other]
Title: Zero and Few Shot Learning with Semantic Feature Synthesis and Competitive Learning
Comments: Submitted to IEEE TPAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Zero-shot learning (ZSL) is made possible by learning a projection function between a feature space and a semantic space (e.g.,~an attribute space). Key to ZSL is thus to learn a projection that is robust against the often large domain gap between the seen and unseen class domains. In this work, this is achieved by unseen class data synthesis and robust projection function learning. Specifically, a novel semantic data synthesis strategy is proposed, by which semantic class prototypes (e.g., attribute vectors) are used to simply perturb seen class data for generating unseen class ones. As in any data synthesis/hallucination approach, there are ambiguities and uncertainties on how well the synthesised data can capture the targeted unseen class data distribution. To cope with this, the second contribution of this work is a novel projection learning model termed competitive bidirectional projection learning (BPL) designed to best utilise the ambiguous synthesised data. Specifically, we assume that each synthesised data point can belong to any unseen class; and the most likely two class candidates are exploited to learn a robust projection function in a competitive fashion. As a third contribution, we show that the proposed ZSL model can be easily extended to few-shot learning (FSL) by again exploiting semantic (class prototype guided) feature synthesis and competitive BPL. Extensive experiments show that our model achieves the state-of-the-art results on both problems.

[23]  arXiv:1810.08403 (cross-list from cs.DC) [pdf, other]
Title: Towards Efficient Large-Scale Graph Neural Network Computing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Recent deep learning models have moved beyond low-dimensional regular grids such as image, video, and speech, to high-dimensional graph-structured data, such as social networks, brain connections, and knowledge graphs. This evolution has led to large graph-based irregular and sparse models that go beyond what existing deep learning frameworks are designed for. Further, these models are not easily amenable to efficient, at scale, acceleration on parallel hardwares (e.g. GPUs). We introduce NGra, the first parallel processing framework for graph-based deep neural networks (GNNs). NGra presents a new SAGA-NN model for expressing deep neural networks as vertex programs with each layer in well-defined (Scatter, ApplyEdge, Gather, ApplyVertex) graph operation stages. This model not only allows GNNs to be expressed intuitively, but also facilitates the mapping to an efficient dataflow representation. NGra addresses the scalability challenge transparently through automatic graph partitioning and chunk-based stream processing out of GPU core or over multiple GPUs, which carefully considers data locality, data movement, and overlapping of parallel processing and data movement. NGra further achieves efficiency through highly optimized Scatter/Gather operators on GPUs despite its sparsity. Our evaluation shows that NGra scales to large real graphs that none of the existing frameworks can handle directly, while achieving up to about 4 times speedup even at small scales over the multiple-baseline design on TensorFlow.

[24]  arXiv:1810.08452 (cross-list from cs.CV) [pdf, other]
Title: High Resolution Semantic Change Detection
Comments: Preprint submitted to Computer Vision and Image Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Change detection is one of the main problems in remote sensing, and is essential to the accurate processing and understanding of the large scale Earth observation data available through programs such as Sentinel and Landsat. Most of the recently proposed change detection methods bring deep learning to this context, but openly available change detection datasets are still very scarce, which limits the methods that can be proposed and tested. In this paper we present the first large scale high resolution semantic change detection (HRSCD) dataset, which enables the usage of deep learning methods for semantic change detection. The dataset contains coregistered RGB image pairs, pixel-wise change information and land cover information. We then propose several methods using fully convolutional neural networks to perform semantic change detection. Most notably, we present a network architecture that performs change detection and land cover mapping simultaneously, while using the predicted land cover information to help to predict changes. We also describe a sequential training scheme that allows this network to be trained without setting a hyperparameter that balances different loss functions and achieves the best overall results.

[25]  arXiv:1810.08462 (cross-list from cs.CV) [pdf, other]
Title: Fully Convolutional Siamese Networks for Change Detection
Comments: To appear inProc. ICIP 2018, October 07-10, 2018, Athens, Greece
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper presents three fully convolutional neural network architectures which perform change detection using a pair of coregistered images. Most notably, we propose two Siamese extensions of fully convolutional networks which use heuristics about the current problem to achieve the best results in our tests on two open change detection datasets, using both RGB and multispectral images. We show that our system is able to learn from scratch using annotated change detection images. Our architectures achieve better performance than previously proposed methods, while being at least 500 times faster than related systems. This work is a step towards efficient processing of data from large scale Earth observation systems such as Copernicus or Landsat.

[26]  arXiv:1810.08468 (cross-list from cs.CV) [pdf, other]
Title: Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks
Comments: To appear inProc. IGARSS 2018, July 22-27, 2018, Valencia, Spain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The Copernicus Sentinel-2 program now provides multispectral images at a global scale with a high revisit rate. In this paper we explore the usage of convolutional neural networks for urban change detection using such multispectral images. We first present the new change detection dataset that was used for training the proposed networks, which will be openly available to serve as a benchmark. The Onera Satellite Change Detection (OSCD) dataset is composed of pairs of multispectral aerial images, and the changes were manually annotated at pixel level. We then propose two architectures to detect changes, Siamese and Early Fusion, and compare the impact of using different numbers of spectral channels as inputs. These architectures are trained from scratch using the provided dataset.

[27]  arXiv:1810.08498 (cross-list from cs.SI) [pdf, other]
Title: Data-driven Analysis of Complex Networks and their Model-generated Counterparts
Comments: 14 pages
Subjects: Social and Information Networks (cs.SI); Discrete Mathematics (cs.DM); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph)

Data-driven analysis of complex networks has been in the focus of research for decades. An important question is to discover the relation between various network characteristics in real-world networks and how these relationships vary across network domains. A related research question is to study how well the network models can capture the observed relations between the graph metrics. In this paper, we apply statistical and machine learning techniques to answer the aforementioned questions. We study 400 real-world networks along with 6 x 400 networks generated by five frequently used network models with previously fitted parameters to make the generated graphs as similar to the real network as possible. We find that the correlation profiles of the structural measures significantly differ across network domains and the domain can be efficiently determined using a small selection of graph metrics. The goodness-of-fit of the network models and the best performing models themselves highly depend on the domains. Using machine learning techniques, it turned out to be relatively easy to decide if a network is real or model-generated. We also investigate what structural properties make it possible to achieve a good accuracy, i.e. what features the network models cannot capture.

[28]  arXiv:1810.08509 (cross-list from cs.CR) [pdf, ps, other]
Title: Probabilistic Matrix Factorization with Personalized Differential Privacy
Comments: 24 pages, 12 figures, 4 tables
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Probabilistic matrix factorization (PMF) plays a crucial role in recommendation systems. It requires a large amount of user data (such as user shopping records and movie ratings) to predict personal preferences, and thereby provides users high-quality recommendation services, which expose the risk of leakage of user privacy. Differential privacy, as a provable privacy protection framework, has been applied widely to recommendation systems. It is common that different individuals have different levels of privacy requirements on items. However, traditional differential privacy can only provide a uniform level of privacy protection for all users.
In this paper, we mainly propose a probabilistic matrix factorization recommendation scheme with personalized differential privacy (PDP-PMF). It aims to meet users' privacy requirements specified at the item-level instead of giving the same level of privacy guarantees for all. We then develop a modified sampling mechanism (with bounded differential privacy) for achieving PDP. We also perform a theoretical analysis of the PDP-PMF scheme and demonstrate the privacy of the PDP-PMF scheme. In addition, we implement the probabilistic matrix factorization schemes both with traditional and with personalized differential privacy (DP-PMF, PDP-PMF) and compare them through a series of experiments. The results show that the PDP-PMF scheme performs well on protecting the privacy of each user and its recommendation quality is much better than the DP-PMF scheme.

[29]  arXiv:1810.08537 (cross-list from stat.ML) [pdf, other]
Title: Bayesian Distance Clustering
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data.
Keywords: Distance-based clustering; Mixture model; Model-based clustering; Model misspecification; Pairwise distance matrix; Partial likelihood; Robustness.

[30]  arXiv:1810.08553 (cross-list from stat.ML) [pdf, other]
Title: Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data
Comments: Federated learning, distributed databases, PCA, SVD, meta-analysis, brain disease
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)

At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts

[31]  arXiv:1810.08559 (cross-list from eess.AS) [pdf, other]
Title: EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge
Comments: 4 pages
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Signal Processing (eess.SP); Machine Learning (stat.ML)

Despite showing state-of-the-art performance, deep learning for speech recognition remains challenging to deploy in on-device edge scenarios such as mobile and other consumer devices. Recently, there have been greater efforts in the design of small, low-footprint deep neural networks (DNNs) that are more appropriate for edge devices, with much of the focus on design principles for hand-crafting efficient network architectures. In this study, we explore a human-machine collaborative design strategy for building low-footprint DNN architectures for speech recognition through a marriage of human-driven principled network design prototyping and machine-driven design exploration. The efficacy of this design strategy is demonstrated through the design of a family of highly-efficient DNNs (nicknamed EdgeSpeechNets) for limited-vocabulary speech recognition. Experimental results using the Google Speech Commands dataset for limited-vocabulary speech recognition showed that EdgeSpeechNets have higher accuracies than state-of-the-art DNNs (with the best EdgeSpeechNet achieving ~97% accuracy), while achieving significantly smaller network sizes (as much as 7.8x smaller) and lower computational cost (as much as 36x fewer multiply-add operations, 10x lower prediction latency, and 16x smaller memory footprint on a Motorola Moto E phone), making them very well-suited for on-device edge voice interface applications.

[32]  arXiv:1810.08564 (cross-list from stat.ME) [pdf, other]
Title: Nonparametric Bayesian Lomax delegate racing for survival analysis with competing risks
Comments: NIPS 2018
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

We propose Lomax delegate racing (LDR) to explicitly model the mechanism of survival under competing risks and to interpret how the covariates accelerate or decelerate the time to event. LDR explains non-monotonic covariate effects by racing a potentially infinite number of sub-risks, and consequently relaxes the ubiquitous proportional-hazards assumption which may be too restrictive. Moreover, LDR is naturally able to model not only censoring, but also missing event times or event types. For inference, we develop a Gibbs sampler under data augmentation for moderately sized data, along with a stochastic gradient descent maximum a posteriori inference algorithm for big data applications. Illustrative experiments are provided on both synthetic and real datasets, and comparison with various benchmark algorithms for survival analysis with competing risks demonstrates distinguished performance of LDR.

[33]  arXiv:1810.08597 (cross-list from cs.CV) [pdf, other]
Title: Detecting cities in aerial night-time images by learning structural invariants using single reference augmentation
Authors: Philipp Sadler
Comments: Project in Image Classification, Winter 2018, Prof. Dr. Tatjana Scheffler
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper examines, if it is possible to learn structural invariants of city images by using only a single reference picture when producing transformations along the variants in the dataset. Previous work explored the problem of learning from only a few examples and showed that data augmentation techniques benefit performance and generalization for machine learning approaches. First a principal component analysis in conjunction with a Fourier transform is trained on a single reference augmentation training dataset using the city images. Secondly a convolutional neural network is trained on a similar dataset with more samples. The findings are that the convolutional neural network is capable of finding images of the same category whereas the applied principal component analysis in conjunction with a Fourier transform failed to solve this task.

Replacements for Mon, 22 Oct 18

[34]  arXiv:1802.03505 (replaced) [pdf, other]
Title: Coulomb Autoencoders
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
[35]  arXiv:1804.05320 (replaced) [pdf, other]
Title: Generative Adversarial Network based Autoencoder: Application to fault detection problem for closed loop dynamical systems
Comments: 9 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[36]  arXiv:1808.03873 (replaced) [pdf, other]
Title: A Consistent Method for Learning OOMs from Asymptotically Stationary Time Series Data Containing Missing Values
Authors: Tianlin Liu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[37]  arXiv:1808.07576 (replaced) [pdf, other]
Title: Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[38]  arXiv:1810.00319 (replaced) [pdf, other]
Title: Modeling Uncertainty with Hedged Instance Embedding
Comments: 15 pages, 10 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[39]  arXiv:1810.02966 (replaced) [pdf, other]
Title: Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[40]  arXiv:1810.05157 (replaced) [pdf, other]
Title: Learning under Misspecified Objective Spaces
Comments: Conference on Robot Learning (CoRL) 2018
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO); Machine Learning (stat.ML)
[41]  arXiv:1810.05665 (replaced) [src]
Title: Is PGD-Adversarial Training Necessary? Alternative Training via a Soft-Quantization Network with Noisy-Natural Samples Only
Comments: Further improvement
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[42]  arXiv:1704.02890 (replaced) [pdf, other]
Title: Opinion Polarization by Learning from Social Feedback
Comments: Presented at the Social Simulation Conference (Dublin 2017)
Journal-ref: The Journal of Mathematical Sociology, 2018
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Adaptation and Self-Organizing Systems (nlin.AO)
[43]  arXiv:1705.07386 (replaced) [pdf, other]
Title: DeepMasterPrints: Generating MasterPrints for Dictionary Attacks via Latent Variable Evolution
Comments: 8 pages; added new verification systems and diagrams. Accepted to conference Biometrics: Theory, Applications, and Systems 2018
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[44]  arXiv:1710.04102 (replaced) [pdf, other]
Title: Combining learned and analytical models for predicting action effects
Comments: Submitted to IJRR, now includes experiments on learning error models on top of the analytical model and on using non-trivial camera viewpoints
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[45]  arXiv:1802.03569 (replaced) [pdf, other]
Title: Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams
Authors: Tam Le, Makoto Yamada
Comments: to appear at the 32nd Conference on Neural Information Processing Systems (NIPS), Canada, 2018
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Algebraic Topology (math.AT)
[46]  arXiv:1802.09514 (replaced) [pdf, ps, other]
Title: Best Arm Identification for Contaminated Bandits
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
[47]  arXiv:1803.01485 (replaced) [pdf, other]
Title: Totally Looks Like - How Humans Compare, Compared to Machines
Comments: ACCV 2018. Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[48]  arXiv:1805.07932 (replaced) [pdf, other]
Title: Bilinear Attention Networks
Comments: Accepted by NIPS 2018; Figure 1 was updated
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[49]  arXiv:1806.10961 (replaced) [pdf, other]
Title: Automatic Exploration of Machine Learning Experiments on OpenML
Comments: 6 pages, 0 figures
Subjects: Machine Learning (stat.ML); Databases (cs.DB); Machine Learning (cs.LG)
[50]  arXiv:1809.04365 (replaced) [pdf, other]
Title: NNCP: A citation count prediction methodology based on deep neural network learning techniques
Subjects: Digital Libraries (cs.DL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[51]  arXiv:1809.06035 (replaced) [pdf, other]
Title: Extracting Universal Representations of Cognition across Brain-Imaging Studies
Authors: Arthur Mensch (PARIETAL), Julien Mairal, Bertrand Thirion (ODYSSEE), Gaël Varoquaux (PARIETAL)
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[52]  arXiv:1810.00138 (replaced) [pdf]
Title: Modelling Errors in X-ray Fluoroscopic Imaging Systems Using Photogrammetric Bundle Adjustment With a Data-Driven Self-Calibration Approach
Comments: ISPRS TC I Mid-term Symposium "Innovative Sensing - From Sensors to Methods and Applications", 10-12 October 2018. Karlsruhe, Germany
Journal-ref: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-1, 2018
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[53]  arXiv:1810.02876 (replaced) [pdf, ps, other]
Title: Adaptive Clinical Trials: Exploiting Sequential Patient Recruitment and Allocation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[54]  arXiv:1810.05291 (replaced) [pdf, other]
Title: signSGD with Majority Vote is Communication Efficient And Byzantine Fault Tolerant
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[55]  arXiv:1810.08010 (replaced) [pdf, other]
Title: Variational Noise-Contrastive Estimation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[ total of 55 entries: 1-55 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)