We gratefully acknowledge support from
the Simons Foundation
and member institutions

Computation and Language

New submissions

[ total of 22 entries: 1-22 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 21 Jul 17

[1]  arXiv:1707.06226 [pdf, other]
Title: The Role of Conversation Context for Sarcasm Detection in Online Interactions
Comments: SIGDial 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker's sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task.

[2]  arXiv:1707.06265 [pdf, other]
Title: Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation
Comments: Submitted to the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2017)
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Domain mismatch between training and testing can lead to significant degradation in performance in many machine learning scenarios. Unfortunately, this is not a rare situation for automatic speech recognition deployments in real-world applications. Research on robust speech recognition can be regarded as trying to overcome this domain mismatch issue. In this paper, we address the unsupervised domain adaptation problem for robust speech recognition, where both source and target domain speech are presented, but word transcripts are only available for the source domain speech. We present novel augmentation-based methods that transform speech in a way that does not change the transcripts. Specifically, we first train a variational autoencoder on both source and target domain data (without supervision) to learn a latent representation of speech. We then transform nuisance attributes of speech that are irrelevant to recognition by modifying the latent representations, in order to augment labeled training data with additional data whose distribution is more similar to the target domain. The proposed method is evaluated on the CHiME-4 dataset and reduces the absolute word error rate (WER) by as much as 35% compared to the non-adapted baseline.

[3]  arXiv:1707.06299 [pdf, other]
Title: Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning
Comments: Accepted at SIGDial 2017
Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective reinforcement learning to significantly reduce the number of training dialogues required. We apply our proposed method to find optimized component weights for six domains and compare them to a default baseline.

[4]  arXiv:1707.06320 [pdf, other]
Title: Learning Visually Grounded Sentence Representations
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

We introduce a variety of models, trained on a supervised image captioning corpus to predict the image features for a given caption, to perform sentence representation grounding. We train a grounded sentence encoder that achieves good performance on COCO caption and image retrieval and subsequently show that this encoder can successfully be transferred to various NLP tasks, with improved performance over text-only models. Lastly, we analyze the contribution of grounding, and show that word embeddings learned by this system outperform non-grounded ones.

[5]  arXiv:1707.06341 [pdf, ps, other]
Title: A Sub-Character Architecture for Korean Language Processing
Authors: Karl Stratos
Comments: EMNLP 2017
Subjects: Computation and Language (cs.CL)

We introduce a novel sub-character architecture that exploits a unique compositional structure of the Korean language. Our method decomposes each character into a small set of primitive phonetic units called jamo letters from which character- and word-level representations are induced. The jamo letters divulge syntactic and semantic information that is difficult to access with conventional character-level units. They greatly alleviate the data sparsity problem, reducing the observation space to 1.6% of the original while increasing accuracy in our experiments. We apply our architecture to dependency parsing and achieve dramatic improvement over strong lexical baselines.

[6]  arXiv:1707.06357 [pdf, ps, other]
Title: Improving Discourse Relation Projection to Build Discourse Annotated Corpora
Subjects: Computation and Language (cs.CL)

The naive approach to annotation projection is not effective to project discourse annotations from one language to another because implicit discourse relations are often changed to explicit ones and vice-versa in the translation. In this paper, we propose a novel approach based on the intersection between statistical word-alignment models to identify unsupported discourse annotations. This approach identified 65% of the unsupported annotations in the English-French parallel sentences from Europarl. By filtering out these unsupported annotations, we induced the first PDTB-style discourse annotated corpus for French from Europarl. We then used this corpus to train a classifier to identify the discourse-usage of French discourse connectives and show a 15% improvement of F1-score compared to the classifier trained on the non-filtered annotations.

[7]  arXiv:1707.06378 [pdf, ps, other]
Title: Large-Scale Goodness Polarity Lexicons for Community Question Answering
Comments: SIGIR '17, August 07-11, 2017, Shinjuku, Tokyo, Japan; Community Question Answering; Goodness polarity lexicons; Sentiment Analysis
Subjects: Computation and Language (cs.CL)

We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline and state-of-the art performance on SemEval-2016 Task 3.

[8]  arXiv:1707.06456 [pdf, other]
Title: Revisiting Selectional Preferences for Coreference Resolution
Comments: EMNLP 2017 - short paper
Subjects: Computation and Language (cs.CL)

Selectional preferences have long been claimed to be essential for coreference resolution. However, they are mainly modeled only implicitly by current coreference resolvers. We propose a dependency-based embedding model of selectional preferences which allows fine-grained compatibility judgments with high coverage. We show that the incorporation of our model improves coreference resolution performance on the CoNLL dataset, matching the state-of-the-art results of a more complex system. However, it comes with a cost that makes it debatable how worthwhile such improvements are.

[9]  arXiv:1707.06480 [pdf, other]
Title: Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones
Comments: EMNLP 2017
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Syllabification does not seem to improve word-level RNN language modeling quality when compared to character-based segmentation. However, our best syllable-aware language model, achieving performance comparable to the competitive character-aware model, has 18%-33% fewer parameters and is trained 1.2-2.2 times faster.

[10]  arXiv:1707.06519 [pdf, other]
Title: Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data
Comments: arXiv admin note: text overlap with arXiv:1603.00982
Subjects: Computation and Language (cs.CL)

Audio Word2Vec offers vector representations of fixed dimensionality for variable-length audio segments using Sequence-to-sequence Autoencoder (SA). These vector representations are shown to describe the sequential phonetic structures of the audio segments to a good degree, with real world applications such as query-by-example Spoken Term Detection (STD). This paper examines the capability of language transfer of Audio Word2Vec. We train SA from one language (source language) and use it to extract the vector representation of the audio segments of another language (target language). We found that SA can still catch phonetic structure from the audio segments of the target language if the source and target languages are similar. In query-by-example STD, we obtain the vector representations from the SA learned from a large amount of source language data, and found them surpass the representations from naive encoder and SA directly learned from a small amount of target language data. The result shows that it is possible to learn Audio Word2Vec model from high-resource languages and use it on low-resource languages. This further expands the usability of Audio Word2Vec.

[11]  arXiv:1707.06556 [pdf, other]
Title: High-risk learning: acquiring new word vectors from tiny data
Comments: Accepted as short paper at EMNLP 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Distributional semantics models are known to struggle with small data. It is generally accepted that in order to learn 'a good vector' for a word, a model must have sufficient examples of its usage. This contradicts the fact that humans can guess the meaning of a word from a few occurrences only. In this paper, we show that a neural language model such as Word2Vec only necessitates minor modifications to its standard architecture to learn new terms from tiny data, using background knowledge from a previously learnt semantic space. We test our model on word definitions and on a nonce task involving 2-6 sentences' worth of context, showing a large increase in performance over state-of-the-art models on the definitional task.

Cross-lists for Fri, 21 Jul 17

[12]  arXiv:1707.06355 (cross-list from cs.CV) [pdf, ps, other]
Title: Video Question Answering via Attribute-Augmented Attention Network Learning
Comments: Accepted for SIGIR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Video Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question. However, the existing visual question answering approaches mainly tackle the problem of static image question, which may be ineffectively for video question answering due to the insufficiency of modeling the temporal dynamics of video contents. In this paper, we study the problem of video question answering by modeling its temporal dynamics with frame-level attention mechanism. We propose the attribute-augmented attention network learning framework that enables the joint frame-level attribute detection and unified video representation learning for video question answering. We then incorporate the multi-step reasoning process for our proposed attention network to further improve the performance. We construct a large-scale video question answering dataset. We conduct the experiments on both multiple-choice and open-ended video question answering tasks to show the effectiveness of the proposed method.

[13]  arXiv:1707.06527 (cross-list from cs.SD) [pdf, other]
Title: Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training
Comments: 11 pages, 6 figures, Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing. arXiv admin note: text overlap with arXiv:1704.01985
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

Although great progresses have been made in automatic speech recognition (ASR), significant performance degradation is still observed when recognizing multi-talker mixed speech. In this paper, we propose and evaluate several architectures to address this problem under the assumption that only a single channel of mixed signal is available. Our technique extends permutation invariant training (PIT) by introducing the front-end feature separation module with the minimum mean square error (MSE) criterion and the back-end recognition module with the minimum cross entropy (CE) criterion. More specifically, during training we compute the average MSE or CE over the whole utterance for each possible utterance-level output-target assignment, pick the one with the minimum MSE or CE, and optimize for that assignment. This strategy elegantly solves the label permutation problem observed in the deep learning based multi-talker mixed speech separation and recognition systems. The proposed architectures are evaluated and compared on an artificially mixed AMI dataset with both two- and three-talker mixed speech. The experimental results indicate that our proposed architectures can cut the word error rate (WER) by 45.0% and 25.0% relatively against the state-of-the-art single-talker speech recognition system across all speakers when their energies are comparable, for two- and three-talker mixed speech, respectively. To our knowledge, this is the first work on the multi-talker mixed speech recognition on the challenging speaker-independent spontaneous large vocabulary continuous speech task.

[14]  arXiv:1707.06562 (cross-list from cs.IR) [pdf, ps, other]
Title: From Task Classification Towards Similarity Measures for Recommendation in Crowdsourcing Systems
Comments: Work in Progress Paper at HCOMP 2017
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Task selection in micro-task markets can be supported by recommender systems to help individuals to find appropriate tasks. Previous work showed that for the selection process of a micro-task the semantic aspects, such as the required action and the comprehensibility, are rated more important than factual aspects, such as the payment or the required completion time. This work gives a foundation to create such similarity measures. Therefore, we show that an automatic classification based on task descriptions is possible. Additionally, we propose similarity measures to cluster micro-tasks according to semantic aspects.

[15]  arXiv:1707.06588 (cross-list from cs.LG) [pdf, other]
Title: Voice Synthesis for in-the-Wild Speakers via a Phonological Loop
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD)

We present a new neural text to speech method that is able to transform text to speech in voices that are sampled in the wild. Unlike other text to speech systems, our solution is able to deal with unconstrained samples obtained from public speeches. The network architecture is simpler than those in the existing literature and is based on a novel shifting buffer working memory. The same buffer is used for estimating the attention, computing the output audio, and for updating the buffer itself. The input sentence is encoded using a context-free lookup table that contains one entry per character or phoneme. Lastly, the speakers are similarly represented by a short vector that can also be fitted to new speakers and variability in the generated speech is achieved by priming the buffer prior to generating the audio. Experimental results on two datasets demonstrate convincing multi-speaker and in-the-wild capabilities. In order to promote reproducibility, we release our source code and models: PyTorch code and sample audio files are available at ytaigman.github.io/loop.

[16]  arXiv:1707.06598 (cross-list from cs.IR) [pdf, other]
Title: Toward Incorporation of Relevant Documents in word2vec
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Recent advances in neural word embedding provide significant benefit to various information retrieval tasks. However as shown by recent studies, adapting the embedding models for the needs of IR tasks can bring considerable further improvements. The embedding models in general define the term relatedness by exploiting the terms' co-occurrences in short-window contexts. An alternative (and well-studied) approach in IR for related terms to a query is using local information i.e. a set of top-retrieved documents. In view of these two methods of term relatedness, in this work, we report our study on incorporating the local information of the query in the word embeddings. One main challenge in this direction is that the dense vectors of word embeddings and their estimation of term-to-term relatedness remain difficult to interpret and hard to analyze. As an alternative, explicit word representations propose vectors whose dimensions are easily interpretable, and recent methods show competitive performance to the dense vectors. We introduce a neural-based explicit representation, rooted in the conceptual ideas of the word2vec Skip-Gram model. The method provides interpretable explicit vectors while keeping the effectiveness of the Skip-Gram model. The evaluation of various explicit representations on word association collections shows that the newly proposed method out- performs the state-of-the-art explicit representations when tasked with ranking highly similar terms. Based on the introduced ex- plicit representation, we discuss our approaches on integrating local documents in globally-trained embedding models and discuss the preliminary results.

Replacements for Fri, 21 Jul 17

[17]  arXiv:1602.08844 (replaced) [pdf]
Title: Bioinformatics and Classical Literary Study
Subjects: Computation and Language (cs.CL)
[18]  arXiv:1704.04520 (replaced) [pdf, other]
Title: Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy
Comments: ACL 2017 rejected
Subjects: Computation and Language (cs.CL)
[19]  arXiv:1707.05850 (replaced) [pdf, ps, other]
Title: A Short Survey of Biomedical Relation Extraction Techniques
Authors: Elham Shahab
Comments: updated keywords and change the format
Subjects: Computation and Language (cs.CL)
[20]  arXiv:1602.05875 (replaced) [pdf, other]
Title: Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL)
[21]  arXiv:1705.02315 (replaced) [pdf, other]
Title: ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
Comments: CVPR 2017 spotlight;V1: CVPR submission+supplementary; V2: Statistics and benchmark results on published ChestX-ray14 dataset are updated in Appendix B V3: Minor correction NOTE **** Dataset link will be updated soon for public access ****
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[22]  arXiv:1706.05656 (replaced) [pdf, ps, other]
Title: Lexical representation explains cortical entrainment during speech comprehension
Comments: Submitted for publication
Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL)
[ total of 22 entries: 1-22 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)