Showing 1–33 of 33 results for author: Shah, N B

.
  1. arXiv:1806.06266  [pdf, ps, other cs.GT

    On Strategyproof Conference Peer Review

    Authors: Yichong Xu, Han Zhao, Xiaofei Shi, Nihar B. Shah

    Abstract: We consider peer review in a conference setting where there is typically an overlap between the set of reviewers and the set of authors. This overlap can incentivize strategic reviews to influence the final ranking of one's own papers. In this work, we address this problem through the lens of social choice, and present a theoretical framework for strategyproof and efficient peer review. We first p… ▽ More

    Submitted 16 June, 2018; originally announced June 2018.

  2. arXiv:1806.06237  [pdf, other stat.ML

    PeerReview4All: Fair and Accurate Reviewer Assignment in Peer Review

    Authors: Ivan Stelmakh, Nihar B. Shah, Aarti Singh

    Abstract: We consider the problem of automated assignment of papers to reviewers in conference peer review, with a focus on fairness and statistical accuracy. Our fairness objective is to maximize the review quality of the most disadvantaged paper, in contrast to the commonly used objective of maximizing the total quality over all papers. We design an assignment algorithm based on an incremental max-flow pr… ▽ More

    Submitted 16 June, 2018; originally announced June 2018.

  3. arXiv:1806.05085  [pdf, other stat.ML

    Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings

    Authors: Jingyan Wang, Nihar B. Shah

    Abstract: Cardinal scores (numeric ratings) collected from people are well known to suffer from miscalibrations. A popular approach to address this issue is to assume simplistic models of miscalibration (such as linear biases) to de-bias the scores. This approach, however, often fares poorly because people's miscalibrations are typically far more complex and not well understood. In the absence of simplifyin… ▽ More

    Submitted 13 June, 2018; originally announced June 2018.

  4. arXiv:1709.00127  [pdf, ps, other stat.ML

    Low Permutation-rank Matrices: Structural Properties and Noisy Completion

    Authors: Nihar B. Shah, Sivaraman Balakrishnan, Martin J. Wainwright

    Abstract: We consider the problem of noisy matrix completion, in which the goal is to reconstruct a structured matrix whose entries are partially observed in noise. Standard approaches to this underdetermined inverse problem are based on assuming that the underlying matrix has low rank, or is well-approximated by a low rank matrix. In this paper, we propose a richer model based on what we term the "permutat… ▽ More

    Submitted 31 August, 2017; originally announced September 2017.

  5. arXiv:1708.09794  [pdf, other cs.DL

    Design and Analysis of the NIPS 2016 Review Process

    Authors: Nihar B. Shah, Behzad Tabibian, Krikamol Muandet, Isabelle Guyon, Ulrike von Luxburg

    Abstract: Neural Information Processing Systems (NIPS) is a top-tier annual conference in machine learning. The 2016 edition of the conference comprised more than 2,400 paper submissions, 3,000 reviewers, and 8,000 attendees. This represents a growth of nearly 40% in terms of submissions, 96% in terms of reviewers, and over 100% in terms of attendees as compared to the previous year. The massive scale as we… ▽ More

    Submitted 23 April, 2018; v1 submitted 31 August, 2017; originally announced August 2017.

  6. arXiv:1606.09632  [pdf, other cs.LG

    A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

    Authors: Nihar B. Shah, Sivaraman Balakrishnan, Martin J. Wainwright

    Abstract: The aggregation and denoising of crowd labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a h… ▽ More

    Submitted 30 June, 2016; originally announced June 2016.

  7. arXiv:1606.08842  [pdf, other cs.LG

    Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don't Help

    Authors: Reinhard Heckel, Nihar B. Shah, Kannan Ramchandran, Martin J. Wainwright

    Abstract: We consider sequential or active ranking of a set of n items based on noisy pairwise comparisons. Items are ranked according to the probability that a given item beats a randomly chosen item, and ranking refers to partitioning the items into sets of pre-specified sizes according to their scores. This notion of ranking includes as special cases the identification of the top-k items and the total or… ▽ More

    Submitted 23 September, 2016; v1 submitted 28 June, 2016; originally announced June 2016.

    Comments: improved log factor in main result; added discussion on comparison probabilities close to zero; added numerical results

  8. arXiv:1603.06881  [pdf, ps, other cs.LG

    Feeling the Bern: Adaptive Estimators for Bernoulli Probabilities of Pairwise Comparisons

    Authors: Nihar B. Shah, Sivaraman Balakrishnan, Martin J. Wainwright

    Abstract: We study methods for aggregating pairwise comparison data in order to estimate outcome probabilities for future comparisons among a collection of n items. Working within a flexible framework that imposes only a form of strong stochastic transitivity (SST), we introduce an adaptivity index defined by the indifference sets of the pairwise comparison probabilities. In addition to measuring the usual… ▽ More

    Submitted 22 March, 2016; originally announced March 2016.

  9. arXiv:1602.07435  [pdf, other cs.GT

    Parametric Prediction from Parametric Agents

    Authors: Yuan Luo, Nihar B. Shah, Jianwei Huang, Jean Walrand

    Abstract: We consider a problem of prediction based on opinions elicited from heterogeneous rational agents with private information. Making an accurate prediction with a minimal cost requires a joint design of the incentive mechanism and the prediction algorithm. Such a problem lies at the nexus of statistical learning theory and game theory, and arises in many domains such as consumer surveys and mobile c… ▽ More

    Submitted 24 February, 2016; originally announced February 2016.

  10. arXiv:1512.08949  [pdf, other cs.LG

    Simple, Robust and Optimal Ranking from Pairwise Comparisons

    Authors: Nihar B. Shah, Martin J. Wainwright

    Abstract: We consider data in the form of pairwise comparisons of n items, with the goal of precisely identifying the top k items for some value of k < n, or alternatively, recovering a ranking of all the items. We analyze the Copeland counting algorithm that ranks the items in order of the number of pairwise comparisons won, and show it has three attractive features: (a) its computational efficiency leads… ▽ More

    Submitted 26 April, 2016; v1 submitted 30 December, 2015; originally announced December 2015.

    Comments: Changes in version 2: In addition to recovery in the exact and Hamming metrics, v2 analyzes a general, abstract recovery criterion based on a notion of "allowed sets"

  11. arXiv:1510.05610  [pdf, other stat.ML

    Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues

    Authors: Nihar B. Shah, Sivaraman Balakrishnan, Adityanand Guntuboyina, Martin J. Wainwright

    Abstract: There are various parametric models for analyzing pairwise comparison data, including the Bradley-Terry-Luce (BTL) and Thurstone models, but their reliance on strong parametric assumptions is limiting. In this work, we study a flexible model for pairwise comparisons, under which the probabilities of outcomes are required only to satisfy a natural form of stochastic transitivity. This class include… ▽ More

    Submitted 27 September, 2016; v1 submitted 19 October, 2015; originally announced October 2015.

  12. arXiv:1508.03787  [pdf, other cs.IT

    Information-theoretically Secure Erasure Codes for Distributed Storage

    Authors: Nihar B. Shah, K. V. Rashmi, Kannan Ramchandran, P. Vijay Kumar

    Abstract: Repair operations in distributed storage systems potentially expose the data to malicious acts of passive eavesdroppers or active adversaries, which can be detrimental to the security of the system. This paper presents erasure codes and repair algorithms that ensure security of the data in the presence of passive eavesdroppers and active adversaries, while maintaining high availability, reliabilit… ▽ More

    Submitted 15 August, 2015; originally announced August 2015.

  13. arXiv:1505.01462  [pdf, other cs.LG

    Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence

    Authors: Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin J. Wainwright

    Abstract: Data in the form of pairwise comparisons arises in many domains, including preference elicitation, sporting competitions, and peer grading among others. We consider parametric ordinal models for such pairwise comparison data involving a latent vector $w^* \in \mathbb{R}^d$ that represents the "qualities" of the $d$ items being compared; this class of models includes the two most widely used parame… ▽ More

    Submitted 6 May, 2015; originally announced May 2015.

    Comments: 39 pages, 5 figures. Significant extension of arXiv:1406.6618

  14. arXiv:1503.07240  [pdf, ps, other cs.LG

    Regularized Minimax Conditional Entropy for Crowdsourcing

    Authors: Dengyong Zhou, Qiang Liu, John C. Platt, Christopher Meek, Nihar B. Shah

    Abstract: There is a rapidly increasing interest in crowdsourcing for data labeling. By crowdsourcing, a large number of labels can be often quickly gathered at low cost. However, the labels provided by the crowdsourcing workers are usually not of high quality. In this paper, we propose a minimax conditional entropy principle to infer ground truth from noisy crowdsourced labels. Under this principle, we der… ▽ More

    Submitted 24 March, 2015; originally announced March 2015.

    Comments: 31 pages

  15. arXiv:1502.05696  [pdf, other cs.GT

    Approval Voting and Incentives in Crowdsourcing

    Authors: Nihar B. Shah, Dengyong Zhou, Yuval Peres

    Abstract: The growing need for labeled training data has made crowdsourcing an important part of machine learning. The quality of crowdsourced labels is, however, adversely affected by three factors: (1) the workers are not experts; (2) the incentives of the workers are not aligned with those of the requesters; and (3) the interface does not allow workers to convey their knowledge accurately, by forcing the… ▽ More

    Submitted 7 September, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

  16. arXiv:1411.5977  [pdf, other stat.ML

    On the Impossibility of Convex Inference in Human Computation

    Authors: Nihar B. Shah, Dengyong Zhou

    Abstract: Human computation or crowdsourcing involves joint inference of the ground-truth-answers and the worker-abilities by optimizing an objective function, for instance, by maximizing the data likelihood based on an assumed underlying model. A variety of methods have been proposed in the literature to address this inference problem. As far as we know, none of the objective functions in existing methods… ▽ More

    Submitted 21 November, 2014; originally announced November 2014.

    Comments: AAAI 2015

  17. Fundamental Limits on Communication for Oblivious Updates in Storage Networks

    Authors: Preetum Nakkiran, Nihar B. Shah, K. V. Rashmi

    Abstract: In distributed storage systems, storage nodes intermittently go offline for numerous reasons. On coming back online, nodes need to update their contents to reflect any modifications to the data in the interim. In this paper, we consider a setting where no information regarding modified data needs to be logged in the system. In such a setting, a 'stale' node needs to update its contents by download… ▽ More

    Submitted 5 September, 2014; originally announced September 2014.

    Comments: IEEE Global Communications Conference (GLOBECOM) 2014

  18. arXiv:1408.1387  [pdf, other cs.GT

    Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing

    Authors: Nihar B. Shah, Dengyong Zhou

    Abstract: Crowdsourcing has gained immense popularity in machine learning applications for obtaining large amounts of labeled data. Crowdsourcing is cheap and fast, but suffers from the problem of low-quality data. To address this fundamental challenge in crowdsourcing, we propose a simple payment mechanism to incentivize workers to answer only the questions that they are sure of and skip the rest. We show… ▽ More

    Submitted 16 December, 2015; v1 submitted 6 August, 2014; originally announced August 2014.

  19. arXiv:1406.6618  [pdf, other stat.ML

    When is it Better to Compare than to Score?

    Authors: Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin Wainwright

    Abstract: When eliciting judgements from humans for an unknown quantity, one often has the choice of making direct-scoring (cardinal) or comparative (ordinal) measurements. In this paper we study the relative merits of either choice, providing empirical and theoretical guidelines for the selection of a measurement scheme. We provide empirical evidence based on experiments on Amazon Mechanical Turk that in a… ▽ More

    Submitted 25 June, 2014; originally announced June 2014.

  20. arXiv:1311.2851  [pdf, other cs.NI

    When Do Redundant Requests Reduce Latency ?

    Authors: Nihar B. Shah, Kangwook Lee, Kannan Ramchandran

    Abstract: Several systems possess the flexibility to serve requests in more than one way. For instance, a distributed storage system storing multiple replicas of the data can serve a request from any of the multiple servers that store the requested data, or a computational task may be performed in a compute-cluster by any one of multiple processors. In such systems, the latency of serving the requests may p… ▽ More

    Submitted 6 November, 2013; originally announced November 2013.

    Comments: Extended version of paper presented at Allerton Conference 2013

  21. arXiv:1309.0186  [pdf, other cs.NI

    A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster

    Authors: K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, Kannan Ramchandran

    Abstract: Erasure codes, such as Reed-Solomon (RS) codes, are being increasingly employed in data centers to combat the cost of reliably storing large amounts of data. Although these codes provide optimal storage efficiency, they require significantly high network and disk usage during recovery of missing data. In this paper, we first present a study on the impact of recovery operations of erasure-coded dat… ▽ More

    Submitted 1 September, 2013; originally announced September 2013.

    Comments: In proceedings of USENIX HotStorage, San Jose, June 2013

  22. arXiv:1302.5872  [pdf, other cs.IT

    A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes

    Authors: K. V. Rashmi, Nihar B. Shah, Kannan Ramchandran

    Abstract: We present a new 'piggybacking' framework for designing distributed storage codes that are efficient in data-read and download required during node-repair. We illustrate the power of this framework by constructing classes of explicit codes that entail the smallest data-read and download for repair among all existing solutions for three important settings: (a) codes meeting the constraints of being… ▽ More

    Submitted 24 February, 2013; originally announced February 2013.

    Comments: Extended version of ISIT 2013 submission

  23. On Minimizing Data-read and Download for Storage-Node Recovery

    Authors: Nihar B. Shah

    Abstract: We consider the problem of efficient recovery of the data stored in any individual node of a distributed storage system, from the rest of the nodes. Applications include handling failures and degraded reads. We measure efficiency in terms of the amount of data-read and the download required. To minimize the download, we focus on the minimum bandwidth setting of the 'regenerating codes' model for d… ▽ More

    Submitted 2 April, 2013; v1 submitted 31 December, 2012; originally announced December 2012.

    Comments: IEEE Communications Letters

  24. arXiv:1211.5405  [pdf, other cs.IT

    The MDS Queue: Analysing the Latency Performance of Erasure Codes

    Authors: Nihar B. Shah, Kangwook Lee, Kannan Ramchandran

    Abstract: In order to scale economically, data centers are increasingly evolving their data storage methods from the use of simple data replication to the use of more powerful erasure codes, which provide the same level of reliability as replication but at a significantly lower storage cost. In particular, it is well known that Maximum-Distance-Separable (MDS) codes, such as Reed-Solomon codes, provide the… ▽ More

    Submitted 10 November, 2013; v1 submitted 22 November, 2012; originally announced November 2012.

  25. arXiv:1207.0120  [pdf, other cs.CR

    Distributed Secret Dissemination Across a Network

    Authors: Nihar B. Shah, K. V. Rashmi, Kannan Ramchandran

    Abstract: Shamir's (n, k) threshold secret sharing is an important component of several cryptographic protocols, such as those for secure multiparty-computation and key management. These protocols typically assume the presence of direct communication links from the dealer to all participants, in which case the dealer can directly pass the shares of the secret to each participant. In this paper, we consider… ▽ More

    Submitted 22 October, 2014; v1 submitted 30 June, 2012; originally announced July 2012.

    Comments: Extended version of a paper presented at the International Symposium on Information Theory (ISIT) 2013

  26. arXiv:1202.1050  [pdf, other cs.IT

    Regenerating Codes for Errors and Erasures in Distributed Storage

    Authors: K. V. Rashmi, Nihar B. Shah, Kannan Ramchandran, P. Vijay Kumar

    Abstract: Regenerating codes are a class of codes proposed for providing reliability of data and efficient repair of failed nodes in distributed storage systems. In this paper, we address the fundamental problem of handling errors and erasures during the data-reconstruction and node-repair operations. We provide explicit regenerating codes that are resilient to errors and erasures, and show that these codes… ▽ More

    Submitted 23 May, 2012; v1 submitted 6 February, 2012; originally announced February 2012.

    Comments: ISIT 2012

  27. arXiv:1107.5279  [pdf, ps, other cs.IT

    Information-theoretically Secure Regenerating Codes for Distributed Storage

    Authors: Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar

    Abstract: Regenerating codes are a class of codes for distributed storage networks that provide reliability and availability of data, and also perform efficient node repair. Another important aspect of a distributed storage network is its security. In this paper, we consider a threat model where an eavesdropper may gain access to the data stored in a subset of the storage nodes, and possibly also, to the da… ▽ More

    Submitted 26 July, 2011; originally announced July 2011.

    Comments: Globecom 2011

  28. Enabling Node Repair in Any Erasure Code for Distributed Storage

    Authors: K. V. Rashmi, Nihar B. Shah, P. Vijay Kumar

    Abstract: Erasure codes are an efficient means of storing data across a network in comparison to data replication, as they tend to reduce the amount of data stored in the network and offer increased resilience in the presence of node failures. The codes perform poorly though, when repair of a failed node is called for, as they typically require the entire file to be downloaded to repair a failed node. A new… ▽ More

    Submitted 30 June, 2011; v1 submitted 30 December, 2010; originally announced January 2011.

    Comments: IEEE International Symposium on Information Theory (ISIT) 2011 (to be presented)

  29. Distributed Storage Codes with Repair-by-Transfer and Non-achievability of Interior Points on the Storage-Bandwidth Tradeoff

    Authors: Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar, Kannan Ramchandran

    Abstract: Regenerating codes are a class of recently developed codes for distributed storage that, like Reed-Solomon codes, permit data recovery from any subset of k nodes within the n-node network. However, regenerating codes possess in addition, the ability to repair a failed node by connecting to an arbitrary subset of d nodes. It has been shown that for the case of functional-repair, there is a tradeoff… ▽ More

    Submitted 16 November, 2010; v1 submitted 10 November, 2010; originally announced November 2010.

    Comments: 30 pages, 6 figures. Submitted to IEEE Transactions on Information Theory

  30. Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction

    Authors: K. V. Rashmi, Nihar B. Shah, P. Vijay Kumar

    Abstract: Regenerating codes are a class of distributed storage codes that optimally trade the bandwidth needed for repair of a failed node with the amount of data stored per node of the network. Minimum Storage Regenerating (MSR) codes minimize first, the amount of data stored per node, and then the repair bandwidth, while Minimum Bandwidth Regenerating (MBR) codes carry out the minimization in the reverse… ▽ More

    Submitted 20 January, 2011; v1 submitted 23 May, 2010; originally announced May 2010.

    Comments: Submitted to IEEE Transactions on Information Theory. Contains 20 pages, 2 figures

    Journal ref: IEEE Transactions on Information Theory, vol. 57, no. 8, pp. 5227 - 5239, August 2011

  31. Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions

    Authors: Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar, Kannan Ramchandran

    Abstract: Regenerating codes are a class of recently developed codes for distributed storage that, like Reed-Solomon codes, permit data recovery from any arbitrary k of n nodes. However regenerating codes possess in addition, the ability to repair a failed node by connecting to any arbitrary d nodes and downloading an amount of data that is typically far less than the size of the data file. This amount of d… ▽ More

    Submitted 13 September, 2010; v1 submitted 10 May, 2010; originally announced May 2010.

    Comments: 38 pages, 12 figures, submitted to the IEEE Transactions on Information Theory;v3 - The title has been modified to better reflect the contributions of the submission. The paper is extensively revised with several carefully constructed figures and examples

  32. arXiv:0908.2984  [pdf, other cs.IT

    Explicit Codes Minimizing Repair Bandwidth for Distributed Storage

    Authors: Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar, Kannan Ramchandran

    Abstract: We consider the setting of data storage across n nodes in a distributed manner. A data collector (DC) should be able to reconstruct the entire data by connecting to any k out of the n nodes and downloading all the data stored in them. When a node fails, it has to be regenerated back using the existing nodes. In a recent paper, Wu et al. have obtained an information theoretic lower bound for the… ▽ More

    Submitted 5 September, 2009; v1 submitted 20 August, 2009; originally announced August 2009.

    Comments: 11 pages, 4 figures v2: corrected typos

  33. arXiv:0906.4913  [pdf, other cs.IT

    Explicit Construction of Optimal Exact Regenerating Codes for Distributed Storage

    Authors: K. V. Rashmi, Nihar B. Shah, P. Vijay Kumar, Kannan Ramchandran

    Abstract: Erasure coding techniques are used to increase the reliability of distributed storage systems while minimizing storage overhead. Also of interest is minimization of the bandwidth required to repair the system following a node failure. In a recent paper, Wu et al. characterize the tradeoff between the repair bandwidth and the amount of data stored per node. They also prove the existence of regene… ▽ More

    Submitted 6 October, 2009; v1 submitted 26 June, 2009; originally announced June 2009.

    Comments: 7 pages, 2 figures, in the Proceedings of Allerton Conference on Communication, Control and Computing, September 2009