Between-Subjects Elicitation Studies

7 May 2016

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 3390-3402
https://doi.org/10.1145/2858036.2858228

Abstract

Elicitation studies, where users supply proposals meant to effect system commands, have become a popular method for system designers. But the method to date has assumed a within-subjects procedure and statistics. Despite the benefits of examining the relative agreement of independent groups (e.g., men versus women, children versus adults, novices versus experts, etc.), the lack of appropriate tools for between-subjects agreement rate analysis have prevented so far such comparative investigations. In this work, we expand the elicitation method to between-subjects designs. We introduce a new measure for evaluating coagreement between groups and a new statistical test for agreement rate analysis that reports the exact p-value to evaluate the significance of the difference between agreement rates calculated for independent groups. We show the usefulness of our tools by re-examining previously published gesture elicitation data, for which we discuss significant differences in agreement for technical and non-technical participants, men and women, and different acquisition technologies. Our new tools will enable practitioners to properly analyze their user-elicited data resulted from complex experimental designs with multiple independent groups and, consequently, will help them understand agreement data and verify hypotheses about agreement at more sophisticated levels of analysis.

Keywords

Funding Information

UEFISCDI (PN-II-RU-TE-2014-4-1187)

This publication has 37 references indexed in Scilit:

I'm home: Defining and evaluating a gesture set for smart-home control
International Journal of Human-Computer Studies, 2011
Coefficient Kappa: Some Uses, Misuses, and Alternatives
Educational and Psychological Measurement, 1981
Measuring nominal scale agreement among many raters.
Psychological Bulletin, 1971
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
Psychological Bulletin, 1968
Significance Tests in Discrete Distributions
Journal of the American Statistical Association, 1961
Reliability of Content Analysis: The Case of Nominal Scale Coding
Public Opinion Quarterly, 1955
The $\chi^2$ Test of Goodness of Fit
The Annals of Mathematical Statistics, 1952
On Information and Sufficiency
The Annals of Mathematical Statistics, 1951
The Problem of $m$ Rankings
The Annals of Mathematical Statistics, 1939
X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1900

Cited by 58 articles