Communications of the ACM

Journal Information
ISSN / EISSN : 0001-0782 / 1557-7317
Total articles ≅ 13,707
Current Coverage
Archived in

Latest articles in this journal

Michael Gardiner, Alexander Truskovsky, George Neville-Neil, Atefeh Mashatan
Communications of the ACM, Volume 64, pp 54-61;

A discussion with Michael Gardiner, Alexander Truskovsky, George Neville-Neil, and Atefeh Mashatan.
Atefeh Mashatan, Douglas Heintzman
Communications of the ACM, Volume 64, pp 46-53;

Is your organization prepared?
Kelly Idell, David Gefen, Arik Ragowsky
Communications of the ACM, Volume 64, pp 72-77;

Organizational distrust, not compensation, is more likely to send IT pros packing.
Logan Kugler
Communications of the ACM, Volume 64, pp 19-20;

A new blockchain-based technology is changing how the art world works, and changing how we think about asset ownership in the process.
Peter J. Denning
Communications of the ACM, Volume 64, pp 35-37;

Back-of-the-envelope calculations are a powerful professional practice.
Thomas Haigh
Communications of the ACM, Volume 64, pp 28-34;

Exploring Ellen Ullman's 'Close to the Machine' and AMC's 'Halt and Catch Fire.'
Marjory S. Blumenthal
Communications of the ACM, Volume 64, pp 25-27;

Seeking security improvements for smart cities.
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
Communications of the ACM, Volume 64, pp 99-106;

Commonsense reasoning remains a major challenge in AI, and yet, recent progresses on benchmarks may seem to suggest otherwise. In particular, the recent neural language models have reported above 90% accuracy on the Winograd Schema Challenge (WSC), a commonsense benchmark originally designed to be unsolvable for statistical models that rely simply on word associations. This raises an important question---whether these models have truly acquired robust commonsense capabilities or they rely on spurious biases in the dataset that lead to an overestimation of the true capabilities of machine commonsense. To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction consist of (1) large-scale crowdsourcing, followed by (2) systematic bias reduction using a novel AFLITE algorithm that generalizes human-detectable word associations to machine-detectable embedding associations. Our experiments demonstrate that state-of-the-art models achieve considerably lower accuracy (59.4%-79.1%) on WINOGRANDE compared to humans (94%), confirming that the high performance on the original WSC was inflated by spurious biases in the dataset. Furthermore, we report new state-of-the-art results on five related benchmarks with emphasis on their dual implications. On the one hand, they demonstrate the effectiveness of WINOGRANDE when used as a resource for transfer learning. On the other hand, the high performance on all these benchmarks suggests the extent to which spurious biases are prevalent in all such datasets, which motivates further research on algorithmic bias reduction.
Back to Top Top