The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords
Top Cited Papers
Open Access
- 1 May 2012
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 538-552
- https://doi.org/10.1109/sp.2012.49
Abstract
We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker's desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.Keywords
This publication has 30 references indexed in Scilit:
- Measurement of enterprise management efficiency based upon information entropy and evidence theoryInternational Journal of Applied Management Science, 2010
- Password Security: An Empirical Investigation into E-Commerce Passwords and Their Crack TimesInformation Systems Security, 2006
- PassPoints: Design and longitudinal evaluation of a graphical password systemInternational Journal of Human-Computer Studies, 2005
- Good‐turing frequency estimation without tears*Journal of Quantitative Linguistics, 1995
- Passwords in use in a university timesharing environmentComputers & Security, 1989
- Password cracking: a game of witsCommunications of the ACM, 1989
- Password securityCommunications of the ACM, 1979
- The trackerACM Transactions on Database Systems, 1979
- On a Distribution Law for Word FrequenciesJournal of the American Statistical Association, 1975
- Protection and the control of information sharing in multicsCommunications of the ACM, 1974