Regional Patterns and Vulnerability Analysis of Chinese Web Passwords
- 14 October 2015
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Forensics and Security
- Vol. 11 (2), 258-272
- https://doi.org/10.1109/tifs.2015.2490620
Abstract
Current research on password security pays much attention on users who speak Indo-European languages (English, Spanish, and so on), and thus the countermeasures are heavily influenced by Indo-European speakers' choices as well. However, languages have a strong impact on passwords. Analysis without considering other languages (e.g., Chinese) might lead to some biased results, such as Chinese passwords are one of the most difficult ones to guess. We believe that such a conclusion could be biased because, to the best of our knowledge, little empirical study has examined the regional differences of passwords at a large scale, especially on Chinese passwords. In this paper, we comprehensively study the differences between passwords from Chinese and English-dominant users, leveraging over 100 million leaked and publicly available passwords from Chinese and international websites in recent years. We find that Chinese prefer digits when composing their passwords, while English-dominant users prefer letters, especially lowercase letters. However, their strength against password guessing is similar. Second, we observe that both groups of users prefer to use the patterns that they are familiar with, e.g., Chinese Pinyins for Chinese and English words for English-dominant users. In particular, since multiple input methods require various sequences of letters to enter the same Chinese characters, we evaluate the impacts of various Chinese input methods, in addition to Pinyin. Third, we observe that both Chinese and English-dominant users prefer their conventional format when they use dates to construct passwords. Based on these observations, we improve two password guessing methods: 1) probabilistic context-free grammar (PCFG)-based password guessing method and 2) Markov model-based password guessing method. For the PCFG-based method, the guessing efficiency increases by up to 48% after inserting Pinyins (about 2.3% more entries) into the attack dictionary and inserting the observed composition rules into the guessing rule set. For the Markov-model-based method, the guessing efficiency increases by up to 4.7% after we increase the percentage of Pinyins in the training set. Our research sheds light on understanding the impact of regional patterns on passwords.Keywords
Funding Information
- National Natural Science Foundation of China (61572136, 61472358)
- Key Laboratory of Information Network Security, Ministry of Public Security through a open project
- National Key Science and Technology Program
This publication has 25 references indexed in Scilit:
- OMEN: Faster Password Guessing Using an Ordered Markov EnumeratorPublished by Springer Science and Business Media LLC ,2015
- The Tangled Web of Password ReusePublished by Internet Society ,2014
- Measuring password guessability for an entire universityPublished by Association for Computing Machinery (ACM) ,2013
- Visualizing semantics in passwordsPublished by Association for Computing Machinery (ACM) ,2012
- The Quest to Replace Passwords: A Framework for Comparative Evaluation of Web Authentication SchemesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million PasswordsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Guess Again (and Again and Again): Measuring Password Strength by Simulating Password-Cracking AlgorithmsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Investigating the distribution of password choicesPublished by Association for Computing Machinery (ACM) ,2012
- A Birthday Present Every Eleven Wallets? The Security of Customer-Chosen Banking PINsLecture Notes in Computer Science, 2012
- Human selection of mnemonic phrase-based passwordsPublished by Association for Computing Machinery (ACM) ,2006