Linguistic Economy Applied to Programming Language Identifiers

Open Access

1 January 2021

journal article
research article
Published by Scientific Research Publishing, Inc. in Journal of Software Engineering and Applications

Vol. 14 (01), 1-10
https://doi.org/10.4236/jsea.2021.141001

Abstract

Though many different readability metrics have been created, there still is no universal agreement defining readability of software source code. The lack of a clear agreement of source code readability has ramifications in many areas of the software development life-cycle, not least of which being software maintainability. We propose a measurement based on Linguistic Economy to bridge the gap between mathematical and behavioral aspects. Linguistic Economy describes efficiencies of speech and is generally applied to natural languages. In our study, we create a large corpus of words that are likely to be found in a programmer’s vocabulary, and a corpus of existing identifiers found in a collection of open-source projects. We perform a usage analysis to create a database from both of these corpora. Linguistic Economy suggests that words requiring less effort to speak are used more often than words requiring more effort. This concept is applied to measure how difficult program identifiers are to understand by extracting them from the program source and comparing their usage to the database. Through this process, we can identify source code that programmers find difficult to review. We validate our work using data from a survey where programmers identified unpleasant to review source files. The results indicate that source files identified as unpleasant to review source code have more linguistically complicated identifiers than pleasant programs.

Keywords

This publication has 6 references indexed in Scilit:

Eliminating Software Caused Mission Failures
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2019
Coding for inspections and reviews
Published by Association for Computing Machinery (ACM) ,2018
Extensive Reading, Narrow Reading and second language learners: implications for libraries
The Australian Library Journal, 2011
Exploring Regularity in Source Code: Software Science and Zipf's Law
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Concise and consistent naming
Software Quality Journal, 2006
Negativity Bias, Negativity Dominance, and Contagion
Personality and Social Psychology Review, 2001

Cited by 1 article