A Procedure for Extending Input Selection Algorithms to Low Quality Data in Modelling Problems with Application to the Automatic Grading of Uploaded Assignments

Open Access

7 July 2014

journal article
research article
Published by Hindawi Limited in The Scientific World Journal

Vol. 2014, 1-11
https://doi.org/10.1155/2014/468405

Abstract

When selecting relevant inputs in modeling problems with low quality data, the ranking of the most informative inputs is also uncertain. In this paper, this issue is addressed through a new procedure that allows the extending of different crisp feature selection algorithms to vague data. The partial knowledge about the ordinal of each feature is modelled by means of a possibility distribution, and a ranking is hereby applied to sort these distributions. It will be shown that this technique makes the most use of the available information in some vague datasets. The approach is demonstrated in a real-world application. In the context of massive online computer science courses, methods are sought for automatically providing the student with a qualification through code metrics. Feature selection methods are used to find the metrics involved in the most meaningful predictions. In this study, 800 source code files, collected and revised by the authors in classroom Computer Science lectures taught between 2013 and 2014, are analyzed with the proposed technique, and the most relevant metrics for the automatic grading task are discussed.

Keywords

Funding Information

Spanish Ministerio de Economia y Competitividad (TIN2011-24302)

This publication has 22 references indexed in Scilit:

Ability-training-oriented automated assessment in introductory programming course
Computers & Education, 2011
What’s up with software metrics? – A preliminary mapping study
Journal of Systems and Software, 2010
Genetic learning of fuzzy rules based on low quality data
Fuzzy Sets and Systems, 2009
Mining uncertain data with multiobjective genetic fuzzy systems to be applied in consumer behaviour modelling
Expert Systems with Applications, 2009
SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
Nature Genetics, 2008
Open source software development should strive for even greater code maintainability
Communications of the ACM, 2004
On automated grading of programming assignments in an academic institution
Computers & Education, 2003
Online Judge
Computers & Education, 2001
Using mutual information for selecting features in supervised neural net learning
IEEE Transactions on Neural Networks, 1994
Fuzzy sets-a convenient fiction for modeling vagueness and possibility
IEEE Transactions on Fuzzy Systems, 1994

Cited by 1 article