Abstract
Many education professionals in Britain believe that school pupils have difficulty accessing academic texts because of inadequate knowledge of vocabulary. Previous research has suggested that some high frequency words used in non-specialised contexts have academic meanings that can cause problems for school pupils. We take corpus techniques used in the study of higher education texts and apply them to a corpus of texts designed for school pupils aged 11 to 14, attempting to identify such words automatically. We use the Spoken BNC2014 as a reference corpus. We identify a list of semi-technical words (Baker, 1988), many of which are polysemous, having everyday meanings and related school subject meanings that may not be familiar to pupils. We investigate how semi-technical vocabulary can be identified and distinguished from both specialised and general vocabulary. Some supplementary qualitative analysis was needed, using collocation and concordance analysis. While time-consuming, the potential benefits for pupils struggling with school language make this a worthwhile exercise.