Identifying and describing functional discourse units in the BNC Spoken 2014

Abstract
On the surface, it appears that conversational language is produced in a stream of spoken utterances. In reality conversation is composed of contiguous units that are characterized by coherent communicative purposes. A large number of important research questions about the nature of conversational discourse could be addressed if researchers could investigate linguistic variation across functional discourse units. To date, however, no corpus of conversational language has been annotated according to functional units, and there are no existing methods for carrying out this type of annotation. We introduce a new method for segmenting transcribed conversation files into discourse units and characterizing those units based on their communicative purposes. In this paper, the development and piloting of this method is described in detail and the final framework is presented. We conclude with a discussion of an ongoing project where we are applying this coding framework to the British National Corpus Spoken 2014.