Intention and Attention in Image-Text Presentations: A Coherence Approach

Abstract
In image-text presentations from online discourse, pronouns can refer to entities depicted in images, even if these entities are not otherwise referred to in a text caption. While visual salience may be enough to allow a writer to use a pronoun to refer to a prominent entity in the image, coherence theory suggests that pronoun use is more restricted. Specifically, language users may need an appropriate coherence relation between text and imagery to license and resolve pronouns. To explore this hypothesis and better understand the relationship between image context and text interpretation, we annotated an image-text data set with coherence relations and pronoun information. We find that pronoun use reflects a complex interaction between the content of the pronoun, the grammar of the text, and the relation of text and image.