HeidelGram: A network of evaluative terms in 19th-century British grammars – Methodological challenges and practical solutions

Beatrix Busse and Ingo Kleiber


The HeidelGram project has a twofold aim. Firstly, it makes an essential contribution to historical grammar studies by compiling, analysing, and giving open access to a representative 10-million word corpus of historical English grammar books from the 16th to the 19th centuries. Secondly, it introduces state-of-the-art network analysis into diachronic corpus linguistics; thus, considerably extending the set of concepts and methods applied in historical linguistics. Our overall aim is to examine discourses in English grammar writing by exemplarily implementing and analysing three networks – a network of grammars and grammarians, a network of evaluative terms associated with verbal hygiene (Cameron 2012 [1995]), and a network of lexemes referring to grammatical phenomena.

While network analytical methods have been applied to historical textual material (e.g. Bergs 2005; Sairio 2009; Fitzmaurice 2010) and fictional texts (e.g. Agarwal et al. 2012; Moretti 2013), the combination of corpus-based diachronic linguistics and network analysis is rather uncharted territory. This new approach poses significant methodological challenges and requires us to come up with new forms of extracting, annotating, and analysing historical linguistic data. A series of exploratory studies (Busse, et al. 2016a and 2016b; Busse and Gather 2016), based on a systematically compiled and representative corpus of 19th -century British grammar books (40 texts, approx. 2.6 mio. words), has already shown the potential of this approach towards conducting historical grammar studies. In the present paper we want to present initial findings regarding the network of evaluative terms and discuss some of the major methodological and technical challenges associated with this approach. These include expressions like “greatly erred” in Crombie’s 1802- grammar: “Priestley, in defending the other phraseology, appears to me to have greatly erred” (Crombie 1802: 302).

This second network will not only help us to critically reflect upon the concepts of prescriptivism and descriptivism, but also to uncover linguistic practices and patterns that may have led to these discursive turns. Based on an extended and optimized version of our pilot-corpus, containing the most-well known and widely distributed grammars of the 19th century (cf. Leitner 1986, 1991; Linn 2006; Michael 1987; Görlach 1998), we will begin to quantitatively investigate terms associated with verbal hygiene (Cameron 2012 [1995]), i.e. active practices of filtering, evaluating, and modifying normative language usage, and their relationships.

Furthermore, informed by this initial analysis, we will discuss three major challenges associated with historical corpus-based network analysis and potential strategies of mitigating them. We will discuss typical issues with optical character recognition (OCR) and state-of-the-art workflows and procedures and tools, both automatic and manual, to reduce misreadings. Also, we will look at problems and solutions associated with automatically generating meaningful graphs (i.e. networks) out of unstructured and unannotated linguistic data. Finally, we will present an early approach of visualizing such graphs in a way that allows for visual diachronic analysis.


