This is an old revision of the document!


Annotation

XML-markup includes the linguistic levels of descriptions and single grammar topics, as well as all kinds of judgemental statements on language use.

The XML structure is defined by a documented RELAX NG schema that is provided alongside the corpus files.

Die idea behind the annotation of this corpus is that we only preserve relevant information instead of trying to copy the grammar book's layout as closely as possible. We, for instance, mark italicised elements if they have a function in the grammar, e.g. emphasis or highlighting. Lists, tables and tree diagrams are rebuilt as structural elements, preserving the logic behind them, and not necessarily their layout.

The planned annotation of references to other authors or grammar books will help us to make the networks visible. Furthermore, the markup of judgemental statements will illustrate a grammar book's degree of prescriptivism and might also hint at its potential for innovation.

Attributes of elements provide additional information that is not part of the text itself, e.g. hierarchies of headings, and different kinds of paragraph.

Inline Elements

"bold", "italic", "underline" - Font Weight

<bold>Text</bold>
<italic>Text</italic>
<underline>Text</underline>

"footnote" - Footnote

There is a footnote* here.

*This is an addition.
There is a footnote <footnote indicator="Asterisk">This is an addition.</footnote> here.

Footnotes are inserted where the footnote indicator (e.g. Asterisk, Dagger, etc.) occurs. If footnotes continue beyond pagebreaks, the pagebreak is omitted within the footnote because it exists within the main text and should not be doubled.


Structural Elements

"list" - Tree Diagrams

Lists can either be “simple”, “bulleted”, or “numbered”. If the list is “numbered”, the label element is required.

<list rend="numbered">
    <head>Parts of Speech</head>
    <label>1</label><item>A Noun is a Name.</item>
    <label>2</label><item>A Verb is a Telling Word.</item>
    <label>3</label><item>An Adjective is a Noun-Marking Word.</item>
    <label>4</label><item>An Adverb is a Modifying Word.</item>
    <label>5</label><item>A Preposition is a Noun-Connecting Word.</item>
    <label>6</label><item>A Conjunction is a Sentence-Connecting Word.</item>
    <label>7</label><item>A Pronoun is a For-Name.</item>
</list>

"table" - Table

 <table cols="2" rows="2">
  <row>
   <cell><small_caps>Adjectives : Nouns</small_caps></cell>
   <cell><small_caps>Adverbs : Verbs</small_caps></cell>
  </row>
  <row>
   <cell><small_caps>Prepositions : Nouns</small_caps></cell>
   <cell><small_caps>Conjunctions : Verbs</small_caps></cell>
  </row>
</table> 

"toc" - Table of Contents

A table of contents consists of a number of entries, each of which has a level, indicating its hierarchy, a section name, and an optional page number.

<paragraph>
    <toc>
        <entry level="1">
            <section_name>Introduction</section_name>
            <page_no>5</page_no>
        </entry>
        <entry level="1">
            <section_name>Etymology</section_name>
            <page_no>10</page_no>
        </entry>
            <entry level="2">
                <section_name>Nouns</section_name>
                <page_no>12</page_no>
            </entry>
        <entry level="1">
            <section_name>Syntax</section_name>
            <page_no>20</page_no>
        </entry>
    </toc>
</paragraph>

"tree" - Tree Diagrams

The arity indicated the maximum number of children a node can have. The child nodes can be either ordered, partially orderd or unorderd.

<tree arity="2" ord="false" order="2">
    <root ref_node="R" children="#C1 #C2">
        <label>Names of Nouns</label>
    </root>
    <leaf ref_node="C1" parent="#R">
        <label>Names of Things</label>
    </leaf>
    <leaf ref_node="C2" parent="#R">
        <label>Names of Notions</label>
    </leaf>
</tree>



Evaluative and Normative Utterances

Evaluative and normative statements by the author on other authors, grammars, society, etc. Single words, phrases, or sentences.

<judgement tendency="positive" type="praise" addressee_explicit="Kigan, John" addresse_implicit="Kigan, John">Appraisal</judgement>



References and Quotations

"quotation" - Quotations

The attribute “source_added” marks if “title” or “author” have been added by the editors.

<quotation author="Surname, First name" title="Title" source_added="0">A quotation</quotation>

"reference" - References

Indicates a referenced author. The element encapsulates the author's name and title.

<reference referenced="Referenced Author" referencing="Referencing Author" type="dedication" judgemental="0" source="Source (Year)">Dedication</reference>



Miscellaneous

"ed_note" - Editor's Note

Attributes: type = addition | correction | omission | note

An editor's note provides additional information given by the editor. It can be either a note, an addition, an omission, or a correction (e.g. typos).

For corrections, the uncorrected form of a word is noted within the element, so that the correct form is counted.

if they have loved <ed_note type="correction">love</ed_note>
<!-- In this case, the wrong spelling "love" in the original is corrected to "loved". -->

"graphic" - Graphic Elements

Complex graphic elements that cannot be fully represented in XML are supplied as digital images. The “desc” element contains a written description of the graphic element and serves as a backup for systems that are not capable of displaying digital images.

<graphic image_id="1" url="images/0_1.jpg" desc="Exercise 36. Prepositions with Sheperds" filetype="jpg"/>