This month’s feature is contributed by Annabelle Lukin (Macquarie University) and Rodrigo Araújo e Castro (Universidade Minas Gerais/Macquarie University). They introduce a newly available corpus, based on the key texts of international war law, now available to be searched using corpus linguistics techniques. This corpus enables critical law scholars and linguists to collaborate on studies of c.170 years of international war law, informed by both legal and linguistic theories and methods.
Every text makes its mark on the world. Some marks are like a small ripple on a pond, while others, like the texts of international war law, are like powerful waves, as they set the legal framework for the use of lethal violence by nation states. These texts construct a semiotic universe in which, among other extreme forms of human behaviour, the killing of children can either be given legal imprimatur, or can be labelled a ‘war crime’.
To enable better interdisciplinary collaboration on these key texts, we have created the Macquarie Laws of War Corpus (MQLWC), based on the texts included by the International Committee of the Red Cross (ICRC) in their International Humanitarian Law Database.
The MQLWC is hosted by the Sydney Corpus Lab. It begins with the 1856 Paris Declaration Respecting Maritime Law, the first open-ended multilateral treaty to which any state could become a party. The most recent document is the latest amendment to the Rome Statute (2019), the legal instrument which established the International Criminal Court, the body with responsibility for trying individuals charged with war crimes, crimes against humanity, crimes of aggression, and genocide.
Figure 1: a sample of concordance lines for “civilian” in the MQLWC
The corpus includes a total of 110 texts, nearly 392K words, and can be searched using basic corpus linguistic techniques such as word frequencies (see Table 1 for top 20 lexical items in the MQLWC), text dispersion, concordances and collocations. The corpus can also be searched by the categories to which these documents are assigned by the ICRC, such as ‘victims of armed conflicts’, ‘methods and means of warfare’, ‘criminal repression’, etc. The data set can also be downloaded to use in other programs, such as #Lancsbox or Voyant Tools.
Table 1: Twenty most frequent lexical (content) items in the MQLWC
Because the data is tagged by year of adoption, diachronic questions (i.e., how do patterns change over the time period of almost 170 years of the data?) can be asked. Figure 2, using Voyant Tools, compares the words ‘military*’ and ‘civilian*’ (with the asterisk denoting that the search includes all related word forms, e.g., ‘civilian/s’), and shows the relative dominance of ‘military’ over ‘civilian’ in international war law, and that ‘civilian’ grows as a preoccupation of international war law over time.
Figure 2: Comparing ‘military*’ and ‘civilian*’ across the timeline of international war law
Collocational searches allow us to see the typical words that accompany key words in these texts. Collocates are crucial to understanding the meaning of a word, and the way it is being used in a particular register. With a program like #Lancsbox, we can visualise the collocates of a word, and investigate whether it has proximity to another key word. Figure 3 compares two words, ‘violence’ (on the left) with ‘war’ (on the right). The diagram shows how distinct these two words are in this corpus, with ‘war being a clearly dominant concept, and ‘violence’ being kept at a distance from ‘war’ – a finding that echoes studies of other data. Despite what should be a logical association, we continue to use ‘war’ in a way that protects it from the negative semantics of ‘violence’.
Figure 3: Collocations of ‘violence’ and ‘war’ in the MQLWC
To find out more, read our recently published paper where we give examples of how this corpus can be used to understand the powerful role of the laws of war, not only in restraining geopolitical violence, but also in very clear ways enabling and legitimating it.