When Henrietta Vaughan Stannard opened her novel Army Society: Life in a Garrison Town with the question what elements constitute army society? She answered that there are three elements: mothers, daughters and the army. A word cloud of the text of the novel reveals that there was an awful lot of “Oh!”, “Yes” and “Mrs” going on! In this blog post, Michelle Crowther, Learning and Research Librarian finds out more about the novel using distance reading techniques.
Arrival of the Collection
When new collections arrive at the university special collections and archives, I’m always excited to find out more about them and encourage researchers to come and use them. However, the size and breadth of some of the collections can be overwhelming and a quick tweet is sometimes all I have time to do. When the Henrietta Vaughan Stannard (pseud. John Strange Winter) books arrived, I decided to use Voyant Tools, an open-source text analysis tool to do the work for me. There are other methods that can be used to analyse large corpora but Voyant is simple and effective as it allows you to see through the text, visualize it as a whole and do keyword in context analysis. It’s what’s known as ‘distant reading’ or non-consumptive research and deploys different strategies to the more traditional close reading and interpretation of a text.
Finding the books
To begin, I had to find machine-readable copies of Stannard’s work, as although visually sumptuous, the CCCU collection of her works is print-based. I found 86 texts on Hathi Trust, a not-for-profit collaborative of academic and research libraries which has preserved 17+ million digitized items. I wanted to profile the CCCU collection of Stannard’s work rather than her whole oeuvre so I located the 70 texts that we have on Hathi Trust and downloaded them as text files. This was easy enough as Hathi Trust allows digital download for non-consumptive research, but the texts contain a lot of extraneous data or ‘noise’ that needs cleaning up, including copyright statements. The finished result is known as ‘smart data’ (Schöch, 2013) as it provides more accurate and trustworthy results. At this point, I did wonder whether quickly reading one of her books would be easier, as the human labour involved in cleaning data can be time-consuming. Text mining is not for the faint-hearted.
Cleaning the data
I decided to start small and to clean and upload one text to Voyant to see how the data performed. I chose Army Society: Life in a Garrison Town: A discursive story as I was aware that Stannard was known for her regimental tales. There was a lot of publishers’ advertisements in the text file listing contemporaneous authors and their works, and it was hard to know whether to keep this in or remove it as non-essential to the literary text. These sorts of decisions need to be made at the beginning of a research project as they can impact the data. This is why it’s important to have clear research objectives and questions. What exactly was I trying to find out? I thought I knew when I started out i.e. that I was going to unmask Stannard at CCCU and delve into those densely-packed pages with minimal effort – hmmm!
Formulating a research question
Non-consumptive research involves computational analysis which is performed on one or more texts – literally, it means, you don’t read the books, you search them. Some researchers take this very seriously and believe that you shouldn’t look at the texts at all, but only the data output. I wasn’t sure that I could maintain that level of tunnel vision on this project, as I wanted to engage with the richness of the content to promote the collection more fully. Some theorists, such as Johanna Drucker (2016), would argue that it’s good to think critically about the results and that it is an epistemological fallacy to trust results implicitly without having a critical understanding of the texts that the data has been extracted from. I agree and prefer to look at tools like Voyant as a brilliant way to facilitate the analysis of multiple texts simultaneously, looking for trends that can then inform and inspire more targeted close reading. After all, why would I want to know about a ‘bag of words’ without some wider context?
Using Voyant
I was able to paste my text file into the box and click on Reveal. Another method that can be used, is to paste an html file into the box. This works well with texts from Project Gutenberg, but as they only have 10 of Stannard’s works, I decided that text files from Hathi Trust would be easier.
Voyant provides a dashboard of interpretive tools, and clear explanations, although if it’s your first time conducting this type of research, you may want to read their help material first. The cirrus or word cloud is a convenient, easy to understand (although reductive) visualization of the most frequent words in the text or corpus (body of texts).
A word cloud or cirrus of Army Society reveals some interesting high-frequency words. It’s quite clear that a lot was SAID (259), some people KNOW (136) who were possibly LITTLE (192) and GOOD (174), a few DEAR (97) PEOPLE (85) THINK (97), and others CAME (89) and WENT (96). It sounds like it was a rollicking read.
However the summary pane reveals, there are 63,292 words in the book and 6,943 unique word forms, so proportionally how often does the word SAID appear and can it be argued that this novel is jam-packed with dialogue?
In reality, SAID only accounts for 0.4% of the words in the book which feels like a drop in the ocean. Other verbs associated with dialogue include: SAY (98), ASKED (95), CRIED (55), TOLD (48), ANSWERED (45), CALLED (38), TELL (37), ASK (33), LAUGHED (24), EXCLAIMED (22), ADDED (21), REPEATED (13), ANNOUNCED (11). If these are all used in the context of dialogue then 1.2% of the novel is devoted to words describing dialogue. I’d imagined the novel to be full of gossipy ‘he-said, she-said’, perhaps I was wrong.
Mothers, daughters and the army
This frequency of the words MRS (486), YOUNG (79), COLONEL (155), and ARMY (116) in the word cloud, reflect the three elements of army society that Stannard outlines on her first page, revealing that her discursive story sticks to its brief. There are some miss-hits and I regret not stripping out the publisher’s advertisements as Mrs Annie Edwards, Mrs Lynn Linton and Mrs Pender Cudlip are adding noise to my search results. However, I can quickly establish that Mrs Hugh Antrobus, Mrs Trafford and Mrs Trelawney appear frequently in the text. There are plenty of other Mrs too, but as these three are mentioned most frequently I decide they must be important to the plot.
To find out who the daughters are, I decide to look for the word MISS in the text. Miss Trafford, Miss Madge Trafford, Miss Laura Trafford and Miss Antrobus emerge. I have now developed a list of characters without having opened the book beyond the first page, but I know little about their actions and motivations. I was feeling slightly anxious that I was treating the text as though it were a shopping list. I had ticked off mothers and daughters, now I became obsessed with the army – who was the enigmatic colonel who scored so highly in the cirrus?
I was able to search the Contexts skin to discover that COLONEL* (includes all variations e.g. Colonel’s) is mentioned 170 times in the novel. In fact, there are more than one of them: Colonels COLES, URQUHART, DACRE and TRELAWNEY.
I wanted to know more about these Colonels and also what other ranks of soldier are mentioned in the novel as this would give me a better understanding of the interplay between characters. I did a keyword search of Colonel Coles in the Hathi Trust version of the text (cover your eyes and ears at this point digital humanists, as I have broken the non-consumptive code) and he is described as “the most good-natured and inveterate gossip in the whole of the garrison.” That little bit of richness and connection with the text gave me a warm glow, as I drank my coffee, before returning to the data.
I searched for other army ranks in the Contexts skin. This skin allows you to dive straight into the sentence seeing words to the left and right of the keyword. There is one mention of lieutenant in the story, but this is no ordinary lieutenant, as it refers to the patronage of the Lord-lieutenant for a theatre performance. I decided to look for high-ranking soldiers to find out more about army society à la Stannard. There are two major generals, three majors, and 28 mentions of captains. I was getting the impression that this wasn’t a novel about the rank and file.
I discovered through the keyword in context skin that there are in fact three captains: Captains Murphy, Orford and Dayrell. Aha! As ORFORD is the fourth most frequent word in the text with 216 mentions of his name, he must be a key character and as URQUHART is also high on the frequency word cloud, I’m guessing these guys had a few exchanges … or is that totally wrong of me to say that… let’s find out.
Links
You can change tools within the dashboard to look for links between words. The Links tool allows you to see how words connect using a Collocates graph.
Keywords are shown in blue and collocates (words in proximity) are shown in orange. On the live site, you can hover over words to see how they connect. COLONEL collocates with ORFORD and ORFORD collocates with URQUHART. Quod erat demonstrandum (or am I drawing false conclusions?)
In essence, what I have asked the tool to do is look for collocates between these two characters. I haven’t asked to see collocates of other characters, so I am in danger of attributing too much significance to the relationship between these two characters at this stage.
Terms
By clicking on the terms tool in the first pane and selecting ORFORD, I can see the most frequent collocates, and URQUHART only collocates with Orford seven times. This seems quite low, so I decide to delve into the actual text on Hathi Trust to answer the question. This seems like I am abandoning the non-consumptive research strategy and that my digital journey is fizzling out like a damp squib, but truth be told I’m not looking at Stannard as a statistic, rather I’m using distance reading techniques as a tool to verify research questions, not as a standalone research methodology.
It appears that Colonel Urquhart and Captain Orford have a good relationship
In his way Colonel Urquhart; was more attached to him [Orford] than to any other officer in his regiment, and when alone the two invariably relapsed into the familiar tone of the old days when they had been subalterns together. (77)
With 21 words between their names in this part of the text, they do not appear statistically as a strong collocation, but in terms of plot, their association is very strong.
A search of the Hathi Trust copy using a Boolean search Urquhart AND Orford retrieves 34 results (i.e. 34 pages where both names occur), with pages 163-164 having nineteen matching terms – Urquhart mentioned 11 times and Orford 8. There is clearly a lot going on between these guys.
Trends
I decide to look for trends in the text using the Trends skin which chunks the text into ten segments. This does not correlate with chapter numbers, as there are in fact 22 chapters within the text. There are 308-page scans for the Hathi Trust copy (some of these will be the cover and title page containing little text) and 63,292 total words. Pages 163-4 are 53% of the way through the book, but allowing for pages with less text, it is likely that this corresponds with number 6 on the chart below.
Orford appears to be a more dominant character than Urquhart for most of the story. So, if Orford is one of the protagonists, what is he like? A skim through the keywords in context skin allow me to find out. He’s called Marcus, he’s the only son and heir, he laughs and persists a lot and speaks promptly. He sounds like he’s a good egg. If I had time, I would tell you more about his relationship with Madge Trafford and how mothers, daughters and army society all come together in this delightful story (but then I’d have to read the book first and you know I don’t have the time!)
What next?
When I do complete my digital analysis, I hope to understand more about the themes but also the methods that Stannard used in her writing and that will be one big, wonderful word cloud. My preliminary investigations have enabled me to undertake this with my eyes wide open. However, I won’t be ignoring the minutiae, after all, I wouldn’t be a literary scholar if I didn’t drop in and admire the texts as they were originally designed to be read occasionally, would I?
If you are interested in finding out more about the works of Henrietta Vaughan Stannard, please ask at the Library Point for access to the special collections.