Data Harvesting

Most of the data in this site was gathered via the method depicted in the diagram. The paratextual search method [1] discovers fiction based on the common words that surround and introduce stories in historical Australian newspapers, such as our novelist, serial story, and chapter. Trove’s API (or Application Programming Interface) was used to query the National Library of Australia’s digitised newspapers [2] and results were checked against those already in the database to exclude duplicates [3]. Batches of records were then exported as csv files [4] and processed to remove irrelevant or duplicated results [5] before being fed back into the database [6], where an editable interface [7] enabled further editing, correcting and testing of the data [8]. The “Curated Dataset” [9] is a subset of around 9,200 titles that were published in nineteenth-century Australian newspapers and which are “extended” (either serialised, or comprising 10,000 or more words in a single newspaper edition). The titles in the curated dataset were subjected to particularly intensive checking and analysis, and form the basis for Katherine Bode’s book, A World of Fiction.