CWGK Annotation Preview at DH2017

CWGK’s development partners at Brumfield Labs have been working on a series of NHPRC grants to develop MashBill, CWGK’s annotation management system. Through the work of NHPRC-funded Graduate Research Associates, CWGK has annotated approximately 1,200 documents to date, identifying over 8,000 unique people, places, organizations, and geographical features.

At the Digital Humanities 2017 conference in Montreal, Brumfield Labs presented a paper co-authored by CWGK staff, “Beyond Coocurrence: Network Visualization in the Civil War Governors of Kentucky Digital Documentary Edition.”

Read a full recap of the presentation here, complete with fascinating visualizations of CWGK annotation data drawn from MashBill. The recap was named an Editors’ Choice story by Digital Humanities Now in August 2017.

Annotating CWGK Documents with MashBill

CWGK is working with Brumfield Labs of Austin, Texas, to build an annotation and entity management system that will allow CWGK to locate, identify, and link together every person, place, organization, and geographical feature in every CWGK document. The annotation application, MashBill, has been live since February 2017, and CWGK staff and Graduate Research Associates working remotely from eight university campuses across the country have (as of April 2017) identified nearly 5,000 unique entities which appear over 8,000 times in nearly 700 CWGK texts.

CWGK published a preliminary plan for MashBill in the fall of 2016, but with the system now up and running, this post will move through through each step of the annotation process with screenshots.


The first step is to search for and select the assigned document on the CWGK website.

In the document view screen, the annotator activates a browser plugin called Hypothes.is, which enables annotation and commentary on any web page. All CWGK staff and GRAs are members of an invitation-only Hypothes.is group, which collects data and feeds it into the MashBill system.

The next step is to highlight all entities (people, places, organizations, or geographical features) at their first mention in the text of the documents, select annotate when the Hypothes.is icon appears above the text, and click “Post to CWGK”.

Once an annotator completes this process, they can click on the Hypothes.is icon in the  browser toolbar to review all of the highlighted entities.

The annotator then moves into MashBill itself, where each user sees a dashboard of their own previous work, a running tab of the latest work in the database, and search fields to find an entity or document. Those search fields allow the annotator to look up the document number which has just been highlighted in MashBill.

Each of the character strings highlighted in Hypothes.is appear on the MashBill document screen.

The user selects “identify” to search the database for entity names which are at least a 30% match to the transcribed character string. This degree of proximity suggests likely matches, but still allows flexibility to account for name abbreviations, misspellings, and the use of titles to identify individuals.

MashBill suggests known entities, but if the entity in question has not yet been added to the database, the annotator moves to the entity creation screen.

After research in approved, authoritative, and reliable sources, the annotator writes a short entity “biography”, fills out a bibliography section, marks up any textual features including italics and underlining in Markdown, and fills in the metadata fields relevant to the entity type.

 

The annotator confirms the information is correct and creates the entity, which is automatically linked to the character string highlighted in Hypothes.is.

If an entity already exists in the MashBill database, the user simply chooses the correct entity from the suggested list and MashBill automatically links the entity record to the character string.

The annotator proceeds until all of the entities for the document have been identified. They then click “Document Needs Reviewed” which sends the document into the fact-checking queue.

When another staff member checks work for accuracy and adherence to editorial style, the document will be marked complete, and MashBill will insert reference tags containing the unique identifier for each entity biography into the TEI-XML transcription of the document stored in GitHub. These files will be re-imported into the existing CWGK Omeka site along with the entity biographies, allowing hyperlinked navigation between text and biography.

The final step in the current CWGK annotation process is social networking, documenting all of the relationships between individuals and organizations present in the text of the document itself.

Each relationship between entities is classified as one of a handful of types: familial, political, legal, economic, social, military, and slavery. Entities can have multiple relationships within documents if the relationship between the two is multifaceted or evolves as the document proceeds. Entities can also have the same type of relationship documented in multiple documents, adding weight to the vector between those two nodes. entities can be involved in a complex network of relationships.

When the relationships have been identified and created, the annotation stage on this document is complete and the annotator moves on to the next assignment.

Civil War Governors of Kentucky Editor Hosts Webinar for Kentucky’s Librarians and Archivists

Civil War Governors of Kentucky (CWGK) assistant editor Tony Curtis hosted a webinar on October 14, 2016 entitled “Researching the Civil War Governors of Kentucky” for Kentucky’s librarians and archivists as a part of the Continuing Education program offered through the Kentucky Department for Libraries and Archives (KDLA). The webinar focused on the launch of “Early Access“–the first stage of accessibility–in June 2016, allowing users to browse and keyword search over 10,000 documents.

The next step–“Annotation Beta”–is to deliver approximately 1,500 documents, annotated and set within dense social and geographic networks through NHPRC funding. The presentation demonstrated how CWGK will shape the ways researchers, students, and teachers will explore the past in the future.

Click HERE to listen to the webinar.

Voices of the Filson Interview on WXOX 97.1FM (Louisville, Ky.)

Listen to Civil War Governors of Kentucky assistant editor Tony Curtis as he returns to the Filson Historical Society archives to discuss the project and its future plans about annotation and social networking on an episode of the Voices of the Filson on WXOX 97.1 FM with the Filson’s own associate curator of collections Aaron Rosenblum.

Audio provided by Voices of the Filson on WXOX 97.1FM and the Filson Historical Society.

The Rogue Historian Podcast

Listen to #CWGK project director Patrick Lewis discuss the project on an episode of The Rogue Historian with Keith Harris.

We discuss:

  • Digital history and how it is useful
  • A historical “social network” being developed through CWGK annotation
  • The place in digital humanities for early career historians
  • How to use the documentary project’s user guides

Listen to the episode here

rogue