Using CWGK Annotations

The Civil War Governors of Kentucky (CWGK) team is happy to announce the launch of the new, annotated CWGK website! The updated and expanded site publishes for the first time 350 fully annotated documents and combines the previously launched Omeka platform with the power and versatility of our annotation tool—Mashbill—to produce complex social networking visualizations for each entity (person, organization, place, and geographical feature). The CWGK team worked with Brumfield Labs and Dazhi Jiao to complete this latest digital publishing platform.

Let’s explore the entity page of Governor James Fisher Robinson. You can see Robinson’s social network and a visualization of this network—for example, his connection to G. F. Cook (circled in black).

And looking at the entity page for G. F. Cook, we see his connection to Governor Robinson.

The legend on the left depicts the different types of entities you will see in the visualization and the different types of relationships that link them together.

Back to Robinson’s entity page, there are many more important pieces of information. First, you can see the full biographical entry for Robinson and the citation for the sources consulted in writing his biography.

Below the visualization are a series of tabs that will give you access to additional information and tools about each entity. There are four tabs: Metadata, Citation, Documents, and Download.

The metadata tab will give you access to the entities birth date, death date, gender, race, and entity type.

The citation tab will give the full entity citation for the convenience of the researcher.

The documents tab will give you a list of EVERY document that this particular entity is linked to throughout the CWGK website. The list is quite long for Robinson.

The download tab will allow you to download the XML code for that particular entity.

This is just the beginning of publishing fully annotated documents and visualizations on the CWGK website. Eventually, tens of thousands more documents and hundreds of thousands more entities will be published. Updates to the site will appear continually as the editing process continues and each document is completed.

So stay tuned and visit often!

Annotating CWGK Documents with MashBill

CWGK is working with Brumfield Labs of Austin, Texas, to build an annotation and entity management system that will allow CWGK to locate, identify, and link together every person, place, organization, and geographical feature in every CWGK document. The annotation application, MashBill, has been live since February 2017, and CWGK staff and Graduate Research Associates working remotely from eight university campuses across the country have (as of April 2017) identified nearly 5,000 unique entities which appear over 8,000 times in nearly 700 CWGK texts.

CWGK published a preliminary plan for MashBill in the fall of 2016, but with the system now up and running, this post will move through through each step of the annotation process with screenshots.


The first step is to search for and select the assigned document on the CWGK website.

In the document view screen, the annotator activates a browser plugin called Hypothes.is, which enables annotation and commentary on any web page. All CWGK staff and GRAs are members of an invitation-only Hypothes.is group, which collects data and feeds it into the MashBill system.

The next step is to highlight all entities (people, places, organizations, or geographical features) at their first mention in the text of the documents, select annotate when the Hypothes.is icon appears above the text, and click “Post to CWGK”.

Once an annotator completes this process, they can click on the Hypothes.is icon in the  browser toolbar to review all of the highlighted entities.

The annotator then moves into MashBill itself, where each user sees a dashboard of their own previous work, a running tab of the latest work in the database, and search fields to find an entity or document. Those search fields allow the annotator to look up the document number which has just been highlighted in MashBill.

Each of the character strings highlighted in Hypothes.is appear on the MashBill document screen.

The user selects “identify” to search the database for entity names which are at least a 30% match to the transcribed character string. This degree of proximity suggests likely matches, but still allows flexibility to account for name abbreviations, misspellings, and the use of titles to identify individuals.

MashBill suggests known entities, but if the entity in question has not yet been added to the database, the annotator moves to the entity creation screen.

After research in approved, authoritative, and reliable sources, the annotator writes a short entity “biography”, fills out a bibliography section, marks up any textual features including italics and underlining in Markdown, and fills in the metadata fields relevant to the entity type.

 

The annotator confirms the information is correct and creates the entity, which is automatically linked to the character string highlighted in Hypothes.is.

If an entity already exists in the MashBill database, the user simply chooses the correct entity from the suggested list and MashBill automatically links the entity record to the character string.

The annotator proceeds until all of the entities for the document have been identified. They then click “Document Needs Reviewed” which sends the document into the fact-checking queue.

When another staff member checks work for accuracy and adherence to editorial style, the document will be marked complete, and MashBill will insert reference tags containing the unique identifier for each entity biography into the TEI-XML transcription of the document stored in GitHub. These files will be re-imported into the existing CWGK Omeka site along with the entity biographies, allowing hyperlinked navigation between text and biography.

The final step in the current CWGK annotation process is social networking, documenting all of the relationships between individuals and organizations present in the text of the document itself.

Each relationship between entities is classified as one of a handful of types: familial, political, legal, economic, social, military, and slavery. Entities can have multiple relationships within documents if the relationship between the two is multifaceted or evolves as the document proceeds. Entities can also have the same type of relationship documented in multiple documents, adding weight to the vector between those two nodes. entities can be involved in a complex network of relationships.

When the relationships have been identified and created, the annotation stage on this document is complete and the annotator moves on to the next assignment.