Begin typing your search above and press return to search. Press Esc to cancel.


GutenTag is an NLP-driven tool for digital humanities research in the Project Gutenberg corpus. I am the co-developer with Julian Brooke.

The high-level goal of the project is to create an ongoing two-way flow of resources between computational linguists and digital humanists, allowing computational linguists to identify pressing problems in the large-scale analysis of literary texts, while giving digital humanists access to a wider variety of NLP tools for exploring literary phenomena. GutenTag is intended to be a standalone software tool for non-programmers, but the source code is also available and we welcome others in the computational linguistics and Digital Humanities community to contribute to its development or adapt it as needed.

For more information, see my blog post introducing to GutenTag to humanists, the academic paper that introduced it, or look at the poster we presented at NAACL in 2015.

GutenTag is available in both web-based and downloadable versions. See the main Project GutenTag website for details.