corporaexplorer: An R package for dynamic exploration of text collections


This article presents the 'corporaexplorer' open source software. 'corporaexplorer' is an R package that uses the Shiny GUI (graphical user interface) framework for dynamic exploration of text collections. The package is designed for use with a wide range of text collections. The intended primary audience are qualitatively oriented researchers in the social sciences and humanities who rely on close reading of textual documents as part of their academic activity. However, the package should also be useful for those doing quantitative textual research and wishing to have convenient access to the texts under study. Main elements in the interactive apps: 1) Input: The ability to filter the corpus and/or highlight documents, based on search patterns (in main text or metadata, including date range). 2) Corpus visualisation: An interactive heat-map of the corpus, based on the search input (calendar heat-map or heat-map where each tile represents one document, optionally grouped by metadata properties). 3) Document visualisation and display: Easy navigation to and within full-text documents with pattern matches highlighted. 4) Document retrieval: Extraction of subsets of the corpus in a format suitable for close reading. While collecting and preparing the text collections to be explored requires some familiarity with R programming, using the Shiny apps for exploring and extracting documents from the corpus should be fairly intuitive also for those with no programming knowledge, once the apps have been set up by a collaborator.

  • Published year: 2019
  • DOI: 10.21105/joss.01342
  • Journal: The Journal of Open Source Software