Data: Sources and Selection

The data source for this experiment is the database Hymnary.org. The database is a relatively young project housed out of Calvin College in Grand Rapids, MI and the digitization of hymnals is powered in significant part by volunteer effort. It is an example of a crowd sourced project that serves a community need and also benefits from institutional support.

One benefit of using the Hymnary database is that the data is easily exportable. A CSV dump of the texts, tunes, hymnals, and people is available for download. In addition, the search results for particular hymn texts is available in JSON format. These two export features made the database a promising option as a datasource.

While the initial dump of the data was promising, the process of whittling down the data into a manageable size resulted in some problematic discoveries. I began by isolating the authors born between 1800 and 1809. I was interested in songs that were written during the middle of the century, and chose this range as those born at the beginning of the century would be in their mature years during the middle. I isolated the songs that were documented as having been written by these authors and downloading the associated JSON files. While this resulted in a large amount of data, further investigation of the JSON files revealed that many were empty. As a consequence, many of the authors identified at first had no transcribed hymns in the database. The original set of over a thousand hymns was reduced to a couple hundred with complete information.