Looking for Patterns

One of the goals of this project was to explore whether text-mining could tease out some meaningful patterns in the language used by male and female hymn authors. Using R, I isolated out the frequent terms across the general corpus. I then ran a similar analysis after dividing the corpus based on the gender of the author. One initial pattern that emerges is that, while there is much similarity between the datasets, there are some telling differences. For example, the female authors favored words like "home", "hope", "rest", and "sweet" while among the frequent words by male authors are "father", "king", "mind", "soul", "spirit", and "world."

Frequent Words

Hymns All

"art" "bright" "christ" "day" "death" "earth" "father" "glory" "god" "grace" "hath" "heart" "heaven" "holy" "jesus" "king" "life" "light" "lord" "love" "mind" "oer" "peace" "praise" "sing" "son" "soul" "spirit" "word" "world"

Hymns by Female Authors

"alleluia" "christ" "day" "earth" "glory" "god" "hath" "heart" "heavn" "holy" "home" "hope" "jesus" "joy" "life" "light" "lord" "love" "name" "nearer" "oer" "peace" "precious" "rest" "sweet"

Hymns by Male Authors

"art" "bright" "christ" "day" "death" "earth" "father" "glory" "god" "grace" "heart" "heaven" "holy" "jesus" "king" "life" "light" "lord" "love" "mind" "peace" "praise" "sing" "son" "soul" "spirit" "word" "world"


Frequency over documents

Another way to use frequent words is to examine word usage over the various documents. The charts below are a graphical depiction of the Document Term Matrix. They show the number of instances of the frequent words in each of the documents of the corpus.

Frequency over Texts by Female Authors
Frequency over Texts by Male Authors