Looking for Patterns
One of the goals of this project was to explore whether text-mining could tease out some meaningful patterns in the language used by male and female hymn authors. Using R, I isolated out the frequent terms across the general corpus. I then ran a similar analysis after dividing the corpus based on the gender of the author. One initial pattern that emerges is that, while there is much similarity between the datasets, there are some telling differences. For example, the female authors favored words like "home", "hope", "rest", and "sweet" while among the frequent words by male authors are "father", "king", "mind", "soul", "spirit", and "world."
Frequent Words
Hymns All
"art" "bright" "christ" "day" "death" "earth" "father" "glory" "god" "grace" "hath" "heart" "heaven" "holy" "jesus" "king" "life" "light" "lord" "love" "mind" "oer" "peace" "praise" "sing" "son" "soul" "spirit" "word" "world"
Hymns by Female Authors
"alleluia" "christ" "day" "earth" "glory" "god" "hath" "heart" "heavn" "holy" "home" "hope" "jesus" "joy" "life" "light" "lord" "love" "name" "nearer" "oer" "peace" "precious" "rest" "sweet"
Hymns by Male Authors
"art" "bright" "christ" "day" "death" "earth" "father" "glory" "god" "grace" "heart" "heaven" "holy" "jesus" "king" "life" "light" "lord" "love" "mind" "peace" "praise" "sing" "son" "soul" "spirit" "word" "world"
Frequency over documents
Another way to use frequent words is to examine word usage over the various documents. The charts below are a graphical depiction of the Document Term Matrix. They show the number of instances of the frequent words in each of the documents of the corpus.