Composition of our corpus Sources (to create your own corpus)

The following is the number of talks and the number of words for each decade in the corpus:

Decade # Talks # Words
1850s 411 1,670,652
1860s 381 1,302,245
1870s 380 1,793,929
1880s 307 1,473,067
1890s 441 1,342,525
1900s 636 1,234,568
1910s 728 1,538,949
1920s 828 1,625,485
1930s 738 1,270,173
1940s 749 1,440,411
1950s 776 1,466,420
1960s 857 1,642,640
1970s 836 1,674,325
1980s 665 1,312,379
1990s 724 1,403,418
2000s 743 1,381,268
2010s 728 1,280,982
2020s 243 436,859
TOTAL 11.175 25,436,093

 

The 11,000+ General Conference talks were taken from a number of sites online, which had highly-accurate versions of the talks. Not all of these sites are still available.

For users who are interested in creating their own corpus, the best source is probably scriptures.byu.edu (not associated with our corpus), which contains all 1426 talks from 1851-1886 (see list) and all 1823 talks from 1942-1970 (see list).

Two other sites (#1 and #2) also contain all talks from the Journal of Discourses (1850s-1880s). Many talks from the early 1900s through the 1960s can be found in the issues of the Improvement Era, which are available from Google Books and www.archive.org (see sample).

All of the conference talks from the 1970s-2020s can be found in General Conference reports online or the online issues of the Ensign

We use these texts under US Fair Use Law. More information...