Composition of our corpus

Sources (to create your own corpus)

The following is the number of talks and the number of words for each decade in the corpus:

Decade	# Talks	# Words
1850s	411	1,670,652
1860s	381	1,302,245
1870s	380	1,793,929
1880s	307	1,473,067
1890s	441	1,342,525
1900s	636	1,234,568
1910s	728	1,538,949
1920s	828	1,625,485
1930s	738	1,270,173
1940s	749	1,440,411
1950s	776	1,466,420
1960s	857	1,642,640
1970s	836	1,674,325
1980s	665	1,312,379
1990s	724	1,403,418
2000s	743	1,381,268
2010s	728	1,280,982
2020s	243	436,859
TOTAL	11.175	25,436,093

The 11,000+ General Conference talks were taken from a number of sites online, which had highly-accurate versions of the talks. Not all of these sites are still available.

For users who are interested in creating their own corpus, the best source is probably scriptures.byu.edu (not associated with our corpus), which contains all 1426 talks from 1851-1886 (see list) and all 1823 talks from 1942-1970 (see list).

Two other sites (#1 and #2) also contain all talks from the Journal of Discourses (1850s-1880s). Many talks from the early 1900s through the 1960s can be found in the issues of the Improvement Era, which are available from Google Books and www.archive.org (see sample).

All of the conference talks from the 1970s-2020s can be found in General Conference reports online or the online issues of the Ensign.

We use these texts under US Fair Use Law. More information...