Friday, October 06, 2006

Peter Head of the Evangelical Textual Criticism blog asks a Statistical Question.

Actually, it's just a counting question — I'll let others derive statistics based on counts.

He wants to know the number of letters in each book of the Greek New Testament.

Since I have easy-to-query data to hand here at Logos, I thought I'd write a quick script to generate some word and letter counts for various texts.

Note, however, that the simple act of counting "words" gets complex pretty quickly. For these purposes, words are things delimted by spaces and punctuation. Thus instances of crasis (e.g. KAGW) are counted as one word, not two. Anthony Kenny, in his book A Stylometric Study of the New Testament, has decent discussion on this. He uses the Friberg morphology and ends up with a total word count of 138019. You'll note my total count is 138020. I'm fairly sure this is due to MHTIGH in 1Co 6.3 being counted as two words instead of one word. NA27 has "MHTI GH" while UBS4 has "MHTIGH". So UBS4-oriented counts (Friberg uses the UBS as source) have 138019, while NA27-oriented counts should end up with 138020.

Letters are letters. I've counted a unicode source, but I've stripped all breathings, accents and iota subscripts. I've also stripped all brackets from the text, even those intervening words, and counted the bracketed text (including things like the longer ending of Mark) as part of this source.

I have counts for the NA27, for Maurice Robinson's 2005 edition of his Byzantine text, and for Scrivener's 1881 edition representing the Greek text behind the KJV. [Update: Added data for Tischendorf's Eighth edition.] Some overall totals; details in the files themselves if you're interested.

  • NA27 words / letters: 138020 / 680942
  • Byzantine words / letters: 140155 / 690536
  • Scrivener words / letters: 140597 / 689960
  • Tischendorf (8th) words / letters: 137548 / 679688

Please note that the NA27 letter counts are at variance with the counts reported in the comments by Casey Perkins on the Evangelical Textual Criticism blog.

More info (broken down by book) in the respective text files:

Update I (2006-10-06): Added data for Tischendorf's Eighth edition.

Update II (2006-10-09): Some responses to comments. First, Casey Perkins who notes:

My program had a bug in it. My figures now match yours for NA, with the exception of Acts (1 char diff), and 2 Cor and Hebrews (2 char diff). Probably a difference in our source files. Beyond that I won't pursue it.

So we're close to the same page. That's good.

Second, Peter Head who notes:

You said: "I've also stripped all brackets from the text, even those intervening words, and counted the bracketed text (including things like the longer ending of Mark) as part of this source."

Strictly speaking you should have distinguished between single square brackets [in which the bracketed words are considered to be part of the NA27 text] from double square brackets [[in which the bracketed words are NOT considered to be part of the text]]. This may require a little human intervention.

True, true. But that wasn't as easy to distinguish in the source files I was working with. I knew it would matter which is why I noted exactly what was included in the figures. I also knew that I would run the same comparison on different texts so I figured I'd stay consistent with reflecting the count on the page. If someone comes up with updated figures such as Dr. Head describes, I'll gladly post a pointer to them here or even host the files. Please let me know if you're aware of such data.

Update III (2006-10-10): Please note that Casey Perkins has provided further adjustments to account for [[double-bracketed text]] in the NA27. Casey reports that double-brackets are "only relevant in Mark, Luke and John". You can retrieve the figures in the comments to the original post on the Evangelical Textual Criticism blog; I've also saved the entire comment reporting the figures as a text file with due attribution, you can reach them here. Thanks, Casey!

Disclaimer/Note: Data that produced these counts was used with permission from my employer, Logos Research Systems, Inc.

Post Author: rico
Friday, October 06, 2006 4:43:57 PM (Pacific Daylight Time, UTC-07:00) 

#     |  Disclaimer  |  Comments [4]
Monday, October 09, 2006 8:29:45 AM (Pacific Daylight Time, UTC-07:00)
Thanks Rick,

Very helpful.

Pete
Peter Head
Monday, October 09, 2006 9:02:46 AM (Pacific Daylight Time, UTC-07:00)
Rick,

You said: "I've also stripped all brackets from the text, even those intervening words, and counted the bracketed text (including things like the longer ending of Mark) as part of this source."
Strictly speaking you should have distinguished between single square brackets [in which the bracketed words are considered to be part of the NA27 text] from double square brackets [[in which the bracketed words are NOT considered to be part of the text]]. This may require a little human intervention.
Peter Head
Monday, October 09, 2006 4:15:48 PM (Pacific Daylight Time, UTC-07:00)
Hi Rick,
My program had a bug in it. My figures now match yours for NA, with the exception of Acts (1 char diff), and 2 Cor and Hebrews (2 char diff). Probably a difference in our source files. Beyond that I won't pursue it.

Thanks for the figures on the other texts.

Casey
Casey Perkins
Tuesday, October 10, 2006 1:08:54 PM (Pacific Daylight Time, UTC-07:00)
"If someone comes up with updated figures such as Dr. Head describes, I'll gladly post a pointer to them here or even host the files."

Check the comment page of Peter's original post.

Casey
Casey Perkins
Comments are closed.