Thursday, July 05, 2007

My friend and colleague Sean Boisen, who blogs at the aptly-named Blogoshas called me out.

Sean is thinking about (and doing -- trust me!) all sorts of cool stuff, but one that he's working on that you can read about has to do with the way that Bible references are indexed on the web. The idea is to use a "microformat" to, in a semi-consistent manner, note where Bible references are cited so that web crawlers can parse the references in a somewhat standard way.

If this sounds groovy to you, then check out Sean's initial post. If you're really interested, you can see an overview and a more formal spec he's been working on as well.

My own initial response: Sure, mostly. My primary sticking point (which is now null and void, see 'update' below) is/will be with a canonical list of supported names. I'd recommend preferred names but include a list of aliases (alternates) for all abbreviations. I think this is necessary for ease of adoption. Instead of forcing the tagger/blogger/whatever to use the proper abbreviation, the app/crawler that is processing Bible refs in the citation standard should deal with that conversion.

To illustrate my point, let me show you how I make Bible refs hot (like this one, 1Ti 2.3-6) here at ricoblog.

The blog software I use (dasBlog) supports a concept of text macros that are essentially regular expressions. This allows me to change something like this: $esv[1Ti 2.3-6] (only I use parens instead of brackets) into something that jumps to the ref: 1Ti 2.3-6. The software itself expands the macro as it processes the page display (or the RSS feed, or whatever). Now, if I was on top of my game, I could write a component for dasBlog in C# that would isolate references in context, or that would 'canonicalize' tagged references in post text. But that's something I don't want to do. Why? Because it is hard, not easy, and I have other hard things I'd rather do.

Now, I jump to the ESV and I rely on the ESV web service to know that "1Ti" means First Timothy. The ESV web service (as well as the Bible Gateway) support a number of abbreviations for each book of the Bible. I think it is important to make the tagging of references on the web pages easy; there is a relatively small universe of known abbreviations for each language, let the processors that process the Bible refs build those tables and deal with the issue.

This has a few benefits. First, it makes tagging easy. I don't have to remember that "1Tim" means First Timothy; I can use my own preferred abbreviation (assuming it is logical, descriptive, and human-readable) and the processing app can take care of it -- or throw an error when it can't figure something out.

Second, it means that multiple languages can be supported. It means that if I'm Swedish, I can type "1Mo 1.1" for Genesis 1:1. I don't have to think, "yeah, 'Gen' is the abbreviation for what I call '1Mo'".

Third ... I hate to break it to y'all, but even the most conscientious taggers make mistakes. The data will not be pure. So I say embrace the messiness of alternate booknames and even alternate languages from the get-go, it'll make life easier down the road. And it'll make life easier for those who do use the bibleref proposal. Heck, I'll begin by altering my macro to insert the proper <cite> tags around the reference ... though I'll be using my own booknames.

Update (2007-07-06): Two things. First, I really need to read the whole paragraph of the proper section of Sean's spec; an appendix recommends that alternate booknames are to be supported by the processor:

Bibleref processors MUST recognize the book designators specified in Appendix C of the OSIS specification (the current version is 2.1.1: note this is a large PDF file).
Bibleref processors for English or other languages MAY recognize additional book identifiers, provided there is an unambiguous mapping to canonical book names.

So, as usual, Sean was ahead of me and most of my blathering up there is needless. Once again, Sean proves his awesome-ness.

Second, I've updated my ref macro to incorporate bibleref tagging. So now there are <cite class="bibleref" title="ref"> elements around all hot Bible references.

Post Author: rico
Friday, July 06, 2007 4:46:25 AM (Pacific Daylight Time, UTC-07:00) 

#     |  Disclaimer  |  Comments [1]