Cikitsā: Crowdsourcing manuscript transcription

Friday, June 01, 2012

Crowdsourcing manuscript transcription

The TEI world is discussing, amongst many things, the crowdsourcing of MS transcription. This idea seems to hold great promise for the Indian case. After all, we've got crowds, right? As always, the issue is quality control.

But just for a moment, imagine the scenario of an open, public, collaborative website where anybody can bring up an image of a Sanskrit manuscript and write a transcription in an adjacent window. A transcription that - like a Wikipedia article - would be open for others to improve or annotate, that would rely on crowdsourced cognitive surplus for contribution and gradual quality improvement. It would be under a history/version control system, so everything would be trackable. Contributors would earn trust points or, as in eBay's feedback score.

Ben Brumfield has created an extremely useful survey of MS transcription tools here.

His own FromThePage service looks simple to use and very attractive for a proof-of-concept pilot project.

For example, the Transcribe Bentham project has developed this way of working:

See also the other video about markup, on their "getting started" page.

All the exciting work in MS and edition work today is happening in connection with the TEI framework, and based on transcribed MSS with TEI encoding. Juxta, the Versioning Machine, etc. We need to start thinking about creating a public, high-quality corpus of transcribed MSS. Such a corpus would be the basis for many future projects.

See also:

http://scripto.org/ Scripto (and Omeka)
http://mel.hofstra.edu/textlab.html John Bryant's TextLab
http://t-pen.org/TPEN/ T-Pen

-----

References

Clay Shirkey's Cognitive Surplus: Creativity and Generosity in a Connected Age (2011). And a TED talk on the same subject.
Transcribe Bentham project at UCL .