Showing posts with label manuscripts. Show all posts
Showing posts with label manuscripts. Show all posts

Monday, January 20, 2014

Fwd: Work Flows and Wish Lists: Reflections on Juxta as an Editorial Tool

---------- Forwarded message ----------
From: Dominik Wujastyk <>
Date: 18 January 2014 21:58
Subject: Work Flows and Wish Lists: Reflections on Juxta as an Editorial Tool
To: Philipp André Maas <>, Alessandro Graheli <>, Karin Preisendanz <>, Dominik Wujastyk <>

Some interesting reflections on Juxta...
I have had the opportunity to use Juxta Commons for several editorial projects, and while taking a breath between a Juxta-intensive term project last semester and my Juxta-intensive MA thesis this semester, I would like to offer a few thoughts on Juxta as an editorial tool.
For my term project for Jerome McGann's American Historiography class last semester, I conducted a collation of Martin R. Delany's novel, Blake, or, The Huts of America, one of the earliest African American novels published in the United States.Little did I know that my exploration would conduct me into an adventure as much technological as textual, but when Professor McGann recommended I use Juxta for conducting the collation and displaying the results, that is exactly what happened. I input my texts into Juxta Commons, collated them, and produced HTML texts of the individual chapters, each with an apparatus of textual variants, using Juxta's Edition Starter. I linked these HTML files together into an easily navigable website to present the results to Professor McGann. I'll be posting on the intriguing results themselves next week, but in the meantime, they can also be viewed on the website I constructed, hosted by GitHub: Blake Project home.
Juxta helped me enormously in this project. First, it was incredibly useful in helping me clean up my texts. My collation involved an 1859 serialization of the novel, and another serialization in 1861-62. The first, I was able to digitize using OCR; the second, I had to transcribe myself. Anyone who has done OCR work knows that every minute of scanning leads to (in my case) an average of five or ten minutes of cleaning up OCR errors. I also had my own transcription errors to catch and correct. By checking Juxta's highlighted variants, I was able to—relatively quickly—fix the errors and produce reliable texts. Secondly, once collated, I had the results stored in Juxta Commons; I did not have to write down in a collation chart every variant to avoid losing that information, as I would if I were machine- or sight-collating. Juxta's heat-map display allows the editor to see variants in-line, as well, which saves an immense amount of time when it comes to analyzing results: you do not have to reference page and line numbers to see the context of the variants. Lastly, Juxta enabled me to organize a large amount of text in individual collation sets—one for each chapter. I was able to jump between chapters and view their variants easily.
As helpful as Juxta was, however, I caution all those new to digital collation that no tool can perfectly collate or create an apparatus from an imperfect text. In this respect, there is still no replacement for human discretion—which is, ultimately, a good thing. For instance, while the Juxta user can turn off punctuation variants in the display, if the user does want punctuation and the punctuation is not spaced exactly the same in both witnesses, the program highlights this anomalous spacing. Thus, when 59 reads
' Henry, wat…
and 61 reads
'Henry, wat…
Juxta will show that punctuation spacing as a variant, while the human editor knows it is the result of typesetting idiosyncrasies rather than a meaningful variant. Such variants can carry over into the Juxta Edition Builder, as well, resulting in meaningless apparatus entries. For these reasons, you must make your texts perfect to get a perfect Juxta heat map and especially before using Edition Starter; otherwise, you'll need to fix the spacing in Juxta and output another apparatus, or edit the text or HTML files to remove undesirable entries.
Spacing issues can also result in disjointed apparatus entries, as occurred in my apparatus for Chapter XI in the case of the contraction needn't. Notice how because of the spacing in needn t and need nt, Juxta recognized the two parts of the contraction as two separate variants (lines 130 and 131):
This one variant was broken into two apparatus entries because Juxta recognized it as two words. There is really no way of rectifying this problem except by checking and editing the text and HTML apparatuses after the fact.
I mean simply to caution scholars going into this sort of work so that they can better estimate the time required for digital collation. This being my first major digital collation project, I averaged about two hours per chapter (chapters ranging between 1000 and 4000 words each) to transcribe the 61-62 text and then collate both witnesses in Juxta. I then needed an extra one or two hours per chapter to correct OCR and transcription errors.
While it did take me time to clean up the digital texts so that Juxta could do its job most efficiently, in the end, Juxta certainly saved me time—time I would have spent keeping collation records, constructing an apparatus, and creating the HTML files (as I wanted to do a digital presentation). I would be remiss, however, if I did not recommend a few improvements and future directions.
As useful as Juxta is, it nevertheless has limitations. One difficulty I had while cleaning my texts was that I could not correct them while viewing the collation sets; I had, rather, to open the witnesses in separate windows.
The ability to edit the witnesses in the collation set directly would make correction of digitization errors much easier. This is not a serious impediment, though, and is easily dealt with in the manner I mentioned. The Juxta download does allow this in a limited capacity: the user can open a witness in the "Source" field below the collation visualization, then click "Edit" to enable editing in that screen. However, while the editing capability is turned on for the "Source," you cannot scroll in the visualization—and so navigate to the next error which may need to be corrected.
A more important limitation is the fact that the Edition Starter does not allow for the creation of eclectic texts, texts constructed with readings from multiple witnesses; rather, the user can only select one witness as the "base text," and all readings in the edition are from that base text.
Most scholarly editors, however, likely will need to adopt readings from different witnesses at some point in the preparation of their editions. Juxta's developers need to mastermind a way of selecting which reading to adopt per variant; selected readings would then be adopted in the text in Edition Starter. For the sake of visualizing, I did some screenshot melding in Paint of what this function might look like:
Currently, an editor wishing to use the Edition Starter to construct an edition would need to select either the copy-text or the text with the most adopted readings for the base text. The editor would then need to adopt readings from other witnesses by editing the the output DOCX or HTML files. I do not know the intricacies of the code which runs Juxta. I looked at it on GitHub, but, alas! my very elementary coding knowledge was completely inadequate to the task. I intend to delve more as my expertise improves, and in the meantime, I encourage all the truly code-savvy scholars out there to look at the code and consider this problem. In my opinion, this is the one hurdle which, once overcome, would make Juxta the optimal choice as an edition-preparation tool—not just a collation tool. Another feature which would be fantastic to include eventually would be a way of digitally categorizing variants: accidental versus substantive; printer errors, editor corrections, or author revisions; etc. Then, an option to adopt all substantives from text A, for instance, would—perhaps—leave nothing to be desired by the digitally inclined textual editor. I am excited about Juxta. I am amazed by what it can do and exhilarated by what it may yet be capable of, and taking its limitations with its vast benefits, I will continue to use it for all future editorial projects.
Stephanie Kingsley is a second-year English MA student specializing in 19th-century American literature, textual studies, and digital humanities. She is one of this year's Praxis Fellows [see Praxis blogs] and Rare Book School Fellows. For more information, visit, and remember to watch for Ms. Kingsley's post next week on the results of her collation of Delany's Blake.
Shared via my feedly reader
Dominik Wujastyk, from Android phone.

Wednesday, January 15, 2014

Zooniverse and Intelligent Machine-assisted Semantic Tagging of Manuscripts

I'm very impressed by the technology being used in the War Diaries Project.  To see what I mean, click on "Get Started" and try the guided tutorial.

Once there's a critical mass of digitized Sanskrit manuscripts available, I think it would be very interesting to contact the people at Zooniverse and discuss the possiblility of a Sanskrit MS-tagging project, like the War Diaries.

Tuesday, December 17, 2013

Tools for cataloguing Sanskrit manuscripts, no.1

In the post-office today I saw this piece of board that's used as a size-template to quickly assess which envelope to choose.  This is a formalized version of the same tool that I used for the many years that I spent cataloguing and packing Sanskrit manuscripts at the Wellcome Library in London.  I made a piece of board with three main size-outlines, for MSS of α, β, γ sizes.  Anything larger than γ counted as δ.  Palm-leaf MSS were all ε.

It was nice to see the same tool being used for a similar job, in an Austrian post-office!

Friday, October 18, 2013

Sanskrit manuscripts lost with the Titanic

Adheesh Sathaye mentioned today that the terrible sinking of the Titanic in 1912 was also the occasion of the loss of fourteen Sanskrit manuscripts of the Vikramacarita.  The MSS were on their way from Bombay to Edgerton in the USA.

Here is Edgerton's account, also kindly supplied by Adheesh.

There's an uncomfortable ambiguity in Edgerton's prose, regarding the predicate of his expression "terrible disaster."

Tuesday, April 02, 2013

Future philology

A very interesting and enjoyable Skype with Elenea Pierazzo at KCL left me with lots to think about, and links to all sorts of digital projects that I was unaware of or only half-aware of previously, including

Tuesday, July 17, 2012

Smallpox MS in Sanskrit

MS R.15.86 in the library of Trinity College Cambridge is a tract in Sanskrit in the Bengali script about smallpox.  To the right is the description of this MS in Aufrecht's 1869 catalogue of the Trinity collection (click to enlarge).
The work is described as Rājasiṃhasudhāsaṃgrahanāmni granthe Masūrikācikitsādhyāyaḥ, meaning "The chapter on therapies for smallpox in the book called The Collection of Nectar of Rajasinha."  Aufrecht says it's by a Mahādeva.  It's hard to know who either Mahādeva or Rājasiṃha might be.  A work called Siṃhasudhānidhi "Collected Nectar of the Lion" was composed by one Prince Devīsiṃha of the Bundela dynasty in the 17 century (Meulenbeld, History of Indian Medical Literature, v. IIA, p. 299), but that's a long shot.  Rājasiṃha is a bit of a generic name, "King-Lion".  All Sikhs have "siṃha" (=Singh) as part of their names.  Could be anyone, really.  The MS collection comes from John Bentley (d.1824), who was a historian of astronomy in early 19 cent Calcutta (he wrote A historical view of the Hindu astronomy (1825)).  The MS has written on it in a copper-plate script on the last leaf, "The Forgery of the Hindu respecting the Cowpock-innoculation."  Probably Bentley's hand, though I'm not certain.  The verses on p.25 that Aufrecht says "are open to the suspicion of modern authorship" say,
There are plukes (grantha, knot, lump) on the breasts of cows, with discharge.
One should collect the pus from them, and protect it carefully.
Preceded by
the illnesses of Śītalā, having placed on the surface (pratīka?) of a child,
with a small knife a wound like the wound of a mosquito,
having made it enter into the blood, with the pus itself,
and with the bloods on a little brush, the wise person to what is cured.
The very best physician fearlessly approaches (m-? upaiti) on the child _ _ _

My translation is a bit incoherent, because the original is too.  Maybe if I thought about it longer, I might come up with something better, but probably not.  The vocabulary is a bit strange: pratīka for a limb or the surface of the body is unusual; the stuff about a brush may be wrong. Any suggestions gratefully received.

Friday, June 01, 2012

Crowdsourcing manuscript transcription

The TEI world is discussing, amongst many things, the crowdsourcing of MS transcription.  This idea seems to hold great promise for the Indian case.  After all, we've got crowds, right?  As always, the issue is quality control.

But just for a moment, imagine the scenario of an open, public, collaborative website where anybody can bring up an image of a Sanskrit manuscript and write a transcription in an adjacent window.  A transcription that - like a Wikipedia article - would be open for others to improve or annotate, that would rely on crowdsourced cognitive surplus for contribution and gradual quality improvement.  It would be under a history/version control system, so everything would be trackable.  Contributors would earn trust points or, as in eBay's feedback score. 

Ben Brumfield has created an extremely useful survey of MS transcription tools here.

His own FromThePage service looks simple to use and very attractive for a proof-of-concept pilot project.

For example, the Transcribe Bentham project has developed this way of working:
See also the other video about markup, on their "getting started" page.

All the exciting work in MS and edition work today is happening in connection with the TEI framework, and based on transcribed MSS with TEI encoding.  Juxta, the Versioning Machine, etc.  We need to start thinking about creating a public, high-quality corpus of transcribed MSS.  Such a corpus would be the basis for many future projects.

See also:



  • Clay Shirkey's Cognitive Surplus: Creativity and Generosity in a Connected Age (2011). And a TED talk on the same subject.
  • Transcribe Bentham project at UCL .

Friday, February 24, 2012

Scribal abbreviation 2

Here's another instance of the same abbreviation from the same scribe, proving HI's conjecture about it being a ring.

Thursday, February 23, 2012

Scribal abbreviation in Sanskrit manuscript

Here is an extract from folio 4r of MS Baroda 12489 (includes the Carakasaṃhitā), showing इति iti followed by a ह ha with a loop to the right of the glyph.  A bit like the loop on the syllable ॐ oṃ. This is probably an abbreviation for the phrase इति स्माह भगवानात्रेयः iti smāha bhagavān ātreyaḥ that occurs as the second phrase in most chapters.

Here is the phrase from the next chapter, f.5v of MS Baroda 12489.

Baroda 12489 dates from AD 1816/17.
Scribal abbreviations are not as common in Sanskrit manuscripts as they are in medieval European ones.

Thursday, January 26, 2012

colophons, names of text portions in Sanskrit manuscripts

I believe that David Pingree introduced the term "post-colophon" into Indian manuscript studies when he wrote his catalogue of the Bodleian Chandra Shum Shere jyotiṣa collection.

Am I right that nobody outside Indological circles (and those influenced by indologists in the last few decades) uses the term "post-colophon"?

Here's a grid of usages:

Key: Pingree (various catalogues, starting 1984)
Tripathi: C. Tripathi, Cat. of Jaina MSS at Strasbourg
Wikipedia: see here and links.
X: no special term

Description      Pingree       Tripathi         Wikipedia (and non-indologists)
Final verse
of text                       X                     X              explicit

iti...samāptam        colophon      colophon       X (or colophon?)

saṃvat phrase       post-            Scribal           colophon

                               colophon       Remarks

after saṃvat

phrase                    X                  post-             X

Pratapaditya Pal uses "post-colophon" in his 1978 Arts of Nepal book
(, in the same sense as Pingree.  Perhaps
that's where David got it?

Monday, October 03, 2011


Sanskrit booklets, or guṭkās, contain several works collected between one set of covers.  They were presumably copied sequentially by their owners as a vade mecum of useful knowledge.

Biswas 0891 (available digitized, no. 090393 at is a series of catalogues of MSS in Jaina libraries in Rajasthan.  Volume 2 (1954), 73 ff. has a section that describes 222 such booklets, and lists their contents in detail.  A study of these particular collocations of texts would provide a valuable insight into reading habits, the circulation of texts and knowledge, and the personal tastes and obsessions of pre-modern Indian readers.