Cikitsā: Sanskrit

Showing posts with label Sanskrit. Show all posts

Tuesday, July 03, 2012

Sanskrit hyphenation list

I'm gradually building up a file of hyphenated Sanskrit words and compounds, written in the Latin alphabet. The file is called sanskrit-hyphenations.tex, and you are welcome to download it.

It contains hyphenation points for words in English (ayur-veda), and for words in Sanskrit (āyur-veda).
To use it, do something like this in your style file:

\setotherlanguage{sanskrit}
\newfontfamily\sanskritfont{Sanskrit 2003}
% Define \sansk{} which is the same as \emph{}, except
% that it causes appropriate hyphenation
% for Sanskrit words. Use \sansk{} for Sanskrit and
% \emph{} for English.
\newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}}

and \input the sanskrit-hyphenations.tex file after \begin{document}, thus:

\begin{document}
\input{sanskrit-hyphenations.tex}
...
\end{document}

XeTeX already has built-in hyphenation rules for Devanāgarī and Romanized Sanskrit. The above file is intended to extend the hyphenation coverage for Romanized words, using etymological and stylistic considerations.

Friday, June 01, 2012

Crowdsourcing manuscript transcription

The TEI world is discussing, amongst many things, the crowdsourcing of MS transcription. This idea seems to hold great promise for the Indian case. After all, we've got crowds, right? As always, the issue is quality control.

But just for a moment, imagine the scenario of an open, public, collaborative website where anybody can bring up an image of a Sanskrit manuscript and write a transcription in an adjacent window. A transcription that - like a Wikipedia article - would be open for others to improve or annotate, that would rely on crowdsourced cognitive surplus for contribution and gradual quality improvement. It would be under a history/version control system, so everything would be trackable. Contributors would earn trust points or, as in eBay's feedback score.

Ben Brumfield has created an extremely useful survey of MS transcription tools here.

His own FromThePage service looks simple to use and very attractive for a proof-of-concept pilot project.

For example, the Transcribe Bentham project has developed this way of working:

See also the other video about markup, on their "getting started" page.

All the exciting work in MS and edition work today is happening in connection with the TEI framework, and based on transcribed MSS with TEI encoding. Juxta, the Versioning Machine, etc. We need to start thinking about creating a public, high-quality corpus of transcribed MSS. Such a corpus would be the basis for many future projects.

See also:

http://scripto.org/ Scripto (and Omeka)
http://mel.hofstra.edu/textlab.html John Bryant's TextLab
http://t-pen.org/TPEN/ T-Pen

-----

References

Clay Shirkey's Cognitive Surplus: Creativity and Generosity in a Connected Age (2011). And a TED talk on the same subject.
Transcribe Bentham project at UCL .

Thursday, May 24, 2012

Two more good Devanāgarī fonts

I have posted an update to this post with some new material.

Steve White has recently done a great deal of work updating the FreeSerif and FreeSans Unicode fonts (that are, er, free). He has done especially important work on the Devanagari characters in the font, as well as several other Indian writing systems. See here for a listing of what has changed. Steve's work means that the Devanāgarī in the Free* fonts now works not only with Xe(La)TeX but also in Firefox, LibreOffice and other programs. Thanks, Steve!

Zdeněk Wagner recently announced (here) that,

A few days ago version 20120503 of GNU FreeFont was released. This (OpenType as well as TrueType) version contains working Devanagari. FreeSans is based on Gargi with bugs fixed and positions of matras fine-tuned (and changes were reported back to the Gargi developers), FreeSerif is based on Velthuis fonts. Both fonts contain the Indian Rupee sign. In XeLaTeX, conjuncts in FreeSerif can be switched on/off according to the language.

Zdeněk provided an example file showing FreeSans and FreeSerif, and demonstrating the different conjuncts of Sanskrit and Hindī. I have taken the liberty of expanding his file to compare some of the other leading Unicode fonts that contain both Devanāgarī and Latin typefaces in the same font:

XeLaTeX Input:
\documentclass{article}
\usepackage{polyglossia}
\defaultfontfeatures{Mapping=RomDev,Script=Devanagari,Language=Sanskrit}
\newfontfamily\eng[Mapping=tex-text]{TeX Gyre Pagella}
\begin{document}
\setmainfont{FreeSerif} \newfontfamily\eng[Mapping=tex-text]{FreeSerif}
{\eng FreeSerif} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont{FreeSans} \newfontfamily\eng[Mapping=tex-text]{FreeSans}
{\eng FreeSans} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont{Sanskrit 2003} \newfontfamily\eng[Mapping=tex-text]{Sanskrit 2003}
{\eng Sanskrit 2003} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont[FakeStretch=1.08]{Sanskrit 2003}
{\eng Sanskrit 2003+} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont{Nakula} \newfontfamily\eng[Mapping=tex-text]{Nakula}
{\eng Nakula} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont{Sahadeva} \newfontfamily\eng[Mapping=tex-text]{Sahadeva}
{\eng Sahadeva} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\end{document}
Output:

(click to enlarge)

Note the use of the RomDev mapping to get "कार्त्स्न्यम्" out of "kārtsnyam", just for fun. I've included Sanskrit 2003 twice, the second time with a bit of horizontal stretch, that I think makes it look nicer.

The official web page of the newly-updated FreeSans and FreeSerif fonts is:

https://savannah.gnu.org/projects/freefont/

As Zdeněk adds, ``Hopefully the font will soon appear in TeX Live and (some) Linux distributions. If you install it independently, be sure that you do not have font conflicts."

Be sure to delete all earlier versions of FreeSerif and FreeSans that might be lurking on your hard drive. Then install the new version. If you find the conjuncts aren't working as promised, you probably have an old FreeSerif or Sans lurking in a directory somewhere that you have forgotten about.

Friday, February 24, 2012

Scribal abbreviation 2

Here's another instance of the same abbreviation from the same scribe, proving HI's conjecture about it being a ring.

Thursday, February 23, 2012

Scribal abbreviation in Sanskrit manuscript

Here is an extract from folio 4r of MS Baroda 12489 (includes the Carakasaṃhitā), showing इति iti followed by a ह ha with a loop to the right of the glyph. A bit like the loop on the syllable ॐ oṃ. This is probably an abbreviation for the phrase इति स्माह भगवानात्रेयः iti smāha bhagavān ātreyaḥ that occurs as the second phrase in most chapters.

Here is the phrase from the next chapter, f.5v of MS Baroda 12489.

Baroda 12489 dates from AD 1816/17.

Scribal abbreviations are not as common in Sanskrit manuscripts as they are in medieval European ones.

Tuesday, October 04, 2011

Simplest Sanskrit XeLaTeX file

Input:

\documentclass{article}
\usepackage{polyglossia}
\setmainfont[Script=Devanagari]{Nakula}

\begin{document}
Your Devanāgarī looks like this: आसीद्राजा नलो नाम and your romanized stuff looks like this: āsīd rājā nalo nāma.
\end{document}

Output:

You can get the Nakula font (and its twin, Sahadeva) from John Smith's website, http://bombay.indology.info

Monday, October 03, 2011

Guṭkās

Sanskrit booklets, or guṭkās, contain several works collected between one set of covers. They were presumably copied sequentially by their owners as a vade mecum of useful knowledge.

Biswas 0891 (available digitized, no. 090393 at http://www.jainlibrary.org/menus_cate.php) is a series of catalogues of MSS in Jaina libraries in Rajasthan. Volume 2 (1954), 73 ff. has a section that describes 222 such booklets, and lists their contents in detail. A study of these particular collocations of texts would provide a valuable insight into reading habits, the circulation of texts and knowledge, and the personal tastes and obsessions of pre-modern Indian readers.

Wednesday, September 01, 2010

XeLaTeX, Velthuis encoding, and palatal nasals

When using the Velthuis input coding for Devanāgarī, and wanting to have it handled by XeLaTeX, one finds the palatal ñ disappears in the Nāgarī.

input: sa~njaya

output: स न्जय

That's because the Velthuis input code for ञ् is ~n, and the "~" is a special code in TeX, meaning "hard space".

Here's the workaround. I define a font-switching command \dev that will turn Velthuis into Devanāgarī. \dev is mostly made up of "\textsanskrit" which is set up using the standard XeLaTeX/polyglossia \newfontfamily commands. \textsanskrit does the work of invoking the mapping-conversion (from XeTeX's velthuis-sanskrit.tec file).

But just before \textsanskrit, we change tilde into a normal character. And after \textsanskrit, we turn tilde back into an "active" hard space. We use the \aftergroup command so that the "active" version of tilde is activated after the closing of the group that contains the Devanāgarī.

Here's the code:

\newfontfamily\textsanskrit [Script=Devanagari,Mapping=velthuis-sanskrit]{Nakula}

% Make the tilde into a normal letter of the alphabet

\def\maketildeletter{\catcode`\~=11 }

% Return tilde to being the default TeX "active" character for hard space

\def\maketildeactive{\catcode`\~=13 }

\def\dev{\maketildeletter\textsanskrit \aftergroup\maketildeactive}

Here's how you use it:

input: {\dev sa~njaya uvaaca}. What did Dr~Sañjaya say?

output: सञ्जय उवाच. What did Dr Sañjaya say?

where that space betwen "Dr" and "Sañjaya" is hard, and you can't break a line there.

Enjoy.

Update 2020:

Using David Carlisle's much better idea from the comments below, here's the new code:

\newfontfamily\textsanskrit [Script=Devanagari,Mapping=velthuis-sanskrit]{Nakula}

\def\dev{\edef~{\string~}\textsanskrit }

\begin{document}

{\dev sa~njaya uvaaca}. What did Dr~Sañjaya say?

\end{document}