Showing posts with label TeX. Show all posts
Showing posts with label TeX. Show all posts

Tuesday, July 26, 2016

Getting Xindy to work for IAST-encoded text

Xindy is an index-processor for use with TeX and LaTeX.  It is a successor to Makeindex, and is the standard system for formatting and sorting indexes and glossaries that go with LaTeX documents.

Sources

The main distribution of Xindy is included in TeXlive and is downloadable at CTAN.  At the time of writing (July 2016), this is version 2.5.1 (2014).  The source code used to be maintained at Sourceforge (Xindy at sourceforge, currently version 2.3 from 2008), but a later version is now available at Github (Xindy at Github).  Since Github has version 2.5.0, source development and compilation for TeXlive must be going on somewhere else, but I don't know where.   The best place to fetch Xindy if you want to tinker with it is from the CTAN base directory.

Use and benefits

The development of Xindy is uneven, given the various repositories with different versions.  The documentation is also of limited use to beginners, being technical and out of date (the examples in the tutorial do not work with the current software release).   Nevertheless, it is a very flexible and powerful program, and does a great job when it works.  And for many texts and nearly fifty modern languages, it "just works," which is great.

In LaTeX documents, the packages index, makeidx or imakeidx will normally be used to provide the macros needed for indexes.  Xindy does the rest.

Sanskrit and IAST

For my writing, I normally use XeTeX with LaTeX and I write using Unicode UTF8 encoding and the IAST transliteration scheme when doing Sanskrit in Roman script.  (For Devanagari I use normal Unicode encoding.)

Xindy by itself doesn't recognize the IAST accented characters like vowels with a macron or consonants with under-dot.  I found that setting Xindy's language to "general" did a pretty good job of nearly all the characters, but not all.  I got words with ā-, ṛ- etc. at the beginning of the index, before "A."

Difficulties

The program for creating a new "alphabet" for Xindy is in Perl and is called make-rules.  I couldn't initially find it at all, because it isn't at the Sourceforge or GitHub repositories (or I couldn't find it).  Later, I found it at CTAN, and I wish I'd seen that earlier.

Finally, I could not get make-rules to work.  The documentation and tutorials simply didn't provide me with enough accurate information to start, as a beginner, and get a workable result from make-rules.

Solution (aka kludge)

I therefore made up a very simple Xindy style file, IAST.xdy, with the following content:
(merge-rule "ā" "a")    (merge-rule "Ā" "a")    (merge-rule "ḍ" "d")    (merge-rule "Ḍ" "d")    (merge-rule "ḥ" "h")    (merge-rule "Ḥ" "h")    (merge-rule "ī" "i")    (merge-rule "Ī" "i")    (merge-rule "ḹ" "l")  (merge-rule "ḷ" "l")    (merge-rule "Ḹ" "l")  (merge-rule "Ḷ" "l")    (merge-rule "ṃ" "m")            (merge-rule "Ṃ" "m")            (merge-rule "ṅ" "n")    (merge-rule "Ṅ" "n")    (merge-rule "ṇ" "n")    (merge-rule "Ṇ" "n")    (merge-rule "ṝ" "r")     (merge-rule "ṛ" "r")    (merge-rule "Ṝ" "r")     (merge-rule "Ṛ" "r")    (merge-rule "ṣ" "s")    (merge-rule "Ṣ" "s")    (merge-rule "ś" "s")    (merge-rule "Ś" "s")    (merge-rule "ṭ" "t")    (merge-rule "Ṭ" "t")    (merge-rule "ū" "u")   (merge-rule "Ū" "u")
Then, 
  • I place IAST.xdy in my local TeX tree, namely as  ../localtexmf/xindy/modules/IAST.xdy 
  • I run "sudo mktexlsr" to rebuild the TeXlive indexes so that Xindy can find IAST.xdy
  • I then run Xindy from the Linux command line with the following syntax:

    texindy -I xelatex -M iast.xyd -L general -o foobar.ind foobar.idx
This last point is a Knuthian white lie (TeXbook, vii).  I currently use TeXStudio for actual writing, so the above "command line" is entered into TeXStudio's "options/configure/commands" menu and invoked with a convenient function-key shortcut. 

Explanation

  • texindy is just xindy with some tweaks for use with LaTeX
  • -I means the input file use "xelatex" encoding, i.e., UTF8
  • -M means please use this style file
  • -L means please use the pseudo-language "general" which does the right thing with most UTF8-encoded Roman / European text.
  • "foobar" is replaced by your TeX filename; in TeXStudio's syntax it's "%", which stands for whatever file you're currently working on, 



I'm now getting results that look like this, which is what I was after:



I'm sure that all this could be done more elegantly and completely.  In the longer run, I hope a successor to IAST.xdy might take its place alongside all the other languages formally supported by Xindy.
While working on this, I sent cries for help to the Xindy discussion list.  Zdenek Wagner replied, and shared with me work he has done towards indexing Hindi and Marathi in Devanagari script.



Tuesday, January 22, 2013

TeX implementations in the Cloud

The more mature products (2014)

Both the above have collaborative-editing features.  Both have free access for limited projects, but require subscription for larger projects or collaborative teams.


Others of varying levels of activity (2014)

  • FlyLaTeX (self-hosting; free and open-source)
  • TeXTouch (iTunes, iPhone editor, can compile when online)
  • Verbosus (with Android and iOS apps)
  • Blue Publications
  • LaTeXLab - requires your Google login details :-( 
  • Pine from Sayahna.orgIn alpha test (a document processing system in the cloud that makes use of MediaWiki and its resources)
  • CloudTeX from Sayahna.org
    XeLaTeX and LuaTeX supported.   Working prototype available to testers. 
  • A different CloudTeX Seems to have gone quiet as of 2013.
  • SpanDex
    XeLaTeX and LuaTeX available, but limited Unicode fonts.
    Closed.
  • ScribTeX  (phased out as of Feb 2013, in favour of ShareLaTeX, but still exists)
  • MonkeyTeX (4/2014)

Tuesday, July 03, 2012

Sanskrit hyphenation list

I'm gradually building up a file of hyphenated Sanskrit words and compounds, written in the Latin alphabet.  The file is called sanskrit-hyphenations.tex, and you are welcome to download it.

It contains hyphenation points for words in English (ayur-veda), and for words in Sanskrit (āyur-veda).
 To use it, do something like this in your style file:
\setotherlanguage{sanskrit}
\newfontfamily\sanskritfont{Sanskrit 2003}
% Define \sansk{} which is the same as \emph{}, except
% that it causes appropriate hyphenation

% for Sanskrit words.  Use \sansk{} for Sanskrit and
% \emph{} for English.

\newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}}
 and \input the sanskrit-hyphenations.tex file after \begin{document}, thus:
\begin{document}
  \input{sanskrit-hyphenations.tex}

...
\end{document}

XeTeX  already has built-in hyphenation rules for Devanāgarī and Romanized Sanskrit. The above file is intended to extend the hyphenation coverage for Romanized words, using etymological and stylistic considerations.

Tuesday, October 04, 2011

Simplest Sanskrit XeLaTeX file

Input:

\documentclass{article}
\usepackage{polyglossia}
\setmainfont[Script=Devanagari]{Nakula}

\begin{document}
Your Devanāgarī looks like this:  आसीद्राजा नलो नाम and your romanized stuff looks like this: āsīd rājā nalo nāma.  
\end{document}

Output:






You can get the Nakula font (and its twin, Sahadeva) from John Smith's website, http://bombay.indology.info

Monday, November 22, 2010

Hyphenating Sanskrit in roman transliteration

%!TeX program = xelatex
%
% Thanks to Yves Codet for the first version of this test file, and to Yves
% and Jonathan Kew for the hyphenation tables
% for Sanskrit (hyph-sa.tex):
%
% This file exemplifies the case where some Sanskrit is embedded in a
% mainly-English document, but the Sanskrit words are appropriately
% hyphenated. The Sanskrit words are in the argument of the
% \textsanskrit{} command.

\documentclass[12pt]{article}

\usepackage{fontspec}
\usepackage{polyglossia}

\setdefaultlanguage{english}
\setmainfont{Charis SIL}

\setotherlanguage{sanskrit}
\newfontfamily\sanskritfont{Charis SIL}

\textwidth=0.5cm
\parindent 0pt

\begin{document}

Sanskrit hyphenation:
\par\smallskip

\textsanskrit{manum ekāgram āsīnam abhigamya maharṣayaḥ |\par}

\bigskip

English hyphenation:
\par\smallskip

manum ekāgram āsīnam abhigamya maharṣayaḥ |

\end{document}

Tuesday, July 06, 2010

Switching from Devanāgarī to Roman with a single command

I have to admit even I am startled by the success of this.
In the input file below, I changed the single command:
  • \setdefaultlanguage{sanskrit}

to

  • \setdefaultlanguage{english}
and the result was the following:

A minimal edition of a Sanskrit verse, using XeLaTeX and Ledmac


And here's the input for the above:


\documentclass{book}
% Set up things for XeLaTeX, and Devanagari.
% Simplified version of http://cikitsa.blogspot.com/2010/07/xelatex-for-sanskrit.html
\usepackage{polyglossia} % the multilingual support package
% Next, from the polyglossia manual:
\setdefaultlanguage{sanskrit} % this is mostly going to be Sanskrit,
\setotherlanguage{french} % with some French embedded in it,
\setotherlanguage{english} % and some English.
% These will call appropriate hyphenation.
\usepackage{xltxtra} % standard for nearly all XeLaTeX documents
\defaultfontfeatures{Mapping=tex-text} % ditto
\setmainfont{Gandhari Unicode} % could be any Unicode font
% Now define the Devanagari font:
% John Smith's Sahadeva, input using standard UTF8 transliteration
\newfontfamily\sanskritfont [Script=Devanagari,Mapping=RomDev]{Sahadeva}

% Now come the commands for the critical edition formatting:
\usepackage{ledmac}
% customizations to Ledmac, and macros to make life easier.
\def\Variant#1{\Afootnote{\relax#1}}
\def\Lemma#1{\lemma{\relax#1}}
\let\Reference=\Bfootnote
\let\Grammatical=\Cfootnote
\let\Tibetan=\Dfootnote
% in a real edition, I'd probably also make
% abbreviations for \textfrench (perhaps \tf) etc.
\def\Omission#1{$\langle$#1$\rangle$}
\def\ScribalDeletion#1{{\rm[\kern-.15em[}#1{\rm]\kern-.15em]}}
\def\hardspace{\texttt{\char`\ }}
\def\And{{\rm\penalty-1\quad$\mid\mid$~}} % divider between variants to the same lemma
% more customizations: make the A notes
% (\Variants and \Lemmas)into two-column format,
% and make the B notes (\Reference) normal footnotes.
%
% changes to stuff cut-and-pasted from ledmac.sty:
\makeatletter
\renewcommand*{\twocolfootfmt}[3]{%
\normal@pars
% \hsize .45\hsize
\hsize .49\hsize
\parindent=0pt
\tolerance=5000
\raggedright
\leavevmode\hangindent1.5em\hangafter1
\strut{\notenumfont\printlines#1|}\enspace
{\select@lemmafont#1|#2}\rbracket\enskip
#3\strut\par\allowbreak}
\foottwocol{A}
\renewcommand*{\normalfootfmt}[3]{%
\normal@pars
\parindent=0pt \parfillskip=0pt plus 1fil
\hangindent1.5em\hangafter1
{\notenumfont\printlines#1|}\strut\enspace
{\select@lemmafont#1|#2}\rbracket\enskip#3\strut\par}
\footnormal{B}
\makeatother
\firstlinenum{1}
\linenumincrement{1}


% and here begins the edition:
%
\begin{document}
\chapter*{yogaśatakam}
\large


\section*{\textenglish{The example verse by itself}}

\textenglish{From \emph{Yogaśataka: Texte m\'edical attribu\'e
\`a Nāgārjuna\ldots par Jean Filliozat} (Pondich\'ery, 1979), pp.\,1, 59:\par}

\bigskip

kṛtsnasya tantrasya gṛhītadhāmna-\\
ścikitsitādviprasṛtasya dūram|
vidagthavaidyapratipūjitasya\\
kariṣyate yogaśatasya bandhaḥ|| 1||

\bigskip

\section*{\textenglish{The example verse, with apparatus}}
% we could use the \stanza command, but I haven't bothered.

%
% I find that the judicious use of indentation
% and newlines helps enormously to see what's what.
% Using a good "folding editor" would be even better.
%

\begingroup
\beginnumbering
\autopar
\edtext{
\edtext{kṛtsnasya}{
\Variant{%
\textfrench{N1 détruit, C1 }kṛtas tasya,
\textfrench{C2 }kṛtasya.}
\Tibetan{\textfrench{T \emph{mth'yas}, ``sans limite, immense''
traduit }kṛtsnasya.}}
tantrasya
\edtext{gṛhītadhāmna-}{
\Variant{\textfrench{Ca, JK }dhamnā.}}\\
\edtext{ścikitsitā}{
\Lemma{cikitsitād} % not ``ścikitsitā'', of course. We're preserving
the sandhyakṣaras.
\Variant{\textfrench{C1, C2 } cikitsitāt.}
\Tibetan{\textfrench{T \emph{gso-spyad} ''pratique de la
thérapeutique''. Ordinairement
  \emph{gso spyad} est ``investigation del la th.''}}}% comment sign to stop a break after the conjunct
\edtext{dviprasṛtasya}{
\Lemma{viprasṛtasya} % as above with cikitsitād.
\Variant{\textfrench{Ca} cikitsitārthaprasṛtasya, \textfrench{C1, C2}
viprasutasya.}}
\edtext{dūram}{
\Variant{\textfrench{Ca} dūrāt}}|
\\ \indent
%
% the above line is annoying. Because the whole verse is
% inside an \edtext{} macro, in order to get the
% \Grammatical note naming the upajāti verse, we have to
% avoid having paragraph breaks, which are not allowed
% inside \edtext{}.
% instead, we use \\ (newline) and \indent (paragraph indent)
% to get the same visual effect. A nasty kludge.
%
vidagdhavaidyapratipūjitasya\\
\edtext{kariṣyate}{
\Variant{\textfrench{N1} karikṣete.}}
yogaśatasya bandhaḥ|| 1||
}{\Lemma{}\Grammatical{Upajāti.}}
\par % necessary to stop \autopar complaining. Thanks to Alessandro Graheli.
\endgroup
\end{document}

Monday, July 05, 2010

XeLaTeX for Sanskrit

This example worked well in July 2010, but some TeX packages have since been updated slightly.  See the new, updated version of this example, posted on 27 May 2013.



Sunday, April 04, 2010

TeXWorks for linux

TeXworks is a nice editor with an emphasis on multilingual use, simplicity and rapid document preview. It is from Jonathan Kew, author of XeTeX.

Binary downloads for Mac and Windows are available from the TeXworks home page. For Ubuntu Linux, there's a PPA here.

Monday, July 30, 2007

Critical edition typesetting

Some years ago, John Lavagnino and I wrote the EDMAC
software for typesetting critical editions. EDMAC was an application for use with plain TeX. Later, adaptations were made to allow EDMAC to work with LaTeX etc. More recently, the ConTeXt package, also based on TeX, has been developing methods for handling critical edition typesetting.

Idris Hamid, Colorado State University, recently gave this talk at the TUG 2007 conference, San Diego, about doing critical editions using ConTeXt.