Showing posts with label XeLaTeX. Show all posts
Showing posts with label XeLaTeX. Show all posts

Tuesday, July 26, 2016

Getting Xindy to work for IAST-encoded text

Xindy is an index-processor for use with TeX and LaTeX.  It is a successor to Makeindex, and is the standard system for formatting and sorting indexes and glossaries that go with LaTeX documents.


The main distribution of Xindy is included in TeXlive and is downloadable at CTAN.  At the time of writing (July 2016), this is version 2.5.1 (2014).  The source code used to be maintained at Sourceforge (Xindy at sourceforge, currently version 2.3 from 2008), but a later version is now available at Github (Xindy at Github).  Since Github has version 2.5.0, source development and compilation for TeXlive must be going on somewhere else, but I don't know where.   The best place to fetch Xindy if you want to tinker with it is from the CTAN base directory.

Use and benefits

The development of Xindy is uneven, given the various repositories with different versions.  The documentation is also of limited use to beginners, being technical and out of date (the examples in the tutorial do not work with the current software release).   Nevertheless, it is a very flexible and powerful program, and does a great job when it works.  And for many texts and nearly fifty modern languages, it "just works," which is great.

In LaTeX documents, the packages index, makeidx or imakeidx will normally be used to provide the macros needed for indexes.  Xindy does the rest.

Sanskrit and IAST

For my writing, I normally use XeTeX with LaTeX and I write using Unicode UTF8 encoding and the IAST transliteration scheme when doing Sanskrit in Roman script.  (For Devanagari I use normal Unicode encoding.)

Xindy by itself doesn't recognize the IAST accented characters like vowels with a macron or consonants with under-dot.  I found that setting Xindy's language to "general" did a pretty good job of nearly all the characters, but not all.  I got words with ā-, ṛ- etc. at the beginning of the index, before "A."


The program for creating a new "alphabet" for Xindy is in Perl and is called make-rules.  I couldn't initially find it at all, because it isn't at the Sourceforge or GitHub repositories (or I couldn't find it).  Later, I found it at CTAN, and I wish I'd seen that earlier.

Finally, I could not get make-rules to work.  The documentation and tutorials simply didn't provide me with enough accurate information to start, as a beginner, and get a workable result from make-rules.

Solution (aka kludge)

I therefore made up a very simple Xindy style file, IAST.xdy, with the following content:
(merge-rule "ā" "a")    (merge-rule "Ā" "a")    (merge-rule "ḍ" "d")    (merge-rule "Ḍ" "d")    (merge-rule "ḥ" "h")    (merge-rule "Ḥ" "h")    (merge-rule "ī" "i")    (merge-rule "Ī" "i")    (merge-rule "ḹ" "l")  (merge-rule "ḷ" "l")    (merge-rule "Ḹ" "l")  (merge-rule "Ḷ" "l")    (merge-rule "ṃ" "m")            (merge-rule "Ṃ" "m")            (merge-rule "ṅ" "n")    (merge-rule "Ṅ" "n")    (merge-rule "ṇ" "n")    (merge-rule "Ṇ" "n")    (merge-rule "ṝ" "r")     (merge-rule "ṛ" "r")    (merge-rule "Ṝ" "r")     (merge-rule "Ṛ" "r")    (merge-rule "ṣ" "s")    (merge-rule "Ṣ" "s")    (merge-rule "ś" "s")    (merge-rule "Ś" "s")    (merge-rule "ṭ" "t")    (merge-rule "Ṭ" "t")    (merge-rule "ū" "u")   (merge-rule "Ū" "u")
  • I place IAST.xdy in my local TeX tree, namely as  ../localtexmf/xindy/modules/IAST.xdy 
  • I run "sudo mktexlsr" to rebuild the TeXlive indexes so that Xindy can find IAST.xdy
  • I then run Xindy from the Linux command line with the following syntax:

    texindy -I xelatex -M iast.xyd -L general -o foobar.ind foobar.idx
This last point is a Knuthian white lie (TeXbook, vii).  I currently use TeXStudio for actual writing, so the above "command line" is entered into TeXStudio's "options/configure/commands" menu and invoked with a convenient function-key shortcut. 


  • texindy is just xindy with some tweaks for use with LaTeX
  • -I means the input file use "xelatex" encoding, i.e., UTF8
  • -M means please use this style file
  • -L means please use the pseudo-language "general" which does the right thing with most UTF8-encoded Roman / European text.
  • "foobar" is replaced by your TeX filename; in TeXStudio's syntax it's "%", which stands for whatever file you're currently working on, 

I'm now getting results that look like this, which is what I was after:

I'm sure that all this could be done more elegantly and completely.  In the longer run, I hope a successor to IAST.xdy might take its place alongside all the other languages formally supported by Xindy.
While working on this, I sent cries for help to the Xindy discussion list.  Zdenek Wagner replied, and shared with me work he has done towards indexing Hindi and Marathi in Devanagari script.

Tuesday, December 16, 2014

Gnu-Freefont fonts and XeLaTeX

The problem

There's been a long-standing issue about using the Gnu-Freefont fonts with XeLaTeX.  The fonts are "Free Serif", "Free Sans" "Free Mono", and each has normal, italic, bold and bold-italic versions.  
These fonts are maintained by Stevan White, who has done a lot of support and maintenance work on them.  
These fonts are of special interest to people who type Indian languages because they include nice, and rather complete Devanāgarī character sets in addition to glyphs for
  • Bengali
  • Gujarati
  • Gurmukhi
  • Oriya
  • Sinhala
  • Tamil
  • Malayalam
The Gnu Freefonts are excellent for an exceptionally wide range of scripts and languages, as well as symbols.  See the coverage chart.

At the time of writing this blog, December 2014, the release version of the fonts is 4-beta, dated May 2012.  This is the release that's distributed with TeXLive 2014, and is generally available with other programs that include or require the FreeFonts.

But the 2012 release of the FreeFonts causes problems with the current versions of XeTeX.  Basically, the Devanagari conjunct consonants in the 2012 fonts are incompatible with the current XeTeX compositing engine. (For the technical: Up to TL 2012 XeTeX used ICU; since TL 2013 it's used HarfBuzz.)

In the last couple of years, Stevan has done a great deal of work on the Devanagari parts of the FreeFonts, and he has solved these problems.  But his improvements and developments are only available in the Subversion repository.   For technically-able users, it's not hard to download and compile this pre-release version of the fonts.  But then to make sure that XeTeX calls the right version of the FreeFonts, it's also necessary to weed out the 2012 version of the fonts that's distributed with TeX Live 2014.  And that's a bit hard.  In short, things get fiddly.

Now, Norbert Preining has created a special TeX Live repository for the Subversion version of the FreeFonts.  TeX Live 2014 users can now just invoke that repo and sit back and enjoy the correct Devanagari typesetting.

New warning June 2017: 
the procedure below is no longer supported.  Don't do it.

Be warned that the version distributed here is a development version, not meant for production. Expect severe breakage. You need to know what you are doing!

Here follow Norbert's instructions (as of Dec 2014).  Remember to use sudo if you have TeX Live installed system-wide.

The solution. A new TeX Live repository for the pre-release Gnu FreeFonts

Norbert says (Dec 2014):

Here we go: Please do:
tlmgr repository add tlptexlive
tlmgr pinning add tlptexlive gnu-freefont
tlmgr install --reinstall gnu-freefont
You should see something like:  
[~] tlmgr install --reinstall gnu-freefont
[1/1, ??:??/??:??] reinstall: gnu-freefont @tlptexlive [12311k]
Note the
After that you can do  
tlmgr info gnu-freefont
and should see: 
Package installed:   Yes
revision:    3007
sizes:       src: 27157k, doc: 961k, run: 19769k
relocatable: No
collection:  collection-fontsextra
Note the
revision: 3007
which corresponds to the freefont subversion revision!!!

From now on, after the pinning action, updates for gnu-freefont will
always be pulled from tlptexlive (see man page of tlmgr).

Reverting the change:

In case you ever want to return to the versions as distributed in TeX Live, please do
tlmgr pinning remove tlptexlive gnu-freefont
tlmgr install --reinstall gun-freefont

Thank you, Norbert!

Wednesday, July 24, 2013

Minimal example of XeLaTeX with Velthuis input mapping



  \sanskritfont [Script=Devanagari,Mapping=velthuis-sanskrit]{Sanskrit 2003}



aasiidraajaa nalo naama viirasenasuto balii|\\
upapannairgu.nairi.s.tai ruupavaana"svakovida.h||


Monday, May 27, 2013

XeLaTeX for Sanskrit: update

In a post on 5 July 2010, I gave an example of how to use XeLaTeX with various fonts and various ways of inputting text.   Some time later, the commands in the Fontspec and Polyglossia packages were updated, and my example didn't work as advertised any more.  Here is an update that works again.

Tuesday, February 26, 2013

Converting XeLaTeX into ODT or MS Word

TeX4ht can do a lot of the work of converting from LaTeX to wordprocessor.  But when one adds in the complications of UTF8 characters, multiple scripts, and XeLaTeX, things can get complicated.

C. V. Radhakrishnan today pointed me to this discussion on the TeX4ht mailing list:
What Radhakrishnan says is:
As far as I understand, TeX4ht won't support fontspec or XeLaTeX
technologies of using system fonts that do not have *.tfm's. In effect, by
adopting TeX4ht, one is likely to loose the features brought in by XeTeX.
However, here is another approach.

   1. We translate all the Unicode character representations in the
   document to Unicode code points in 7bit ascii which is very much palatable
   to TeX4ht. A simple perl script, in the attached archive does
   the job.
   2. We run TeX4ht on the output of step 1.
   3. Open the *html in a browser, I believe, we get what you wanted. See
   the attached screen shot as it appeared in Firefox in my Linux box.

Here is what I did with your specimen document.

   1. commented out lines that related to fontspec package from your
   sources named as alex.tex.
   2. added four lines of macro code to digest the converted TeX sources
   3. ran the command: perl alex.tex > alex-ent.tex
   4. ran the command: htlatex alex-ent "xhtml,charset=utf-8,fn-in" -utf8
   (fn-in option is to keep the footnotes in the same document). I have used a
   local bib file, mn.bib as I didn't have your bib database. biber was also
   run in the meantime to process the bibliography database.
   5. open the output, alex-ent.html in a browser. I got it as you see in
   the attached alex.png.
 Radhakrishnan's PERL script is

use strict;
use warnings;

for my $file ( @ARGV ){
  open my $fh, '<:utf8 br="" cannot="" die="" file:="" file="" open="" or="">   while( <$fh> ){

For Radhakrishnan's continuing comments on TeX4ht development, see
TeX4ht's homepage:

Tuesday, January 22, 2013

TeX implementations in the Cloud

The more mature products (2014)

Both the above have collaborative-editing features.  Both have free access for limited projects, but require subscription for larger projects or collaborative teams.

Others of varying levels of activity (2014)

  • FlyLaTeX (self-hosting; free and open-source)
  • TeXTouch (iTunes, iPhone editor, can compile when online)
  • Verbosus (with Android and iOS apps)
  • Blue Publications
  • LaTeXLab - requires your Google login details :-( 
  • Pine from Sayahna.orgIn alpha test (a document processing system in the cloud that makes use of MediaWiki and its resources)
  • CloudTeX from
    XeLaTeX and LuaTeX supported.   Working prototype available to testers. 
  • A different CloudTeX Seems to have gone quiet as of 2013.
  • SpanDex
    XeLaTeX and LuaTeX available, but limited Unicode fonts.
  • ScribTeX  (phased out as of Feb 2013, in favour of ShareLaTeX, but still exists)
  • MonkeyTeX (4/2014)

Tuesday, July 03, 2012

Sanskrit hyphenation list

I'm gradually building up a file of hyphenated Sanskrit words and compounds, written in the Latin alphabet.  The file is called sanskrit-hyphenations.tex, and you are welcome to download it.

It contains hyphenation points for words in English (ayur-veda), and for words in Sanskrit (āyur-veda).
 To use it, do something like this in your style file:
\newfontfamily\sanskritfont{Sanskrit 2003}
% Define \sansk{} which is the same as \emph{}, except
% that it causes appropriate hyphenation

% for Sanskrit words.  Use \sansk{} for Sanskrit and
% \emph{} for English.

 and \input the sanskrit-hyphenations.tex file after \begin{document}, thus:


XeTeX  already has built-in hyphenation rules for Devanāgarī and Romanized Sanskrit. The above file is intended to extend the hyphenation coverage for Romanized words, using etymological and stylistic considerations.

Thursday, May 24, 2012

Two more good Devanāgarī fonts

I have posted an update to this post with some new material.

Steve White has recently done a great deal of work updating the FreeSerif and FreeSans Unicode fonts (that are, er, free).   He has done especially important work on the Devanagari characters in the font, as well as several other Indian writing systems.  See here for a listing of what has changed. Steve's work means that the Devanāgarī in the Free* fonts now works not only with Xe(La)TeX but also in Firefox, LibreOffice and other programs.  Thanks, Steve!

Zdeněk Wagner recently announced (here) that,
A few days ago version 20120503 of GNU FreeFont was released. This (OpenType as well as TrueType) version contains working Devanagari. FreeSans is based on Gargi with bugs fixed and positions of matras fine-tuned (and changes were reported back to the Gargi developers), FreeSerif is based on Velthuis fonts. Both fonts contain the Indian Rupee sign. In XeLaTeX, conjuncts in FreeSerif can be switched on/off according to the language.
Zdeněk provided an example file showing FreeSans and FreeSerif, and demonstrating the different conjuncts of Sanskrit and Hindī.  I have taken the liberty of expanding his file to compare some of the other leading Unicode fonts that contain both Devanāgarī and Latin typefaces in the same font:

XeLaTeX Input:

\newfontfamily\eng[Mapping=tex-text]{TeX Gyre Pagella}
\setmainfont{FreeSerif} \newfontfamily\eng[Mapping=tex-text]{FreeSerif}
{\eng FreeSerif} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont{FreeSans} \newfontfamily\eng[Mapping=tex-text]{FreeSans}
{\eng FreeSans} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont{Sanskrit 2003} \newfontfamily\eng[Mapping=tex-text]{Sanskrit 2003}
{\eng Sanskrit 2003} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont[FakeStretch=1.08]{Sanskrit 2003}
{\eng Sanskrit 2003+} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont{Nakula} \newfontfamily\eng[Mapping=tex-text]{Nakula}
{\eng Nakula} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
\setmainfont{Sahadeva} \newfontfamily\eng[Mapping=tex-text]{Sahadeva}
{\eng Sahadeva} = शक्ति, kārtsnyam {\addfontfeatures{Language=Hindi} Hindī = शक्ति}
(click to enlarge)

Note the use of the RomDev mapping to get "कार्त्स्न्यम्" out of "kārtsnyam", just for fun.  I've included Sanskrit 2003 twice, the second time with a bit of horizontal stretch, that I think makes it look nicer.
The official web page of the newly-updated FreeSans and FreeSerif fonts is:
As Zdeněk adds, ``Hopefully the font will soon appear in TeX Live and (some) Linux distributions. If you install it independently, be sure that you do not have font conflicts."

Be sure to delete all earlier versions of FreeSerif and FreeSans that might be lurking on your hard drive.  Then install the new version.  If you find the conjuncts aren't working as promised, you probably have an old FreeSerif or Sans lurking in a directory somewhere that you have forgotten about.

Tuesday, October 04, 2011

Simplest Sanskrit XeLaTeX file



Your Devanāgarī looks like this:  आसीद्राजा नलो नाम and your romanized stuff looks like this: āsīd rājā nalo nāma.  


You can get the Nakula font (and its twin, Sahadeva) from John Smith's website,

Monday, November 22, 2010

Hyphenating Sanskrit in roman transliteration

%!TeX program = xelatex
% Thanks to Yves Codet for the first version of this test file, and to Yves
% and Jonathan Kew for the hyphenation tables
% for Sanskrit (hyph-sa.tex):
% This file exemplifies the case where some Sanskrit is embedded in a
% mainly-English document, but the Sanskrit words are appropriately
% hyphenated. The Sanskrit words are in the argument of the
% \textsanskrit{} command.



\setmainfont{Charis SIL}

\newfontfamily\sanskritfont{Charis SIL}

\parindent 0pt


Sanskrit hyphenation:

\textsanskrit{manum ekāgram āsīnam abhigamya maharṣayaḥ |\par}


English hyphenation:

manum ekāgram āsīnam abhigamya maharṣayaḥ |


Wednesday, September 01, 2010

XeLaTeX, Velthuis encoding, and palatal nasals

When using the Velthuis input coding for Devanāgarī, and wanting to have it handled by XeLaTeX, one finds the palatal ñ disappears in the Nāgarī.

input: sa~njaya

output: स न्जय

That's because the Velthuis input code for ञ् is ~n, and the "~" is a special code in TeX, meaning "hard space".

Here's the workaround. I define a font-switching command \dev that will turn Velthuis into Devanāgarī. \dev is mostly made up of "\textsanskrit" which is set up using the standard XeLaTeX/polyglossia \newfontfamily commands. \textsanskrit does the work of invoking the mapping-conversion (from XeTeX's velthuis-sanskrit.tec file).

But just before \textsanskrit, we change tilde into a normal character. And after \textsanskrit, we turn tilde back into an "active" hard space. We use the \aftergroup command so that the "active" version of tilde is activated after the closing of the group that contains the Devanāgarī.

Here's the code:

\newfontfamily\textsanskrit [Script=Devanagari,Mapping=velthuis-sanskrit]{Nakula}

% Make the tilde into a normal letter of the alphabet
\def\maketildeletter{\catcode`\~=11 }

% Return tilde to being the default TeX "active" character for hard space
\def\maketildeactive{\catcode`\~=13 }

\def\dev{\maketildeletter\textsanskrit \aftergroup\maketildeactive}

Here's how you use it:

input: {\dev sa~njaya uvaaca}. What did Dr~Sañjaya say?

output: सञ्जय उवाच. What did Dr Sañjaya say?

where that space betwen "Dr" and "Sañjaya" is hard, and you can't break a line there.


Tuesday, July 06, 2010

Switching from Devanāgarī to Roman with a single command

I have to admit even I am startled by the success of this.
In the input file below, I changed the single command:
  • \setdefaultlanguage{sanskrit}


  • \setdefaultlanguage{english}
and the result was the following:

How do I install RomDev mapping for XeLaTeX (Unicode transliteration -> Devanāgarī)?

[Update, February 2011: Somdev has moved his blog to]

Somdev Vasudev's RomDev mapping is installed as follows:
  1. The actual mapping file is published by Somdev in his blog, here: 
    [Update Feb 2011: now at; update March 2012: now at]
  2. Cut and paste this text, and save it in a Unicode file called  Save that file in a place which XeTeX can "see," e.g., something like local/texmf/fonts/misc/xetex/fontmapping/
  3. You now need to compile the human-readable *.map file into a binary *.tec file, so that XeTeX can read it directly.  This is done by the program Teckit, which you can get here:
  4. I'm working with Ubuntu GNU/Linux.  For me, the command is,

    teckit_compile -o RomDev.tec

    I'm afraid I don't know the Windows or Mac command invocation.

  5. Now you have a file in a place like

  6. Run the command that rebuilds the database of files that TeX knows about.  In Linux it's
    sudo mktexlsr
  7. That's it!  XeTeX and XeLaTeX can now see, and make use of the RomDev mapping, that converts Unicode transliteration into Devanāgarī, as exemplified in my earlier blog posts below. 

    A minimal edition of a Sanskrit verse, using XeLaTeX and Ledmac

    And here's the input for the above:

    % Set up things for XeLaTeX, and Devanagari.
    % Simplified version of
    \usepackage{polyglossia} % the multilingual support package
    % Next, from the polyglossia manual:
    \setdefaultlanguage{sanskrit} % this is mostly going to be Sanskrit,
    \setotherlanguage{french} % with some French embedded in it,
    \setotherlanguage{english} % and some English.
    % These will call appropriate hyphenation.
    \usepackage{xltxtra} % standard for nearly all XeLaTeX documents
    \defaultfontfeatures{Mapping=tex-text} % ditto
    \setmainfont{Gandhari Unicode} % could be any Unicode font
    % Now define the Devanagari font:
    % John Smith's Sahadeva, input using standard UTF8 transliteration
    \newfontfamily\sanskritfont [Script=Devanagari,Mapping=RomDev]{Sahadeva}

    % Now come the commands for the critical edition formatting:
    % customizations to Ledmac, and macros to make life easier.
    % in a real edition, I'd probably also make
    % abbreviations for \textfrench (perhaps \tf) etc.
    \def\hardspace{\texttt{\char`\ }}
    \def\And{{\rm\penalty-1\quad$\mid\mid$~}} % divider between variants to the same lemma
    % more customizations: make the A notes
    % (\Variants and \Lemmas)into two-column format,
    % and make the B notes (\Reference) normal footnotes.
    % changes to stuff cut-and-pasted from ledmac.sty:
    % \hsize .45\hsize
    \hsize .49\hsize
    \parindent=0pt \parfillskip=0pt plus 1fil

    % and here begins the edition:

    \section*{\textenglish{The example verse by itself}}

    \textenglish{From \emph{Yogaśataka: Texte m\'edical attribu\'e
    \`a Nāgārjuna\ldots par Jean Filliozat} (Pondich\'ery, 1979), pp.\,1, 59:\par}


    kṛtsnasya tantrasya gṛhītadhāmna-\\
    ścikitsitādviprasṛtasya dūram|
    kariṣyate yogaśatasya bandhaḥ|| 1||


    \section*{\textenglish{The example verse, with apparatus}}
    % we could use the \stanza command, but I haven't bothered.

    % I find that the judicious use of indentation
    % and newlines helps enormously to see what's what.
    % Using a good "folding editor" would be even better.

    \textfrench{N1 détruit, C1 }kṛtas tasya,
    \textfrench{C2 }kṛtasya.}
    \Tibetan{\textfrench{T \emph{mth'yas}, ``sans limite, immense''
    traduit }kṛtsnasya.}}
    \Variant{\textfrench{Ca, JK }dhamnā.}}\\
    \Lemma{cikitsitād} % not ``ścikitsitā'', of course. We're preserving
    the sandhyakṣaras.
    \Variant{\textfrench{C1, C2 } cikitsitāt.}
    \Tibetan{\textfrench{T \emph{gso-spyad} ''pratique de la
    thérapeutique''. Ordinairement
      \emph{gso spyad} est ``investigation del la th.''}}}% comment sign to stop a break after the conjunct
    \Lemma{viprasṛtasya} % as above with cikitsitād.
    \Variant{\textfrench{Ca} cikitsitārthaprasṛtasya, \textfrench{C1, C2}
    \Variant{\textfrench{Ca} dūrāt}}|
    \\ \indent
    % the above line is annoying. Because the whole verse is
    % inside an \edtext{} macro, in order to get the
    % \Grammatical note naming the upajāti verse, we have to
    % avoid having paragraph breaks, which are not allowed
    % inside \edtext{}.
    % instead, we use \\ (newline) and \indent (paragraph indent)
    % to get the same visual effect. A nasty kludge.
    \Variant{\textfrench{N1} karikṣete.}}
    yogaśatasya bandhaḥ|| 1||
    \par % necessary to stop \autopar complaining. Thanks to Alessandro Graheli.

    Monday, July 05, 2010

    XeLaTeX for Sanskrit

    This example worked well in July 2010, but some TeX packages have since been updated slightly.  See the new, updated version of this example, posted on 27 May 2013.