Monday, January 20, 2014

Fwd: Work Flows and Wish Lists: Reflections on Juxta as an Editorial Tool

---------- Forwarded message ----------
From: Dominik Wujastyk <wujastyk@gmail.com>
Date: 18 January 2014 21:58
Subject: Work Flows and Wish Lists: Reflections on Juxta as an Editorial Tool
To: Philipp André Maas <Philipp.A.Maas@gmail.com>, Alessandro Graheli <a.graheli@gmail.com>, Karin Preisendanz <karin.preisendanz@univie.ac.at>, Dominik Wujastyk <wujastyk.cikitsa@blogspot.com>


Some interesting reflections on Juxta...
I have had the opportunity to use Juxta Commons for several editorial projects, and while taking a breath between a Juxta-intensive term project last semester and my Juxta-intensive MA thesis this semester, I would like to offer a few thoughts on Juxta as an editorial tool.
For my term project for Jerome McGann's American Historiography class last semester, I conducted a collation of Martin R. Delany's novel, Blake, or, The Huts of America, one of the earliest African American novels published in the United States.Little did I know that my exploration would conduct me into an adventure as much technological as textual, but when Professor McGann recommended I use Juxta for conducting the collation and displaying the results, that is exactly what happened. I input my texts into Juxta Commons, collated them, and produced HTML texts of the individual chapters, each with an apparatus of textual variants, using Juxta's Edition Starter. I linked these HTML files together into an easily navigable website to present the results to Professor McGann. I'll be posting on the intriguing results themselves next week, but in the meantime, they can also be viewed on the website I constructed, hosted by GitHub: Blake Project home.
Juxta helped me enormously in this project. First, it was incredibly useful in helping me clean up my texts. My collation involved an 1859 serialization of the novel, and another serialization in 1861-62. The first, I was able to digitize using OCR; the second, I had to transcribe myself. Anyone who has done OCR work knows that every minute of scanning leads to (in my case) an average of five or ten minutes of cleaning up OCR errors. I also had my own transcription errors to catch and correct. By checking Juxta's highlighted variants, I was able to—relatively quickly—fix the errors and produce reliable texts. Secondly, once collated, I had the results stored in Juxta Commons; I did not have to write down in a collation chart every variant to avoid losing that information, as I would if I were machine- or sight-collating. Juxta's heat-map display allows the editor to see variants in-line, as well, which saves an immense amount of time when it comes to analyzing results: you do not have to reference page and line numbers to see the context of the variants. Lastly, Juxta enabled me to organize a large amount of text in individual collation sets—one for each chapter. I was able to jump between chapters and view their variants easily.
As helpful as Juxta was, however, I caution all those new to digital collation that no tool can perfectly collate or create an apparatus from an imperfect text. In this respect, there is still no replacement for human discretion—which is, ultimately, a good thing. For instance, while the Juxta user can turn off punctuation variants in the display, if the user does want punctuation and the punctuation is not spaced exactly the same in both witnesses, the program highlights this anomalous spacing. Thus, when 59 reads
' Henry, wat…
and 61 reads
'Henry, wat…
Juxta will show that punctuation spacing as a variant, while the human editor knows it is the result of typesetting idiosyncrasies rather than a meaningful variant. Such variants can carry over into the Juxta Edition Builder, as well, resulting in meaningless apparatus entries. For these reasons, you must make your texts perfect to get a perfect Juxta heat map and especially before using Edition Starter; otherwise, you'll need to fix the spacing in Juxta and output another apparatus, or edit the text or HTML files to remove undesirable entries.
Spacing issues can also result in disjointed apparatus entries, as occurred in my apparatus for Chapter XI in the case of the contraction needn't. Notice how because of the spacing in needn t and need nt, Juxta recognized the two parts of the contraction as two separate variants (lines 130 and 131):
This one variant was broken into two apparatus entries because Juxta recognized it as two words. There is really no way of rectifying this problem except by checking and editing the text and HTML apparatuses after the fact.
I mean simply to caution scholars going into this sort of work so that they can better estimate the time required for digital collation. This being my first major digital collation project, I averaged about two hours per chapter (chapters ranging between 1000 and 4000 words each) to transcribe the 61-62 text and then collate both witnesses in Juxta. I then needed an extra one or two hours per chapter to correct OCR and transcription errors.
While it did take me time to clean up the digital texts so that Juxta could do its job most efficiently, in the end, Juxta certainly saved me time—time I would have spent keeping collation records, constructing an apparatus, and creating the HTML files (as I wanted to do a digital presentation). I would be remiss, however, if I did not recommend a few improvements and future directions.
As useful as Juxta is, it nevertheless has limitations. One difficulty I had while cleaning my texts was that I could not correct them while viewing the collation sets; I had, rather, to open the witnesses in separate windows.
The ability to edit the witnesses in the collation set directly would make correction of digitization errors much easier. This is not a serious impediment, though, and is easily dealt with in the manner I mentioned. The Juxta download does allow this in a limited capacity: the user can open a witness in the "Source" field below the collation visualization, then click "Edit" to enable editing in that screen. However, while the editing capability is turned on for the "Source," you cannot scroll in the visualization—and so navigate to the next error which may need to be corrected.
A more important limitation is the fact that the Edition Starter does not allow for the creation of eclectic texts, texts constructed with readings from multiple witnesses; rather, the user can only select one witness as the "base text," and all readings in the edition are from that base text.
Most scholarly editors, however, likely will need to adopt readings from different witnesses at some point in the preparation of their editions. Juxta's developers need to mastermind a way of selecting which reading to adopt per variant; selected readings would then be adopted in the text in Edition Starter. For the sake of visualizing, I did some screenshot melding in Paint of what this function might look like:
Currently, an editor wishing to use the Edition Starter to construct an edition would need to select either the copy-text or the text with the most adopted readings for the base text. The editor would then need to adopt readings from other witnesses by editing the the output DOCX or HTML files. I do not know the intricacies of the code which runs Juxta. I looked at it on GitHub, but, alas! my very elementary coding knowledge was completely inadequate to the task. I intend to delve more as my expertise improves, and in the meantime, I encourage all the truly code-savvy scholars out there to look at the code and consider this problem. In my opinion, this is the one hurdle which, once overcome, would make Juxta the optimal choice as an edition-preparation tool—not just a collation tool. Another feature which would be fantastic to include eventually would be a way of digitally categorizing variants: accidental versus substantive; printer errors, editor corrections, or author revisions; etc. Then, an option to adopt all substantives from text A, for instance, would—perhaps—leave nothing to be desired by the digitally inclined textual editor. I am excited about Juxta. I am amazed by what it can do and exhilarated by what it may yet be capable of, and taking its limitations with its vast benefits, I will continue to use it for all future editorial projects.
Stephanie Kingsley is a second-year English MA student specializing in 19th-century American literature, textual studies, and digital humanities. She is one of this year's Praxis Fellows [see Praxis blogs] and Rare Book School Fellows. For more information, visit http://stephanie-kingsley.github.io/, and remember to watch for Ms. Kingsley's post next week on the results of her collation of Delany's Blake.
----
Shared via my feedly reader
Dominik Wujastyk, from Android phone.

Wednesday, January 15, 2014

Zooniverse and Intelligent Machine-assisted Semantic Tagging of Manuscripts

I'm very impressed by the technology being used in the War Diaries Project.  To see what I mean, click on "Get Started" and try the guided tutorial.

Once there's a critical mass of digitized Sanskrit manuscripts available, I think it would be very interesting to contact the people at Zooniverse and discuss the possiblility of a Sanskrit MS-tagging project, like the War Diaries.

Tuesday, December 17, 2013

Tools for cataloguing Sanskrit manuscripts, no.1



In the post-office today I saw this piece of board that's used as a size-template to quickly assess which envelope to choose.  This is a formalized version of the same tool that I used for the many years that I spent cataloguing and packing Sanskrit manuscripts at the Wellcome Library in London.  I made a piece of board with three main size-outlines, for MSS of α, β, γ sizes.  Anything larger than γ counted as δ.  Palm-leaf MSS were all ε.

It was nice to see the same tool being used for a similar job, in an Austrian post-office!

Friday, December 13, 2013

Corrupted font spacing in terminal


http://i.stack.imgur.com/D6HgO.jpg

I had this problem, that was solved by purging pango-graphite:
sudo apt-get purge pango-graphite

Thursday, December 12, 2013

From Gnome to Cinnamon

Gnome 2 and 3

Ubuntu with Gnome 2
Ubuntu with Unity
After moving to Ubuntu GNU/Linux for all my work, in 2009, I used the default interface, Gnome 2, for a while.  Later, with version 3, Gnome moved to a completely new concept, but Ubuntu forked the development and moved to Unity, so I did that too.  


Gnome 3

Gnome 2 and Unity both had their virtues and their flaws.  The six-monthly upgrade cycle ("cadence") was never as smooth as it should be, so there have often been niggles that lasted a few weeks or months.  I didn't like Unity's two different search boxes.
Ubuntu with Gnome 3
I started using the Gnome Shell, ver. 3, on Linux (Ubuntu) after seeing my friends and colleagues using it at the TeX conference in Trivandrum in 2011.  I really liked Gnome 3, but with the update from 3.6 to 3.8 and 3.10, they did some major, major things wrong, and I've finally dumped it, in favour of Cinnamon.

The biggest boo-boo in the development of Gnome from version 3.8, was fooling with the default file-manager, Nautilus.  Many people have complained about the stripping out of function, like split-screen, and that was bad enough.  So was the nonsense about shifting the menus to the panel bar (or not!).  But what hasn't got mentioned so much (at all?) is that the new Nautilus changed all the keyboard shortcuts and rearranged the shortcuts relating to the menu system.  So Alt-F didn't bring up a "File" menu any more, for  example.  Right mouse-click+R didn't begin renaming a file.  If one uses computers all day, then one's fingers get trained, and no interface designer should mess with that stuff without expecting backlash.  With Nautilus 3.8, it was like being a beginning typist again, looking at my fingers, chicken-pecking for keys.

I liked the general design model of Gnome 3, with the corner switch to the meta level for choosing programs, desktops, and so on.  Searching for lesser-used programs with a few keystrokes rather than poking hopelessly through nested menus.  Much better.  A genuine and valuable contribution to the vision of how a computer should work.

Thanks to Webupd8, I was able to work around the Nautilus problem by uninstalling it and using Nemo instead.

But things just kept going wrong.  The shell crashed too often.  On two of my machines it stopped coming up at login, and had to be started manually.  Only after a couple of weeks did I track this down to a bad file in ~/.config/gnome-session (and I'm still not 100% sure).  Frequent crashes of the gnome-control-panel and other utilities.  More and more extraordinary tweaking to make it comfortable and useable.  Finally, I've had enough.

 

Cinnamon

Ubuntu with Cinnamon
I'm in my first few days of using Cinnamon, and so far things are okay.  I'm running Cinnamon on top of Ubuntu.  It's like stepping back in time, a bit, all those menus.  But one doesn't have to use them, and with a bit of tweaking one can set things up so that actual shell behaviour is very similar to Gnome 3.6.  Nemo is there - what a relief.  Alacarte actually works, but I've dumped it in favour of Menulibre in any case.  Configuration and tweaking is much nicer.  Many useful add-ons, and although I liked the http://extensions.gnome.org system, Cinnamon handles the add-on business in a much more integrated way.  Ibus+m17n work as expected again.  In general, it's an update from Gnome 2 in the direction of Gnome 3 but not the whole way.  And it seems more stable, which is critical to getting work done.

Tuesday, November 12, 2013

Human interfaces

“Because I believe interfaces to government can be simple, beautiful, and easy to use.”
-- Jen Pahlka, via Tim O'Reilly's blog.

Friday, October 18, 2013

Sanskrit manuscripts lost with the Titanic





Adheesh Sathaye mentioned today that the terrible sinking of the Titanic in 1912 was also the occasion of the loss of fourteen Sanskrit manuscripts of the Vikramacarita.  The MSS were on their way from Bombay to Edgerton in the USA.

Here is Edgerton's account, also kindly supplied by Adheesh.

There's an uncomfortable ambiguity in Edgerton's prose, regarding the predicate of his expression "terrible disaster."

Wednesday, October 16, 2013

Kenney on the poor success-rate of editorial conjectures

I cite this page of Kenney's The Classical Text often, partly in my own mind, and partly to friends and colleagues.  So I thought I'd reproduce it here.

The point being, be careful with conjectures, and remain sanguine that - if it is ever possible to check - over 95% of your conjectures will be wrong.

-----

Thursday, October 03, 2013

Checklist of things to do on reinstalling Ubuntu

I'm finally moving to 64 bit Linux (since all my machines are fine with that).  My disks have my /home and root files in different partitions, so I can erase and reinstall Linux itself without touching any of "my" files.  This works flawlessly, and the new installation comes up with all my old desktop settings etc.  Since the Ubuntu Software Center's "sync" function still doesn't work properly, here are some installations and customizations that I like:

Wednesday, September 25, 2013

"ucgadkw" and UCL

Dear colleagues,

For about 25 years I had an email account at University College London.  It was "ucgadkw@euclid.ucl.ac.uk" from 1988, and then "ucgadkw@ucl.ac.uk" with aliases to "wujastyk@ucl.ac.uk" and "d.wujastyk@ucl.ac.uk".  After leaving UCL in 2009 I was given honorary status, so the accounts continued, as did access to my filestore, that for many years was the INDOLOGY website. 

But that has all now been unplugged and switched off. I was not informed or warned about this, and a UCL computer support person has just told me that it is unlikely that I'll be given access to my old files because apparently they don't belong to me, but to UCL ("work for hire").  I have a backup, but it is slightly out of date.

There's obviously a lesson here: if you care about your work, don't store it exclusively on a university computer.  Strange thing to say, but unfortunately true. 

Wednesday, July 24, 2013

Minimal example of XeLaTeX with Velthuis input mapping



\documentclass{article}

\usepackage{polyglossia}

\newfontfamily
  \sanskritfont [Script=Devanagari,Mapping=velthuis-sanskrit]{Sanskrit 2003}

\begin{document}

\sanskritfont
\noindent\huge

aasiidraajaa nalo naama viirasenasuto balii|\\
upapannairgu.nairi.s.tai ruupavaana"svakovida.h||

\end{document}

Monday, July 08, 2013

Resources for OA book publication

My post "Some OA journals that publish S-Asia related research," was self-evidently devoted to journals.

There is also a growing field of services for publishing OA books.  Some research funding agencies, such as the FWF in Austria, require contractually that books too should be published OA.  To me, the business model for OA book publishing is less clear than that for journals, and I see many difficulties.  Nevertheless, the field is growing.

Some resources:
  • Directory of Open Access Books

    "The primary aim of DOAB is to increase discoverability of Open Access books. Academic publishers are invited to provide metadata of their Open Access books to DOAB. Metadata will be harvestable in order to maximize dissemination, visibility and impact."
  • Knowledge Unlatched

    "Knowledge Unlatched is a not-for-profit organisation committed to helping global communities share the costs of Open Access publishing so that good books continue to be published and more readers are able to engage with them."
  • Open Humanities Press

    "The basic idea is simple: making peer-reviewed literature permanently available, free of charge and freely redistributable by taking advantage of the low cost and wide access of internet distribution." ... "After looking at the various efforts underway, we concluded that an editorially-driven international press, focused on building respect through its brand, is what is required to tackle the digital 'credibility' problem. With OHP, we aim to emulate the strengths and flexibility of commercial presses, while avoiding the institutional limitations of the university-based e-presses."
  • Open Edition

    "OpenEdition is the umbrella portal for OpenEdition Books, Revues.org, Hypotheses and Calenda, four platforms dedicated to electronic resources in the humanities and social sciences."
  • Open Edition Books

    "OpenEdition Books est une plateforme de livres en sciences humaines et sociales. Plus de la moitié d'entre eux est en libre accès. Des services complémentaires sont proposés via les bibliothèques et institutions abonnées."
  • Open Access Publishing European Network

    "Online library and publication platform.  The OAPEN Library contains freely accessible academic books, mainly in the area of Humanities and Social Sciences. OAPEN works with publishers to build a quality controlled collection of Open Access books, and provides services for publishers, libraries and research funders in the areas of dissemination, quality assurance and digital preservation."

Monday, June 24, 2013

Where's that Book? On the physical locations of knowledge

Many years ago, my then wife – whose academic specialism was similar to my own – decided that it would make sense to merge our book collections. When I got home from work, our two libraries had been unified, and arranged alphabetically by author. I experienced this as a negative change, I felt lost, ignorant, almost lobotomized. I took the rearrangement as an act of mild aggression, although I admitted freely that this was not at all my wife's intention. From her point of view, she was making it easier for both of us to access our complementary book collections. She was being tidy, logical, orderly. From my point of view, I had lost the ready mental association of book with spatial location, that was integral to the way I habitually remembered references.

For me, the act of remembering academic matters often goes as a rapid internal dialogue, something like this: “Ah yes! So-and-so said something about this. Now what was it exactly? Hmm.” Then I would turn from my desk towards the shelves, thinking, “With the books on topic X. Top right, at about eye-level.” In my mind's eye, my attention roams to the spot on the wall where the book was located. Then I would get up, fetch the book, and find the passage I was after.
In my spatial memory, the phrase “books on topic X” need not actually refer to a subject location. It could be books of a certain colour, especially if a series were all printed in the same style of binding. Or books of a certain size, or that I bought or was given at a certain time. The arrangement of books on my shelves was – and is again – mostly subject-wise, but there are exceptions. But this doesn't affect my ability to remember spatially where books are. As long as the books are where I put them in the first place, then I can rapidly and efficiently find them again. 

The distress I experienced when the books were put into somebody else's logical order was a very real reduction in cognitive power.

I have experienced this several times since then, especially when moving house or office. Physically displacing my personal book collection results in a loss of bibliographical and cognitive control. That control is never quite recovered, in spite of attempts to reproduce the original arrangement of the books. Such a rearrangement is never quite possible. This is partly because the layout of the new storage space does not lend itself to reproducing the original ordering. But also because one's physical resources of energy and time do not allow for a full recovery of the original arrangement.

All this raises the broader question of books, locations, and memory. My own method of relating cognitively to my books is, I suppose, a pale version of the famous medieval European concept of the Theatre of Memory, so eloquently described by Frances Yates. [At this point, I recall that the Yates book is at above head-level on the second column of books in the next room. I fetch it, and write out the bibliographical information for the following footnote.]1
Further points for development:

  • On moving libraries, the Kern Library.
  • What is lost, what is gained.
  • Open access versus closed access.
---
1 Frances A. Yates, The Art of Memory (London, ARK Paperbacks, 1984). First published in 1966.

Reflections on translation from Sanskrit

The following was written by me in another context, as part of a conversation.  I'm reproducing it here in case it is of general interest:

Monday, May 27, 2013

XeLaTeX for Sanskrit: update

In a post on 5 July 2010, I gave an example of how to use XeLaTeX with various fonts and various ways of inputting text.   Some time later, the commands in the Fontspec and Polyglossia packages were updated, and my example didn't work as advertised any more.  Here is an update that works again.

Wednesday, May 15, 2013

Changing publication models

With the growth of good desktop document processing software and the universality of good, free Unicode fonts, it is now entirely feasible for an individual to produce excellent camera-ready copy of an academic book for themselves, with modest effort over a modest period of time.

With services like Lulu and Createspace, the transition from a PDF on your computer to a hard-bound, published book sold online and through Amazon, Barnes & Noble, etc., is also very easy and cheap.  I mean, less than about $100, total cost.  I did a book with Lulu a couple of years ago (my father's memoirs), and I paid $60 to cover distribution through Amazon and all other big bookshops and online services.  Everything else was free.  The book is large, 650 pages, and costs about $50 for hardback, with free shipping in the USA (e.g., Amazon, B&N).    I also made the PDF downloadable directly from Lulu at $12.

What does all this mean?

What it means is that publishers are no longer necessary for performing the traditional roles of book production and distribution.   Authors can now do this satisfactorily for themselves at marginal cost, high quality, and with international distribution.

What remains?  What I call "Gatekeeping" services.  With today's deluge of free online resources, what we all really do need is someone to take responsibility for guaranteeing high intellectual quality.  Trustworthiness.

Traditionally, this was also a role performed by some publishers, especially the university presses.  A book on Buddhism from Cambridge University Press *should* be of a different calibre from a book on Buddhism from, say, Harlequin or Mills & Boon.   The good academic publishers acted as gatekeepers, offering an implicit guarantee of intellectual quality.

But if you look more closely at this arrangement, the university presses rely heavily on the free services of university staff for refereeing, book acquisition, series curation, and sometimes even content-editing and copy-editing.  In-house copy-editing was usual, however, and often of a high standard.

Another service that a big university press provides is prestige.  A young scholar with a book published by Princeton is likely to do better at getting a job than another with a book published with a publisher of less prestige.  This is because appointment committees are willing to take the implied quality-guarantee of Princeton UP.  But again, Princeton only publishes books because unpaid academic referees at universities give the thumbs-up.  The process is circular.

What does all this mean?

If books can be produced and distributed by academics themselves, and refereed and edited by them too, what is left for publishers?  Not much, I think, unless they dramatically change their business and service models.  

What we see going on today, I believe, are the last convulsions of a dying industry.  Yes, they're making a lot of money, but only because of the inertia and uncertainty of academics.  What used to be called FUD ("fear, uncertainty and doubt").  The upcoming younger generation of scholars with different preconceptions will probably not be so smitten by the prestige of old publishing houses, and will be more adept at self-publishing.

What remains is the need for gatekeeping, for the guaranteeing of quality.  If publishers really took that seriously, and divorced their editorial selections and quality judgements from their need to remain profitable, then they might salvage for themselves a genuine role in the future.  I cannot see a way in which genuine academic quality can be guaranteed by an institution that simultaneously has to satisfy criteria of profitability.  As long as their are two goals - quality and profit - there will inevitably arise cases of conflict and compromise.  In short, gatekeeping is the job of (publicly-funded) university staff, not a (commercial) publisher.

The alternative to this is that university staff take back into their own hands all the processes of the production and distribution of knowledge.  In fact, this is the change that the major funding bodies are pressing upon us, with the widespread requirement that publicly-funded academic research be published Open Access.  It is also the original idea of the university press.

Here's a hypothetical model for a future academic book series. 

  • Author on a research grant or university salary writes a book. 
  • The book is typeset using LibreOffice or TeX.  The university department provides some secretarial support to help, or some money from the research grant pays for smart word-processing by an agency.
  • The book is sent to an external commercial copy-editing company to tidy up the details.  A smart, accurate PDF results. 
    This is paid for by the university department, or out of the research grant (this is already common).
  • The PDF is submitted to a panel of academics somewhere who curate a book series, judging the intellectual quality of the submissions.  The book is accepted as an important intellectual contribution..
  • The PDF is uploaded to Lulu.com or Createspace, where it is turned into a print-on-demand hardback book for sale internationally through Amazon etc., and in bookshops. 
    Lulu are the printers and distributors. 
    The ISBN is provided by the university department, so they are the publishers, not Lulu.  
  • The book is advertised through a prestige university website that promotes the book as an intellectual contribution, contextualizes it as a university-curated product, and made available for sale through a simple click link to PayPal, Amazon, etc.  The university's series name is printed in the book, and splashed all over the website.
Ooops: high quality production, high quality intellectual content, university curation, international sales, but no "traditional" publisher!

Please blow holes in what I've said. There must be an elephant in the room that I'm not seeing.

(reproduced from my post to the INDOLOGY discussion list, 15 May 2013)

Saturday, May 11, 2013

emusic.com solution for Linux

Emusic.com is a truly great music service, particularly since they have been non-DRM from their inception.

But - strangely - they have never supported Linux properly.

Many thanks indeed to Matt Woodward for pointing out that all Linux users need to do is install emusicj and then point and click this link:
!

Tuesday, April 02, 2013

Future philology

A very interesting and enjoyable Skype with Elenea Pierazzo at KCL left me with lots to think about, and links to all sorts of digital projects that I was unaware of or only half-aware of previously, including

Tuesday, February 26, 2013

Converting XeLaTeX into ODT or MS Word

TeX4ht can do a lot of the work of converting from LaTeX to wordprocessor.  But when one adds in the complications of UTF8 characters, multiple scripts, and XeLaTeX, things can get complicated.

C. V. Radhakrishnan today pointed me to this discussion on the TeX4ht mailing list:
What Radhakrishnan says is:
As far as I understand, TeX4ht won't support fontspec or XeLaTeX
technologies of using system fonts that do not have *.tfm's. In effect, by
adopting TeX4ht, one is likely to loose the features brought in by XeTeX.
However, here is another approach.

   1. We translate all the Unicode character representations in the
   document to Unicode code points in 7bit ascii which is very much palatable
   to TeX4ht. A simple perl script, utf2ent.pl in the attached archive does
   the job.
   2. We run TeX4ht on the output of step 1.
   3. Open the *html in a browser, I believe, we get what you wanted. See
   the attached screen shot as it appeared in Firefox in my Linux box.

Here is what I did with your specimen document.

   1. commented out lines that related to fontspec package from your
   sources named as alex.tex.
   2. added four lines of macro code to digest the converted TeX sources
   3. ran the command: perl utf2ent.pl alex.tex > alex-ent.tex
   4. ran the command: htlatex alex-ent "xhtml,charset=utf-8,fn-in" -utf8
   (fn-in option is to keep the footnotes in the same document). I have used a
   local bib file, mn.bib as I didn't have your bib database. biber was also
   run in the meantime to process the bibliography database.
   5. open the output, alex-ent.html in a browser. I got it as you see in
   the attached alex.png.
 Radhakrishnan's PERL script utf2ent.pl is
#!/usr/bin/perl

use strict;
use warnings;

for my $file ( @ARGV ){
  open my $fh, '<:utf8 br="" cannot="" die="" file:="" file="" open="" or="">   while( <$fh> ){
      s/([\x7f-\x{ffffff}])/'\\entity{'.ord($1).'}'/ge;
        print;
  }
}


For Radhakrishnan's continuing comments on TeX4ht development, see
TeX4ht's homepage:

Thursday, January 31, 2013

Some OA journals that publish S-Asia related research

Name and URL online print Fee?* Copyright Licence DOAJ
------- --- ------ -------- --------- ---------------- ------
Bhāṣā: Journal of South Asian
Linguistics, Philology and
Grammatical Traditions
Y N N author CC-BY no entry
Himalaya Y N N not stated CC DOAJ
Studia Orientalia Electronica Y N N author Full OA no entry
वागर्थः (An International Journal of
Sanskrit Research)
Y Y ₹5900/-
unstated but OA no entry
KERVAN - International Journal of
Afro-Asiatic Studies
Y N N author CC-BY no entry
Social Sciences Y Y N author OA
DOAJ
Journal of World Philosophies (formerly Confluence) Y N N journal CC BY no entry
Journal of Bengali Studies Y N? N author a benign muddle DOAJ
Acta Poética Y N? N journal CC BY-NC DOAJ
Linguistica Y Y? N author CC BY-SA DOAJ
Hiperboreea Y N? N journal CC BY-NC-ND DOAJ
Scripta Instituti Donneriani Aboensis Y ? N journal CC BY-NC-ND DOAJ
Ancient Science of Life Y Y N journal CC BY-NC-SA DOAJ
Asian Studies Y N N author CC BY-SA DOAJ
Acta Linguistica Asiatica Y N N author CC BY-SA DOAJ
South Asia Multidisciplinary Academic Journal Y N N journal CC BY-NC-ND DOAJ
Journal of Ayurveda and Integrative Medicine Y Y N journal CC BY-NC-SA DOAJ
Journal of Ayurveda and
Holistic Medicine
Y N? ₹1500
for Ind.
nationals
journal CC BY-NC-SA
History of Science in South Asia Y Y N author CC BY-SA DOAJ
Asian Literature and Translation Y not yet N author CC
Ancient Asia Y N? N? author CC BY DOAJ
Approaching Religion Y N? N?


Relegere: Studies in Religion and Reception Y N N author

Bijdragen tot de Taal-, Land- en Volkenkunde Y N N journal CC BY-NC DOAJ
Asian Social Science Y Y $300


Studi Linguistici e Filologici Online Y N N


eJIM - eJournal of
Indian Medicine
Y Y, cost N


Journal of History and Social Sciences Y N N?


Annals of Ayurvedic Medicine Y N N


Rivista di Studi Sudasiatici Y Y N


Himalayan Linguistics Y N N


Journal of South Asian Linguistics Y N N?


Annual of Urdu Studies Y Y, cost N


Pacific WorldJournal of the Institute of Buddhist Studies Y Y, free N? journal

Health, Culture and Society Y N N author CC BY DOAJ
Open Journal of Philosophy Y N $400+
$50/page
above 10 pages
journal

International Journal of Jaina Studies Y Y
N journal Print copies
from HGK

Asian Ethnology olim
Asian Folklore Studies
Y Y N journal

Electronic Journal of Vedic Studies Y N N?
restricted
OA policy

Transcultural Studies Y N N



And see the India-related list that used to be maintained by Scholars Without Borders (mostly science and medicine):

Saying that a journal is OA still leaves some critical questions unanswered.  E.g.,
  1. Is there a "going in" fee, or Article Processing Fee (APF)?
    One of the items in the list above charges $300.  Several of the big houses like Brill, Elsevier and Springer will also publish your article as OA, even in an otherwise non-OA journal, if you pay them enough.  Their APF prices are typically $3000 (Springer, Elsevier).  I am not interested in including such journals in the list above, as I consider anything above $300-500 to be profiteering.  APFs of $300-500 are typical of some even very large OA publishers like Hindawi, proving that this is a valid business model. 
    Quite apart from my personal view, I do not think APF fees of $3000 meet most people's normal expectation of the meaning of an Open Access journal.  As South Asianists, we are interested in access for both readers and authors in countries where scholars are relatively poor.  A high APF mutes less wealthy authors.  As such, both Gratis OA (also called "Diamond OA") and low or zero APFs are indexes of relevance.
  2. Copyright: being OA means that whoever owns the copyright has given permission for the article to be disseminated at zero cost.  But it doesn't say anything about who owns the copyright of the article.  Many OA journals allow the authors to retain copyright, but not all.
  3. Is the journal online-only, or both online and in print?
  4. The online version is free by definition, but the print issues would usually cost something.  How much?
  5. Is the journal indexed by the main global indexing services?
  6. Is the journal peer-reviewed? Strongly or weakly?

More distinctions (e.g., Gratis OA (free of price) and Libre OA (free of price and rights restrictions)) and discussion in Wikipedia (consulted 13 Feb 2012).  The Sherpa/Romeo website helps with some of this.

I'm putting some indicators in parentheses after the journal title, for those cases where I can find out the information without correspondence.

It is often hard to find out these facts from the journals' websites.  This suggests to me that for some of the editors, the various business models of OA publishing are not always well understood. 

[2022-11: I keep the above list up to date as I hear of new journals.  But some of the work of this blog post has now been superseded by FOASAS.]



---
*APF = Article Processing Fee, a fee that the publisher charges the author or the author's institution for publication in the Open Access journal.  See discussion in Wikipedia (consulted 12 Feb 2012).