SOAS Wa Dictionary Project

What's New at the SOAS Wa Dictionary Project



8 April 2013

Microsoft Keyboard Layout Creator (MKLC) ver. 1.4 SOASShan Shan keyboard for typing Shan in conformance with Unicode v.5.2 (no Private User Area characters) (self-extracting zip archive, 292KB, updated 2013-04-10). This is work in progress! The keyboard arrangement (PDF, 68KB) is basically identical with the also new Keyman SOASShan keyboard. The principles are the same phonetic/mnemonic ones of the SOASMyanmar keyboard, but with the much smaller set of characters needed for Shan mapped to the sound-alike keys of the QWERTY keyboard. Burmese characters not needed for Shan are mapped to Ctrl+Alt+x (i.e., AltGr+x, Right Alt+x) keys. This keyboard may be downloaded and installed in Windows 2000, XP, Vista, or Windows 7.

New separate web page on SOAS Wa and other Keyboards

4 August 2006

Adobe Acrobat PDF files of fourth draft (2006-07-11) of Wa print dictionary (various combinations of orthographies, languages, serial orders, features, and sizes)

  • Chinese orthography (Wa-CN) version with translations into Chinese and English. (PDF - 8,315KB, created 2006-07-18)
  • Myanmar orthography (Wa-MM) version in a-b-c order, with translations into Burmese (Unicode 4.0), Chinese, and English. (PDF - 10,298KB, created 2006-07-18)
  • Myanmar orthography (Wa-MM) version in Indic (k- kh- g- ng-...) order, with translations into Burmese, Chinese, and English. (PDF - 10,332KB, created 2006-07-18)
  • Unabridged version, in Wa-CN order, primarily for staff use. Includes parallel orthographies (Wa-CN, Wa-MM, and Unified Wa), with translations into Burmese (Unicode 4.0), Chinese, and English. Also includes editing notes, sources of entries and examples, etc., in a-b-c order. (PDF - 14,678KB, created 2006-07-18)

23 April 2006

Three new titles have been added to the corpus of Wa texts:

  1. Līg lāi hmai jei jīe noung quing (佤族农村实用知识读本) (this is an especially important addition, since it fills a gap in the corpus with over 350 pages of how-to info about daily life, from tractor maintenance to castrating pigs to recipes for pickled vegetables)
  2. Ngai līg ngai lāi nblōng blag kaix līh Gon Būi: Dox noh gah Mīex Geeing Gon Nyōm Ming Qux ba roi dūx an gix (a brief health-care manual for expectant mothers)
  3. Nbēen loux gon nyōm Joung Gox ("Chinese Fairy Tales", but despite the title the tales are not restricted to China)
    • Been lox kawn: nyawm Cong: Kawx (Wa-MM version of Nbēen loux gon nyōm Joung Gox (a semi-automatic conversion of the preceding text into Myanmar Wa orthography)

1 April 2006

The text strings of synonyms, near-synonyms, and phonetic variants used to keep track of the links among these in the Wa dictionary database can be viewed in a plain-text file here.

29 March 2006

We finished cleaning up the entire Chinese text of Prof. Wang Jingliu's 王敬骝 opus magnus Loux Gāb Vax (佤语熟语汇释) and as a way of celebrating, we also cleaned up the correspondence list of Wa-CN orthography and International Phonetic Alphabet (IPA) in the appendix. It's a nice, easy-to-grasp, supplement to our existing correspondence charts, so we added a link to it on our Wa Resources page as a small separate document too.

27 March 2006

We redid the Wa Bibliography in paragraph form (actually a bulleted list), rather than in table form, along the lines of the recent conversion of the Gazetteer of Wa Area Placenames. It is available as both a dynamic XML-XSLT document or a static HTML document.

25 March 2006

We're ba-a-ck. We converted the Gazetteer of Wa Area Placenames from an Excel spreadsheet to its permanent XML form. It had been living somewhere in between these two data structures for the last six months. It is available in lots of different "live" views in both matrix/tabular form and paragraph form on our Wa Resources page. We are able to edit the Wa, Burmese, Chinese, Shan etc. XML data simultaneously in WYSIWYG and structured form, using "XMLMind XML Editor", a great tool from the French company Pixware. Then upload it to our website and XSLT stylesheets do the rest, rendering it in comprehensible (!?) form.

10 October 2005

Adobe Acrobat PDF files of third draft (2005-09-20) of Wa print dictionary (various combinations of orthographies, languages, serial orders, features, and sizes)

17 August 2005

Improved the Wa dictionary database search algorithm and search user interface in several ways, including a more powerful "Starts with" search and the ability to choose which fields in the database to display in paragraph format.

15 August 2005

We have a new, improved way of displaying our data tables and various XML-encoded Wa text files, based on XSLT stylesheet technology. For tables, it allows one to change the order in which the data is displayed, without rearranging the XML data itself. We have also used frames to give a "split-screen" effect, hopefully improving the ability to grasp the content in tables displayed in a matrix larger than a computer screen can hold at one time.

The first table to which this technique has been applied is our draft gazetteer of Wa Area Placenames, which at the same time, has had quite bit of new info added (but still has a great deal of wrong or missing information.) Here are the links to the new views, from our Resources page:

Wa Area Placenames (draft): Wa names, with equivalents in neighboring languages (updated 2005-08-15)

28 July 2005

Updated all orthography tables to reflect changes in our IPA (International Phonetic Alphabet) notation for palatal final stops and nasals; aspirated initials; and pre-syllable si-. Some samples of the changes are shown below:

FormerlyIs now
hoikhoc
si daiŋs.daɲ
ti̤ŋti̤ɲ
lhɛʔlʰɛʔ

16 July 2005

Recently it was pointed out that the serial order used for the Myanmar version of the Wa dictionary is English-biased, being based on the a-b-c alphabetical order, while the Wa authorities in Pang Kham (Pang Hsang) present the Roman letters in a phonetically logical matrix, in the style of Indic scripts. A serial order can be tentatively constructed on the basis of the matrices for 58 syllable-initial consonants and consonant clusters, 26 nuclear vowels and polyphthongs, and 16 final consonants, as shown in the table belowː

Initials (58) k-, kr-, kl-, kh-, khr-, khl-, g-, gr-, gl-, gh-, grh- (ghr-), glh- (ghl-), ng-, ngh-, p-, pr-, pl-, ph-, phr-, phl-, b-, br-, bl-, bh-, brh- (bhr-), blh- (bhl-), m-, mh-, t-, ts-*, th-, tsh-*, d-, dh-, n-, nh-, c-, ch-, j-, jh-, ny-, nyh-, s-, sh-, [z-,] y-, yh-, r-, rh-, l-, lh-, v-, vh-, w-, f-, h-, x (zero)-
Nuclei: (26)a, au, ao, ai, o, u, aw, oi, i, e, ie, ee, eu, ia, iao, io, iu, yaw**, iie(?), ui (wi), ua (wa), uai*, wie* (uie)*, oe, eei, eue
  or   
Nuclei: (26)a, au, ao, ai, o, u, aw, oi, i, e, ie, ee, eei, eu, eue, ia, iao, io, iu, yaw**, iie(?), ui (wi), ua (wa), uai*, wie* (uie)*, oe
Finals: (16)-(zero) -k, -t, -p, -h, -x, -(zero):, -g, -d, -b, -ng, -ng:, -n, -n:, - m, -m:

*New since August 2005's 1st serialising of Wadict DB.

  1. ts-: e.g., tsi: tsi: chi:, tsi: peung: cu: yi:
  2. tsh-: e.g., tshix, tshi: ting:
  3. -wie-: e.g., gwieh, gwiex, bwie
  4. -uai-: e.g., kuaik, kuaih, kuaix, puaik (OT "hawt pheet puaik tix" -- only case of -uai- in whole OT-NT), juaing. vuai, etc.
    Note to ourselves: E.g, If not -uai-, what should these syllables be in Wa-MM instead? As stopgap, make -wa- same ID as -ua-, use -wa- ID for -uai-?

**New since late September 2005's 2nd serialising of Wadict DB.

  1. -yaw-: e.g., phyawk

If this order were to be applied to entries in the Wa dictionary, it would look like this small sample of Wa-MM entries in Indic order.

15 July 2005

Reorganised the List of texts in the digitised and searched Wa corpus, to make it easier to find works, as the list has grown. Also, metadata has been normalised and the abbreviation which is used to refer to a work has been added to the metadata itself.

13 July 2005

There are updated HTML versions of the following major reference works on the Wa language in our corpus of Wa texts:

  • Loux gāb Vax (佤语熟语汇释), by Wang Jingliu 王敬骝 et al.
  • Wayu yufa 佤语语法, by Zhao Yanshe 赵岩社 and Zhao Fuhe 赵福和

As noted in the section on secondary works at the bottom of the list of texts in the Wa Corpus, there are still many typos, but perhaps 30-35% of the text in each of these works is now clean.

12 July 2005

Two new tables have been added to our Resources. One is a rough draft of a table of Wa Names for Punctuation Marks and Mathematical Symbols. The other is a Concise Table of Abbreviations for Titles of Texts in the Wa Corpus, to give easier and quicker reference than the full PDF file Guide to Symbols and Abbreviations used in the Wa Dictionary.

11 July 2005

Added two new texts to the indexed Wa corpus. Both are Wa translations of speeches by Jiang Zemin, at the Chinese Communist Party congresses of 1992 and 1997:

Due to improvements in our text-processing techniques, these are probably the "cleanest" documents yet in our corpus, at least among those which were not already digital documents and which were OCRed and cleaned up in a multi-stage process. And even better news is that these texts have breathy register marked with macrons. (Some other recent corpus additions ignored register variation.) This is not to say that the source texts do not have the usual challenges. For example, note the titles above. The same publisher (Yunnan Minzu Chubanshe) and same editors chose to write the name of China as "Jōung Gox" throughout the 1992 text and as "Joung Gox" throughout the 1997 text. But in general, it is the 1997 text which generally sprinkles the breathy macron about more liberally than our current dictionary database would indicate was proper (e.g., 'si nyēig' for 'si nyeig'). Another idiosyncracy of the 1997 text is that it spells some words with an -ie- which in "standard" Chinese Wa would be written with an -ia-. E.g.,

UsesFor
diemdiam
dīemdīam
hngiedhngiad
nīednīad
njiednjiad

The use of -ie- for this diphthong is similar to Myanmar Wa usage. Search for [sic] in the texts to see other anomalies. But many unfamiliar forms marked with a "sic" are candidates for adding to the Wa dictionary.

With these additions we debut a new feature of the corpus, which is numbered paragraphs. Using a Web browser which supports CSS2 (Mozilla Firefox, Opera, etc. -- not Microsoft Internet Explorer v.6.0 or less), the paragraph numbers are displayed in square brackets at the end of each paragraph. Numbered paragraphs will enable us to locate and cite text more accurately (including in concordances). In parallel texts such as the Wa texts above and their Chinese original versions shown below, parallel passages in each text can be located more quickly with the human eye (or with the appropriate XML tool, since the numbering uses XML syntax), even though there is no explicit parallel text markup, as there is in our TEI XML-encoded texts. We can also use hypertext links to numbered paragraphs, as well as numbered pages.

Desiderata: we still don't have an automatic paragraph numbering mechanism, but maybe we'll have one soon. Also, even though there is a general paragraph-to-paragraph equivalence in these texts, they still don't match up entirely. So the plus-or-minus-one rule, which is familiar to students of looking up Chinese characters by stroke count, applies here when matching up paragraphs (and is indeed more like plus or minus 3!).

1 July 2005

A "live" link to the Wa dictionary database itself has been added to our page describing the goal and state of the Wa Dictionary Database

30 June 2005

Recent changes to the Wa dictionary database and search engine:

  • An exact-match search within the two main headword fields (Wa in Chinese orthography and Wa in Myanmar orthography) has been enhanced (hopefully!) by making it inexact in two specific ways:
    1. It continues to match a full headword entry as before.
    2. It also matches headword entries which begin with the search item but have variant spellings in parentheses. Thus a search for "nyīex" will also match "nyīex (nyīiex)" and a search for "mai:" will also match "mai: (mai)".
    3. And it also matches variant spellings anywhere within parentheses. Thus a search for "rheung", "rheung:", "rheeng", or "rheeng:" (all common variations in Myanmar Wa texts) will all find the entry "rheung: (rheeng:, rheeng, rheung)". Similarly either "paoxgrawm" or "paox grawm" will find the entry "paoxgrawm (paox grawm)".
    4. Incidentally a few bonus entries are occasionally retrieved which are not wanted, but isn't this a small price to pay for a more powerful search? And it may in fact be good during our dictionary development period, by showing those entries which are believed to have alternative spellings indicated within parentheses (and thus may need tweaking).
  • Added a generic Reference field, which is being used not only for cross-references within the dictionary which the other x-ref fields did not lend themselves to, but also for external references to our other data resources. E.g., unit of weight "kaox".
  • Several small formatting changes to make the editors' jobs easier.

24 June 2005

As a key to the tone numbers on Tai/Shan words in our etymologies, we provide a simple table, based on pp. 598 and 978 in Thomas Hudak, ed. William J. Gedney's Southwestern Tai Dialects: Glossaries, Texts, and Translations (Michigan Papers on South and Southeast Asia, Number 42, Ann Arbor: Center for S and SE Asian Studies, U of Michigan 1994); and pp.xx-xxi in Thomas Hudak, ed. William J. Gedney's The Lue Language: Glossary, Texts, and Translations (Michigan Papers on South and Southeast Asia, Number 44, Ann Arbor: Center for S and SE Asian Studies, U of Michigan 1996). Since so-called Shan and SW Tai dialects are spoken over a wide area and the Wa borrowed different words at different times and places, the table does not accurately reflect the Tai source for any single word or region, but it will give a general idea.

23 June 2005

Began to analyse the numerous words and variations with the general meaning "jump", which appear to differ in how many legs are used, whether the jumper is human or animal, the direction of jumping, etc. To see most of them, search for 跳 (tiào) in the Chinese definition field, or click here.

20 June 2005

Made some minor updates to the Comparative Chart of Wa Orthographies (Based on Initials and Finals) While we were at it, we added a column for Unified Wa.

10 June 2005

Added or updated some files in the indexed Wa corpus, including Lāi Loux (1989) Pug Puan (Vol.5), Si ngian rang mai si mgang lih (佤族神话与历史传说), Nbēen oud mgrong goui gon Ba rāog / Wazu fengqing 佤族风情 (Wa text), and Loux Gāb Vax.

Wa has many words and variations with the general meaning "tie, bind, bundle, lash, hitch, tether, hobble, etc.". Beginning to sort out the various roles of these words has been a challenge and much more work is needed. To see some of the early results, search for 捆 (kǔn) in the Chinese definition field, or click here.

09 June 2005

Information about the 1958 documentary film "The Kawa" and links have been added to the Working Documents page.

08 June 2005

Did a general freshening of all data files on our main website, including both the public HTML versions (links on Resources page and this What's New page) and the source files on the Working Documents page.

31 May 2005

Added much more detail to identification of mammals in the Wa dictionary database, including tentative scientific names and picture illustrations of tentative IDs for most entries. To see these concentrated on a single page, search for 'dou sad' (animal) in Usage (CN) field (or click here) or 'tosat' in Usage (MM) field (or click here). Also began a database of pictures, sources, credits, and conditions of use, with link on Working Documents page.

11 May 2005

Posted updated PDFs for both the Chinese-orthography and Myanmar-orthography versions of the dictionary on the Working Documents page.

26 April 2005

Added a couple of new features to the Wa dictionary database system: one is an improvement in paragraph-mode formatting, which distinguishes the formatting of collocations from that of ordinary examples. Collocations are now bolded so as to make them stand out as the sub-headwords that they are. The second feature is an enhancement to editing entries in the database, making it much easier to rearrange definitions and collocations/examples under a single entry.

15 April 2005

Chinese Pinyin (and Wa) Keyboard, created with Microsoft Keyboard Layout Creator (Self-extracting zip archive, 293KB, updated 2013-04-04). Type all Chinese pinyin vowel+tone mark combinations and Wa vowel+macron combinations (= Chinese pinyin "level tone" or "tone 1") using the vowels a-e-i-o-u and v and A-E-I-O-U and V as dead keys, followed by the numbers 1-2-3-4 for the four tones. The dead key is typed with the AltGr key (or right Alt key) + vowel (or v for ǖ ǘ ǚ ǜ). Plain ü is typed with AltGr + " + u; ê is typed with AltGr + ^ + e. Miscellaneous IPA symbols used for Chinese can be typed with AltGr+number key combinations. E.g., AltGr+1 = ə. For use on Windows 2000 or Windows XP or later only.

Install in Windows by downloading and executing the self-extracting zip archive. Then double-click on the setup.exe file to install. Use the Control Panel, Regional [and Language] Options, to assign this keyboard as an alternate keyboard for an existing language or add a dummy language like Icelandic, Faeroese, Afrikaans, etc., with Chinese Pinyin as its keyboard. Then add a "Hot key" keyboard toggle, such as Ctrl+Shift+2 to switch to that language. Toggle between Chinese Pinyin and the Microsoft Chinese (PRC) Keyboard with Ctrl+Space. Note that since the special accented characters are only activated by pressing the AltGr key along with a keyboard key, this keyboard behaves just like a standard U.S. English keyboard when the AltGr key is not pressed (unlike, the United States-International keyboard, which can be challenging to use for rapid typing of English, since the apostrophe/single quote key is itself a deadkey in that keyboard).

There are two other new keyboards available, also created with Microsoft Keyboard Layout Creator (MKLC). One is an implementation of the SOASMyanmar Burmese keyboard for typing Burmese in pure Unicode v.5.1 only (self-extracting zip archive, 292KB, updated 2013-04-04). This is work in progress! The keyboard arrangement (PDF, 68KB) is identical with the Keyman SOASMyanmar keyboard, but this keyboard does not have, or need, the extra features of the Keyman version, needed for typing Burmese contextual glyphs assigned to the PUA (Private User Area) in the "Unicode-transitional" system we employed.

The other new MKLC keyboard is a Russian Cyrillic phonetic keyboard, based on the arrangement of the AATSEEL (American Association of Teachers of Slavic and East European Languages) Student keyboard (self-extracting zip archive, 290KB, updated 2013-04-05). This provides an alternative to the single Russian keyboard distributed with Windows, which is based on the Russian typewriter keyboard, in which the phonetic values of the Cyrillic letters have no relation to the Latin letters found on the same keys in the familiar QWERTY keyboard arrangement. Either of these keyboards may be downloaded and installed by following the suggestions in the second paragraph above.

14 April 2005

Added a Wa Dictionary Project Staff Roster, so as to keep track of who's who among our various collaborators (password required).

13 April 2005

There are now HTML versions of the following major reference works on the Wa language in our corpus of Wa texts:

  • Loux gāb Vax (佤语熟语汇释), by Wang Jingliu 王敬骝 et al.
  • Wayu jianzhi 佤语简志, by Zhou Zhizhi 周植志 and Yan Qixiang 颜其香
  • Wayu yufa 佤语语法, by Zhao Yanshe 赵岩社 and Zhao Fuhe 赵福和

For links and caveats (namely, these have zillions of typos), see the section on secondary works at the bottom of List of texts in the Wa Corpus

5 April 2005

Updated Working Documents page. Several updated tables and lists (noted on the Resources page) and some new documents, including a very rough draft digital version of Wang Jingliu's magnum opus Loux gāb Vax.

31 March 2005

Added some new orthography-related documents:

21 February 2005

Dictionary search/edit functionality has been enhanced. New features include:

    For all searches (both public and restricted staff searches)

  • ability to display results in parallel for both the orthographies used in the database. The default is to display only one orthography, based on an assumed preference indicated by whichever orthography was last used in searching the main headword field. Sometimes for searches of other fields only the x-wa-CN orthography is displayed as an expedient, or entries are displayed in x-wa-CN order.
  • ability to perform a dictionary search from a URL with a "query string" (e.g., sample entry)
  • ability to search the "Unified Wa" orthography field
  • searches of the collocation/example fields have been restored, after instituting an improved SQL command string
  • For staff searches:

  • the ability to search the ID field directly and to display the ID on the search results page
  • the ability to retrieve records which have been edited within a specified interval of days before or after a specified date.
  • the maximum number of definitions per entry allowed in the database has been increased from 9 to 99; similarly the maximum number of examples per definition has been increased from 9 to 99.
  • if an entry in the database includes a URL link to an illustrative picture, the picture will appear at the bottom of the main editing page (a link just added will need to be saved first, before being able to view it the picture on the edit page).
  • a search of the "Updated By" field will list the records retrieved in ascending order by date.
  • a dropdown box has been added to the Etymology field on the edit page, to make it easier to insert references to other languages.
  • a limited ability to send e-mail to another staff member has been added to the Update page (still being worked on).

17 February 2005

We now have a web page related to the new bilingual Chinese-Wa language text Lāi Loux (actual publication date 2003), for use in primary education in the Wa-speaking areas of Yunnan.

16 February 2005

You can now sample a track from the Lai Rhax Vax Wa Hymns CD.

15 February 2005

Uploaded experimental data to the Unified Wa orthography field in the database. To see what Unified Wa, or Lái Vax Róub Róum, is all about, see our new Unified Wa Orthography page.

14 February 2005

Significantly updated the Young family page, complete with illustrations! For another wildlife snapshot, search the dictionary database for 'giah / kiah' and click on the illustration. (Not sure if the ID as a serow is correct though.)

26 January 2005

After several months of extensive travelling, we are now back in the office and updating things, including many of the pages on this website. These include some new and updated tables and data files, new maps, new texts added to the Wa corpus, and of course daily additions and modifications to the Wa dictionary database itself.

2 December 2004

The Internet-based on-line dictionary database now is capable of displaying examples in the same orthography as the user has selected during the initial dictionary search. Multiple entries are displayed in the alphabetical order of the selected orthography.

22 November 2004

The Internet-based on-line dictionary database has been significantly expanded. There are currently about 10,500 entries, with definitions in Chinese for approximately 90% of entries, in English for approximately 30% of entries, and in Burmese for approximately 3% of entries. Search functions and rendering have been upgraded and enhanced.

4 October 2004

The XML framework for the Waic Etymological Lexicon portion of Diffloth 1980 (see 27 September 2004 entry) has progressed to the point that we've added another couple of links on the Working Documents page to the data formatted with various technologies. The most successful uses Cascading Style Sheets level 2 (CSS2), with browsers which support it, such as recent versions of Mozilla, Netscape, or Opera, or Safari on the Mac.

1 October 2004

We have added the ability to view our ad hoc working metadata in the files which constitute the expanding Wa corpus.

27 September 2004

There is now a link to an XML framework for the Waic Etymological Lexicon portion of Diffloth 1980 (see 16 September 2004 entry) on the Working Documents page (password required). There is no XSLT stylesheet for it yet, but it can be viewed using the built-in stylesheets in current versions of Internet Explorer, Mozilla, and Netscape (the latter two make an effort to display all the exotic characters, but since these are still inaccurate, it doesn't matter much).

22 September 2004

Added to the Wa corpus the lyrics from two audio music tapes (one of which was accompanied by a karaoke VCD) of the popular Wa singer and impresario Ai Yawn Rai. Also broke up the very large single document file for the New Testament of the Bible into separate files, each containing one of the constituent books, in order to promote faster download time and more fine-grained searching.

20 September 2004

Added several texts in Revised Bible orthography (x-wa-MM) to the Wa corpus. Actually these are two different versions of the same textbook, with some overlap, accompanied by considerable variation in spelling of the same words. One version was there already as a plain-text file, but it has now been replaced by three XHTML files, with some text markup and with metadata added.

There are two main studies of Wa grammar, Wayu yufa 佤语语法 and Wayu jianzhi 佤语简志, (both in Chinese with Wa example sentences) and we now have OCRed "dirty" copies of each of these. The plan is to edit/retype the Wa example sentences, so that we at least have clean copies of those, along with their surrounding Chinese descriptive text. Copies of the MS Word documents with the OCRed text are available to collaborators for download in the working documents folder, with links to both the OCRed text and the TIFF image files on the Working Documents page (password required).

17 September 2004

Did more fine-tuning to the SOASMyanmar transitional Unicode font and the SOASShan keyboard for use with the [no longer] free Keyman utility.

The draft of some speculation on how to handle the Shan alphabet in Unicode is now also perhaps ready for public exposure. Each of the three treatments considered would allow for Shan plain text which looks reasonably like Shan; in addition, Shan and Burmese could be mixed in the same plain-text document, with each one looking correct in their respective differing contextual combinations. The approach taken with the SOASShan keyboard is more or less the middle-of-the-road "conservative" option.

Scanned portions of some fairly rare documents to image PDFs as a reference for people who are interested in Shan but who wouldn't be likely to have access to the docs themselves.

Updated various documents and programs on the Working Documents page (password required).

16 September 2004

Began the process of turning Gérard Diffloth's seminal 1980 work, The Wa Languages into a modern searchable, Unicode (eventually XML) document. The English text and the framework of abbreviations for sources is now fairly accurate, but the OCRed data itself (much of it handwritten IPA) is a mess, and it will be a long process cleaning it up. The document is available from the Working Documents page (password required).

15 September 2004

Did more fine-tuning to Wa orthography conversion tables for use with Wa Orthography Converter.

Updated the Writing of the Wa Language web page to more closely reflect our evolving practices in Wa lexicography. Note: as we add more vernacular text to our website pages in Wa, Chinese, Burmese, or International Phonetic Alphabet, it is useful to call your attention to the fact that the free open-source browser Mozilla (or its commercial counterpart Netscape) is usually superior to Microsoft Internet Explorer in displaying the characters of all these languages properly.

14 September 2004

Added this "What's New" page to website.

13 September 2004

The on-line dictionary search pages now have point-and-click buttons for inserting vowel+macron combinations in the search textbox.

There is also a separate dedicated page for doing the same thing with both vowel+macron characters and IPA characters used with Wa. With this page, you need to copy the text to the clipboard and paste it into the application where you want the characters to appear. Let us know if there are other characters which would be useful to have here.

10 September 2004

Added a link on the bottom of the Wa Corpus page to a list of the small number of reasonably clean Wa texts in our small but growing corpus of digital Wa texts.

8 September 2004

The SOASMyanmar transitional Unicode font for Burmese now also has characters needed for Shan, and there is also a new SOASShan keyboard for use with the [no longer] free Keyman utility.

5 September 2004

Added password protection to the on-line dictionary search-and-edit pages for staff use. Userid and password are the same as for access to the private section of the Mercury website (e.g., working documents page).

3 September 2004

Used the improved Burmese Encoding Converter, to convert several Burmese files to SOASMyanmar Unicode-compatible encoding, including converting the Book of Matthew in the New Testament (Judson translation) from Win Researcher font encoding.

1 September 2004

You can now search the SOAS Wa Dictionary project website. We're using the Texis Webinator [ => Google] search engine, and a lot of problems remain with our implementation and configuration.

You can also search a small starter set of the Wa text corpus.

30 August 2004

Improvements to Burmese Encoding Converter, which now supports the conversion of several more encodings.

26 August 2004

Added a section on Fonts and Displaying Burmese Characters to the Wa Resources web page.

20 August 2004

All books with substantial amounts of Wa text currently in our library have now been scanned and stored in digital image format, mostly multipage compressed TIFFs, with some PDFs. Copies are available to collaborators by FTP or on CD-R.


Arts and Humanities Research Council School of Oriental and African Studies

The Wa Dictionary Project is funded by the Arts and Humanities Research Council and hosted at the School of Oriental and African Studies, University of London.

Please send suggestions, queries or comments to Justin Watkins or Richard Kunst.