Unspaced Writing Systems
Many of the world’s writing systems do not use spaces to delineate words. For the purposes of this paper we will focus on two representative instances: Thai and Chinese. Thai is of interest because it employs an alphabetic writing system. Chinese, on the other hand, uses a hybrid logographic/syllabic system.
The Thai language is tonal and written with an alphabetic writing system with several unique orthographic features. Thai is a monosyllabic, monomorphemic language. In its written form it consists of words with an average length of about three to four characters but word compounding is pervasive. It is written from left to right and all consonants and some vowel characters are written horizontally in the main line. However, some specific vowel characters are written vertically above or below the initial consonant character. Tone markers are placed above the initial consonant character and some diacritics and special symbols are placed above the final consonant character. As a result, Thai is considered to have a complex orthography but with a high degree of grapheme-to-phoneme correspondence. Thai is written without spaces between words, but spaces are sometimes used as a form of punctuation to indicate the ends of phrases, clauses or sentences, and sometimes for dramatic emphasis.
Compound words can be a source of word segmentation problems in Thai (Aroonmanakun 2007; 2002). Correct segmentation depends very strongly on sentential context, since many Thai character sequences can have several
readings. For example, ffianaw can be read as ЙПП aw [ta:k lom] (“exposed to
wind”) or ЙТ naw [ta: klom] (“round eyes”). This raises the question of how Thai readers deal with this type of inherent ambiguity. Interestingly, Kohsom and Gobet (1997) found that adding spaces to Thai speeded up overall reading time significantly. There is also anecdotal evidence that newsreaders and public speakers manually segment the text prior to reading.