# Jan Labanowski, jkl@osc.edu, Dec. 30, 1992 # File koi7_8.rus # This is a transliteration data file for converting from KOI7 to # KOI8 (RELCOM-GOST 19768-74). The KOI7 character codes for Russian letters # overlap with Latin letters. To mark what is in Russian and what in English, # The SHIFT-OUT and SHIFT-IN characters are used: The SHIFT-OUT and SHIFT-IN # switch between Latin and Russian character set. The SHIFT-OUT switches # to Russian letters. If the sequence of Russian characters does not start # with the SHIFT-OUT character, it will be treated as English text! # The SHIFT-OUT character is CTRL-N (\14 = \0x0E = \0o16). # The SHIFT-IN character is CTRL-O (\15 = \0x0F = \0o17). It switches back to # Latin characters. # If the SHIFT-OUT character is not present, whole file is assumed to be # written in Latin alphabet. # On the practical side: The KOI7 characters are frequently obtained # from KOI8 character set as a result of transmission through the network. # In most cases, the electronic-mail strips the 8th bit of KOI8 character # set changing it to KOI7. The problem is that in this case, there is no # SHIFT-IN and SHIFT-OUT codes to signal which characters are Latin and # which are Russian. In this case, buy using the editor, YOU HAVE TO ENCLOSE # RUSSIAN TEXT inside the appropriate SHIFT-OUT and SHIFT-IN sequence. # To obtain a true KOI7 file, these should be CTRL-N and CTRL-O, respectively. # However, It is sometimes difficult to obtain this control characters within # an editor. In this case, you may use your own character, but they should # not apear elsewhere in the text. Unfortunately, most "good" characters # are taken. I think that it is better to use a two-character sequence in # this situation. You are free to use your own, but I would suggest that you # stick with {{ (as SHIFT-OUT) and }} as SHIFT-IN (they correspoind to: # sh-sh and shch-shch in KOI7). Of course, if they appear in your KOI7 # text you need to use something else. To use these characters, you need to # make following changes in the section of this file where input SHIFT-OUT/IN # sequences are read in: # 1) uncomment (i.e., delete # character in the first column): # # "{{" "" "" "" "}}" "" # Russian letters {{...}} # 2) comment (i.e., put # in the first column) # "\0x0E" "" "" "" "\0x0F" "" # Russian letters ^N...^O # To be used with translit.c program by Jan Labanowski 1 file version number " " # string delimiters [ ] # list delimites { } # regular expression delimiters #starting sequence "" #ending sequence "" 2 # number of input SHIFT sequences, only one set of input characters "" "" "" "" "" "" # Latin characters "\0x0E" "" "" "" "\0x0F" "" # Russian letters ^N...^O # "{{" "" "" "" "}}" "" # Russian letters {{...}} 0 # number of output SHIFT sequences, only one set of output characters # conversion table # ASCII characters (set 1) no-change # Russian letters have to be entered explicitly (we could use a simple # range like: # 2 [#$"@-~] 0 [\0xA3\0xB3\0xFF\0xC0-\0xFE] # but then we would miss the nice table for the humans). # inp_set_numb inp_seq out_set_numb out_seq 1 [\0x21-\0x7F] 0 [\0x21-\0x7F] #Pass ASCII, no change # Leave the " as a quote if it is at the beginning or end of the word # Change it to hard sign only if it is inside the word 2 {([][\0x7d{A-Za-z\|~@'/])\0x22([][\0x7d{A-Za-z\|~@'/])} -1 {\1\0xFF\2} # hard sign 2 "#" 0 "\0xA3" #small yo 2 "$" 0 "\0xB3" #capital YO 2 "a" 0 "\0xE1" #capital A 2 "b" 0 "\0xE2" #capital Be 2 "w" 0 "\0xF7" #capital Ve 2 "g" 0 "\0xE7" #capital Ghe 2 "d" 0 "\0xE4" #capital De 2 "e" 0 "\0xE5" #capital Ie 2 "v" 0 "\0xF6" #capital Zhe 2 "z" 0 "\0xFA" #capital Ze 2 "i" 0 "\0xE9" #capital I 2 "j" 0 "\0xEA" #capital short I 2 "k" 0 "\0xEB" #capital Ka 2 "l" 0 "\0xEC" #capital El 2 "m" 0 "\0xED" #capital Em 2 "n" 0 "\0xEE" #capital En 2 "o" 0 "\0xEF" #capital O 2 "p" 0 "\0xF0" #capital Pe 2 "r" 0 "\0xF2" #capital Er 2 "s" 0 "\0xF3" #capital Es 2 "t" 0 "\0xF4" #capital Te 2 "u" 0 "\0xF5" #capital U 2 "f" 0 "\0xE6" #capital Ef 2 "h" 0 "\0xE8" #capital Kha 2 "c" 0 "\0xE3" #capital Tse 2 "~" 0 "\0xFE" #capital Che 2 "{" 0 "\0xFB" #capital Sha 2 "}" 0 "\0xFD" #capital Shcha 2 "y" 0 "\0xF9" #capital Y (Iery) 2 "x" 0 "\0xF8" #capit soft sign(Ierik) 2 "|" 0 "\0xFC" #capit reverse rounded E 2 "`" 0 "\0xE0" #capital Yu 2 "q" 0 "\0xF1" #capital Ya 2 "A" 0 "\0xC1" #small a 2 "B" 0 "\0xC2" #small be 2 "W" 0 "\0xD7" #small ve 2 "G" 0 "\0xC7" #small ghe 2 "D" 0 "\0xC4" #small de 2 "E" 0 "\0xC5" #small ie 2 "V" 0 "\0xD6" #small zhe 2 "Z" 0 "\0xDA" #small z 2 "I" 0 "\0xC9" #small i 2 "J" 0 "\0xCA" #small short i 2 "K" 0 "\0xCB" #small ka 2 "L" 0 "\0xCC" #small el 2 "M" 0 "\0xCD" #small em 2 "N" 0 "\0xCE" #small en 2 "O" 0 "\0xCF" #small o 2 "P" 0 "\0xD0" #small pe 2 "R" 0 "\0xD2" #small er 2 "S" 0 "\0xD3" #small es 2 "T" 0 "\0xD4" #small te 2 "U" 0 "\0xD5" #small u 2 "F" 0 "\0xC6" #small ef 2 "H" 0 "\0xC8" #small kha 2 "C" 0 "\0xC3" #small tse 2 "^" 0 "\0xDE" #small che 2 "[" 0 "\0xDB" #small sha 2 "]" 0 "\0xDD" #small shcha 2 "_" 0 "\0xDF" #small hard sign (ier) 2 "Y" 0 "\0xD9" #small y (iery) 2 "X" 0 "\0xD8" #small soft sign (ierik) 2 "\" 0 "\0xDC" #small rev rounded e 2 "@" 0 "\0xC0" #small yu 2 "Q" 0 "\0xD1" #small ya