|
Afar | aa |
|
Afar (Djibouti)
| aa-DJ |
|
Afar (Dominica)
| aa-DM |
|
Afar (Eritrea)
| aa-ER |
|
Afar (Ethiopia)
| aa-ET |
Metadata |
---|
Tokenization: | c-441 |
|
|
Abkhazian | ab |
Metadata |
---|
Tokenization: | L-1 | Punctuation: | –‐ | Letter: | ЏАБВГДЕЖЗИКЛМНОПРСТУФХЦЧШЫЬабвгдежзиклмнопрстуфхцчшыьџҔҕҚқҞҟҦҧҨҩҬҭҲҳҴҵҶҷҼҽҾҿӘәӠӡӶӷ |
|
|
Achinese | ace |
Metadata |
---|
Punctuation: | ‐“” | Letter: | ÈÉËÔÖèéëôö | Mark: | ̀́̂̈ |
|
Achinese {Arab} (Indonesia)
| ace-Arab-ID |
|
Achinese {Latn} (Indonesia)
| ace-Latn-ID |
|
|
Acoli | ach |
Metadata |
---|
Tokenization: | L-349 |
|
Acoli (Uganda)
| ach-UG |
Metadata |
---|
Tokenization: | c-440 |
|
|
Mesopotamian Arabic | acm |
Metadata |
---|
Tokenization: | L-730 |
|
|
Achuar-Shiwiar | acu |
Metadata |
---|
Punctuation: | ¿ | Letter: | úáÚÁ | Mark: | ́ |
|
|
Adangme | ada |
Metadata |
---|
Letter: | íÍƆƐɔɛ | Mark: | ́ |
|
|
Adyghe, Adygei | ady |
Metadata |
---|
Letter: | ЁАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёӏӀ | Mark: | ̆̈ |
|
|
Avestan | ae |
Metadata |
---|
Tokenization: | L-31 |
|
|
Afrikaans | af |
Metadata |
---|
Tokenization: | L-3 | Punctuation: | §‐–—…‘’“”†‡′″‰ | Letter: | áâéèêëîïôöûÁÂÉÈÊËÎÏÔÖÛ | Mark: | ́̂̀̈ |
|
Afrikaans (Namibia)
| af-NA |
|
Afrikaans (South Africa)
| af-ZA |
|
|
Aghem | agq |
Metadata |
---|
Punctuation: | ‰ | Letter: | àâèêìîòôùûÀÂÈÊÌÎÒÔÙÛǎǐǒǔǍƐǏƗǑƆǓɄāěēīŋōūĀĚĒĪŊŌŪɛɨɔʉʔ | Mark: | ̀̂̌̄ |
|
Aghem (Cameroon)
| agq-CM |
|
|
Aguaruna | agr |
Metadata |
---|
Punctuation: | ¡¿‐ | Letter: | áíÁÍ | Mark: | ́ |
|
|
Assyrian Neo-Aramaic | aii |
Metadata |
---|
Punctuation: | ،܆܇؛.؟ | Letter: | ܐܝܘܦܒܬܛܕܟܓܩܣܨܙܫܚܥܗܡܢܪܠ | Mark: | ܼܹܸ̰̮݂̱݈̣ܿܲܵ݁̃̄݇̈̇݀ |
|
|
Aja (Benin) | ajg |
Metadata |
---|
Letter: | úóòùàèéìíõáÚÓÒÙÀÈÉÌÍÕÁƆƉƐƷŋŊɔɖɛʒ | Mark: | ̀́̃ |
|
|
Akan | ak |
Metadata |
---|
Tokenization: | L-5 | Punctuation: | ‰ | Letter: | ɛɔƐƆ |
|
Akan (Ghana)
| ak-GH |
Metadata |
---|
Tokenization: | c-442 |
|
|
Tosk Albanian | als |
Metadata |
---|
Punctuation: | «»§‐–—…‘’“”′″‰ | Letter: | çëÇË | Mark: | ̧̈ |
|
|
Southern Altai | alt |
Metadata |
---|
Punctuation: | ‐ | Letter: | кижнҥтапэрешдлцязыгьйсмбјчӱоуӧвщюъфхКИЖНҤТАПЭРЕШДЛЦЯЗЫГЬЙСМБЈЧӰОУӦВЩЮЪФХ |
|
|
Amharic | am |
Metadata |
---|
Tokenization: | L-8 | Punctuation: | ፡፣፤፥፦።‐–‹›«» | Letter: | ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖሗመሙሚማሜምሞሟሠሡሢሣሤሥሦሧረሩሪራሬርሮሯሰሱሲሳሴስሶሷሸሹሺሻሼሽሾሿቀቁቂቃቄቅቆቈቊቋቌቍበቡቢባቤብቦቧቨቩቪቫቬቭቮቯተቱቲታቴትቶቷቸቹቺቻቼችቾቿኀኁኂኃኄኅኆኈኊኋኌኍነኑኒናኔንኖኗኘኙኚኛኜኝኞኟአኡኢኣኤእኦኧከኩኪካኬክኮኰኲኳኴኵኸኹኺኻኼኽኾወዉዊዋዌውዎዐዑዒዓዔዕዖዘዙዚዛዜዝዞዟዠዡዢዣዤዥዦዧየዩዪያዬይዮደዱዲዳዴድዶዷጀጁጂጃጄጅጆጇገጉጊጋጌግጎጐጒጓጔጕጠጡጢጣጤጥጦጧጨጩጪጫጬጭጮጯጰጱጲጳጴጵጶጷጸጹጺጻጼጽጾጿፀፁፂፃፄፅፆፈፉፊፋፌፍፎፏፐፑፒፓፔፕፖፗ | Number: | ፩፪፫፬፭፮፯፰፱፲፳፴ |
|
Amharic (Ethiopia)
| am-ET |
Metadata |
---|
Tokenization: | c-443 |
|
|
Amahuaca | amc |
|
|
Yanesha' | ame |
Metadata |
---|
Letter: | ñëóíãõáÑËÓÍÃÕÁẽẼʼ | Mark: | ̃̈́ |
|
|
Amis | ami |
|
|
Amarakaeri | amr |
Metadata |
---|
Punctuation: | ¿’ | Mark: | ̱ |
|
|
Aragonese | an |
Metadata |
---|
Tokenization: | L-26 |
|
Aragonese (Spain)
| an-ES |
Metadata |
---|
Tokenization: | c-444 |
|
|
Sudanese Arabic | apd |
Metadata |
---|
Tokenization: | L-719 |
|
Sudanese Arabic (Sudan)
| apd-SD |
Metadata |
---|
Tokenization: | c-749 |
|
Sudanese Arabic {Latn} (Sudan)
| apd-Latn-SD |
Metadata |
---|
Tokenization: | c-709 |
|
|
Arabic | ar |
|
Arabic {Latn}
| ar-Latn |
Metadata |
---|
Tokenization: | L-753 |
|
Arabic (United Arab Emirates)
| ar-AE |
Metadata |
---|
Tokenization: | c-10 |
|
Arabic {Arab} (United Arab Emirates)
| ar-Arab-AE |
|
Arabic (Bahrain)
| ar-BH |
Metadata |
---|
Tokenization: | c-11 |
|
Arabic (Algeria)
| ar-DZ |
Metadata |
---|
Tokenization: | c-12 |
|
Arabic (Egypt)
| ar-EG |
Metadata |
---|
Tokenization: | c-13 |
|
Arabic (Western Sahara)
| ar-EH |
Metadata |
---|
Tokenization: | c-710 |
|
Arabic (Israel)
| ar-IL |
Metadata |
---|
Tokenization: | c-648 |
|
Arabic {Arab} (International)
| ar-Arab-INT |
|
Arabic {Latn} (International)
| ar-Latn-INT |
|
Arabic (Iraq)
| ar-IQ |
Metadata |
---|
Tokenization: | c-14 |
|
Arabic (Jordan)
| ar-JO |
Metadata |
---|
Tokenization: | c-15 |
|
Arabic (Comoros)
| ar-KM |
|
Arabic (Kuwait)
| ar-KW |
Metadata |
---|
Tokenization: | c-16 |
|
Arabic (Lebanon)
| ar-LB |
Metadata |
---|
Tokenization: | c-17 |
|
Arabic (Levant)
| ar-LEV |
Metadata |
---|
Tokenization: | c-778 |
|
Arabic (Libya)
| ar-LY |
Metadata |
---|
Tokenization: | c-18 |
|
Arabic (Morocco)
| ar-MA |
Metadata |
---|
Tokenization: | c-19 |
|
Arabic (Maghreb)
| ar-MGB |
Metadata |
---|
Tokenization: | c-779 |
|
Arabic (Mauritania)
| ar-MR |
Metadata |
---|
Tokenization: | c-711 |
|
Arabic (Oman)
| ar-OM |
Metadata |
---|
Tokenization: | c-20 |
|
Arabic (Palestine)
| ar-PS |
Metadata |
---|
Tokenization: | c-712 |
|
Arabic (Qatar)
| ar-QA |
Metadata |
---|
Tokenization: | c-21 |
|
Arabic (Saudi Arabia)
| ar-SA |
Metadata |
---|
Tokenization: | c-22 |
|
Arabic (Sudan)
| ar-SD |
Metadata |
---|
Tokenization: | c-649 |
|
Arabic (Somalia)
| ar-SO |
|
Arabic (South Sudan)
| ar-SS |
|
Arabic (Syria)
| ar-SY |
Metadata |
---|
Tokenization: | c-23 |
|
Arabic (Chad)
| ar-TD |
Metadata |
---|
Tokenization: | c-713 |
|
Arabic (Tunisia)
| ar-TN |
Metadata |
---|
Tokenization: | c-24 |
|
Arabic (Yemen)
| ar-YE |
Metadata |
---|
Tokenization: | c-25 |
|
|
Standard Arabic | arb |
Metadata |
---|
Tokenization: | L-599 | Punctuation: | ؉،؛؟٪٫٬‐–—…‰«» | Letter: | ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي | Mark: | ًٌٍَُِّْٰٕٓٔ | Number: | ١٢٣٤٥٦٧٨٩ |
|
|
Arabela | arl |
Metadata |
---|
Punctuation: | ¿ | Letter: | úÚ | Mark: | ́ |
|
|
Mapudungun, Mapuche | arn |
Metadata |
---|
Tokenization: | L-400 | Letter: | ñáíóÑÁÍÓ | Mark: | ̃́ |
|
Mapudungun, Mapuche (Chile)
| arn-CL |
|
|
Najdi Arabic | ars |
Metadata |
---|
Tokenization: | L-755 |
|
Najdi Arabic (Saudi Arabia)
| ars-SA |
Metadata |
---|
Tokenization: | c-828 |
|
|
Assamese | as |
Metadata |
---|
Tokenization: | L-29 | Punctuation: | ‰ | Letter: | অআইঈউঊঋএঐওঔকখগঘঙচছজঝঞটঠডঢণতথদধনপফবভমযৰলৱশষসহ | Mark: | ়ংঁঃ্ািীুূৃেৈোৌৗ | Number: | ১২৩৪৫৬৭৮৯ |
|
Assamese (India)
| as-IN |
Metadata |
---|
Tokenization: | c-445 |
|
|
Asu (Tanzania) | asa |
|
Asu (Tanzania) (Tanzania, United Republic of)
| asa-TZ |
|
|
American Sign Language | ase |
Metadata |
---|
Tokenization: | L-582 |
|
|
Asturian, Asturleonese, Bable, Leonese | ast |
Metadata |
---|
Tokenization: | L-352 | Punctuation: | ¡¿«»§‐–—…‘’“”†‡′″‰ | Letter: | áéíñóúüÁÉÍÑÓÚÜḥḷḤḶ | Mark: | ̣́̃̈ |
|
Asturian, Asturleonese, Bable, Leonese (Spain)
| ast-ES |
Metadata |
---|
Tokenization: | c-446 |
|
|
Waorani | auc |
Metadata |
---|
Letter: | ñíéóÑÍÉÓ | Mark: | ̃́ |
|
|
Avaric | av |
Metadata |
---|
Tokenization: | L-30 |
|
Avaric (Russia)
| av-RU |
Metadata |
---|
Tokenization: | c-448 |
|
|
Aymara | ay |
Metadata |
---|
Tokenization: | L-32 |
|
Aymara (Bolivia)
| ay-BO |
Metadata |
---|
Tokenization: | c-449 |
|
|
North Mesopotamian Arabic | ayp |
Metadata |
---|
Tokenization: | L-731 |
|
|
Central Aymara | ayr |
Metadata |
---|
Letter: | ñïäíáëúÑÏÄÍÁËÚ | Mark: | ̃̈́ |
|
|
Azerbaijani | az |
Metadata |
---|
Tokenization: | L-33 |
|
Azerbaijani {Cyrl} (Azerbaijan)
| az-Cyrl-AZ |
Metadata |
---|
Tokenization: | c-35 |
|
Azerbaijani {Latn} (Azerbaijan)
| az-Latn-AZ |
Metadata |
---|
Tokenization: | c-34 |
|
|
South Azerbaijani | azb |
Metadata |
---|
Tokenization: | L-711 | Letter: | آؤئاتثجحخدذرزسشصضطظعغفقلمنهويٮپچژکگۆۇیەݣ | Mark: | َْٓٔ |
|
South Azerbaijani {Arab} (Iran)
| azb-Arab-IR |
Metadata |
---|
Tokenization: | c-450 |
|
|
North Azerbaijani | azj |
Metadata |
---|
Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | çöüÇÖÜƏğışĞŞİə | Mark: | ̧̇̆̈ |
|
North Azerbaijani {Cyrl}
| azj-Cyrl |
Metadata |
---|
Punctuation: | ‐–—…‘’“”†‡′″‰§ | Letter: | аәбвгғдежзийјкҝлмноөпрстуүфхһчҹшыАӘБВГҒДЕЖЗИЙЈКҜЛМНОӨПРСТУҮФХҺЧҸШЫ | Mark: | ̆ |
|
|
Bashkir | ba |
Metadata |
---|
Tokenization: | L-37 |
|
Bashkir (Russia)
| ba-RU |
Metadata |
---|
Tokenization: | c-454 |
|
|
Baluchi | bal |
Metadata |
---|
Tokenization: | L-355 |
|
Baluchi (Iran)
| bal-IR |
Metadata |
---|
Tokenization: | c-452 |
|
|
Balinese | ban |
Metadata |
---|
Tokenization: | L-354 |
|
Balinese {Bali}
| ban-Bali |
Metadata |
---|
Punctuation: | ᭞᭟᭚᭛᭜᭝᭠ | Letter: | ᬅᬆᬇᬈᬉᬊᬋᬌᬍᬎᬏᬐᬑᬒᬓᬔᬕᬖᬗᬘᬙᬚᬛᬜᬝᬞᬟᬠᬡᬢᬣᬤᬥᬦᬧᬨᬩᬪᬫᬬᬭᬮᬯᬰᬱᬲᬳ | Mark: | ᬂᬃᬄ᬴ᬵᬶᬷᬸᬹᬺᬻᬼᬽᬾᬿᭀᭁᭂᭃ᭄ | Number: | ᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙ |
|
Balinese (Indonesia)
| ban-ID |
Metadata |
---|
Tokenization: | c-451 |
|
|
Basaa | bas |
Metadata |
---|
Letter: | áàâéèêíìîóòôúùûÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛǎǐǹǒǔǍƁƐǏǸǑƆǓāěēīńŋōūĀĚĒĪŃŊŌŪɓɛɔ | Mark: | ᷆᷇́̀̂̌̄ |
|
Basaa (Cameroon)
| bas-CM |
|
|
Bamun | bax |
Metadata |
---|
Punctuation: | ‘’ | Letter: | úéêüûâôîáèùàÚÉÊÜÛÂÔÎÁÈÙÀṅṄ | Mark: | ́̂̈̀̇ |
|
Bamun {Bamu}
| bax-Bamu |
Metadata |
---|
Punctuation: | ꛲꛳꛴꛵꛶꛷ | Letter: | ꚠꚡꚢꚣꚤꚥꚦꚧꚨꚩꚪꚫꚬꚭꚮꚯꚰꚱꚲꚳꚴꚵꚶꚷꚸꚹꚺꚻꚼꚽꚾꚿꛀꛁꛂꛃꛄꛅꛆꛇꛈꛉꛊꛋꛌꛍꛎꛏꛐꛑꛒꛓꛔꛕꛖꛗꛘꛙꛚꛛꛜꛝꛞꛟꛠꛡꛢꛣꛤꛥꛦꛧꛨꛩꛪꛫꛬꛭꛮꛯ | Mark: | ꛰꛱ |
|
|
Baatonum | bba |
Metadata |
---|
Letter: | àéùèóÀÉÙÈÓǹƐƆǸɛɔ | Mark: | ̀́ |
|
|
Central Bikol | bcl |
|
|
Belarusian | be |
Metadata |
---|
Tokenization: | L-40 | Punctuation: | ‐«» | Letter: | абвгджзеёійклмнопрстуўфхцчшыьэюяиАБВГДЖЗЕЁІЙКЛМНОПРСТУЎФХЦЧШЫЬЭЮЯИʼ | Mark: | ̈̆ |
|
Belarusian (Belarus)
| be-BY |
Metadata |
---|
Tokenization: | c-41 |
|
|
Bemba (Zambia) | bem |
Metadata |
---|
Tokenization: | L-721 |
|
Bemba (Zambia) (Zambia)
| bem-ZM |
Metadata |
---|
Tokenization: | c-757 |
|
|
Berber languages | ber |
Metadata |
---|
Tokenization: | L-425 |
|
|
Bena (Tanzania) | bez |
|
|
Malba Birifor | bfo |
Metadata |
---|
Tokenization: | L-399 |
|
Malba Birifor (Burkina Faso)
| bfo-BF |
Metadata |
---|
Tokenization: | c-508 |
|
|
Bulgarian | bg |
Metadata |
---|
Tokenization: | L-48 | Punctuation: | ‐–—…‘‚“„″§ | Letter: | абвгдежзийклмнопрстуфхцчшщъьюяАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЬЮЯ | Mark: | ̆ |
|
Bulgarian (Bulgaria)
| bg-BG |
Metadata |
---|
Tokenization: | c-49 |
|
Bulgarian {Latn} (Bulgaria)
| bg-Latn-BG |
Metadata |
---|
Tokenization: | c-816 |
|
|
Haryanvi | bgc |
Metadata |
---|
Tokenization: | L-577 |
|
Haryanvi (India)
| bgc-IN |
Metadata |
---|
Tokenization: | c-578 |
|
|
Bihari languages | bh |
Metadata |
---|
Tokenization: | L-43 |
|
Bihari languages (India)
| bh-IN |
Metadata |
---|
Tokenization: | c-714 |
|
|
Bhojpuri | bho |
Metadata |
---|
Tokenization: | L-723 | Punctuation: | । | Letter: | मनवधकरखतसयषटउचबहलघणपगठदभअएआओथशजडइछऔफढईझऐञ | Mark: | ािंु्ेोी़ूौृै |
|
Bhojpuri {Deva} (India)
| bho-Deva-IN |
Metadata |
---|
Tokenization: | c-763 |
|
Bhojpuri {Deva} (Nepal)
| bho-Deva-NP |
Metadata |
---|
Tokenization: | c-764 |
|
|
Bislama | bi |
Metadata |
---|
Tokenization: | L-44 |
|
Bislama (Vanuatu)
| bi-VU |
Metadata |
---|
Tokenization: | c-456 |
|
|
Bikol | bik |
|
|
Bini, Edo | bin |
Metadata |
---|
Letter: | ÀÁÈÉÌÍÒÓÙÚàáèéìíòóùúẸẹỌọ | Mark: | ̣̀́ |
|
Bini, Edo (Nigeria)
| bin-NG |
|
|
Southern Birifor | biv |
Metadata |
---|
Tokenization: | L-421 |
|
Southern Birifor (Ghana)
| biv-GH |
Metadata |
---|
Tokenization: | c-538 |
|
|
Banjarese | bjn |
|
Banjarese {Arab} (Indonesia)
| bjn-Arab-ID |
|
Banjarese {Latn} (Indonesia)
| bjn-Latn-ID |
|
|
Buhid | bku |
Metadata |
---|
Punctuation: | ᜵᜶ | Letter: | ᝉᝆᝃᝊᝇᝄᝐᝑᝋᝈᝅᝏᝍᝎᝌᝀᝁᝂ | Mark: | ᝒᝓ |
|
Buhid {Buhd}
| bku-Buhd |
Metadata |
---|
Punctuation: | ᜵᜶ | Letter: | ᝉᝆᝃᝊᝇᝄᝐᝑᝋᝈᝅᝏᝍᝎᝌᝀᝁᝂ | Mark: | ᝒᝓ |
|
|
Tai Dam | blt |
Metadata |
---|
Letter: | ꪀꪁꪂꪃꪄꪅꪆꪇꪈꪉꪊꪋꪌꪍꪎꪏꪐꪑꪒꪓꪔꪕꪖꪗꪘꪙꪚꪛꪜꪝꪞꪟꪠꪡꪢꪣꪤꪥꪦꪧꪨꪩꪪꪫꪬꪭꪮꪯꪱꪵꪶꪹꪺꪻꪼꪽꫀꫂꫛꫜꫝ | Mark: | ꪴꪰꪲꪳꪷꪸꪾ꪿꫁ |
|
|
Bambara | bm |
Metadata |
---|
Tokenization: | L-36 | Punctuation: | ’ | Letter: | ƐƝƆŋŊɛɲɔ |
|
Bambara (Mali)
| bm-ML |
Metadata |
---|
Tokenization: | c-453 |
|
Bambara {Latn} (Mali)
| bm-Latn-ML |
|
|
Bengali | bn |
Metadata |
---|
Tokenization: | L-42 | Punctuation: | ।॥“”‘’ | Letter: | অআইঈউঊঋএঐওঔকষখগঘঙচছজঝঞটঠডঢণতৎথদধনপফবভমযরলশসহঽ | Mark: | ়ংঃঁ্ািীুূৃেৈোৌৗ | Number: | ১২৩৪৫৬৭৮৯০ |
|
Bengali (Bangladesh)
| bn-BD |
Metadata |
---|
Tokenization: | c-356 |
|
Bengali (India)
| bn-IN |
Metadata |
---|
Tokenization: | c-455 |
|
Bengali {Latn} (India)
| bn-Latn-IN |
Metadata |
---|
Tokenization: | c-821 |
|
|
Tibetan | bo |
Metadata |
---|
Tokenization: | L-69 | Punctuation: | ༄༅༈་༌།༎ | Letter: | ཀཁགངཅཆཇཉཊཋཌཎཏཐདནཔཕབམཙཚཛཝཞཟའཡརལཤཥསཧཨཪ | Mark: | ིེོུྐྑྒྔྕྖྗྙྚྛྜྞྟྠྡྣྤྥྦྨྩྪྫྭྮྯྰྱྲླྴྵྶྷྸྺྻྼ | Number: | ༡༢༣༤༥༦༧༨༩ |
|
Tibetan (China)
| bo-CN |
Metadata |
---|
Tokenization: | c-550 |
|
Tibetan (India)
| bo-IN |
|
|
Bora | boa |
Metadata |
---|
Letter: | úáéñíóÚÁÉÑÍÓɨȉƗȈ | Mark: | ́̃̏ |
|
|
Breton | br |
Metadata |
---|
Tokenization: | L-47 | Punctuation: | ’– | Letter: | êñùÊÑÙʼ | Mark: | ̂̃̀ |
|
Breton (France)
| br-FR |
Metadata |
---|
Tokenization: | c-457 |
|
|
Bodo (India) | brx |
Metadata |
---|
Tokenization: | L-726 | Letter: | अआइईउऊऍएऐऑओऔकखगघचछजझञटठडढणतथदधनपफबभमयरलळवशषसह | Mark: | ़ँंािीुूृॅेैॉोौ् |
|
Bodo (India) {Deva} (India)
| brx-Deva-IN |
Metadata |
---|
Tokenization: | c-773 |
|
|
Bosnian | bs |
Metadata |
---|
Tokenization: | L-45 | Punctuation: | ‐–—…‘’“”′″ | Letter: | čćžđšČĆŽĐŠ | Mark: | ̌́ |
|
Bosnian {Cyrl}
| bs-Cyrl |
Metadata |
---|
Punctuation: | ‐–—…‘’“”′″ | Letter: | абвгдђежзијклљмнњопрстћуфхцчџшАБВГДЂЕЖЗИЈКЛЉМНЊОПРСТЋУФХЦЧЏШ |
|
Bosnian (Bosnia and Herzegovina)
| bs-BA |
|
Bosnian {Cyrl} (Bosnia and Herzegovina)
| bs-Cyrl-BA |
Metadata |
---|
Tokenization: | c-357 |
|
Bosnian {Latn} (Bosnia and Herzegovina)
| bs-Latn-BA |
Metadata |
---|
Tokenization: | c-358 |
|
|
Bassa | bsq |
Metadata |
---|
Letter: | ɓɔɖɛḾḿṸṹĒēĚěĨĩŃńŨũŪūƁƆƉƐǍǎǏǐǑǒǓǔǸǹÀÁÃÈÉÌÍÒÓÙÚàáãèéìíòóùú | Mark: | ̀́̃̄̌ |
|
Bassa {Bass}
| bsq-Bass |
Metadata |
---|
Punctuation: | ֫բ | Letter: | ֫Ͱ֫ʰ֫ɰ֫˰̰֫֫Ȱ֫ǰ֫Ű֫°֫p֫װ֫ްٰ֫߰֫֫Ѱְ֫֫ذ֫ݰ֫Ӱ֫ð֫Ұ֫ܰ֫ڰ֫0֫İ֫֫а֫հ֫۰֫Ƣ | Mark: | ֫а֫Ѱ֫Ұ֫Ӱ֫Ԣ |
|
|
Buriat | bua |
Metadata |
---|
Tokenization: | L-612 |
|
|
Bushi | buc |
Metadata |
---|
Punctuation: | ’ | Letter: | ìàãÌÀÃɓŋĩŊĨƁɗƊ | Mark: | ̀̃ |
|
|
Buginese | bug |
|
Buginese {Bugi}
| bug-Bugi |
Metadata |
---|
Punctuation: | ᨞᨟ | Letter: | ᨀᨁᨂᨄᨅᨆᨈᨉᨊᨌᨍᨎᨐᨑᨒᨓᨔᨖᨃᨏᨋᨇᨕ | Mark: | ᨘᨗᨙᨚᨛ |
|
|
Bulu (Cameroon) | bum |
Metadata |
---|
Letter: | óñôéáÓÑÔÉÁōńŌŃ | Mark: | ̄́̃̂ |
|
|
Bukusu, Lubukusu | bxk |
|
|
Bilin, Blin | byn |
|
Bilin, Blin (Eritrea)
| byn-ER |
|
|
Catalan, Valencian | ca |
Metadata |
---|
Tokenization: | L-51 | Punctuation: | ·¡¿«»§‐–—…‘’“”†‡′″ | Letter: | àçéèíïóòúüÀÇÉÈÍÏÓÒÚÜ | Mark: | ̧̀́̈ |
|
Catalan, Valencian (Andorra)
| ca-AD |
Metadata |
---|
Tokenization: | c-715 |
|
Catalan, Valencian (Spain)
| ca-ES |
Metadata |
---|
Tokenization: | c-52 |
|
Catalan, Valencian (Spain; Valenciana, Comunidad)
| ca-ES-VC |
|
Catalan, Valencian (France)
| ca-FR |
|
Catalan, Valencian (International; Valenciana, Comunidad)
| ca-INT-VC |
|
Catalan, Valencian (Italy)
| ca-IT |
|
|
Garifuna | cab |
Metadata |
---|
Letter: | üúñáéíèóÜÚÑÁÉÍÈÓ | Mark: | ̈́̃̀ |
|
|
Kaqchikel, Cakchiquel | cak |
Metadata |
---|
Letter: | äïöüÄÏÖÜ | Mark: | ̈ |
|
|
Carolinian | cal |
Metadata |
---|
Tokenization: | L-712 |
|
|
Chachi | cbi |
Metadata |
---|
Punctuation: | ¿¡ | Letter: | ñóúáíéÑÓÚÁÍÉ | Mark: | ̃́ |
|
|
Chavacano | cbk |
|
|
Cashibo-Cacataibo | cbr |
Metadata |
---|
Punctuation: | ¿ | Letter: | ñëúíáéóÑËÚÍÁÉÓ | Mark: | ́̃̈́ |
|
|
Cashinahua | cbs |
Metadata |
---|
Punctuation: | ¿ | Letter: | íÍ | Mark: | ́ |
|
|
Chayahuita | cbt |
Metadata |
---|
Punctuation: | ¿ | Letter: | ëóíËÓÍ | Mark: | ̈́ |
|
|
Candoshi-Shapra | cbu |
Metadata |
---|
Punctuation: | ¿¡ | Letter: | íáÍÁ | Mark: | ́ |
|
|
Chakma | ccp |
Metadata |
---|
Punctuation: | хpхðх°х"0‐–—…‘’“”†‡′″§ | Letter: | ф߰фڰфðфǰфŰф°фݰфŰфɰф̰фٰфpф0фְфѰфΰфذфưфðф۰фϰфȰфʰфܰфӰф˰фװфްфͰфհфҰфİфưфаффĢ | Mark: | ф̰фͰффǰфpфΰфӰфȰфɰфʰфѰфаф˰фϰфҰф0ф¢ | Number: | ০১২৩৪৫৬৭৮৯фװфذфٰфڰф۰фܰфݰфްф߰ф |
|
|
Chechen | ce |
Metadata |
---|
Tokenization: | L-54 | Punctuation: | ‐–—…‘‚“„«»§ | Letter: | аьбвгӏдеёжзийкхлмнопрстуфцчшщъыэюяАЬБВГӀДЕЁЖЗИЙКХЛМНОПРСТУФЦЧШЩЪЫЭЮЯ | Mark: | ̈̆ |
|
Chechen (Russia)
| ce-RU |
|
Chechen {Cyrl} (Russia)
| ce-Cyrl-RU |
Metadata |
---|
Tokenization: | c-461 |
|
Chechen {Latn} (Russia)
| ce-Latn-RU |
Metadata |
---|
Tokenization: | c-462 |
|
|
Cebuano | ceb |
Metadata |
---|
Tokenization: | L-359 |
|
Cebuano (Philippines)
| ceb-PH |
Metadata |
---|
Tokenization: | c-459 |
|
Cebuano {Latn} (Philippines)
| ceb-Latn-PH |
|
|
Falam Chin | cfm |
|
Falam Chin (Myanmar)
| cfm-MM |
|
|
Chiga | cgg |
|
Chiga (Uganda)
| cgg-UG |
|
|
Chamorro | ch |
Metadata |
---|
Tokenization: | L-53 | Letter: | ÅÑåñ | Mark: | ̃̊ |
|
|
Ojitlán Chinantec | chj |
Metadata |
---|
Punctuation: | – | Letter: | öíäñáéúïüëóÖÍÄÑÁÉÚÏÜËÓ | Mark: | ̈́̃ |
|
|
Chuukese | chk |
Metadata |
---|
Tokenization: | L-613 |
|
|
Mari (Russia) | chm |
|
|
Cherokee | chr |
Metadata |
---|
Tokenization: | L-432 | Letter: | ᏸᏹᏺᏻᏼᎠᎡᎢᎣᎤᎥᎦᎧᎨᎩᎪᎫᎬᎭᎮᎯᎰᎱᎲᎳᎴᎵᎶᎷᎸᎹᎺᎻᎼᎽᎾᎿᏀᏁᏂᏃᏄᏅᏆᏇᏈᏉᏊᏋᏌᏍᏎᏏᏐᏑᏒᏓᏔᏕᏖᏗᏘᏙᏚᏛᏜᏝᏞᏟᏠᏡᏢᏣᏤᏥᏦᏧᏨᏩᏪᏫᏬᏭᏮᏯᏰᏱᏲᏳᏴꭰꭱꭲꭳꭴꭵꭶꭷꭸꭹꭺꭻꭼꭽꭾꭿꮀꮁꮂꮃꮄꮅꮆꮇꮈꮉꮊꮋꮌꮍꮎꮏꮐꮑꮒꮓꮔꮕꮖꮗꮘꮙꮚꮛꮜꮝꮞꮟꮠꮡꮢꮣꮤꮥꮦꮧꮨꮩꮪꮫꮬꮭꮮꮯꮰꮱꮲꮳꮴꮵꮶꮷꮸꮹꮺꮻꮼꮽꮾꮿ |
|
Cherokee (United States)
| chr-US |
Metadata |
---|
Tokenization: | c-463 |
|
|
Chickasaw | cic |
Metadata |
---|
Punctuation: | — | Letter: | óáíÓÁÍ | Mark: | ̱́ |
|
|
Cimbrian | cim |
Metadata |
---|
Tokenization: | L-732 |
|
Cimbrian (Italy)
| cim-IT |
Metadata |
---|
Tokenization: | c-780 |
|
|
Western Cham | cja |
Metadata |
---|
Tokenization: | L-745 |
|
Western Cham {Arab} (Cambodia)
| cja-Arab-KH |
Metadata |
---|
Tokenization: | c-797 |
|
Western Cham {Cham} (Cambodia)
| cja-Cham-KH |
Metadata |
---|
Tokenization: | c-798 |
|
|
Chokwe | cjk |
Metadata |
---|
Tokenization: | L-748 |
|
Chokwe (Angola)
| cjk-AO |
Metadata |
---|
Tokenization: | c-803 |
|
|
Shor | cjs |
Metadata |
---|
Letter: | кижнтолағыңудерцязчқшйъӱгьсмбюпӧэвфхКИЖНТОЛАҒЫҢУДЕРЦЯЗЧҚШЙЪӰГЬСМБЮПӦЭВФХЁЩщё | Mark: | ̆̈ |
|
|
Central Kurdish | ckb |
Metadata |
---|
Tokenization: | L-360 | Punctuation: | ٫٬٪؉ | Letter: | ئابپتجچحخدرزڕژسشعغفڤقکگلڵمنھەوۆیێي | Mark: | ٔ | Number: | ١٢٣٤٥٦٧٨٩ |
|
Central Kurdish {Latn}
| ckb-Latn |
Metadata |
---|
Letter: | şŞûîêçÛÎÊÇ | Mark: | ̧̂ |
|
Central Kurdish {Arab} (Iraq)
| ckb-Arab-IQ |
Metadata |
---|
Tokenization: | c-460 |
|
|
Chaldean Neo-Aramaic | cld |
Metadata |
---|
Tokenization: | L-614 |
|
|
Mandarin Chinese | cmn |
Metadata |
---|
Tokenization: | L-600 |
|
Mandarin Chinese (China)
| cmn-CN |
Metadata |
---|
Tokenization: | c-601 |
|
|
Hakha Chin, Haka Chin | cnh |
Metadata |
---|
Tokenization: | L-615 |
|
Hakha Chin, Haka Chin {Latn} (Myanmar)
| cnh-Latn-MM |
|
Hakha Chin, Haka Chin {Mymr} (Myanmar)
| cnh-Mymr-MM |
|
|
Asháninka | cni |
Metadata |
---|
Letter: | áéÁÉÑñ | Mark: | ́̃ |
|
|
Montenegrin | cnr |
Metadata |
---|
Tokenization: | L-405 |
|
Montenegrin (Montenegro)
| cnr-ME |
Metadata |
---|
Tokenization: | c-734 |
|
Montenegrin {Cyrl} (Montenegro)
| cnr-Cyrl-ME |
Metadata |
---|
Tokenization: | c-512 |
|
Montenegrin {Latn} (Montenegro)
| cnr-Latn-ME |
Metadata |
---|
Tokenization: | c-513 |
|
|
Corsican | co |
Metadata |
---|
Tokenization: | L-58 | Punctuation: | ’ | Letter: | àèìùòÀÈÌÙÒ | Mark: | ̀ |
|
Corsican (France)
| co-FR |
Metadata |
---|
Tokenization: | c-468 |
|
Corsican (Italy)
| co-IT |
|
|
Colorado | cof |
|
|
Caquinte | cot |
Metadata |
---|
Punctuation: | ¿ | Letter: | óÓ | Mark: | ́ |
|
|
Chinese Pidgin English | cpi |
Metadata |
---|
Tokenization: | L-724 |
|
|
Pichis Ashéninka | cpu |
Metadata |
---|
Letter: | ñáéÑÁÉ | Mark: | ̃́ |
|
|
Cree | cr |
Metadata |
---|
Tokenization: | L-59 |
|
|
Crimean Tatar, Crimean Turkish | crh |
|
Crimean Tatar, Crimean Turkish {Cyrl}
| crh-Cyrl |
Metadata |
---|
Tokenization: | L-547 |
|
Crimean Tatar, Crimean Turkish {Latn}
| crh-Latn |
Metadata |
---|
Tokenization: | L-548 |
|
|
Sãotomense | cri |
Metadata |
---|
Letter: | çóêéáâôºíÇÓÊÉÁÂÔÍ | Mark: | ̧́̂ |
|
|
Seselwa Creole French | crs |
Metadata |
---|
Tokenization: | L-419 | Punctuation: | ’ | Letter: | íÍ | Mark: | ́ |
|
Seselwa Creole French (Seychelles)
| crs-SC |
Metadata |
---|
Tokenization: | c-533 |
|
|
Czech | cs |
Metadata |
---|
Tokenization: | L-63 | Punctuation: | ‐–…‘‚“„§ | Letter: | áéíóúýÁÉÍÓÚÝčďěňřšťůžČĎĚŇŘŠŤŮŽ | Mark: | ́̌̊ |
|
Czech (Czech Republic)
| cs-CZ |
Metadata |
---|
Tokenization: | c-64 |
|
|
Chiltepec Chinantec | csa |
Metadata |
---|
Punctuation: | † | Letter: | öüïóáñäëéíúÖÜÏÓÁÑÄËÉÍÚ | Mark: | ̷̱̍̎̈́̃ |
|
|
Kashubian | csb |
Metadata |
---|
Tokenization: | L-386 |
|
Kashubian (Poland)
| csb-PL |
Metadata |
---|
Tokenization: | c-494 |
|
|
Swampy Cree | csw |
Metadata |
---|
Punctuation: | ᙮ | Letter: | ᐁᐢᐱᑕᑲᒥᐠᐊᑭᒋᐃᑗᐎᐣᓂᑯᓯᓇᐅᔑᒧᓀᐡᑐᑌᑎᐸᐗᐳᒪᒶᐌᔭᓄᑾᔦᒣᐤᓴᓶᔕᑴᐯᐟᑫᓱᓉᐺᑡᐨᔓᑺᓋᔗᔾᔀᑊᔡᒬᒼ |
|
|
Tedim Chin | ctd |
|
|
Old Slavonic | cu |
Metadata |
---|
Tokenization: | L-70 | Punctuation: | ꙾꙳–—‐ | Letter: | абвгдеєжѕзиіїйклмнѻоѡѽѿпрстуфхцчшщъыьѣюѧѫѯѱѳѵѷАБВГДЕЄЖЅЗИІЇЙКЛМНѺОѠѼѾПРСТУФХЦЧШЩЪЫЬѢЮѦѪѮѰѲѴѶꙿꙁꙍꙋꙗꙀꙌꙊꙖⸯ | Mark: | ҇҃ⷠⷡⷢⷣⷤⷥⷦⷧⷨⷩⷪⷬⷭⷯⷱⷴ꙽ |
|
Old Slavonic (Russia)
| cu-RU |
|
|
Chuvash | cv |
Metadata |
---|
Tokenization: | L-56 |
|
Chuvash (Russia)
| cv-RU |
Metadata |
---|
Tokenization: | c-466 |
|
|
Welsh | cy |
Metadata |
---|
Tokenization: | L-71 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | áàâäéèêëíìîïóòôöúùûüýÿÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜÝŵŷŴŶŸẃẁẅỳẂẀẄỲ | Mark: | ́̀̂̈ |
|
Welsh (United Kingdom)
| cy-GB |
Metadata |
---|
Tokenization: | c-72 |
|
|
Danish | da |
Metadata |
---|
Tokenization: | L-65 | Punctuation: | §‐–…‘’“”†′″ | Letter: | æøåÆØÅ | Mark: | ̊ |
|
Danish (Denmark)
| da-DK |
Metadata |
---|
Tokenization: | c-66 |
|
Danish (Greenland)
| da-GL |
|
|
Dagbani | dag |
Metadata |
---|
Letter: | ƐƆƔƷŋŊɛɔɣʒ’ |
|
Dagbani (Ghana)
| dag-GH |
|
|
Taita, Dawida | dav |
|
Taita, Dawida (Kenya)
| dav-KE |
|
|
Dendi (Benin) | ddn |
Metadata |
---|
Letter: | ãâõÃÂÕǎǒƆƐǍƉǑŋŊɔɛɖ | Mark: | ̃̌̂ |
|
|
German | de |
Metadata |
---|
Tokenization: | L-73 | Punctuation: | «»§‐–—…‘‚“„ | Letter: | äößüÄÖÜ | Mark: | ̈ |
|
German (Austria)
| de-AT |
Metadata |
---|
Tokenization: | c-74 |
|
German (Belgium)
| de-BE |
Metadata |
---|
Tokenization: | c-379 |
|
German (Switzerland)
| de-CH |
Metadata |
---|
Tokenization: | c-75 |
|
German (Germany)
| de-DE |
Metadata |
---|
Tokenization: | c-76 |
|
German (Liechtenstein)
| de-LI |
Metadata |
---|
Tokenization: | c-77 |
|
German (Luxembourg)
| de-LU |
Metadata |
---|
Tokenization: | c-78 |
|
German (Netherlands)
| de-NL |
Metadata |
---|
Tokenization: | c-380 |
|
|
Southern Dagaare | dga |
Metadata |
---|
Letter: | ãÃƐƆũŨɛɔ | Mark: | ̃ |
|
|
Dogri (individual language) | dgo |
Metadata |
---|
Tokenization: | L-728 |
|
Dogri (individual language) {Deva} (India)
| dgo-Deva-IN |
Metadata |
---|
Tokenization: | c-776 |
|
|
Dimli (individual language) | diq |
Metadata |
---|
Tokenization: | L-569 |
|
Dimli (individual language) (Turkey)
| diq-TR |
Metadata |
---|
Tokenization: | c-570 |
|
|
Zarma | dje |
Metadata |
---|
Letter: | ãõÃÕƝŋšžŊŠŽẽẼɲ | Mark: | ̃̌ |
|
Zarma (Niger)
| dje-NE |
|
|
Dass | dot |
Metadata |
---|
Tokenization: | L-714 |
|
|
Lower Sorbian | dsb |
Metadata |
---|
Tokenization: | L-396 | Punctuation: | «»§‐–—…‘’‚“„ | Letter: | óÓčćěłńŕšśžźČĆĚŁŃŔŠŚŽŹ | Mark: | ̌́ |
|
Lower Sorbian (Germany)
| dsb-DE |
Metadata |
---|
Tokenization: | c-502 |
|
|
Kadazan Dusun, Central Dusun | dtp |
|
|
Duala | dua |
Metadata |
---|
Letter: | áéíóúÁÉÍÓÚƁƊƐƆŋūŊŪɓɗɛɔ | Mark: | ́̄ |
|
Duala (Cameroon)
| dua-CM |
|
|
Drung | duu |
|
|
Maldivian | dv |
Metadata |
---|
Tokenization: | L-67 | Punctuation: | ،؛ | Letter: | ޑސމބރގއދޖލހޢނފކށވޙޤތޕޓޔޝޞޅޚޣޒޠޗޏޘޛޟޜޡޥޱ | Mark: | ިެްަީުާޮޭޫޯ |
|
Maldivian (India)
| dv-IN |
Metadata |
---|
Tokenization: | c-716 |
|
Maldivian (Maldives)
| dv-MV |
Metadata |
---|
Tokenization: | c-68 |
|
|
Jola-Fonyi | dyo |
Metadata |
---|
Punctuation: | “”‰ | Letter: | áéíñóúàÁÉÍÑÓÚÀŋŊ | Mark: | ́̃̀ |
|
Jola-Fonyi (Senegal)
| dyo-SN |
|
|
Dyula | dyu |
Metadata |
---|
Tokenization: | L-747 | Punctuation: | ’‘ | Letter: | úàìóáòùèíéÚÀÌÓÁÒÙÈÍÉƐƆƝŋŊɛɔɲ | Mark: | ́̀ |
|
Dyula {Arab} (Côte d'Ivoire)
| dyu-Arab-CI |
Metadata |
---|
Tokenization: | c-802 |
|
Dyula {Latn} (Côte d'Ivoire)
| dyu-Latn-CI |
Metadata |
---|
Tokenization: | c-801 |
|
Dyula {Nkoo} (Côte d'Ivoire)
| dyu-Nkoo-CI |
Metadata |
---|
Tokenization: | c-800 |
|
|
Dzongkha | dz |
Metadata |
---|
Tokenization: | L-79 | Punctuation: | ༼༽༄༅༆༈༉༊࿐࿑༒࿒࿓࿔༌།༎༏༐༑༔་§‐–—…‘’“”†‡ | Letter: | ཀཁགངཅཆཇཉཏཐདནཔཕབམཙཚཛཝཞཟའཡརལཤསཧཨ | Mark: | ིེོུྐྑྒྔྗྙྟྠྡྣྤྥྦྨྩྪྫྭྱྲླྵྶྷཱྕ | Number: | ༡༢༣༤༥༦༧༨༩༠ |
|
Dzongkha (Bhutan)
| dz-BT |
Metadata |
---|
Tokenization: | c-470 |
|
|
Embu, Kiembu | ebu |
Metadata |
---|
Letter: | ĩũĨŨ | Mark: | ̃ |
|
Embu, Kiembu (Kenya)
| ebu-KE |
|
|
Ewe | ee |
Metadata |
---|
Tokenization: | L-80 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | áàãéèíìóòõúùÁÀÃÉÈÍÌÓÒÕÚÙƒƉƐƑƔƆƲĩŋũĨŊŨẽẼɖɛɣɔʋ | Mark: | ́̀̃ |
|
Ewe (Ghana)
| ee-GH |
Metadata |
---|
Tokenization: | c-471 |
|
Ewe (Togo)
| ee-TG |
|
|
Standard Estonian | ekk |
Metadata |
---|
Letter: | õäöüÕÄÖÜšžŠŽ | Mark: | ̌̃̈ |
|
|
Greek | el |
Metadata |
---|
Tokenization: | L-81 | Punctuation: | «»§‐–—… | Letter: | ΆΈΉΊΌΎΏΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώ | Mark: | ́̈ |
|
Greek (Cyprus)
| el-CY |
Metadata |
---|
Tokenization: | c-381 |
|
Greek (Greece)
| el-GR |
Metadata |
---|
Tokenization: | c-82 |
|
Greek {Latn} (Greece)
| el-Latn-GR |
Metadata |
---|
Tokenization: | c-819 |
|
|
Eastern Maninkakan | emk |
|
Eastern Maninkakan {Latn} (Guinea)
| emk-Latn-GN |
|
|
English | en |
Metadata |
---|
Tokenization: | L-83 | Punctuation: | §‐–—…‘’“”†‡′″ |
|
English (United Arab Emirates)
| en-AE |
Metadata |
---|
Tokenization: | c-658 |
|
English (Antigua and Barbuda)
| en-AG |
|
English (Anguilla)
| en-AI |
|
English (Antarctica)
| en-AQ |
Metadata |
---|
Tokenization: | c-659 |
|
English (American Samoa)
| en-AS |
|
English (Asia)
| en-ASIA |
Metadata |
---|
Tokenization: | c-717 |
|
English (Austria)
| en-AT |
Metadata |
---|
Tokenization: | c-660 |
|
English (Australia)
| en-AU |
Metadata |
---|
Tokenization: | c-84 |
|
English (Barbados)
| en-BB |
|
English (Bangladesh)
| en-BD |
Metadata |
---|
Tokenization: | c-363 |
|
English (Belgium)
| en-BE |
Metadata |
---|
Tokenization: | c-661 |
|
English (Bulgaria)
| en-BG |
Metadata |
---|
Tokenization: | c-662 |
|
English (Bahrain)
| en-BH |
Metadata |
---|
Tokenization: | c-663 |
|
English (Burundi)
| en-BI |
|
English (Bermuda)
| en-BM |
|
English (Bahamas)
| en-BS |
|
English (Botswana)
| en-BW |
|
English (Belize)
| en-BZ |
Metadata |
---|
Tokenization: | c-85 |
|
English (Canada)
| en-CA |
Metadata |
---|
Tokenization: | c-86 |
|
English (Caribbean)
| en-CB |
Metadata |
---|
Tokenization: | c-87 |
|
English (Cocos (Keeling) Islands)
| en-CC |
|
English (Switzerland)
| en-CH |
Metadata |
---|
Tokenization: | c-675 |
|
English (Cook Islands)
| en-CK |
|
English (Cameroon)
| en-CM |
|
English (China)
| en-CN |
Metadata |
---|
Tokenization: | c-664 |
|
English (Christmas Island)
| en-CX |
|
English (Cyprus)
| en-CY |
Metadata |
---|
Tokenization: | c-665 |
|
English (Czech Republic)
| en-CZ |
Metadata |
---|
Tokenization: | c-666 |
|
English (Germany)
| en-DE |
Metadata |
---|
Tokenization: | c-667 |
|
English (Denmark)
| en-DK |
Metadata |
---|
Tokenization: | c-668 |
|
English (Estonia)
| en-EE |
Metadata |
---|
Tokenization: | c-669 |
|
English (Egypt)
| en-EG |
Metadata |
---|
Tokenization: | c-670 |
|
English (Eritrea)
| en-ER |
|
English (Finland)
| en-FI |
Metadata |
---|
Tokenization: | c-671 |
|
English (Fiji)
| en-FJ |
|
English (Falkland Islands (Malvinas))
| en-FK |
|
English (Micronesia)
| en-FM |
|
English (France)
| en-FR |
Metadata |
---|
Tokenization: | c-672 |
|
English (United Kingdom)
| en-GB |
Metadata |
---|
Tokenization: | c-88 |
|
English (Grenada)
| en-GD |
|
English (Guernsey)
| en-GG |
|
English (Ghana)
| en-GH |
|
English (Gibraltar)
| en-GI |
|
English (Gambia)
| en-GM |
|
English (Greece)
| en-GR |
Metadata |
---|
Tokenization: | c-673 |
|
English (South Georgia and the South Sandwich Islands)
| en-GS |
|
English (Guam)
| en-GU |
|
English (Guyana)
| en-GY |
|
English (Hong Kong)
| en-HK |
Metadata |
---|
Tokenization: | c-362 |
|
English (Hungary)
| en-HU |
Metadata |
---|
Tokenization: | c-674 |
|
English (Indonesia)
| en-ID |
Metadata |
---|
Tokenization: | c-365 |
|
English (Ireland)
| en-IE |
Metadata |
---|
Tokenization: | c-89 |
|
English (Israel)
| en-IL |
Metadata |
---|
Tokenization: | c-676 |
|
English (Isle of Man)
| en-IM |
|
English (India)
| en-IN |
Metadata |
---|
Tokenization: | c-364 |
|
English (International)
| en-INT |
Metadata |
---|
Tokenization: | c-366 |
|
English (British Indian Ocean Territory)
| en-IO |
|
English (Iceland)
| en-IS |
Metadata |
---|
Tokenization: | c-677 |
|
English (Italy)
| en-IT |
Metadata |
---|
Tokenization: | c-678 |
|
English (Jersey)
| en-JE |
|
English (Jamaica)
| en-JM |
Metadata |
---|
Tokenization: | c-90 |
|
English (Jordan)
| en-JO |
Metadata |
---|
Tokenization: | c-368 |
|
English (Japan)
| en-JP |
Metadata |
---|
Tokenization: | c-367 |
|
English (Kenya)
| en-KE |
Metadata |
---|
Tokenization: | c-759 |
|
English (Cambodia)
| en-KH |
Metadata |
---|
Tokenization: | c-679 |
|
English (Kiribati)
| en-KI |
|
English (Saint Kitts and Nevis)
| en-KN |
|
English (Kuwait)
| en-KW |
Metadata |
---|
Tokenization: | c-680 |
|
English (Cayman Islands)
| en-KY |
|
English (Laos)
| en-LA |
Metadata |
---|
Tokenization: | c-681 |
|
English (Lebanon)
| en-LB |
Metadata |
---|
Tokenization: | c-650 |
|
English (Saint Lucia)
| en-LC |
|
English (Sri Lanka)
| en-LK |
Metadata |
---|
Tokenization: | c-683 |
|
English (Liberia)
| en-LR |
|
English (Lesotho)
| en-LS |
|
English (Lithuania)
| en-LT |
Metadata |
---|
Tokenization: | c-684 |
|
English (Luxembourg)
| en-LU |
Metadata |
---|
Tokenization: | c-685 |
|
English (Latvia)
| en-LV |
Metadata |
---|
Tokenization: | c-686 |
|
English (Morocco)
| en-MA |
Metadata |
---|
Tokenization: | c-687 |
|
English (Madagascar)
| en-MG |
|
English (Marshall Islands)
| en-MH |
|
English (Macau)
| en-MO |
|
English (Northern Mariana Islands)
| en-MP |
|
English (Montserrat)
| en-MS |
|
English (Malta)
| en-MT |
Metadata |
---|
Tokenization: | c-651 |
|
English (Mauritius)
| en-MU |
|
English (Malawi)
| en-MW |
|
English (Malaysia)
| en-MY |
Metadata |
---|
Tokenization: | c-369 |
|
English (Namibia)
| en-NA |
|
English (Neutral)
| en-NEUTRAL |
Metadata |
---|
Tokenization: | c-718 |
|
English (Norfolk Island)
| en-NF |
|
English (Nigeria)
| en-NG |
Metadata |
---|
Tokenization: | c-689 |
|
English (Netherlands)
| en-NL |
Metadata |
---|
Tokenization: | c-690 |
|
English (Norway)
| en-NO |
Metadata |
---|
Tokenization: | c-691 |
|
English (Nauru)
| en-NR |
|
English (Niue)
| en-NU |
|
English (New Zealand)
| en-NZ |
Metadata |
---|
Tokenization: | c-91 |
|
English (Oman)
| en-OM |
Metadata |
---|
Tokenization: | c-692 |
|
English (Papua New Guinea)
| en-PG |
|
English (Philippines)
| en-PH |
Metadata |
---|
Tokenization: | c-92 |
|
English (Pirate)
| en-PI |
Metadata |
---|
Tokenization: | c-371 |
|
English (Pakistan)
| en-PK |
Metadata |
---|
Tokenization: | c-370 |
|
English (Pitcairn)
| en-PN |
|
English (Puerto Rico)
| en-PR |
Metadata |
---|
Tokenization: | c-372 |
|
English (Portugal)
| en-PT |
Metadata |
---|
Tokenization: | c-693 |
|
English (Palau)
| en-PW |
|
English (Qatar)
| en-QA |
Metadata |
---|
Tokenization: | c-694 |
|
English (Romania)
| en-RO |
Metadata |
---|
Tokenization: | c-695 |
|
English (Rwanda)
| en-RW |
|
English (Saudi Arabia)
| en-SA |
Metadata |
---|
Tokenization: | c-652 |
|
English (Solomon Islands)
| en-SB |
|
English (Seychelles)
| en-SC |
|
English (Sudan)
| en-SD |
|
English (Sweden)
| en-SE |
Metadata |
---|
Tokenization: | c-697 |
|
English (Singapore)
| en-SG |
Metadata |
---|
Tokenization: | c-373 |
|
English (Saint Helena, Ascension and Tristan da Cunha)
| en-SH |
|
English (Slovenia)
| en-SI |
Metadata |
---|
Tokenization: | c-699 |
|
English (Slovakia)
| en-SK |
Metadata |
---|
Tokenization: | c-698 |
|
English (Sierra Leone)
| en-SL |
|
English (South Sudan)
| en-SS |
|
English (Sint Maarten (Dutch part))
| en-SX |
|
English (Swaziland)
| en-SZ |
|
English (Turks and Caicos Islands)
| en-TC |
|
English (Thailand)
| en-TH |
Metadata |
---|
Tokenization: | c-700 |
|
English (Tokelau)
| en-TK |
|
English (Tonga)
| en-TO |
|
English (Trinidad and Tobago)
| en-TT |
Metadata |
---|
Tokenization: | c-93 |
|
English (Tuvalu)
| en-TV |
|
English (Taiwan)
| en-TW |
Metadata |
---|
Tokenization: | c-701 |
|
English (Tanzania, United Republic of)
| en-TZ |
|
English (Upside Down)
| en-UD |
Metadata |
---|
Tokenization: | c-374 |
|
English (Uganda)
| en-UG |
|
English (United States Minor Outlying Islands)
| en-UM |
|
English (United States)
| en-US |
Metadata |
---|
Tokenization: | c-94 |
|
English (Uruguay)
| en-UY |
Metadata |
---|
Tokenization: | c-702 |
|
English (Saint Vincent and the Grenadines)
| en-VC |
|
English (Virgin Islands, British)
| en-VG |
|
English (Virgin Islands, U.S.)
| en-VI |
|
English (Vietnam)
| en-VN |
Metadata |
---|
Tokenization: | c-703 |
|
English (Vanuatu)
| en-VU |
|
English (Samoa)
| en-WS |
|
English (South Africa)
| en-ZA |
Metadata |
---|
Tokenization: | c-95 |
|
English (Zambia)
| en-ZM |
|
English (Zimbabwe)
| en-ZW |
Metadata |
---|
Tokenization: | c-96 |
|
|
Esperanto | eo |
Metadata |
---|
Tokenization: | L-97 | Punctuation: | ‐–—…‘’“” | Letter: | ĉĝĥĵŝŭĈĜĤĴŜŬ | Mark: | ̂̆ |
|
Esperanto (International)
| eo-INT |
|
|
Spanish | es |
Metadata |
---|
Tokenization: | L-98 | Punctuation: | ‐–—…‘’“”†‡′″¡¿«»§ | Letter: | áéíïñóúüýÁÉÍÏÑÓÚÜÝ | Mark: | ́̈̃ |
|
Spanish (Andorra)
| es-AD |
Metadata |
---|
Tokenization: | c-704 |
|
Spanish (Argentina)
| es-AR |
Metadata |
---|
Tokenization: | c-99 |
|
Spanish (Bolivia)
| es-BO |
Metadata |
---|
Tokenization: | c-100 |
|
Spanish (Chile)
| es-CL |
Metadata |
---|
Tokenization: | c-101 |
|
Spanish (Colombia)
| es-CO |
Metadata |
---|
Tokenization: | c-102 |
|
Spanish (Costa Rica)
| es-CR |
Metadata |
---|
Tokenization: | c-103 |
|
Spanish (Cuba)
| es-CU |
Metadata |
---|
Tokenization: | c-721 |
|
Spanish (Dominican Republic)
| es-DO |
Metadata |
---|
Tokenization: | c-104 |
|
Spanish (Ecuador)
| es-EC |
Metadata |
---|
Tokenization: | c-105 |
|
Spanish (Spain)
| es-ES |
Metadata |
---|
Tokenization: | c-106 |
|
Spanish (Equatorial Guinea)
| es-GQ |
|
Spanish (Guatemala)
| es-GT |
Metadata |
---|
Tokenization: | c-107 |
|
Spanish (Heard Island and McDonald Islands)
| es-HM |
|
Spanish (Honduras)
| es-HN |
Metadata |
---|
Tokenization: | c-108 |
|
Spanish (International)
| es-INT |
Metadata |
---|
Tokenization: | c-719 |
|
Spanish (Latin America)
| es-LAT |
Metadata |
---|
Tokenization: | c-422 |
|
Spanish (Mexico)
| es-MX |
Metadata |
---|
Tokenization: | c-109 |
|
Spanish (Neutral)
| es-NEUTRAL |
Metadata |
---|
Tokenization: | c-720 |
|
Spanish (Nicaragua)
| es-NI |
Metadata |
---|
Tokenization: | c-110 |
|
Spanish (Panama)
| es-PA |
Metadata |
---|
Tokenization: | c-111 |
|
Spanish (Peru)
| es-PE |
Metadata |
---|
Tokenization: | c-112 |
|
Spanish (Philippines)
| es-PH |
|
Spanish (Puerto Rico)
| es-PR |
Metadata |
---|
Tokenization: | c-113 |
|
Spanish (Paraguay)
| es-PY |
Metadata |
---|
Tokenization: | c-114 |
|
Spanish (El Salvador)
| es-SV |
Metadata |
---|
Tokenization: | c-115 |
|
Spanish (Universal)
| es-UN |
Metadata |
---|
Tokenization: | c-427 |
|
Spanish (United States)
| es-US |
Metadata |
---|
Tokenization: | c-424 |
|
Spanish (Uruguay)
| es-UY |
Metadata |
---|
Tokenization: | c-116 |
|
Spanish (Venezuela)
| es-VE |
Metadata |
---|
Tokenization: | c-117 |
|
|
Estonian | et |
Metadata |
---|
Tokenization: | L-118 |
|
Estonian (Estonia)
| et-EE |
Metadata |
---|
Tokenization: | c-119 |
|
|
Basque | eu |
Metadata |
---|
Tokenization: | L-38 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | çñÇÑ | Mark: | ̧̃ |
|
Basque (Spain)
| eu-ES |
Metadata |
---|
Tokenization: | c-39 |
|
|
Even | eve |
Metadata |
---|
Punctuation: | ‐ | Letter: | стаьябэйилокчурмнхдеҥгөыцпвһюзѳшжъфщСТАЬЯБЭЙИЛОКЧУРМНХДЕҤГӨЫЦПВҺЮЗѲШЖЪФЩ | Mark: | ̆ |
|
|
Evenki | evn |
Metadata |
---|
Punctuation: | – | Letter: | упкатңилэбгдерӣынӯмвчзоюцяьйсёһъщжхфУПКАТҢИЛЭБГДЕРӢЫНӮМВЧЗОЮЦЯЬЙСЁҺЪЩЖХФ | Mark: | ̄̆̈ |
|
|
Ewondo | ewo |
Metadata |
---|
Tokenization: | L-734 | Letter: | áàâéèêíìîóòôúùûÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛǎǐǹǒǔǍƏƐǏǸǑƆǓěńŋĚŃŊəɛɔ | Mark: | ́̀̂̌ |
|
Ewondo (Cameroon)
| ewo-CM |
Metadata |
---|
Tokenization: | c-784 |
|
|
Persian | fa |
Metadata |
---|
Tokenization: | L-120 |
|
Persian (Iran)
| fa-IR |
Metadata |
---|
Tokenization: | c-121 |
|
Persian {Latn} (Iran)
| fa-Latn-IR |
Metadata |
---|
Tokenization: | c-815 |
|
|
Guinean Fang | fan |
|
Guinean Fang (Equatorial Guinea)
| fan-GQ |
|
|
Fanti | fat |
Metadata |
---|
Tokenization: | L-616 | Letter: | ãõÃÕƆƐɔɛ | Mark: | ̃ |
|
|
Fulah | ff |
Metadata |
---|
Tokenization: | L-122 | Letter: | ñÑƴƁƊƳŋŊɓɗ | Mark: | ̃ |
|
Fulah {Latn} (Senegal)
| ff-Latn-SN |
|
|
Maasina Fulfulde | ffm |
Metadata |
---|
Tokenization: | L-710 |
|
Maasina Fulfulde {Latn} (Mali)
| ffm-Latn-ML |
Metadata |
---|
Tokenization: | c-478 |
|
|
Finnish | fi |
Metadata |
---|
Tokenization: | L-123 | Punctuation: | »§‐–…’” | Letter: | åäöÅÄÖšžŠŽ | Mark: | ̌̊̈ |
|
Finnish (Finland)
| fi-FI |
Metadata |
---|
Tokenization: | c-124 |
|
|
Filipino, Pilipino | fil |
Metadata |
---|
Tokenization: | L-375 | Punctuation: | §‐–—…‘’“”′″ | Letter: | ñÑ | Mark: | ̃ |
|
Filipino, Pilipino (Philippines)
| fil-PH |
Metadata |
---|
Tokenization: | c-473 |
|
|
Tornedalen Finnish | fit |
Metadata |
---|
Tokenization: | L-402 |
|
|
Fijian | fj |
Metadata |
---|
Tokenization: | L-125 |
|
Fijian (Fiji)
| fj-FJ |
Metadata |
---|
Tokenization: | c-472 |
|
|
Faroese | fo |
Metadata |
---|
Tokenization: | L-126 | Punctuation: | ́§‐–…‘’“”†′″ | Letter: | áðíóúýæøÁÐÍÓÚÝÆØ | Mark: | ́ |
|
Faroese (Denmark)
| fo-DK |
|
Faroese (Faroe Islands)
| fo-FO |
Metadata |
---|
Tokenization: | c-127 |
|
|
Fon | fon |
Metadata |
---|
Letter: | óéòèáúàìùíÓÉÒÈÁÚÀÌÙÍǎǐǔƐƆƉǍǏǓěđĚĐɛɔɖ | Mark: | ́̌̀ |
|
Fon (Benin)
| fon-BJ |
|
|
French | fr |
Metadata |
---|
Tokenization: | L-128 | Punctuation: | «»§‐–—…’“”†‡ | Letter: | àâæçéèêëîïôùûüÿÀÂÆÇÉÈÊËÎÏÔÙÛÜœŒŸ | Mark: | ̧̀̂́̈ |
|
French (Sub-Saharan Africa)
| fr-202 |
|
French (Belgium)
| fr-BE |
Metadata |
---|
Tokenization: | c-129 |
|
French (Burkina Faso)
| fr-BF |
|
French (Burundi)
| fr-BI |
|
French (Benin)
| fr-BJ |
|
French (Saint Barthélemy)
| fr-BL |
|
French (Canada)
| fr-CA |
Metadata |
---|
Tokenization: | c-130 |
|
French (Caribbean)
| fr-CB |
|
French (Democratic Republic of the Congo)
| fr-CD |
Metadata |
---|
Tokenization: | c-760 |
|
French (Central African Republic)
| fr-CF |
|
French (Congo)
| fr-CG |
|
French (Switzerland)
| fr-CH |
Metadata |
---|
Tokenization: | c-131 |
|
French (Côte d'Ivoire)
| fr-CI |
|
French (Cameroon)
| fr-CM |
|
French (Djibouti)
| fr-DJ |
|
French (Algeria)
| fr-DZ |
|
French (France)
| fr-FR |
Metadata |
---|
Tokenization: | c-132 |
|
French (Gabon)
| fr-GA |
|
French (French Guiana)
| fr-GF |
|
French (Guinea)
| fr-GN |
|
French (Guadeloupe)
| fr-GP |
|
French (Haiti)
| fr-HT |
|
French (International)
| fr-INT |
Metadata |
---|
Tokenization: | c-829 |
|
French (Comoros)
| fr-KM |
|
French (Luxembourg)
| fr-LU |
Metadata |
---|
Tokenization: | c-133 |
|
French (Morocco)
| fr-MA |
Metadata |
---|
Tokenization: | c-653 |
|
French (Monaco)
| fr-MC |
Metadata |
---|
Tokenization: | c-134 |
|
French (Saint Martin)
| fr-MF |
|
French (Madagascar)
| fr-MG |
|
French (Maghreb)
| fr-MGB |
|
French (Mali)
| fr-ML |
|
French (Martinique)
| fr-MQ |
|
French (Mauritania)
| fr-MR |
|
French (Mauritius)
| fr-MU |
|
French (New Caledonia)
| fr-NC |
|
French (Niger)
| fr-NE |
|
French (French Polynesia)
| fr-PF |
|
French (Saint Pierre and Miquelon)
| fr-PM |
|
French (Quebec)
| fr-QC |
Metadata |
---|
Tokenization: | c-376 |
|
French (Réunion)
| fr-RE |
|
French (Rwanda)
| fr-RW |
|
French (Seychelles)
| fr-SC |
|
French (Senegal)
| fr-SN |
|
French (Syria)
| fr-SY |
|
French (Chad)
| fr-TD |
|
French (Togo)
| fr-TG |
|
French (Tunisia)
| fr-TN |
|
French (Vanuatu)
| fr-VU |
|
French (Wallis and Futuna)
| fr-WF |
|
French (Mayotte)
| fr-YT |
|
|
Arpitan, Francoprovençal | frp |
Metadata |
---|
Tokenization: | L-351 |
|
|
Adamawa Fulfulde | fub |
|
Adamawa Fulfulde {Arab} (Cameroon)
| fub-Arab-CM |
|
Adamawa Fulfulde {Latn} (Cameroon)
| fub-Latn-CM |
|
|
Pulaar | fuc |
|
Pulaar {Latn}
| fuc-Latn |
Metadata |
---|
Tokenization: | L-477 |
|
Pulaar {Arab} (Gambia)
| fuc-Arab-GM |
|
Pulaar {Latn} (Gambia)
| fuc-Latn-GM |
|
Pulaar {Arab} (Senegal)
| fuc-Arab-SN |
|
Pulaar {Latn} (Senegal)
| fuc-Latn-SN |
|
|
Pular | fuf |
|
Pular {Adlm}
| fuf-Adlm |
Metadata |
---|
Punctuation: | ޥ߰ޥޢف⁏؟ | Letter: | ޤͰޤ˰ޤհޤӰޤ˰ޤɰޤΰޤ̰ޤ°ޤ0ޤȰޤưޤܰޤڰޤװޤհޤӰޤѰޤٰޤװޤưޤİޤɰޤǰޤðޤpޤϰޤͰޤְޤޤڰޤذޤ̰ޤʰޤǰޤŰޤذޤְޤŰޤðޤҰޤаޤ۰ޤٰޤݰޤ۰ޤѰޤϰޤʰޤȰޤİޤ°ޤޤҰޤаޤΰޤްޤܰޤ߰ޤݰޥ0ޤްޥpޤ߰ޥ°ޤ0ޥðޤpޥˢ | Mark: | ޥʰޥưޥŰޥİޥȰޥɰޥǢ | Number: | ޥаޥѰޥҰޥӰޥޥհޥְޥװޥذޥ٢ |
|
Pular {Arab} (Guinea)
| fuf-Arab-GN |
Metadata |
---|
Tokenization: | c-476 |
|
Pular {Latn} (Guinea)
| fuf-Latn-GN |
Metadata |
---|
Tokenization: | c-475 |
|
Pular {Latn} (Nigeria)
| fuf-Latn-NG |
Metadata |
---|
Tokenization: | c-762 |
|
|
Western Niger Fula | fuh |
|
Western Niger Fula {Arab} (Niger)
| fuh-Arab-NE |
|
Western Niger Fula {Latn} (Niger)
| fuh-Latn-NE |
|
|
Friulian | fur |
Metadata |
---|
Tokenization: | L-377 | Letter: | àâçèêìîòôùûÀÂÇÈÊÌÎÒÔÙÛ | Mark: | ̧̀̂ |
|
Friulian (Italy)
| fur-IT |
Metadata |
---|
Tokenization: | c-474 |
|
|
Nigerian Fulfulde | fuv |
Metadata |
---|
Tokenization: | L-720 |
|
Nigerian Fulfulde {Latn} (Nigeria)
| fuv-Latn-NG |
Metadata |
---|
Tokenization: | c-756 |
|
|
Fur | fvr |
|
|
Western Frisian | fy |
Metadata |
---|
Tokenization: | L-135 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | ûâêúôòëïáàäéèíóöüýÛÂÊÚÔÒËÏÁÀÄÉÈÍÓÖÜÝ | Mark: | ̂́̀̈ |
|
Western Frisian (Netherlands)
| fy-NL |
Metadata |
---|
Tokenization: | c-561 |
|
|
Irish | ga |
Metadata |
---|
Tokenization: | L-136 | Letter: | áéíóúÁÉÍÓÚ | Mark: | ́ |
|
Irish (Ireland)
| ga-IE |
Metadata |
---|
Tokenization: | c-490 |
|
|
Ga | gaa |
Metadata |
---|
Tokenization: | L-378 | Letter: | ãÃƆƐŋŊɔɛ | Mark: | ̃ |
|
Ga (Ghana)
| gaa-GH |
Metadata |
---|
Tokenization: | c-479 |
|
|
Gagauz | gag |
Metadata |
---|
Punctuation: | — | Letter: | üäêöçÜÄÊÖÇışţŞİŢ | Mark: | ̧̇̈̂ |
|
Gagauz (Moldova)
| gag-MD |
|
|
Borana-Arsi-Guji Oromo | gax |
|
|
Gaelic | gd |
Metadata |
---|
Tokenization: | L-137 | Letter: | ìàòèùÌÀÒÈÙ | Mark: | ̀ |
|
Gaelic (United Kingdom)
| gd-GB |
Metadata |
---|
Tokenization: | c-532 |
|
Gaelic (Ireland)
| gd-IE |
Metadata |
---|
Tokenization: | c-755 |
|
|
Gilbertese | gil |
|
|
Gonja | gjn |
|
|
Galician | gl |
Metadata |
---|
Tokenization: | L-138 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | áéíñóúüªÁÉÍÑÓÚÜ | Mark: | ́̃̈ |
|
Galician (Spain)
| gl-ES |
Metadata |
---|
Tokenization: | c-139 |
|
|
Nanai | gld |
Metadata |
---|
Punctuation: | – | Letter: | найпрвослиебщдкцягьмзюуёчэӈтхӣӯъфжНАЙПРВОСЛИЕБЩДКЦЯГЬМЗЮУЁЧЭӇТХӢӮЪФЖ | Mark: | ̄̆̈ |
|
|
Guarani | gn |
Metadata |
---|
Tokenization: | L-140 |
|
Guarani (Paraguay)
| gn-PY |
Metadata |
---|
Tokenization: | c-481 |
|
|
Gronings | gos |
|
|
Gothic | got |
Metadata |
---|
Tokenization: | L-751 |
|
|
Ancient Greek | grc |
Metadata |
---|
Tokenization: | L-350 |
|
Ancient Greek (Greece)
| grc-GR |
Metadata |
---|
Tokenization: | c-771 |
|
|
Swiss German, Alemannic, Alsatian | gsw |
Metadata |
---|
Letter: | äöüÄÖÜ | Mark: | ̈ |
|
Swiss German, Alemannic, Alsatian (Switzerland)
| gsw-CH |
|
Swiss German, Alemannic, Alsatian (France)
| gsw-FR |
|
Swiss German, Alemannic, Alsatian (Liechtenstein)
| gsw-LI |
|
|
Gujarati | gu |
Metadata |
---|
Tokenization: | L-141 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | ૐઅઆઇઈઉઊઋૠઍએઐઑઓઔકખગઘઙચછજઝઞટઠડઢણતથદધનપફબભમયરલવશષસહળઽ | Mark: | ઼ંઁઃાિીુૂૃૄૅેૈૉોૌ્ | Number: | ૧૨૩૪૫૬૭૮૯૦ |
|
Gujarati (India)
| gu-IN |
Metadata |
---|
Tokenization: | c-142 |
|
|
Wayuu | guc |
Metadata |
---|
Letter: | üñÜÑ | Mark: | ̈̃ |
|
|
Paraguayan Guaraní | gug |
Metadata |
---|
Letter: | óáñéãíúõèÓÁÑÉÃÍÚÕÈʼĩũĨŨẽẼ | Mark: | ́̃̀ |
|
|
Yanomamö | guu |
Metadata |
---|
Letter: | ëãáõíËÃÁÕÍĩũĨŨẽẼ | Mark: | ̈̃́ |
|
|
Gusii, Ekegusii | guz |
Metadata |
---|
Tokenization: | L-740 |
|
Gusii, Ekegusii (Kenya)
| guz-KE |
Metadata |
---|
Tokenization: | c-792 |
|
|
Manx | gv |
Metadata |
---|
Tokenization: | L-143 | Punctuation: | ’ | Letter: | çÇ | Mark: | ̧ |
|
Manx (United Kingdom)
| gv-GB |
Metadata |
---|
Tokenization: | c-509 |
|
Manx (Isle of Man)
| gv-IM |
|
|
Guarayu | gyr |
Metadata |
---|
Punctuation: | ’ | Letter: | ëñäüöéïËÑÄÜÖÉÏ | Mark: | ̈̃́ |
|
|
Hausa | ha |
Metadata |
---|
Tokenization: | L-144 | Punctuation: | ‐’‘ | Letter: | ƙƴƁƊƘƳɓɗʼ |
|
Hausa {Arab}
| ha-Arab |
Metadata |
---|
Punctuation: | ،؟‹›«» | Letter: | أإابتثجحدرزسشطعغلموىٻڟکیۑࢻࢼࢽݣࣃࣄ | Mark: | َُِّْٰٕٜٔ |
|
Hausa {Latn} (Ghana)
| ha-Latn-GH |
|
Hausa {Latn} (Niger)
| ha-Latn-NE |
|
Hausa (Nigeria)
| ha-NG |
Metadata |
---|
Tokenization: | c-722 |
|
Hausa {Latn} (Nigeria)
| ha-Latn-NG |
Metadata |
---|
Tokenization: | c-483 |
|
|
Hakka Chinese | hak |
Metadata |
---|
Tokenization: | L-617 |
|
|
Hawaiian | haw |
Metadata |
---|
Tokenization: | L-382 | Punctuation: | ’‘“” | Letter: | āēīōūĀĒĪŌŪʻ | Mark: | ̄ |
|
Hawaiian (United States)
| haw-US |
Metadata |
---|
Tokenization: | c-484 |
|
|
Serbo-Croatian | hbs |
Metadata |
---|
Tokenization: | L-618 |
|
|
Hebrew | he |
Metadata |
---|
Tokenization: | L-145 | Punctuation: | ׳״־‐–— | Letter: | אבגדהוזחטיכךלמםנןסעפףצץקרשת |
|
Hebrew (Israel)
| he-IL |
Metadata |
---|
Tokenization: | c-146 |
|
|
Hindi | hi |
Metadata |
---|
Tokenization: | L-147 | Punctuation: | ।॥॰‘’“”— | Letter: | अआइईउऊऋएऐओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसह\u{958}\u{959}\u{95A}\u{95B}\u{95C}\u{95D}\u{95E} | Mark: | ँंः़ािीुूृेैोौ् | Number: | १२३४५६७८९ |
|
Hindi (India)
| hi-IN |
Metadata |
---|
Tokenization: | c-148 |
|
Hindi {Latn} (India)
| hi-Latn-IN |
Metadata |
---|
Tokenization: | c-822 |
|
|
Fiji Hindi | hif |
|
Fiji Hindi (Fiji)
| hif-FJ |
|
|
Hiligaynon | hil |
Metadata |
---|
Tokenization: | L-383 |
|
Hiligaynon (Philippines)
| hil-PH |
Metadata |
---|
Tokenization: | c-485 |
|
|
Matu Chin | hlt |
Metadata |
---|
Tokenization: | L-619 |
|
|
Hmong, Mong | hmn |
Metadata |
---|
Tokenization: | L-384 |
|
Hmong, Mong (United States)
| hmn-US |
Metadata |
---|
Tokenization: | c-723 |
|
|
Mina (Cameroon) | hna |
Metadata |
---|
Letter: | éáìóòúíàèùÉÁÌÓÒÚÍÀÈÙǒǐǔǹƉƐƆǑǏǓǸŋŊɖɛɔ | Mark: | ̀́̌ |
|
|
Hani | hni |
|
|
Hanunoo | hnn |
Metadata |
---|
Punctuation: | ᜵᜶ | Letter: | ᜩᜦᜣᜪᜧᜤᜰᜱᜫᜨᜥᜯᜭᜮᜬᜠᜡᜢ | Mark: | ᜲᜳ᜴ |
|
Hanunoo {Hano}
| hnn-Hano |
Metadata |
---|
Punctuation: | ᜵᜶ | Letter: | ᜩᜦᜣᜪᜧᜤᜰᜱᜫᜨᜥᜯᜭᜮᜬᜠᜡᜢ | Mark: | ᜲᜳ᜴ |
|
|
Caribbean Hindustani | hns |
Metadata |
---|
Punctuation: | ‘’ | Letter: | áêòíèàëÁÊÒÍÈÀË | Mark: | ́̂̀̈ |
|
|
Hiri Motu | ho |
Metadata |
---|
Tokenization: | L-149 |
|
Hiri Motu (Papua New Guinea)
| ho-PG |
Metadata |
---|
Tokenization: | c-486 |
|
|
Croatian | hr |
Metadata |
---|
Tokenization: | L-60 | Punctuation: | ‐–—…‘’‚“”„′″ | Letter: | čćžđšČĆŽĐŠ | Mark: | ̌́ |
|
Croatian (Bosnia and Herzegovina)
| hr-BA |
Metadata |
---|
Tokenization: | c-61 |
|
Croatian (Croatia)
| hr-HR |
Metadata |
---|
Tokenization: | c-62 |
|
|
Upper Sorbian | hsb |
Metadata |
---|
Tokenization: | L-428 | Punctuation: | «»§‐–—…‘’‚“„ | Letter: | čćźěłńřšžČĆŹĚŁŃŘŠŽóÓ | Mark: | ̌́ |
|
Upper Sorbian (Germany)
| hsb-DE |
Metadata |
---|
Tokenization: | c-555 |
|
|
Haitian | ht |
Metadata |
---|
Tokenization: | L-150 | Letter: | èéòÈÉÒ | Mark: | ̀́ |
|
Haitian (Haiti)
| ht-HT |
Metadata |
---|
Tokenization: | c-482 |
|
|
Hungarian | hu |
Metadata |
---|
Tokenization: | L-151 | Punctuation: | «»§–…’”„ | Letter: | áéíóöúüÁÉÍÓÖÚÜőűŐŰ | Mark: | ́̈̋ |
|
Hungarian (Hungary)
| hu-HU |
Metadata |
---|
Tokenization: | c-152 |
|
|
Hupa | hup |
|
|
Huastec | hus |
Metadata |
---|
Letter: | íáúéóàÍÁÚÉÓÀ | Mark: | °́̀ |
|
|
Murui Huitoto | huu |
Metadata |
---|
Letter: | úñáÚÑÁƗɨ | Mark: | ́̃ |
|
|
Armenian | hy |
Metadata |
---|
Tokenization: | L-27 | Punctuation: | ֊՝՜՞՛։․«» | Letter: | աբգդեզէըթժիլխծկհձղճմյնշոչպջռսվտրցւփքևօֆԱԲԳԴԵԶԷԸԹԺԻԼԽԾԿՀՁՂՃՄՅՆՇՈՉՊՋՌՍՎՏՐՑՒՓՔՕՖ |
|
Armenian (Armenia)
| hy-AM |
Metadata |
---|
Tokenization: | c-28 |
|
Armenian {Latn} (Armenia)
| hy-Latn-AM |
Metadata |
---|
Tokenization: | c-812 |
|
|
Herero | hz |
Metadata |
---|
Tokenization: | L-153 |
|
|
Interlingua | ia |
Metadata |
---|
Tokenization: | L-154 |
|
Interlingua (France)
| ia-FR |
|
Interlingua (International)
| ia-INT |
|
|
Iban | iba |
|
|
Ibibio | ibb |
|
Ibibio (Nigeria)
| ibb-NG |
|
|
Indonesian | id |
Metadata |
---|
Tokenization: | L-155 | Punctuation: | ‐–—…‘’“” |
|
Indonesian (Indonesia)
| id-ID |
Metadata |
---|
Tokenization: | c-156 |
|
|
Interlingue | ie |
Metadata |
---|
Tokenization: | L-157 |
|
|
Igbo | ig |
Metadata |
---|
Tokenization: | L-158 | Punctuation: | ‐ | Letter: | ẹịṅọụẸỊṄỌỤ | Mark: | ̣̇ |
|
Igbo (Nigeria)
| ig-NG |
Metadata |
---|
Tokenization: | c-487 |
|
|
Nuosu | ii |
Metadata |
---|
Tokenization: | L-159 | Punctuation: | 《》。、,(): | Letter: | ꀀꀁꀂꀃꀄꀅꀆꀇꀈꀉꀊꀋꀌꀍꀎꀏꀐꀑꀒꀓꀔꀕꀖꀗꀘꀙꀚꀛꀜꀝꀞꀟꀠꀡꀢꀣꀤꀥꀦꀧꀨꀩꀪꀫꀬꀭꀮꀯꀰꀱꀲꀳꀴꀵꀶꀷꀸꀹꀺꀻꀼꀽꀾꀿꁀꁁꁂꁃꁄꁅꁆꁇꁈꁉꁊꁋꁌꁍꁎꁏꁐꁑꁒꁓꁔꁕꁖꁗꁘꁙꁚꁛꁜꁝꁞꁟꁠꁡꁢꁣꁤꁥꁦꁧꁨꁩꁪꁫꁬꁭꁮꁯꁰꁱꁲꁳꁴꁵꁶꁷꁸꁹꁺꁻꁼꁽꁾꁿꂀꂁꂂꂃꂄꂅꂆꂇꂈꂉꂊꂋꂌꂍꂎꂏꂐꂑꂒꂓꂔꂕꂖꂗꂘꂙꂚꂛꂜꂝꂞꂟꂠꂡꂢꂣꂤꂥꂦꂧꂨꂩꂪꂫꂬꂭꂮꂯꂰꂱꂲꂳꂴꂵꂶꂷꂸꂹꂺꂻꂼꂽꂾꂿꃀꃁꃂꃃꃄꃅꃆꃇꃈꃉꃊꃋꃌꃍꃎꃏꃐꃑꃒꃓꃔꃕꃖꃗꃘꃙꃚꃛꃜꃝꃞꃟꃠꃡꃢꃣꃤꃥꃦꃧꃨꃩꃪꃫꃬꃭꃮꃯꃰꃱꃲꃳꃴꃵꃶꃷꃸꃹꃺꃻꃼꃽꃾꃿꄀꄁꄂꄃꄄꄅꄆꄇꄈꄉꄊꄋꄌꄍꄎꄏꄐꄑꄒꄓꄔꄕꄖꄗꄘꄙꄚꄛꄜꄝꄞꄟꄠꄡꄢꄣꄤꄥꄦꄧꄨꄩꄪꄫꄬꄭꄮꄯꄰꄱꄲꄳꄴꄵꄶꄷꄸꄹꄺꄻꄼꄽꄾꄿꅀꅁꅂꅃꅄꅅꅆꅇꅈꅉꅊꅋꅌꅍꅎꅏꅐꅑꅒꅓꅔꅕꅖꅗꅘꅙꅚꅛꅜꅝꅞꅟꅠꅡꅢꅣꅤꅥꅦꅧꅨꅩꅪꅫꅬꅭꅮꅯꅰꅱꅲꅳꅴꅵꅶꅷꅸꅹꅺꅻꅼꅽꅾꅿꆀꆁꆂꆃꆄꆅꆆꆇꆈꆉꆊꆋꆌꆍꆎꆏꆐꆑꆒꆓꆔꆕꆖꆗꆘꆙꆚꆛꆜꆝꆞꆟꆠꆡꆢꆣꆤꆥꆦꆧꆨꆩꆪꆫꆬꆭꆮꆯꆰꆱꆲꆳꆴꆵꆶꆷꆸꆹꆺꆻꆼꆽꆾꆿꇀꇁꇂꇃꇄꇅꇆꇇꇈꇉꇊꇋꇌꇍꇎꇏꇐꇑꇒꇓꇔꇕꇖꇗꇘꇙꇚꇛꇜꇝꇞꇟꇠꇡꇢꇣꇤꇥꇦꇧꇨꇩꇪꇫꇬꇭꇮꇯꇰꇱꇲꇳꇴꇵꇶꇷꇸꇹꇺꇻꇼꇽꇾꇿꈀꈁꈂꈃꈄꈅꈆꈇꈈꈉꈊꈋꈌꈍꈎꈏꈐꈑꈒꈓꈔꈕꈖꈗꈘꈙꈚꈛꈜꈝꈞꈟꈠꈡꈢꈣꈤꈥꈦꈧꈨꈩꈪꈫꈬꈭꈮꈯꈰꈱꈲꈳꈴꈵꈶꈷꈸꈹꈺꈻꈼꈽꈾꈿꉀꉁꉂꉃꉄꉅꉆꉇꉈꉉꉊꉋꉌꉍꉎꉏꉐꉑꉒꉓꉔꉕꉖꉗꉘꉙꉚꉛꉜꉝꉞꉟꉠꉡꉢꉣꉤꉥꉦꉧꉨꉩꉪꉫꉬꉭꉮꉯꉰꉱꉲꉳꉴꉵꉶꉷꉸꉹꉺꉻꉼꉽꉾꉿꊀꊁꊂꊃꊄꊅꊆꊇꊈꊉꊊꊋꊌꊍꊎꊏꊐꊑꊒꊓꊔꊕꊖꊗꊘꊙꊚꊛꊜꊝꊞꊟꊠꊡꊢꊣꊤꊥꊦꊧꊨꊩꊪꊫꊬꊭꊮꊯꊰꊱꊲꊳꊴꊵꊶꊷꊸꊹꊺꊻꊼꊽꊾꊿꋀꋁꋂꋃꋄꋅꋆꋇꋈꋉꋊꋋꋌꋍꋎꋏꋐꋑꋒꋓꋔꋕꋖꋗꋘꋙꋚꋛꋜꋝꋞꋟꋠꋡꋢꋣꋤꋥꋦꋧꋨꋩꋪꋫꋬꋭꋮꋯꋰꋱꋲꋳꋴꋵꋶꋷꋸꋹꋺꋻꋼꋽꋾꋿꌀꌁꌂꌃꌄꌅꌆꌇꌈꌉꌊꌋꌌꌍꌎꌏꌐꌑꌒꌓꌔꌕꌖꌗꌘꌙꌚꌛꌜꌝꌞꌟꌠꌡꌢꌣꌤꌥꌦꌧꌨꌩꌪꌫꌬꌭꌮꌯꌰꌱꌲꌳꌴꌵꌶꌷꌸꌹꌺꌻꌼꌽꌾꌿꍀꍁꍂꍃꍄꍅꍆꍇꍈꍉꍊꍋꍌꍍꍎꍏꍐꍑꍒꍓꍔꍕꍖꍗꍘꍙꍚꍛꍜꍝꍞꍟꍠꍡꍢꍣꍤꍥꍦꍧꍨꍩꍪꍫꍬꍭꍮꍯꍰꍱꍲꍳꍴꍵꍶꍷꍸꍹꍺꍻꍼꍽꍾꍿꎀꎁꎂꎃꎄꎅꎆꎇꎈꎉꎊꎋꎌꎍꎎꎏꎐꎑꎒꎓꎔꎕꎖꎗꎘꎙꎚꎛꎜꎝꎞꎟꎠꎡꎢꎣꎤꎥꎦꎧꎨꎩꎪꎫꎬꎭꎮꎯꎰꎱꎲꎳꎴꎵꎶꎷꎸꎹꎺꎻꎼꎽꎾꎿꏀꏁꏂꏃꏄꏅꏆꏇꏈꏉꏊꏋꏌꏍꏎꏏꏐꏑꏒꏓꏔꏕꏖꏗꏘꏙꏚꏛꏜꏝꏞꏟꏠꏡꏢꏣꏤꏥꏦꏧ |
|
Nuosu (China)
| ii-CN |
Metadata |
---|
Tokenization: | c-521 |
|
|
Inupiaq | ik |
Metadata |
---|
Tokenization: | L-160 |
|
Inupiaq (United States)
| ik-US |
Metadata |
---|
Tokenization: | c-834 |
|
|
Eastern Canadian Inuktitut | ike |
Metadata |
---|
Tokenization: | L-622 | Letter: | ᐁᐃᐄᐅᐆᐊᐋᐯᐱᐲᐳᐴᐸᐹᑉᑌᑎᑏᑐᑑᑕᑖᑦᑫᑭᑮᑯᑰᑲᑳᒃᒉᒋᒌᒍᒎᒐᒑᒡᒣᒥᒦᒧᒨᒪᒫᒻᓀᓂᓃᓄᓅᓇᓈᓐᓓᓕᓖᓗᓘᓚᓛᓪᓭᓯᓰᓱᓲᓴᓵᔅᔦᔨᔩᔪᔫᔭᔮᔾᕃᕆᕇᕈᕉᕋᕌᕐᕓᕕᕖᕗᕘᕙᕚᕝᕼᕿᖀᖁᖂᖃᖄᖅᖏᖐᖑᖒᖓᖔᖕᖠᖡᖢᖣᖤᖥᖦᖯᙯᙰᙱᙲᙳᙴᙵᙶ |
|
|
Inuinnaqtun, Western Canadian Inuktitut | ikt |
Metadata |
---|
Tokenization: | L-623 |
|
|
Iloko | ilo |
Metadata |
---|
Tokenization: | L-385 |
|
Iloko (Philippines)
| ilo-PH |
Metadata |
---|
Tokenization: | c-488 |
|
|
Ingush | inh |
|
|
Ido | io |
Metadata |
---|
Tokenization: | L-161 |
|
|
Icelandic | is |
Metadata |
---|
Tokenization: | L-162 | Punctuation: | §‐–—…‘‚“„†‡′″ | Letter: | áðéíóúýþæöÁÐÉÍÓÚÝÞÆÖ | Mark: | ́̈ |
|
Icelandic (Iceland)
| is-IS |
Metadata |
---|
Tokenization: | c-163 |
|
|
Italian | it |
Metadata |
---|
Tokenization: | L-164 | Punctuation: | «»—…’“” | Letter: | àéèìóòùÀÉÈÌÓÒÙ | Mark: | ̀́ |
|
Italian (Switzerland)
| it-CH |
Metadata |
---|
Tokenization: | c-165 |
|
Italian (Italy)
| it-IT |
Metadata |
---|
Tokenization: | c-166 |
|
Italian (San Marino)
| it-SM |
|
|
Inuktitut | iu |
Metadata |
---|
Tokenization: | L-167 |
|
Inuktitut {Cans}
| iu-Cans |
Metadata |
---|
Tokenization: | L-624 |
|
Inuktitut {Latn}
| iu-Latn |
|
Inuktitut (Canada)
| iu-CA |
Metadata |
---|
Tokenization: | c-489 |
|
Inuktitut {Cans} (Canada)
| iu-Cans-CA |
Metadata |
---|
Tokenization: | c-654 |
|
|
Iu Mien | ium |
Metadata |
---|
Tokenization: | L-621 |
|
|
Japanese | ja |
Metadata |
---|
Tokenization: | L-168 |
|
Japanese (Japan)
| ja-JP |
Metadata |
---|
Tokenization: | c-169 |
|
|
Jamaican Creole English | jam |
Metadata |
---|
Tokenization: | L-574 |
|
Jamaican Creole English (Jamaica)
| jam-JM |
Metadata |
---|
Tokenization: | c-575 |
|
|
Lojban | jbo |
Metadata |
---|
Tokenization: | L-392 |
|
|
Ngomba | jgo |
Metadata |
---|
Punctuation: | «»‹› | Letter: | áâíîúûÁÂÍÎÚÛꞌꞋǎǐǹǔǍƐǏǸƆǓɄńŋŃŊḿẅḾẄɛɔʉ | Mark: | ́̀̂̌̄̈ |
|
Ngomba (Cameroon)
| jgo-CM |
|
|
Shuar | jiv |
Metadata |
---|
Letter: | áíúéÁÍÚÉ | Mark: | ́ |
|
|
Machame | jmc |
|
Machame (Tanzania, United Republic of)
| jmc-TZ |
|
|
Javanese | jv |
Metadata |
---|
Tokenization: | L-170 | Punctuation: | ‰ | Letter: | ÂÅÈÉÊÌÒÙâåèéêìòù | Mark: | ̀́̂̊ |
|
Javanese {Java}
| jv-Java |
Metadata |
---|
Punctuation: | ꧁꧂꧃꧄꧅꧆꧇꧈꧉꧊꧋꧌꧍ | Letter: | ꦄꦆꦇꦈꦉꦊꦋꦌꦎꦏꦑꦒꦓꦔꦕꦖꦗꦘꦚꦛꦝꦟꦠꦡꦢꦤꦥꦦꦧꦨꦩꦪꦫꦭꦮꦱꦲꧏ | Mark: | ꦀꦁꦂꦃ꦳ꦴꦶꦸꦺꦼꦽꦾꦿ꧀ | Number: | ꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙ |
|
Javanese (Indonesia)
| jv-ID |
Metadata |
---|
Tokenization: | c-491 |
|
Javanese {Latn} (Indonesia)
| jv-Latn-ID |
|
|
Georgian | ka |
Metadata |
---|
Tokenization: | L-171 | Punctuation: | ჻«»§‐–—…‘‚“„†‡′″ | Letter: | აბგდევზთიკლმნოპჟრსტუფქღყშჩცძწჭხჯჰ |
|
Georgian (Georgia)
| ka-GE |
Metadata |
---|
Tokenization: | c-172 |
|
|
Kabyle | kab |
Metadata |
---|
Punctuation: | ‰ | Letter: | ǧƐǦƔčČḍḥṛṣṭẓḌḤṚṢṬẒɛɣ | Mark: | ̣̌ |
|
Kabyle {Latn} (Algeria)
| kab-Latn-DZ |
|
|
Kachin, Jingpho | kac |
Metadata |
---|
Tokenization: | L-744 |
|
Kachin, Jingpho {Latn} (Myanmar)
| kac-Latn-MM |
Metadata |
---|
Tokenization: | c-796 |
|
|
Kamba (Kenya) | kam |
Metadata |
---|
Tokenization: | L-738 | Letter: | ĩũĨŨ | Mark: | ̃ |
|
Kamba (Kenya) (Kenya)
| kam-KE |
Metadata |
---|
Tokenization: | c-790 |
|
|
Karen languages | kar |
Metadata |
---|
Tokenization: | L-625 |
|
|
Kabardian | kbd |
Metadata |
---|
Letter: | цӏыхуэфащмтеднйпсожлъкрзгьибяшвчіюЦӀЫХУЭФАЩМТЕДНЙПСОЖЛЪКРЗГЬИБЯШВЧІЮ | Mark: | ̆ |
|
Kabardian (Russia)
| kbd-RU |
|
|
Kabiyè | kbp |
Metadata |
---|
Letter: | ñÑƆƐƱƉƖƔŋŊɔɛʊɖɩɣ | Mark: | ̃ |
|
|
Makonde | kde |
|
Makonde (Tanzania, United Republic of)
| kde-TZ |
|
|
Tem | kdh |
Metadata |
---|
Letter: | íáéúóÿÍÁÉÚÓƖƱƐƉƆńŋŃŸŊḿḾɩʊɛɖɔ | Mark: | ́̈ |
|
|
Kam | kdx |
|
|
Kabuverdianu | kea |
Metadata |
---|
Tokenization: | L-733 | Punctuation: | ’ | Letter: | ñçêéâíèáôóãºõúàòÑÇÊÉÂÍÈÁÔÓÃÕÚÀÒ | Mark: | ̧̃̂́̀ |
|
Kabuverdianu (Cabo Verde)
| kea-CV |
Metadata |
---|
Tokenization: | c-781 |
|
|
Kekchí | kek |
|
|
Kongo | kg |
Metadata |
---|
Tokenization: | L-173 |
|
Kongo (Angola)
| kg-AO |
Metadata |
---|
Tokenization: | c-782 |
|
Kongo (Congo)
| kg-CG |
Metadata |
---|
Tokenization: | c-724 |
|
|
Khasi | kha |
Metadata |
---|
Letter: | ïñÏÑ | Mark: | ̈̃ |
|
|
Lü | khb |
Metadata |
---|
Letter: | ᦀᦁᦂᦃᦄᦅᦆᦇᦈᦉᦊᦋᦌᦍᦎᦏᦐᦑᦒᦓᦔᦕᦖᦗᦘᦙᦚᦛᦜᦝᦞᦟᦠᦡᦢᦣᦤᦥᦦᦧᦨᦩᦪᦫᦰᦱᦲᦳᦴᦵᦶᦷᦸᦹᦺᦻᦼᦽᦾᦿᧀᧁᧂᧃᧄᧅᧆᧇ | Number: | ᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᧚ |
|
|
Halh Mongolian | khk |
Metadata |
---|
Punctuation: | ̈̆‐–—…‘’“”†‡′″§ | Letter: | абвгдеёжзийклмноөпрстуүфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОӨПРСТУҮФХЦЧШЩЪЫЬЭЮЯ | Mark: | ̈̆ |
|
Halh Mongolian {Mong}
| khk-Mong |
Metadata |
---|
Punctuation: | ᠊᠁᠂᠃᠄()〈〉《》〔〕?! | Letter: | ᠢᠦᠤᠡᠧᠥᠣᠠᠫᠪᠲᠳᠴᠵᠬᠰᠱᠭᠨᠩᠮᠯᠶᠷᠸᠹᠺᠻᠼᠽᠾᠿᡀᡁᡂ | Number: | ᠑᠒᠓᠔᠕᠖᠗᠘᠙ |
|
|
Koyra Chiini Songhay | khq |
Metadata |
---|
Tokenization: | L-737 | Letter: | ãõÃÕƝŋšžŊŠŽẽẼɲ | Mark: | ̃̌ |
|
Koyra Chiini Songhay {Latn} (Mali)
| khq-Latn-ML |
Metadata |
---|
Tokenization: | c-786 |
|
|
Kikuyu | ki |
Metadata |
---|
Tokenization: | L-174 | Letter: | ĩũĨŨ | Mark: | ̃ |
|
Kikuyu (Kenya)
| ki-KE |
Metadata |
---|
Tokenization: | c-830 |
|
|
Kirmanjki (individual language) | kiu |
Metadata |
---|
Tokenization: | L-567 |
|
Kirmanjki (individual language) (Turkey)
| kiu-TR |
Metadata |
---|
Tokenization: | c-568 |
|
|
Kwanyama | kj |
Metadata |
---|
Tokenization: | L-175 |
|
|
Khakas | kjh |
Metadata |
---|
Letter: | прайтиксізледјвоцяыгнмбңюьчуғхжҷэфщъПРАЙТИКСІЗЛЕДЈВОЦЯЫГНМБҢЮЬЧУҒХЖҶЭФЩЪ | Mark: | ̆ |
|
|
Kazakh | kk |
Metadata |
---|
Tokenization: | L-176 | Punctuation: | ‐–—…‘’“”«»§ | Letter: | аәбвгғдеёжзийкқлмнңоөпрстуұүфхһцчшщъыіьэюяАӘБВГҒДЕЁЖЗИЙКҚЛМНҢОӨПРСТУҰҮФХҺЦЧШЩЪЫІЬЭЮЯ |
|
Kazakh (Kazakhstan)
| kk-KZ |
|
Kazakh {Cyrl} (Kazakhstan)
| kk-Cyrl-KZ |
Metadata |
---|
Tokenization: | c-177 |
|
Kazakh {Latn} (Kazakhstan)
| kk-Latn-KZ |
Metadata |
---|
Tokenization: | c-789 |
|
|
Khün | kkh |
Metadata |
---|
Punctuation: | ᪨᪩᪪᪫ | Letter: | ᨠᨡᨣᨤᨥᨦᨧᨨᨩᨪᨫᨬᨭᨮᨯᨰᨱᨲᨳᨴᨵᨶᨷᨸᨹᨺᨻᨼᨽᨾᨿᩀᩁᩃᩅᩆᩇᩈᩉᩊᩋᩌᩍᩎᩏᩐᩑᩒᩓᩔᪧ | Mark: | ᩕᩖᩘᩙᩛᩜᩝᩞ᩠ᩡᩢᩣᩤᩥᩦᩧᩨᩩᩪᩫᩬᩭᩮᩯᩰᩱᩳᩴ᩵᩶᩺᩼ | Number: | ᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉ |
|
|
Kako | kkj |
Metadata |
---|
Punctuation: | «»…‘‹›“” | Letter: | áàâéèêíìîóòôúùûÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛnjƁƊƐNJƆŋŊɓɗɛɔ | Mark: | ̧́̀̂ |
|
Kako (Cameroon)
| kkj-CM |
|
|
Greenlandic | kl |
Metadata |
---|
Tokenization: | L-178 |
|
Greenlandic (Greenland)
| kl-GL |
Metadata |
---|
Tokenization: | c-725 |
|
|
Kalenjin | kln |
Metadata |
---|
Tokenization: | L-739 |
|
Kalenjin (Kenya)
| kln-KE |
Metadata |
---|
Tokenization: | c-791 |
|
|
Khmer | km |
Metadata |
---|
Tokenization: | L-179 | Punctuation: | ៖។៕៙៚‘’“” | Letter: | ឥឦឪឧឩឯឰឱឳឲឫឬឭឮកខគឃងចឆជឈញដឋឌឍណតថទធនបផពភមយរឡលវសហអៗ | Mark: | ៈាិីឹឺុូួើឿៀេែៃោៅំះ៉៊់៍័្ | Number: | ១២៣៤៥៦៧៨៩ |
|
Khmer (Cambodia)
| km-KH |
Metadata |
---|
Tokenization: | c-495 |
|
|
Kimbundu | kmb |
Metadata |
---|
Tokenization: | L-742 | Punctuation: | ’ | Letter: | êâôÊÂÔ | Mark: | ̂ |
|
Kimbundu (Angola)
| kmb-AO |
Metadata |
---|
Tokenization: | c-794 |
|
|
Northern Kurdish | kmr |
Metadata |
---|
Tokenization: | L-409 | Letter: | ûîêçÛÎÊÇşŞ | Mark: | ̧̂ |
|
Northern Kurdish {Arab} (Iraq)
| kmr-Arab-IQ |
Metadata |
---|
Tokenization: | c-519 |
|
Northern Kurdish {Arab} (Iran)
| kmr-Arab-IR |
Metadata |
---|
Tokenization: | c-610 |
|
Northern Kurdish {Latn} (Syria)
| kmr-Latn-SY |
Metadata |
---|
Tokenization: | c-783 |
|
Northern Kurdish {Latn} (Turkey)
| kmr-Latn-TR |
Metadata |
---|
Tokenization: | c-518 |
|
|
Kannada | kn |
Metadata |
---|
Tokenization: | L-180 | Punctuation: | ‐–—…‘’“”′″ | Letter: | ಅಆಇಈಉಊಋೠಌೡಎಏಐಒಓಔಕಖಗಘಙಚಛಜಝಞಟಠಡಢಣತಥದಧನಪಫಬಭಮಯರಱಲವಶಷಸಹಳಽ | Mark: | ಼̃ಂಃಾಿೀುೂೃೄೆೇೈೊೋೌ್ೕೖ | Number: | ೧೨೩೪೫೬೭೮೯ |
|
Kannada (India)
| kn-IN |
Metadata |
---|
Tokenization: | c-181 |
|
|
Central Kanuri | knc |
|
|
Koongo | kng |
|
|
Konkani (individual language) | knn |
Metadata |
---|
Letter: | ॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसहळऽ | Mark: | ़ंँःािीुूृॅेैॉोौ् | Number: | १२३४५६७८९ |
|
|
Korean | ko |
Metadata |
---|
Tokenization: | L-182 |
|
Korean (North Korea)
| ko-KP |
|
Korean (Korea)
| ko-KR |
Metadata |
---|
Tokenization: | c-183 |
|
|
Komi-Permyak | koi |
Metadata |
---|
Punctuation: | – | Letter: | мортпавэзлӧнбыдсиьекцяюгйучішжёщъфхМОРТПАВЭЗЛӦНБЫДСИЬЕКЦЯЮГЙУЧІШЖЁЩЪФХ | Mark: | ̈̆ |
|
|
Konkani | kok |
Metadata |
---|
Tokenization: | L-184 |
|
Konkani (India)
| kok-IN |
Metadata |
---|
Tokenization: | c-185 |
|
|
Konzo | koo |
|
|
Kosraean | kos |
Metadata |
---|
Tokenization: | L-713 |
|
|
Kpelle | kpe |
Metadata |
---|
Tokenization: | L-626 |
|
|
Kaonde | kqn |
|
|
Kanuri | kr |
Metadata |
---|
Tokenization: | L-186 |
|
Kanuri {Arab}
| kr-Arab |
Metadata |
---|
Tokenization: | L-492 |
|
Kanuri {Latn}
| kr-Latn |
Metadata |
---|
Tokenization: | L-191 |
|
|
Krio | kri |
Metadata |
---|
Tokenization: | L-627 | Punctuation: | – | Letter: | ƐƆŋŊɛɔ |
|
Krio (Sierra Leone)
| kri-SL |
|
|
Karelian | krl |
Metadata |
---|
Punctuation: | ’ | Letter: | äöÄÖ螚ȎŠ | Mark: | ̈̌ |
|
|
Kashmiri | ks |
Metadata |
---|
Tokenization: | L-187 | Punctuation: | ‰ | Letter: | ؠءآأؤابتثجحخدذرزسشصضطظعغفقلمنوٲٹپچڈڑژکگںھہۄۆیۍے | Mark: | ٓٔ | Number: | ۱۲۳۴۵۶۷۸۹ |
|
Kashmiri {Deva}
| ks-Deva |
Metadata |
---|
Punctuation: | । | Letter: | अआइईउऊऎएऐऒओऔकखगचछजटठडतथदनपफबमयरलवशसहॳॴॵॶॷ | Mark: | ँंऺऻ़ािीुूॆेैॊोौ्ॏॖॗ |
|
Kashmiri (India)
| ks-IN |
|
Kashmiri {Arab} (India)
| ks-Arab-IN |
|
Kashmiri {Deva} (India)
| ks-Deva-IN |
Metadata |
---|
Tokenization: | c-493 |
|
Kashmiri (Pakistan)
| ks-PK |
Metadata |
---|
Tokenization: | c-387 |
|
|
Shambala | ksb |
|
Shambala (Tanzania, United Republic of)
| ksb-TZ |
|
|
Bafia | ksf |
Metadata |
---|
Letter: | áéíóúÁÉÍÓÚǝƎƐƆŋŊɛɔ | Mark: | ́ |
|
Bafia (Cameroon)
| ksf-CM |
|
|
Kölsch | ksh |
Metadata |
---|
Punctuation: | ‐–—…‘‚“„†‡§⸗ | Letter: | ėœůĖŒŮåäæëößüÅÄÆËÖÜ | Mark: | ̊̈̇ |
|
Kölsch (Germany)
| ksh-DE |
|
|
Kituba (Democratic Republic of Congo) | ktu |
|
|
Kurdish | ku |
Metadata |
---|
Tokenization: | L-188 |
|
Kurdish {Arab}
| ku-Arab |
Metadata |
---|
Tokenization: | L-628 |
|
Kurdish (Iraq)
| ku-IQ |
Metadata |
---|
Tokenization: | c-726 |
|
Kurdish {Arab} (Iraq)
| ku-Arab-IQ |
|
Kurdish {Arab} (Iran)
| ku-Arab-IR |
|
Kurdish (Turkey)
| ku-TR |
Metadata |
---|
Tokenization: | c-706 |
|
|
Kunama | kun |
Metadata |
---|
Tokenization: | L-629 |
|
|
Komi | kv |
Metadata |
---|
Tokenization: | L-189 |
|
Komi (Russia)
| kv-RU |
|
|
Cornish | kw |
Metadata |
---|
Tokenization: | L-57 |
|
Cornish (United Kingdom)
| kw-GB |
Metadata |
---|
Tokenization: | c-467 |
|
|
Awa-Cuaiquer | kwi |
Metadata |
---|
Punctuation: | · | Letter: | áñëóçâùéàêÁÑËÓÇÂÙÉÀÊ | Mark: | ̧́̃̈̂̀ |
|
|
Kyrgyz | ky |
Metadata |
---|
Tokenization: | L-190 | Punctuation: | ‐–—…‘‚“„«»§ | Letter: | абгдеёжзийклмнӊоөпрстуүхчшъыэюяцңвьфАБГДЕЁЖЗИЙКЛМНӉОӨПРСТУҮХЧШЪЫЭЮЯЦҢВЬФ | Mark: | ̈̆ |
|
Kyrgyz (Kyrgyzstan)
| ky-KG |
Metadata |
---|
Tokenization: | c-191 |
|
Kyrgyz (Kazakhstan)
| ky-KZ |
Metadata |
---|
Tokenization: | c-192 |
|
|
Latin | la |
Metadata |
---|
Tokenization: | L-193 |
|
Latin (International)
| la-INT |
|
Latin (Holy See)
| la-VA |
Metadata |
---|
Tokenization: | c-770 |
|
|
Ladino | lad |
Metadata |
---|
Punctuation: | – | Letter: | íÍ | Mark: | ́ |
|
|
Langi | lag |
Metadata |
---|
Letter: | áéíóúÁÉÍÓÚƗɄɨʉ | Mark: | ́ |
|
Langi (Tanzania, United Republic of)
| lag-TZ |
|
|
Luxembourgish | lb |
Metadata |
---|
Tokenization: | L-194 | Punctuation: | «»§‐–—…‘‚“„ | Letter: | äéëêüöôàÄÉËÊÜÖÔÀ | Mark: | ̈́̂̀ |
|
Luxembourgish (Belgium)
| lb-BE |
Metadata |
---|
Tokenization: | c-503 |
|
Luxembourgish (Luxembourg)
| lb-LU |
Metadata |
---|
Tokenization: | c-504 |
|
|
Lingua Franca Nova | lfn |
|
|
Luganda | lg |
Metadata |
---|
Tokenization: | L-195 | Letter: | ŋŊ |
|
Luganda (Uganda)
| lg-UG |
Metadata |
---|
Tokenization: | c-480 |
|
|
Limburgish | li |
Metadata |
---|
Tokenization: | L-196 |
|
Limburgish (Netherlands)
| li-NL |
Metadata |
---|
Tokenization: | c-769 |
|
|
West-Central Limba | lia |
|
|
Ligurian | lij |
Metadata |
---|
Tokenization: | L-391 | Punctuation: | ’ | Letter: | çòæéùöôâîàêÇÒÆÉÙÖÔÂÎÀÊ | Mark: | ̧̀́̈̂ |
|
|
Lisu | lis |
Metadata |
---|
Punctuation: | 《》…꓾꓿ | Letter: | ꓐꓑꓒꓓꓔꓕꓖꓗꓘꓙꓚꓛꓜꓝꓞꓟꓠꓡꓢꓣꓤꓥꓦꓧꓨꓩꓪꓫꓬꓭꓮꓯꓰꓱꓲꓳꓴꓵꓶꓷꓸꓹꓺꓻꓼꓽʼˍ |
|
|
Lakota | lkt |
Metadata |
---|
Punctuation: | ́̌‐–—“” | Letter: | ʼáéíóúÁÉÍÓÚǧȟǦȞŋčšžŊČŠŽ | Mark: | ́̌ |
|
Lakota (United States)
| lkt-US |
|
|
Ladin | lld |
Metadata |
---|
Punctuation: | ’ | Letter: | ëéüêàèöìùîâôòóûËÉÜÊÀÈÖÌÙÎÂÔÒÓÛćĆ | Mark: | ̈́̂̀ |
|
|
Lombard | lmo |
Metadata |
---|
Tokenization: | L-393 |
|
|
Lingala | ln |
Metadata |
---|
Tokenization: | L-197 | Punctuation: | ’ | Letter: | áâéêíîóôúÁÂÉÊÍÎÓÔÚǎǐǒǍƐǏǑƆěĚɛɔ | Mark: | ́̂̌ |
|
Lingala (Angola)
| ln-AO |
|
Lingala (Democratic Republic of the Congo)
| ln-CD |
|
Lingala {Latn} (Democratic Republic of the Congo)
| ln-Latn-CD |
Metadata |
---|
Tokenization: | c-571 |
|
Lingala (Central African Republic)
| ln-CF |
|
Lingala (Congo)
| ln-CG |
|
Lingala {Latn} (Congo)
| ln-Latn-CG |
Metadata |
---|
Tokenization: | c-727 |
|
|
Lamnso' | lns |
Metadata |
---|
Punctuation: | ’ | Letter: | áéùìòúíóàèÁÉÙÌÒÚÍÓÀÈƏŋŊə | Mark: | ̀́ |
|
|
Lao | lo |
Metadata |
---|
Tokenization: | L-198 | Letter: | ໆກຂຄງຈສຊຍດຕຖທນບປຜຝພຟມຢຣລວຫໜໝອຮຯະາຳຽເແໂໃໄ | Mark: | ່້໊໋́໌ໍັິີຶືຸູົຼ |
|
Lao (Laos)
| lo-LA |
Metadata |
---|
Tokenization: | c-501 |
|
|
Lobi | lob |
Metadata |
---|
Letter: | àáäÀÁÄƲƖƆƐʋɩɔɛʔ | Mark: | ̀́̈ |
|
|
Otuho | lot |
|
|
Lozi | loz |
|
|
Northern Luri | lrc |
Metadata |
---|
Punctuation: | ،٫٬؛؟‐…‹›«» | Letter: | آأؤئابپتثجچحخدذرزژسشصضطظعغفڤقکگلمنھەوۉۊیؽي | Mark: | ٙٛٓٔ |
|
Northern Luri (Iraq)
| lrc-IQ |
|
Northern Luri (Iran)
| lrc-IR |
|
|
Lithuanian | lt |
Metadata |
---|
Tokenization: | L-199 | Punctuation: | ‐–—…“„ | Letter: | éÉąčęėįšųūžĄČĘĖĮŠŲŪŽ | Mark: | ̨̌̇̄́ |
|
Lithuanian (Lithuania)
| lt-LT |
Metadata |
---|
Tokenization: | c-200 |
|
|
Latgalian | ltg |
|
|
Luba-Katanga | lu |
Metadata |
---|
Tokenization: | L-201 | Letter: | áàéèíìóòúùÁÀÉÈÍÌÓÒÚÙƐƆɛɔ | Mark: | ́̀ |
|
Luba-Katanga (Democratic Republic of the Congo)
| lu-CD |
|
|
Luba-Lulua | lua |
|
|
Luvale | lue |
|
|
Lunda | lun |
|
|
Luo (Kenya and Tanzania), Dholuo | luo |
Metadata |
---|
Tokenization: | L-746 |
|
Luo (Kenya and Tanzania), Dholuo {Latn} (Kenya)
| luo-Latn-KE |
Metadata |
---|
Tokenization: | c-799 |
|
|
Mizo, Lushai, Duhlian | lus |
Metadata |
---|
Letter: | âêûîãÂÊÛÎà | Mark: | ̂̃ |
|
Mizo, Lushai, Duhlian {Beng} (India)
| lus-Beng-IN |
|
Mizo, Lushai, Duhlian {Latn} (India)
| lus-Latn-IN |
|
|
Luyia, Oluluyia | luy |
Metadata |
---|
Tokenization: | L-397 |
|
Luyia, Oluluyia (Kenya)
| luy-KE |
|
|
Latvian | lv |
Metadata |
---|
Tokenization: | L-202 |
|
Latvian (Latvia)
| lv-LV |
Metadata |
---|
Tokenization: | c-203 |
|
|
Standard Latvian | lvs |
Metadata |
---|
Punctuation: | §‐–—…‘’‚“”„†‡′″ | Letter: | āčēģīķļņšūžĀČĒĢĪĶĻŅŠŪŽ | Mark: | ̧̄̌ |
|
|
Literary Chinese | lzh |
|
|
San Jerónimo Tecóatl Mazatec | maa |
Metadata |
---|
Tokenization: | L-583 |
|
San Jerónimo Tecóatl Mazatec (Mexico)
| maa-MX |
Metadata |
---|
Tokenization: | c-584 |
|
|
Madurese | mad |
|
Madurese {Java} (Indonesia)
| mad-Java-ID |
|
Madurese {Latn} (Indonesia)
| mad-Latn-ID |
|
|
Magahi | mag |
Metadata |
---|
Punctuation: | । | Letter: | मनवधकरलसयतषटउचबघणपगठदहभअएऔथओशईजखआडइछफढझञऐ | Mark: | ािेंु्ोी़ूौृैँ |
|
|
Maithili | mai |
Metadata |
---|
Tokenization: | L-398 | Punctuation: | ।– | Letter: | सरवभमनधकघषणटदबएतआउलजपठगअछहऐयशओचथखफइढडङईञʼ | Mark: | ा्ौिोंेँीृूुःै़ |
|
Maithili (Nepal)
| mai-NP |
Metadata |
---|
Tokenization: | c-505 |
|
|
Jalapa De Díaz Mazatec | maj |
Metadata |
---|
Tokenization: | L-585 |
|
Jalapa De Díaz Mazatec (Mexico)
| maj-MX |
Metadata |
---|
Tokenization: | c-586 |
|
|
Mam | mam |
|
|
Mandingo, Manding | man |
|
|
Chiquihuitlán Mazatec | maq |
Metadata |
---|
Tokenization: | L-587 |
|
Chiquihuitlán Mazatec (Mexico)
| maq-MX |
Metadata |
---|
Tokenization: | c-588 |
|
|
Masai | mas |
Metadata |
---|
Tokenization: | L-576 | Letter: | áàâéèêíìîóòôúùûÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛƐƗƆɄāēīŋōūĀĒĪŊŌŪɛɨɔʉ | Mark: | ́̀̂̄ |
|
Masai (Kenya)
| mas-KE |
|
Masai (Tanzania, United Republic of)
| mas-TZ |
|
|
Huautla Mazatec | mau |
Metadata |
---|
Tokenization: | L-589 |
|
Huautla Mazatec (Mexico)
| mau-MX |
Metadata |
---|
Tokenization: | c-590 |
|
|
Central Mazahua | maz |
Metadata |
---|
Letter: | ñÑ | Mark: | ̸̱̃ |
|
|
Sharanahua | mcd |
Metadata |
---|
Punctuation: | ¿ | Letter: | úíóáÚÍÓÁ | Mark: | ́ |
|
|
Matsés | mcf |
|
|
Mende (Sierra Leone) | men |
Metadata |
---|
Punctuation: | –‐ | Letter: | ƆƐŋŊɔɛ |
|
|
Meru | mer |
Metadata |
---|
Letter: | ĩũĨŨ | Mark: | ̃ |
|
Meru (Kenya)
| mer-KE |
|
|
Morisyen | mfe |
Metadata |
---|
Tokenization: | L-401 |
|
Morisyen (Mauritius)
| mfe-MU |
Metadata |
---|
Tokenization: | c-511 |
|
|
Malagasy | mg |
Metadata |
---|
Tokenization: | L-204 |
|
Malagasy (Madagascar)
| mg-MG |
Metadata |
---|
Tokenization: | c-506 |
|
|
Makhuwa-Meetto | mgh |
|
Makhuwa-Meetto (Mozambique)
| mgh-MZ |
|
|
Meta' | mgo |
Metadata |
---|
Punctuation: | ‘’“” | Letter: | ʼàèìòùÀÈÌÒÙƏƆŋŊəɔ | Mark: | ̀ |
|
Meta' (Cameroon)
| mgo-CM |
|
|
Marshallese | mh |
Metadata |
---|
Tokenization: | L-205 |
|
Marshallese (Marshall Islands)
| mh-MH |
Metadata |
---|
Tokenization: | c-510 |
|
|
Eastern Mari | mhr |
|
Eastern Mari (Russia)
| mhr-RU |
|
|
Maori | mi |
Metadata |
---|
Tokenization: | L-206 | Punctuation: | ‰ | Letter: | ĀāĒēĪīŌōŪūïÏ | Mark: | ̄̈ |
|
Maori (New Zealand)
| mi-NZ |
Metadata |
---|
Tokenization: | c-207 |
|
|
Mi'kmaq, Micmac | mic |
|
|
Mandaic | mid |
Metadata |
---|
Punctuation: | ࡞ | Letter: | ࡀࡁࡂࡃࡄࡅࡆࡇࡈࡉࡊࡋࡌࡍࡎࡏࡐࡑࡒࡓࡔࡕࡖࡗࡘ | Mark: | ࡙࡚࡛ |
|
|
Minangkabau | min |
Metadata |
---|
Tokenization: | L-606 |
|
Minangkabau {Arab} (Indonesia)
| min-Arab-ID |
|
Minangkabau {Latn} (Indonesia)
| min-Latn-ID |
Metadata |
---|
Tokenization: | c-607 |
|
|
Mískito | miq |
Metadata |
---|
Letter: | áâÁ | Mark: | ́̂ |
|
|
Macedonian | mk |
Metadata |
---|
Tokenization: | L-208 | Punctuation: | ‐–—…‘‚“„ | Letter: | абвгдѓежзѕијклљмнњопрстќуфхцчџшАБВГДЃЕЖЗЅИЈКЛЉМНЊОПРСТЌУФХЦЧЏШ | Mark: | ́ |
|
Macedonian (Macedonia)
| mk-MK |
Metadata |
---|
Tokenization: | c-209 |
|
|
Malayalam | ml |
Metadata |
---|
Tokenization: | L-210 | Punctuation: | ‘’“” | Letter: | അആഇഈഉഊഋൠഌൡഎഏഐഒഓഔകൿഖഗഘങചഛജഝഞടഠഡഢണൺതഥദധനൻപഫബഭമയരർലൽവശഷസഹളൾഴറ | Mark: | ഃംാിീുൂൃെേൈൊോൌൗ് |
|
Malayalam (India)
| ml-IN |
Metadata |
---|
Tokenization: | c-507 |
|
|
Mongolian | mn |
Metadata |
---|
Tokenization: | L-211 |
|
Mongolian {Mong} (China)
| mn-Mong-CN |
|
Mongolian (Mongolia)
| mn-MN |
Metadata |
---|
Tokenization: | c-728 |
|
Mongolian {Cyrl} (Mongolia)
| mn-Cyrl-MN |
Metadata |
---|
Tokenization: | c-566 |
|
Mongolian {Mong} (Mongolia)
| mn-Mong-MN |
|
|
Manipuri | mni |
Metadata |
---|
Tokenization: | L-727 |
|
Manipuri (India)
| mni-IN |
|
Manipuri {Beng} (India)
| mni-Beng-IN |
Metadata |
---|
Tokenization: | c-774 |
|
Manipuri {Mtei} (India)
| mni-Mtei-IN |
Metadata |
---|
Tokenization: | c-775 |
|
|
Mandinka | mnk |
Metadata |
---|
Tokenization: | L-630 |
|
|
Mon | mnw |
Metadata |
---|
Punctuation: | ၊။ | Letter: | လကၚအခရမဟပဍစတသဂဒဇနဘဝဗဓထၜယညဆဏဖဿဥဋဉဌဠ | Mark: | ိ်ောါၞုံွဲ္ဵၟဳြှူၠးဴီျ | Number: | ၁၉၄၈၀၂၃၅၆၇ |
|
|
Moldovan | mo |
Metadata |
---|
Tokenization: | L-404 |
|
Moldovan (Moldova)
| mo-MD |
Metadata |
---|
Tokenization: | c-729 |
|
|
Mohawk | moh |
Metadata |
---|
Tokenization: | L-403 |
|
Mohawk (Canada)
| moh-CA |
|
|
Mossi | mos |
Metadata |
---|
Punctuation: | ’ | Letter: | ãõÃÕƖƱƐĩũœĨŨŒẽẼɩʊɛ | Mark: | ̃ |
|
Mossi (Burkina Faso)
| mos-BF |
|
|
Marathi | mr |
Metadata |
---|
Tokenization: | L-214 | Punctuation: | ‐–—…‘’“”′″ | Letter: | ऱॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसहळऽ | Mark: | ़ंँःािीुूृॅेैॉोौ् | Number: | १२३४५६७८९० |
|
Marathi (India)
| mr-IN |
Metadata |
---|
Tokenization: | c-215 |
|
|
Malay | ms |
Metadata |
---|
Tokenization: | L-216 |
|
Malay (Brunei Darussalam)
| ms-BN |
Metadata |
---|
Tokenization: | c-217 |
|
Malay (Malaysia)
| ms-MY |
Metadata |
---|
Tokenization: | c-218 |
|
Malay (Singapore)
| ms-SG |
Metadata |
---|
Tokenization: | c-708 |
|
|
Maltese | mt |
Metadata |
---|
Tokenization: | L-219 | Punctuation: | ‘’“” | Letter: | àèìòùÀÈÌÒÙċġħżĊĠĦŻ | Mark: | ̀̇ |
|
Maltese (Malta)
| mt-MT |
Metadata |
---|
Tokenization: | c-220 |
|
|
Totontepec Mixe | mto |
Metadata |
---|
Punctuation: | ’ | Letter: | äüëöéÄÜËÖÉ | Mark: | ̈́ |
|
|
Mundang | mua |
Metadata |
---|
Letter: | ãëõÃËÕǝƁƊƎĩŋĨŊṽṼɓɗ | Mark: | ̃̈ |
|
Mundang (Cameroon)
| mua-CM |
|
|
Creek | mus |
|
|
Marwari | mwr |
|
Marwari (India)
| mwr-IN |
|
|
Hmong Daw | mww |
|
|
Mozarabic | mxi |
Metadata |
---|
Punctuation: | ’ | Letter: | àùèòÀÙÈÒ | Mark: | ̀ |
|
|
Jamiltepec Mixtec | mxt |
Metadata |
---|
Tokenization: | L-631 |
|
|
Burmese | my |
Metadata |
---|
Tokenization: | L-50 | Punctuation: | ၏၊။၍၌၎‘’“” | Letter: | ကခဂဃငစဆဇဈဉညဋဌဍဎဏတထဒဓနပဖဗဘမယရလဝသဟဠအဣဤဥဦဧဩဪဿ | Mark: | ာါိီုူေဲံျြွှ့္်း | Number: | ၁၉၄၈၀၂၃၅၆၇ |
|
Burmese (Myanmar)
| my-MM |
Metadata |
---|
Tokenization: | c-458 |
|
Burmese {zwgy} (Myanmar)
| my-zwgy-MM |
Metadata |
---|
Tokenization: | c-751 |
|
|
Ixcatlán Mazatec | mzi |
Metadata |
---|
Tokenization: | L-591 | Punctuation: | ’ | Letter: | áñíóéÁÑÍÓÉ | Mark: | ́̃ |
|
Ixcatlán Mazatec (Mexico)
| mzi-MX |
Metadata |
---|
Tokenization: | c-592 |
|
|
Mazanderani | mzn |
Metadata |
---|
Punctuation: | ،٫٬؛؟‐…‹›«» | Letter: | ءآأؤئابپةتثجچحخدذرزژسشصضطظعغفقکگلمنهویي | Mark: | ًٌٍّٔٓ |
|
Mazanderani (Iran)
| mzn-IR |
|
|
Nauru | na |
Metadata |
---|
Tokenization: | L-221 |
|
Nauru (Nauru)
| na-NR |
Metadata |
---|
Tokenization: | c-514 |
|
|
Nahuatl languages | nah |
|
Nahuatl languages (Mexico)
| nah-MX |
|
|
Neapolitan | nap |
Metadata |
---|
Tokenization: | L-750 |
|
Neapolitan (Italy)
| nap-IT |
Metadata |
---|
Tokenization: | c-810 |
|
|
Khoekhoe, Nama (Namibia) | naq |
Metadata |
---|
Letter: | ǀǁǂǃâîôûÂÎÔÛ | Mark: | ̂ |
|
Khoekhoe, Nama (Namibia) (Namibia)
| naq-NA |
|
|
Norwegian Bokmål | nb |
Metadata |
---|
Tokenization: | L-222 | Punctuation: | «»§– | Letter: | àéóòôæøåÀÉÓÒÔÆØÅ | Mark: | ̀́̂̊ |
|
Norwegian Bokmål (Norway)
| nb-NO |
Metadata |
---|
Tokenization: | c-223 |
|
Norwegian Bokmål (Svalbard and Jan Mayen)
| nb-SJ |
|
|
Nyemba | nba |
|
|
Central Huasteca Nahuatl | nch |
|
|
North Ndebele | nd |
Metadata |
---|
Tokenization: | L-224 |
|
North Ndebele (Zimbabwe)
| nd-ZW |
Metadata |
---|
Tokenization: | c-520 |
|
|
Low German, Low Saxon | nds |
Metadata |
---|
Tokenization: | L-395 | Punctuation: | ’ | Letter: | åäöüÅÄÖÜ | Mark: | ̊̈ |
|
|
Nepali | ne |
Metadata |
---|
Tokenization: | L-225 |
|
Nepali (India)
| ne-IN |
|
Nepali (Nepal)
| ne-NP |
Metadata |
---|
Tokenization: | c-516 |
|
|
Ndonga | ng |
Metadata |
---|
Tokenization: | L-226 |
|
Ndonga (Namibia)
| ng-NA |
Metadata |
---|
Tokenization: | c-515 |
|
|
Guerrero Nahuatl | ngu |
|
|
Nganasan | nio |
Metadata |
---|
Punctuation: | ” | Letter: | нерәзытбуоясикаӈҫүдйхлмпвгөъцьчэщжюНЕРӘЗЫТБУОЯСИКАӇҪҮДЙХЛМПВГӨЪЦЬЧЭЩЖЮ | Mark: | ̆ |
|
|
Niuean | niu |
|
|
Bouna Kulango | nku |
Metadata |
---|
Punctuation: | ’ | Letter: | ƖƆƐƝƲŋŊɩɔɛɲʋ |
|
|
Dutch | nl |
Metadata |
---|
Tokenization: | L-227 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | áäéëíïóöúüÁÄÉËÍÏÓÖÚÜ | Mark: | ́̈ |
|
Dutch (Aruba)
| nl-AW |
|
Dutch (Belgium)
| nl-BE |
Metadata |
---|
Tokenization: | c-228 |
|
Dutch (Bonaire, Sint Eustatius and Saba)
| nl-BQ |
|
Dutch (Curaçao)
| nl-CW |
|
Dutch (Germany)
| nl-DE |
Metadata |
---|
Tokenization: | c-826 |
|
Dutch (Netherlands)
| nl-NL |
Metadata |
---|
Tokenization: | c-229 |
|
Dutch (Suriname)
| nl-SR |
Metadata |
---|
Tokenization: | c-430 |
|
Dutch (Sint Maarten (Dutch part))
| nl-SX |
|
|
Flemish, Vlaams | nld |
|
Flemish, Vlaams (Belgium)
| nld-BE |
|
Flemish, Vlaams (Netherlands)
| nld-NL |
|
|
Kwasio | nmg |
Metadata |
---|
Letter: | áâäéêíîïóôöúûÁÂÄÉÊÍÎÏÓÔÖÚÛǎǝǐǒǔǍƁƎƐǏǑƆǓāěēīńŋōŕūĀĚĒĪŃŊŌŔŪɓɛɔ | Mark: | ́̂̌̄̈ |
|
Kwasio (Cameroon)
| nmg-CM |
|
|
Norwegian Nynorsk | nn |
Metadata |
---|
Tokenization: | L-230 | Punctuation: | ‰ | Letter: | àéóòôæøåÀÉÓÒÔÆØÅ | Mark: | ̀́̂̊ |
|
Norwegian Nynorsk (Norway)
| nn-NO |
Metadata |
---|
Tokenization: | c-231 |
|
|
Ngiemboon | nnh |
Metadata |
---|
Punctuation: | «»‘’ | Letter: | ʼáàâéèêíìóòôúùûÿÁÀÂÉÈÊÍÌÓÒÔÚÙÛǎǒǔǍƐǑƆǓɄěńŋĚŃŊŸḿẅḾẄɛɔʉ | Mark: | ́̀̂̌̈ |
|
Ngiemboon (Cameroon)
| nnh-CM |
|
|
Norwegian | no |
Metadata |
---|
Tokenization: | L-232 |
|
Norwegian (Norway)
| no-NO |
Metadata |
---|
Tokenization: | c-730 |
|
|
Northern Thai | nod |
Metadata |
---|
Punctuation: | ᪨᪩᪪᪫ | Letter: | ᨠᨡᨢᨣᨤᨥᨦᨧᨨᨩᨪᨫᨬᨭᨮᨯᨰᨱᨲᨳᨴᨵᨶᨷᨸᨹᨺᨻᨼᨽᨾᨿᩀᩁᩃᩅᩆᩇᩈᩉᩊᩋᩌᩍᩎᩏᩐᩑᩓᩔᪧ | Mark: | ᩕᩖᩘᩙᩛᩝᩞ᩠ᩡᩢᩣᩤᩥᩦᩧᩨᩩᩪᩫᩬᩮᩯᩰᩱᩲᩳᩴ᩵᩶᩺᩻ | Number: | ᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉ |
|
|
Nomatsiguenga | not |
Metadata |
---|
Letter: | ëíáóñËÍÁÓÑ | Mark: | ̈́̃ |
|
|
Nepali (individual language) | npi |
Metadata |
---|
Punctuation: | । | Letter: | ॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलळवशषसहऽ | Mark: | ़ँंःािीुूृॅेैॉोौ् | Number: | १२३४५६७८९० |
|
|
N'Ko, N’Ko | nqo |
Metadata |
---|
Punctuation: | ߷߸߹﴾﴿،؛؟⸜⸝ | Letter: | ߊߋߌߍߎߏߐߑߒߓߔߕߖߗߘߙߚߛߜߝߞߟߠߡߢߣߤߥߦߧߴߵߺ | Mark: | ߲߽߫߬߭߮߯߰߱߳ | Number: | ߀߁߂߃߄߅߆߇߈߉ |
|
N'Ko, N’Ko (Guinea)
| nqo-GN |
|
|
South Ndebele | nr |
Metadata |
---|
Tokenization: | L-233 |
|
South Ndebele (South Africa)
| nr-ZA |
Metadata |
---|
Tokenization: | c-539 |
|
|
Pedi, Northern Sotho, Sepedi | nso |
Metadata |
---|
Tokenization: | L-410 | Letter: | šŠ | Mark: | ̌ |
|
Pedi, Northern Sotho, Sepedi (Cameroon)
| nso-CM |
Metadata |
---|
Tokenization: | c-765 |
|
Pedi, Northern Sotho, Sepedi (South Africa)
| nso-ZA |
Metadata |
---|
Tokenization: | c-235 |
|
|
Nuer | nus |
Metadata |
---|
Letter: | äëïöÄËÏÖƐƔƆŋŊɛɣɔ | Mark: | ̱̈ |
|
Nuer (South Sudan)
| nus-SS |
|
|
Navajo | nv |
Metadata |
---|
Tokenization: | L-236 | Letter: | ʼéóáíÉÓÁÍǫǪąłįꥣĮĘ | Mark: | ̨́ |
|
|
Chewa | ny |
Metadata |
---|
Tokenization: | L-55 |
|
Chewa (Malawi)
| ny-MW |
Metadata |
---|
Tokenization: | c-464 |
|
Chewa (Zimbabwe)
| ny-ZW |
Metadata |
---|
Tokenization: | c-465 |
|
|
Nyamwezi | nym |
|
|
Nyankole | nyn |
|
Nyankole (Uganda)
| nyn-UG |
|
|
Nzima | nzi |
|
|
Orok | oaa |
Metadata |
---|
Punctuation: | – | Letter: | ƝūŪɲԩԨчипалнесдкробуӡгэӈмхтөвӯзЧИПАЛНЕСДКРОБУӠГЭӇМХТӨВӮЗ | Mark: | ̄ |
|
|
Occitan | oc |
Metadata |
---|
Tokenization: | L-237 | Punctuation: | «»’— | Letter: | óèéçàïòìùúâêîëáôüûÓÈÉÇÀÏÒÌÙÚÂÊÎËÁÔÜÛ | Mark: | ̧́̀̈̂ |
|
Occitan (France)
| oc-FR |
Metadata |
---|
Tokenization: | c-731 |
|
|
Ojibwa | oj |
Metadata |
---|
Tokenization: | L-238 |
|
|
Northwestern Ojibwa | ojb |
Metadata |
---|
Letter: | ᐯᒪᑎᓯᑦᑌᐸᑫᑕᑯᐎᓇᓐᒥᐌᑲᒃᔭᐊᓂᐃᔑᑭᔝᐤᐅᑾᐱᔦᑐᐗᒣᒋᐁᔅᓱᓀᓄᒧᓭᔥᐨᑡᔕᓴᓶᓉᐺᓪᑉᐼᑴᑄᒐᒬᔐᔗᑺᔡᒻᒡᑶ |
|
|
Okiek | oki |
|
|
Oromo | om |
Metadata |
---|
Tokenization: | L-239 |
|
Oromo (Ethiopia)
| om-ET |
Metadata |
---|
Tokenization: | c-523 |
|
Oromo (Kenya)
| om-KE |
|
|
Oriya, Odia | or |
Metadata |
---|
Tokenization: | L-240 |
|
Oriya, Odia (India)
| or-IN |
Metadata |
---|
Tokenization: | c-522 |
|
Oriya, Odia {Latn} (India)
| or-Latn-IN |
Metadata |
---|
Tokenization: | c-811 |
|
|
Odia, Oriya (individual language) | ory |
Metadata |
---|
Letter: | ଅଆଇଈଉଊଋଏଐଓଔକଖଗଘଙଚଛଜଝଞଟଠଡଢଣତଥଦଧନପଫବଭମଯୟରଲଳଵୱଶଷସହ | Mark: | ଼ଁଂଃାିୀୁୂୃେୈୋୌ୍ୖୗ | Number: | ୧୨୩୪୫୬୭୮୯ |
|
|
Ossetian | os |
Metadata |
---|
Tokenization: | L-241 |
|
Ossetian (Georgia)
| os-GE |
|
Ossetian (Russia)
| os-RU |
Metadata |
---|
Tokenization: | c-524 |
|
|
Osage | osa |
Metadata |
---|
Letter: | ВаВѰВҰВӰВВհВְВװВذВٰВڰВ۰ВܰВݰВްВ߰Г0ГpГ°ГðГİГŰГưГǰГȰГɰГʰГ˰Г̰ГͰГΰГϰГаГѰГҰГӰГذГٰГڰГ۰ГܰГݰГްГ߰Г0ГpГ°ГðГİГŰГưГǰГȰГɰГʰГ˰Г̰ГͰГΰГϰГаГѰГҰГӰГГհГְГװГذГٰГڰГۊܢ | Mark: | ̄́̋͘ |
|
|
Ottoman Turkish | ota |
Metadata |
---|
Tokenization: | L-715 |
|
|
Mezquital Otomi | ote |
Metadata |
---|
Letter: | öüäéñúíáèÖÜÄÉÑÚÍÁÈ | Mark: | ̱̈́̃̀ |
|
|
Querétaro Otomi | otq |
|
|
Punjabi | pa |
Metadata |
---|
Tokenization: | L-242 | Punctuation: | ‐–—‘’“”′″। | Letter: | ੴਉਊਓਅਆਐਔਇਈਏਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜ\u{A33}\u{A36}\u{A59}\u{A5A}\u{A5B}\u{A5E} | Mark: | ੱੰ਼੍ਾਿੀੁੂੇੈੋੌਂ | Number: | ੧੨੩੪੫੬੭੮੯ |
|
Punjabi {Arab}
| pa-Arab |
Metadata |
---|
Punctuation: | ‰ | Letter: | ءآؤئابتثجحخدذرزسشصضطظعغفقلمنهويٹپچڈڑژکگںھہیے | Mark: | ُٓٔ |
|
Punjabi (India)
| pa-IN |
Metadata |
---|
Tokenization: | c-243 |
|
Punjabi (Pakistan)
| pa-PK |
Metadata |
---|
Tokenization: | c-412 |
|
Punjabi {Arab} (Pakistan)
| pa-Arab-PK |
Metadata |
---|
Tokenization: | c-647 |
|
|
Pangasinan | pag |
|
Pangasinan (Philippines)
| pag-PH |
|
|
Pampanga, Kapampangan | pam |
|
|
Papiamento | pap |
Metadata |
---|
Tokenization: | L-632 | Punctuation: | ’ | Letter: | ñÑ | Mark: | ̃ |
|
Papiamento (Caribbean)
| pap-CB |
|
|
Palauan | pau |
Metadata |
---|
Tokenization: | L-633 |
|
|
Páez | pbb |
Metadata |
---|
Letter: | üëäïáÜËÄÏÁ | Mark: | ̈́ |
|
|
Northern Pashto | pbu |
Metadata |
---|
Punctuation: | ٫٬٪؉‰ | Letter: | آاأءبپتټثجځچڅحخدډذرړزژږسشښصضطظعغفقکګگلمنڼهةوؤیيېۍئ | Mark: | ًٌٍَُِّْٰٔٓ | Number: | ۱۲۳۴۵۶۷۸۹ |
|
|
Picard | pcd |
Metadata |
---|
Tokenization: | L-634 | Letter: | èåûîéôçÈÅÛÎÉÔÇ | Mark: | ̧̀̊̂́ |
|
|
Nigerian Pidgin | pcm |
Metadata |
---|
Tokenization: | L-408 |
|
Nigerian Pidgin (Nigeria)
| pcm-NG |
Metadata |
---|
Tokenization: | c-517 |
|
|
Iranian Persian | pes |
Metadata |
---|
Punctuation: | ٫٬٪؉،؛؟‰‐…‹›«» | Letter: | آاءأؤئبپتثجچحخدذرزژسشصضطظعغفقکگلمنوهةیإي | Mark: | ًٌٍِّٕٔٓ | Number: | ۱۲۳۴۵۶۷۸۹ |
|
|
Pali | pi |
Metadata |
---|
Tokenization: | L-244 |
|
|
Pijin | pis |
Metadata |
---|
Tokenization: | L-635 |
|
|
Pintupi-Luritja | piu |
|
|
Polish | pl |
Metadata |
---|
Tokenization: | L-245 | Punctuation: | «»§‐–—…”„†‡′″ | Letter: | óÓąćęłńśźżĄĆĘŁŃŚŹŻ | Mark: | ̨́̇ |
|
Polish (Poland)
| pl-PL |
Metadata |
---|
Tokenization: | c-246 |
|
|
Plateau Malagasy | plt |
Metadata |
---|
Letter: | àâéèêëìîïñôÀÂÉÈÊËÌÎÏÑÔ | Mark: | ̀̂́̈̃ |
|
|
Pam | pmn |
|
|
Western Panjabi | pnb |
Metadata |
---|
Punctuation: | ‐–—‘’“”′″ | Letter: | ءآؤئابپتثٹجچحخدذڈرزڑژسشصضطظعغفقکگلمنںهھہویےي | Mark: | ُٓٔ |
|
|
Pohnpeian | pon |
Metadata |
---|
Tokenization: | L-716 |
|
|
Pipil, Nicarao | ppl |
Metadata |
---|
Letter: | áéÁÉ | Mark: | ́ |
|
|
Prussian | prg |
Metadata |
---|
Punctuation: | ‐–—…“„ | Letter: | țȚāēģīķņōŗšūžĀĒĢĪĶŅŌŖŠŪŽḑḐ | Mark: | ̧̦̄̌ |
|
Prussian (International)
| prg-INT |
|
|
Ashéninka Perené | prq |
Metadata |
---|
Punctuation: | ¿ | Letter: | íÍ | Mark: | ́ |
|
|
Dari, Afghan Persian | prs |
Metadata |
---|
Tokenization: | L-361 | Punctuation: | ،‐ | Letter: | اعلمیهجنحقوبشرصدسزآکئثتذضخپگظفغطأچژءي | Mark: | ًٔٓ | Number: | ۱۹۴۸۲۳۵۶۷۰ |
|
Dari, Afghan Persian (Afghanistan)
| prs-AF |
Metadata |
---|
Tokenization: | c-469 |
|
|
Pashto | ps |
Metadata |
---|
Tokenization: | L-247 |
|
Pashto (Afghanistan)
| ps-AF |
Metadata |
---|
Tokenization: | c-248 |
|
|
Portuguese | pt |
Metadata |
---|
Tokenization: | L-249 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | áàâãçéêíóòôõúºÁÀÂÃÇÉÊÍÓÒÔÕÚ | Mark: | ̧́̀̂̃ |
|
Portuguese (Africa)
| pt-002 |
|
Portuguese (Angola)
| pt-AO |
Metadata |
---|
Tokenization: | c-411 |
|
Portuguese (Brazil)
| pt-BR |
Metadata |
---|
Tokenization: | c-250 |
|
Portuguese (Cabo Verde)
| pt-CV |
|
Portuguese (Guinea-Bissau)
| pt-GW |
|
Portuguese (Macau)
| pt-MO |
|
Portuguese (Mozambique)
| pt-MZ |
Metadata |
---|
Tokenization: | c-732 |
|
Portuguese (Portugal)
| pt-PT |
Metadata |
---|
Tokenization: | c-251 |
|
Portuguese (Sao Tome and Principe)
| pt-ST |
|
Portuguese (Timor-Leste)
| pt-TL |
|
|
Hinglish | qhi |
|
Hinglish (India)
| qhi-IN |
|
|
Simple Hindi | qsh |
|
Simple Hindi {Deva} (India)
| qsh-Deva-IN |
|
|
Taiwanese Hokkien | qtg |
|
Taiwanese Hokkien {Hant} (Taiwan)
| qtg-Hant-TW |
|
|
Thoda English | qth |
|
Thoda English {Deva} (India)
| qth-Deva-IN |
|
|
Quechua | qu |
Metadata |
---|
Tokenization: | L-252 |
|
Quechua (Bolivia)
| qu-BO |
Metadata |
---|
Tokenization: | c-253 |
|
Quechua (Ecuador)
| qu-EC |
Metadata |
---|
Tokenization: | c-254 |
|
Quechua (Peru)
| qu-PE |
Metadata |
---|
Tokenization: | c-255 |
|
|
K'iche', Quiché | quc |
Metadata |
---|
Tokenization: | L-388 |
|
K'iche', Quiché (Guatemala)
| quc-GT |
Metadata |
---|
Tokenization: | c-496 |
|
K'iche', Quiché {Latn} (Guatemala)
| quc-Latn-GT |
|
K'iche', Quiché (Peru)
| quc-PE |
Metadata |
---|
Tokenization: | c-525 |
|
|
Ayacucho Quechua | quy |
|
Ayacucho Quechua (Peru)
| quy-PE |
|
|
Cusco Quechua | quz |
|
Cusco Quechua (Bolivia)
| quz-BO |
|
Cusco Quechua (Ecuador)
| quz-EC |
|
Cusco Quechua (Peru)
| quz-PE |
|
|
Puno Quechua | qxp |
Metadata |
---|
Punctuation: | ‰ | Letter: | Ññʼ | Mark: | ̃ |
|
|
Quenya | qya |
Metadata |
---|
Tokenization: | L-413 |
|
|
Rarotongan, Cook Islands Maori | rar |
|
|
Rohingya | rhg |
Metadata |
---|
Tokenization: | L-754 |
|
Rohingya {Rohg} (Myanmar)
| rhg-Rohg-MM |
Metadata |
---|
Tokenization: | c-825 |
|
|
Rakhine | rki |
Metadata |
---|
Tokenization: | L-572 |
|
Rakhine (Myanmar)
| rki-MM |
Metadata |
---|
Tokenization: | c-573 |
|
|
Romansh | rm |
Metadata |
---|
Tokenization: | L-256 | Letter: | àüöéèìòùÀÜÖÉÈÌÒÙ | Mark: | ̀̈́ |
|
Romansh (Switzerland)
| rm-CH |
Metadata |
---|
Tokenization: | c-526 |
|
|
Balkan Romani | rmn |
Metadata |
---|
Letter: | àõùèìòâÀÕÙÈÌÒÂƟśěćŕăąňűźőģůščžŚĚĆŔĂĄŇŰŹŐĢŮŠČŽɵ | Mark: | ̨̧̀́̌̃̆̋̂̊ |
|
|
Rundi | rn |
Metadata |
---|
Tokenization: | L-257 |
|
Rundi (Burundi)
| rn-BI |
Metadata |
---|
Tokenization: | c-498 |
|
|
Romanian | ro |
Metadata |
---|
Tokenization: | L-258 | Punctuation: | «»‐–—…‘“”„ | Letter: | âîÂÎșțȘȚăĂ | Mark: | ̦̆̂ |
|
Romanian (Moldova)
| ro-MD |
Metadata |
---|
Tokenization: | c-260 |
|
Romanian (Romania)
| ro-RO |
Metadata |
---|
Tokenization: | c-259 |
|
|
Rombo | rof |
|
Rombo (Tanzania, United Republic of)
| rof-TZ |
|
|
Romany | rom |
Metadata |
---|
Tokenization: | L-414 |
|
|
Russian | ru |
Metadata |
---|
Tokenization: | L-261 | Punctuation: | ‐–—…‘‚“„«»§ | Letter: | всеобщаядклрципчнтзгшюйьмуыхъжэфёВСЕОБЩАЯДКЛРЦИПЧНТЗГШЮЙЬМУЫХЪЖЭФЁ | Mark: | ̆̈ |
|
Russian (Belarus)
| ru-BY |
|
Russian (Estonia)
| ru-EE |
Metadata |
---|
Tokenization: | c-655 |
|
Russian (Israel)
| ru-IL |
Metadata |
---|
Tokenization: | c-656 |
|
Russian (Kyrgyzstan)
| ru-KG |
|
Russian (Kazakhstan)
| ru-KZ |
|
Russian (Latvia)
| ru-LV |
Metadata |
---|
Tokenization: | c-657 |
|
Russian (Moldova)
| ru-MD |
Metadata |
---|
Tokenization: | c-263 |
|
Russian (Russia)
| ru-RU |
Metadata |
---|
Tokenization: | c-262 |
|
Russian {Latn} (Russia)
| ru-Latn-RU |
Metadata |
---|
Tokenization: | c-817 |
|
Russian (Ukraine)
| ru-UA |
Metadata |
---|
Tokenization: | c-415 |
|
Russian (Uzbekistan)
| ru-UZ |
Metadata |
---|
Tokenization: | c-824 |
|
|
Rusyn | rue |
Metadata |
---|
Tokenization: | L-416 |
|
Rusyn (Ukraine)
| rue-UA |
Metadata |
---|
Tokenization: | c-527 |
|
|
Macedo-Romanian, Aromanian, Arumanian | rup |
Metadata |
---|
Letter: | ãâà| Mark: | ̃̂ |
|
|
Kinyarwanda | rw |
Metadata |
---|
Tokenization: | L-264 |
|
Kinyarwanda (Rwanda)
| rw-RW |
Metadata |
---|
Tokenization: | c-497 |
|
|
Rwa | rwk |
|
Rwa (Tanzania, United Republic of)
| rwk-TZ |
|
|
Sanskrit | sa |
Metadata |
---|
Tokenization: | L-265 | Punctuation: | । | Letter: | मनवधकरणजगतअभघषयपचशसएछबदटडहइआञउठथलढऽ | Mark: | ािंो्ूेुौैीृॄ़ |
|
Sanskrit (India)
| sa-IN |
Metadata |
---|
Tokenization: | c-266 |
|
|
Yakut | sah |
Metadata |
---|
Letter: | абгҕдьийклмнҥоөпрстуүхһчыэецязювщъжфАБГҔДЬИЙКЛМНҤОӨПРСТУҮХҺЧЫЭЕЦЯЗЮВЩЪЖФ | Mark: | ̆ |
|
Yakut (Russia)
| sah-RU |
|
|
Samburu | saq |
|
Samburu (Kenya)
| saq-KE |
|
|
Santali | sat |
Metadata |
---|
Tokenization: | L-417 |
|
Santali (India)
| sat-IN |
Metadata |
---|
Tokenization: | c-529 |
|
|
Sangu (Tanzania) | sbp |
|
Sangu (Tanzania) (Tanzania, United Republic of)
| sbp-TZ |
|
|
Sardinian | sc |
Metadata |
---|
Tokenization: | L-267 |
|
Sardinian (Italy)
| sc-IT |
Metadata |
---|
Tokenization: | c-530 |
|
|
Sicilian | scn |
|
Sicilian (Italy)
| scn-IT |
|
|
Scots | sco |
Metadata |
---|
Tokenization: | L-418 |
|
Scots (United Kingdom)
| sco-GB |
Metadata |
---|
Tokenization: | c-531 |
|
|
Sindhi | sd |
Metadata |
---|
Tokenization: | L-268 | Punctuation: | ‰ | Letter: | آابٻپڀتثٺٽٿجھڃڄچڇحخدذڊڌڍڏرزڙسشصضطظعغفڦقکڪگڱڳلمنڻهوي | Mark: | ٓ |
|
Sindhi {Deva} (India)
| sd-Deva-IN |
Metadata |
---|
Tokenization: | c-772 |
|
Sindhi (Pakistan)
| sd-PK |
Metadata |
---|
Tokenization: | c-733 |
|
Sindhi {Arab} (Pakistan)
| sd-Arab-PK |
Metadata |
---|
Tokenization: | c-535 |
|
Sindhi {Deva} (Pakistan)
| sd-Deva-PK |
Metadata |
---|
Tokenization: | c-605 |
|
|
Southern Kurdish | sdh |
Metadata |
---|
Tokenization: | L-499 |
|
Southern Kurdish {Arab} (Iran)
| sdh-Arab-IR |
Metadata |
---|
Tokenization: | c-500 |
|
|
Northern Sami | se |
Metadata |
---|
Tokenization: | L-269 | Letter: | áÁčđŋšŧžČĐŊŠŦŽ | Mark: | ́̌ |
|
Northern Sami (Finland)
| se-FI |
Metadata |
---|
Tokenization: | c-270 |
|
Northern Sami (Norway)
| se-NO |
Metadata |
---|
Tokenization: | c-271 |
|
Northern Sami (Sweden)
| se-SE |
Metadata |
---|
Tokenization: | c-272 |
|
|
Sena | seh |
Metadata |
---|
Letter: | áàâãçéêíóòôõúÁÀÂÃÇÉÊÍÓÒÔÕÚ | Mark: | ̧́̀̂̃ |
|
Sena (Mozambique)
| seh-MZ |
|
|
Koyraboro Senni Songhai | ses |
Metadata |
---|
Letter: | ãõÃÕƝŋšžŊŠŽẽẼɲ | Mark: | ̃̌ |
|
Koyraboro Senni Songhai (Mali)
| ses-ML |
|
|
Secoya | sey |
Metadata |
---|
Letter: | ëñàéËÑÀÉ | Mark: | ̱̈̃̀́ |
|
|
Sango | sg |
Metadata |
---|
Tokenization: | L-273 | Letter: | âäêëîïôöùûüÂÄÊËÎÏÔÖÙÛÜ | Mark: | ̂̈̀ |
|
Sango (Central African Republic)
| sg-CF |
Metadata |
---|
Tokenization: | c-528 |
|
|
Tachelhit | shi |
Metadata |
---|
Letter: | ⴰⴱⴳⵯⴷⴹⴻⴼⴽⵀⵃⵄⵅⵇⵉⵊⵍⵎⵏⵓⵔⵕⵖⵙⵚⵛⵜⵟⵡⵢⵣⵥ |
|
Tachelhit {Latn}
| shi-Latn |
Metadata |
---|
Letter: | ḍḥṛṣṭḌḤṚṢṬƐƔɛɣʷ | Mark: | ̣ |
|
Tachelhit {Latn} (Morocco)
| shi-Latn-MA |
|
Tachelhit {Tfng} (Morocco)
| shi-Tfng-MA |
|
|
Shilluk | shk |
Metadata |
---|
Letter: | ÀÁÄÈÉËÌÍÏÓÖØÙÚàáäèéëìíïóöøùú | Mark: | ́̈̀ |
|
|
Shan | shn |
Metadata |
---|
Tokenization: | L-743 | Punctuation: | ။၊ | Letter: | လၵပၼၽဝငသဢတမၸၾႁယၶၺထရ | Mark: | ိ်ႈုၢႇွႆူးဵီႊႅႃႉေႂႄြ |
|
Shan {Mymr} (Myanmar)
| shn-Mymr-MM |
Metadata |
---|
Tokenization: | c-795 |
|
|
Shipibo-Conibo | shp |
Metadata |
---|
Punctuation: | ¿ | Letter: | íáóéñúÍÁÓÉÑÚ | Mark: | ́̃ |
|
|
Sinhala | si |
Metadata |
---|
Tokenization: | L-274 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | අආඇඈඉඊඋඌඍඑඒඓඔඕඖකඛගඝඞඟචඡජඣඥටඨඩඪණඬතථදධනඳපඵබභමඹයරලවශෂසහළෆ | Mark: | ංඃ්ාැෑිීුූෘෙේෛොෝෞෟ |
|
Sinhala (Sri Lanka)
| si-LK |
Metadata |
---|
Tokenization: | c-536 |
|
|
Sidama | sid |
|
Sidama {Latn} (Ethiopia)
| sid-Latn-ET |
|
|
Epena | sja |
|
|
Sindarin | sjn |
Metadata |
---|
Tokenization: | L-717 |
|
|
Slovak | sk |
Metadata |
---|
Tokenization: | L-275 | Punctuation: | ‐–…‘‚“„§ | Letter: | čďĺľňŕšťžűČĎĹĽŇŔŠŤŽŰáäéíóôúýÁÄÉÍÓÔÚÝ | Mark: | ́̈̌̂̋ |
|
Slovak (Slovakia)
| sk-SK |
Metadata |
---|
Tokenization: | c-276 |
|
|
Saraiki, Seraiki | skr |
Metadata |
---|
Punctuation: | ۔، | Letter: | انسیحقودعلمشرپہڱھےکڄئتڻزںگڈفظجچبڑصڋخڔٹطآذضغةثٻي | Mark: | ُٔٓ | Number: | ۱۲۳۴۵۶۷۸۹۰ |
|
|
Slovenian | sl |
Metadata |
---|
Tokenization: | L-277 | Letter: | 蚞ȊŽ | Mark: | ̌ |
|
Slovenian (Slovenia)
| sl-SI |
Metadata |
---|
Tokenization: | c-278 |
|
|
Samoan | sm |
Metadata |
---|
Tokenization: | L-279 |
|
Samoan (Samoa)
| sm-WS |
Metadata |
---|
Tokenization: | c-735 |
|
|
Southern Sami | sma |
Metadata |
---|
Tokenization: | L-637 |
|
Southern Sami (Norway)
| sma-NO |
|
Southern Sami (Sweden)
| sma-SE |
|
|
Lule Sami | smj |
Metadata |
---|
Tokenization: | L-638 |
|
Lule Sami (Norway)
| smj-NO |
|
Lule Sami (Sweden)
| smj-SE |
|
|
Inari Sami | smn |
Metadata |
---|
Tokenization: | L-639 | Letter: | âäáÂÄÁčđŋšžČĐŊŠŽ | Mark: | ̂̌̈́ |
|
Inari Sami (Finland)
| smn-FI |
|
|
Skolt Sami | sms |
Metadata |
---|
Tokenization: | L-640 |
|
Skolt Sami (Finland)
| sms-FI |
|
|
Shona | sn |
Metadata |
---|
Tokenization: | L-280 |
|
Shona (Zimbabwe)
| sn-ZW |
Metadata |
---|
Tokenization: | c-534 |
|
Shona {Latn} (Zimbabwe)
| sn-Latn-ZW |
|
|
Soninke | snk |
Metadata |
---|
Tokenization: | L-641 | Letter: | ñÑŋŊ | Mark: | ̃ |
|
Soninke {Latn} (Mali)
| snk-Latn-ML |
|
|
Siona | snn |
Metadata |
---|
Letter: | ëñíäéËÑÍÄÉ | Mark: | ̱̈̃́ |
|
|
Somali | so |
Metadata |
---|
Tokenization: | L-281 |
|
Somali (Djibouti)
| so-DJ |
Metadata |
---|
Tokenization: | c-752 |
|
Somali (Ethiopia)
| so-ET |
Metadata |
---|
Tokenization: | c-753 |
|
Somali (Kenya)
| so-KE |
Metadata |
---|
Tokenization: | c-754 |
|
Somali (Somalia)
| so-SO |
Metadata |
---|
Tokenization: | c-537 |
|
|
Songhai languages | son |
Metadata |
---|
Tokenization: | L-420 |
|
|
Sabaot | spy |
|
|
Albanian | sq |
|
Albanian (Albania)
| sq-AL |
|
Albanian (Macedonia)
| sq-MK |
|
|
Serbian | sr |
Metadata |
---|
Tokenization: | L-282 | Punctuation: | ‐–…‘‚“„ | Letter: | абвгдђежзијклљмнњопрстћуфхцчџшАБВГДЂЕЖЗИЈКЛЉМНЊОПРСТЋУФХЦЧЏШ |
|
Serbian {Cyrl}
| sr-Cyrl |
Metadata |
---|
Tokenization: | L-287 |
|
Serbian {Latn}
| sr-Latn |
Metadata |
---|
Tokenization: | L-283 | Punctuation: | ‐–…‘‚“„ | Letter: | čćžđšČĆŽĐŠ | Mark: | ̌́ |
|
Serbian {Cyrl} (Bosnia and Herzegovina)
| sr-Cyrl-BA |
Metadata |
---|
Tokenization: | c-285 |
|
Serbian {Latn} (Bosnia and Herzegovina)
| sr-Latn-BA |
Metadata |
---|
Tokenization: | c-284 |
|
Serbian {Cyrl} (Montenegro)
| sr-Cyrl-ME |
Metadata |
---|
Tokenization: | c-289 |
|
Serbian {Latn} (Montenegro)
| sr-Latn-ME |
Metadata |
---|
Tokenization: | c-288 |
|
Serbian {Cyrl} (Serbia)
| sr-Cyrl-RS |
Metadata |
---|
Tokenization: | c-290 |
|
Serbian {Latn} (Serbia)
| sr-Latn-RS |
Metadata |
---|
Tokenization: | c-286 |
|
|
Logudorese Sardinian | src |
Metadata |
---|
Letter: | òìàèùÒÌÀÈÙ | Mark: | ̀ |
|
|
Serer | srr |
Metadata |
---|
Letter: | ñÑƭƴƊƁƬƳŋćŊĆṕṔɗɓ | Mark: | ̃́ |
|
|
Swati | ss |
Metadata |
---|
Tokenization: | L-291 |
|
Swati (Swaziland)
| ss-SZ |
|
Swati (South Africa)
| ss-ZA |
|
|
Saho | ssy |
|
Saho (Eritrea)
| ssy-ER |
|
|
Southern Sotho | st |
Metadata |
---|
Tokenization: | L-292 |
|
Southern Sotho (Lesotho)
| st-LS |
|
Southern Sotho (South Africa)
| st-ZA |
Metadata |
---|
Tokenization: | c-540 |
|
|
Siberian Tatar | sty |
Metadata |
---|
Tokenization: | L-602 |
|
Siberian Tatar (Russia)
| sty-RU |
Metadata |
---|
Tokenization: | c-549 |
|
|
Sundanese | su |
Metadata |
---|
Tokenization: | L-293 |
|
Sundanese {Sund}
| su-Sund |
Metadata |
---|
Letter: | ᮊᮋᮌᮍᮎᮏᮐᮑᮒᮓᮔᮕᮖᮗᮘᮙᮚᮛᮜᮝᮞᮟᮠᮮᮯᮃᮄᮅᮆᮇᮈᮉ | Mark: | ᮡᮢᮣᮀᮁᮂᮤᮥᮦᮧᮨᮩ᮪ | Number: | ᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹ |
|
Sundanese {Latn} (Indonesia)
| su-Latn-ID |
Metadata |
---|
Tokenization: | c-541 |
|
Sundanese {Sund} (Indonesia)
| su-Sund-ID |
Metadata |
---|
Tokenization: | c-542 |
|
|
Sukuma | suk |
|
|
Susu | sus |
|
|
Swedish | sv |
Metadata |
---|
Tokenization: | L-294 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | àéåäöÀÉÅÄÖ | Mark: | ̀́̊̈ |
|
Swedish (Åland Islands)
| sv-AX |
|
Swedish (Finland)
| sv-FI |
Metadata |
---|
Tokenization: | c-295 |
|
Swedish (Sweden)
| sv-SE |
Metadata |
---|
Tokenization: | c-296 |
|
|
Swahili | sw |
Metadata |
---|
Tokenization: | L-297 |
|
Swahili (Democratic Republic of the Congo)
| sw-CD |
|
Swahili (Democratic Republic of the Congo; Kinshasa)
| sw-CD-KN |
|
Swahili (Kenya)
| sw-KE |
Metadata |
---|
Tokenization: | c-298 |
|
Swahili (Somalia)
| sw-SO |
Metadata |
---|
Tokenization: | c-736 |
|
Swahili (Tanzania, United Republic of)
| sw-TZ |
Metadata |
---|
Tokenization: | c-737 |
|
Swahili (Uganda)
| sw-UG |
Metadata |
---|
Tokenization: | c-738 |
|
|
Maore Comorian | swb |
Metadata |
---|
Letter: | ãÃƁƊĩĨẽẼɓɗ | Mark: | ̃ |
|
|
Swahili (individual language), Kiswahili | swh |
|
|
Sutu | sx |
Metadata |
---|
Tokenization: | L-642 |
|
|
Classical Syriac | syc |
Metadata |
---|
Punctuation: | ،؛.؟܀܁܂܃܄܅܆܇܈܉܊܋܌܍ | Letter: | ܐܝܘܦܒܬܛܕܟܓܩܥܣܤܨܫܙܚܗܡܢܪܠـ | Mark: | ّܼܸܹܻܾܷܱܴ݂̥̣݄̤݈̱̭̮ܿܲܵܺܽܶܰܳ݁̊݀̇݃̈݇̄݉݊ |
|
|
Syriac | syr |
Metadata |
---|
Tokenization: | L-299 |
|
Syriac (Syria)
| syr-SY |
Metadata |
---|
Tokenization: | c-300 |
|
Syriac (Turkey)
| syr-TR |
Metadata |
---|
Tokenization: | c-739 |
|
|
Silesian | szl |
Metadata |
---|
Tokenization: | L-611 |
|
Silesian (Poland)
| szl-PL |
Metadata |
---|
Tokenization: | c-761 |
|
|
Tamil | ta |
Metadata |
---|
Tokenization: | L-301 | Punctuation: | “”‘’ | Letter: | ஃஅஆஇஈஉஊஎஏஐஒஓஔகஙசஜஞடணதநனபமயரறலளழவஶஷஸஹ | Mark: | ாிீுூெேைொோௌ்ௗ |
|
Tamil (India)
| ta-IN |
Metadata |
---|
Tokenization: | c-302 |
|
Tamil {Latn} (India)
| ta-Latn-IN |
Metadata |
---|
Tokenization: | c-814 |
|
Tamil (Sri Lanka)
| ta-LK |
Metadata |
---|
Tokenization: | c-581 |
|
Tamil (Malaysia)
| ta-MY |
|
Tamil (Singapore)
| ta-SG |
|
|
Tamasheq | taq |
Metadata |
---|
Tokenization: | L-749 |
|
Tamasheq {Latn} (Mali)
| taq-Latn-ML |
Metadata |
---|
Tokenization: | c-805 |
|
Tamasheq {Tfng} (Mali)
| taq-Tfng-ML |
Metadata |
---|
Tokenization: | c-804 |
|
|
Atayal | tay |
Metadata |
---|
Tokenization: | L-353 |
|
Atayal (Taiwan)
| tay-TW |
Metadata |
---|
Tokenization: | c-447 |
|
|
Tagbanwa | tbw |
Metadata |
---|
Punctuation: | ᜵᜶ | Letter: | ᝩᝦᝣᝪᝧᝤᝰᝫᝨᝥᝯᝮᝬᝠᝡᝢ | Mark: | ᝲᝳ |
|
Tagbanwa {Tagb}
| tbw-Tagb |
Metadata |
---|
Punctuation: | ᜵᜶ | Letter: | ᝩᝦᝣᝪᝧᝤᝰᝫᝨᝥᝯᝮᝬᝠᝡᝢ | Mark: | ᝲᝳ |
|
|
Ditammari | tbz |
Metadata |
---|
Letter: | úàóãìùÚÀÓÃÌÙƉƐƆũŋĩŨŊĨɖɛɔ | Mark: | ̃́̀ |
|
|
Ticuna | tca |
Metadata |
---|
Letter: | üéãñõúáíóÜÉÃÑÕÚÁÍÓĩũĨŨẽṯḏṉẼṮḎṈ | Mark: | ̱̃́̈͟ |
|
|
Tai Nüa | tdd |
Metadata |
---|
Letter: | ᥐᥑᥒᥓᥔᥕᥖᥗᥘᥙᥚᥛᥜᥝᥞᥟᥠᥡᥢᥣᥤᥥᥦᥧᥨᥩᥪᥫᥬᥭᥰᥱᥲᥳᥴ |
|
|
Tetun Dili | tdt |
Metadata |
---|
Punctuation: | ’ | Letter: | áíúóÁÍÚÓ | Mark: | ́ |
|
|
Telugu | te |
Metadata |
---|
Tokenization: | L-303 | Punctuation: | ‘’“” | Letter: | అఆఇఈఉఊఋఎఏఐఒఓఔకఖగఘఙచఛజఝఞటఠడఢణతథదధనపఫబభమయరఱలళవశషసహ | Mark: | ంఃాిీుూృెేైొోౌ్ౖ |
|
Telugu (India)
| te-IN |
Metadata |
---|
Tokenization: | c-304 |
|
Telugu {Latn} (India)
| te-Latn-IN |
Metadata |
---|
Tokenization: | c-813 |
|
|
Timne | tem |
Metadata |
---|
Punctuation: | ‐ | Letter: | ɅƆƏƐŋŊʌɔəɛ |
|
|
Teso | teo |
Metadata |
---|
Tokenization: | L-579 |
|
Teso (Kenya)
| teo-KE |
|
Teso (Uganda)
| teo-UG |
Metadata |
---|
Tokenization: | c-580 |
|
|
Tetum | tet |
Metadata |
---|
Tokenization: | L-643 |
|
Tetum (Indonesia)
| tet-ID |
Metadata |
---|
Tokenization: | c-740 |
|
Tetum (Timor-Leste)
| tet-TL |
Metadata |
---|
Tokenization: | c-741 |
|
|
Tajik | tg |
Metadata |
---|
Tokenization: | L-305 | Punctuation: | ‰ | Letter: | эъломияуҳқбашрпегфтднзкхсвӣёҷчғюӯйжьЭЪЛОМИЯУҲҚБАШРПЕГФТДНЗКХСВӢЁҶЧҒЮӮЙЖЬ | Mark: | ̄̈̆ |
|
Tajik (Tajikistan)
| tg-TJ |
Metadata |
---|
Tokenization: | c-544 |
|
Tajik {Cyrl} (Tajikistan)
| tg-Cyrl-TJ |
|
|
Thai | th |
Metadata |
---|
Tokenization: | L-306 | Punctuation: | ‐–—‘’“”…′″๏๚๛ | Letter: | ฯๆกขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮะาๅำเแโใไ | Mark: | ์็่้๊๋ัิีึืุู | Number: | ๑๒๓๔๕๖๗๘๙ |
|
Thai (Thailand)
| th-TH |
Metadata |
---|
Tokenization: | c-307 |
|
|
Tigrinya | ti |
Metadata |
---|
Tokenization: | L-308 | Punctuation: | ፣፡’ | Letter: | ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖሗመሙሚማሜምሞሟሠሡሢሣሤሥሦሧረሩሪራሬርሮሯሰሱሲሳሴስሶሷሸሹሺሻሼሽሾሿቀቁቂቃቄቅቆቈቊቋቌቍቐቑቒቓቔቕቖቘቚቛቜቝበቡቢባቤብቦቧቨቩቪቫቬቭቮቯተቱቲታቴትቶቷቸቹቺቻቼችቾቿኀኁኂኃኄኅኆኈኊኋኌኍነኑኒናኔንኖኗኘኙኚኛኜኝኞኟአኡኢኣኤእኦኧከኩኪካኬክኮኰኲኳኴኵኸኹኺኻኼኽኾዀዂዃዄዅወዉዊዋዌውዎዐዑዒዓዔዕዖዘዙዚዛዜዝዞዟዠዡዢዣዤዥዦዧየዩዪያዬይዮደዱዲዳዴድዶዷጀጁጂጃጄጅጆጇገጉጊጋጌግጎጐጒጓጔጕጠጡጢጣጤጥጦጧጨጩጪጫጬጭጮጯጰጱጲጳጴጵጶጷጸጹጺጻጼጽጾጿፀፁፂፃፄፅፆፇፈፉፊፋፌፍፎፏፐፑፒፓፔፕፖፗ | Mark: | ፟ |
|
Tigrinya (Eritrea)
| ti-ER |
Metadata |
---|
Tokenization: | c-827 |
|
Tigrinya (Ethiopia)
| ti-ET |
Metadata |
---|
Tokenization: | c-551 |
|
|
Tigre | tig |
|
Tigre (Eritrea)
| tig-ER |
|
|
Tiv | tiv |
|
|
Turkmen | tk |
Metadata |
---|
Tokenization: | L-309 | Punctuation: | §–—…“”‐‰ | Letter: | çäöüýÇÄÖÜÝžňşŽŇŞ | Mark: | ̧̈̌́ |
|
Turkmen {Cyrl}
| tk-Cyrl |
Metadata |
---|
Punctuation: | ‐– | Letter: | адмхуклрынңәиецясгшбптчвзэоҗйөүъюжфёАДМХУКЛРЫНҢӘИЕЦЯСГШБПТЧВЗЭОҖЙӨҮЪЮЖФЁ | Mark: | ̆̈ |
|
Turkmen (Turkmenistan)
| tk-TM |
Metadata |
---|
Tokenization: | c-554 |
|
|
Tagalog | tl |
Metadata |
---|
Tokenization: | L-310 |
|
Tagalog (Philippines)
| tl-PH |
Metadata |
---|
Tokenization: | c-311 |
|
|
Klingon | tlh |
Metadata |
---|
Tokenization: | L-390 |
|
|
Talysh | tly |
Metadata |
---|
Letter: | çÇƏığşİĞŞə | Mark: | ̧̇̆ |
|
|
Tswana | tn |
Metadata |
---|
Tokenization: | L-312 | Punctuation: | ·‐ | Letter: | šŠ | Mark: | ̌ |
|
Tswana (Botswana)
| tn-BW |
Metadata |
---|
Tokenization: | c-742 |
|
Tswana (South Africa)
| tn-ZA |
Metadata |
---|
Tokenization: | c-313 |
|
|
Tonga | to |
Metadata |
---|
Tokenization: | L-314 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | āēīōūĀĒĪŌŪáéíóúÁÉÍÓÚʻ | Mark: | ́̄ |
|
Tonga (Tonga)
| to-TO |
Metadata |
---|
Tokenization: | c-743 |
|
|
Toba | tob |
Metadata |
---|
Tokenization: | L-756 | Letter: | ỹỸíÍ | Mark: | ̃́ |
|
Toba (Argentina)
| tob-AR |
Metadata |
---|
Tokenization: | c-831 |
|
|
Tonga (Zambia) | toi |
|
|
Tojolabal | toj |
|
|
Papantla Totonac | top |
|
|
Tok Pisin | tpi |
Metadata |
---|
Tokenization: | L-644 |
|
Tok Pisin (Papua New Guinea)
| tpi-PG |
|
|
Turkish | tr |
Metadata |
---|
Tokenization: | L-315 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | çöüâÇÖÜÂğışĞŞİ | Mark: | ̧̇̆̈̂ |
|
Turkish (Cyprus)
| tr-CY |
Metadata |
---|
Tokenization: | c-426 |
|
Turkish (Turkey)
| tr-TR |
Metadata |
---|
Tokenization: | c-316 |
|
|
Turoyo, Surayt | tru |
Metadata |
---|
Punctuation: | ،؛؟܆܇ | Letter: | ܐܝܘܦܒܬܛܕܟܓܩܫܔܣܨܙܚܥܗܡܢܪܠ | Mark: | ܷܱ̰݂݆ܶܳܰ݁݅ |
|
|
Tsonga | ts |
Metadata |
---|
Tokenization: | L-317 | Punctuation: | ’ | Letter: | ìàçãòèùíéáúÌÀÇÃÒÈÙÍÉÁÚ | Mark: | ̧̀̃́ |
|
Tsonga (South Africa)
| ts-ZA |
Metadata |
---|
Tokenization: | c-552 |
|
Tsonga (Zimbabwe)
| ts-ZW |
Metadata |
---|
Tokenization: | c-553 |
|
|
Tausug | tsg |
Metadata |
---|
Tokenization: | L-736 |
|
Tausug {Arab} (Philippines)
| tsg-Arab-PH |
Metadata |
---|
Tokenization: | c-787 |
|
Tausug {Latn} (Philippines)
| tsg-Latn-PH |
Metadata |
---|
Tokenization: | c-788 |
|
|
Purepecha | tsz |
Metadata |
---|
Letter: | áïéíÁÏÉÍⱭƲŋŊɑʋ | Mark: | ́̈ |
|
|
Tatar | tt |
Metadata |
---|
Tokenization: | L-318 | Letter: | кешхоуларынңгмидцясбәтьвзпөъһҗчүйфюэжКЕШХОУЛАРЫНҢГМИДЦЯСБӘТЬВЗПӨЪҺҖЧҮЙФЮЭЖёщЁЩ | Mark: | ̈̆ |
|
Tatar (Russia)
| tt-RU |
Metadata |
---|
Tokenization: | c-744 |
|
Tatar {Cyrl} (Russia)
| tt-Cyrl-RU |
Metadata |
---|
Tokenization: | c-319 |
|
|
Tumbuka | tum |
|
Tumbuka {Latn}
| tum-Latn |
|
Tumbuka {Mwng}
| tum-Mwng |
|
|
Tuvalu | tvl |
|
|
Twi | tw |
Metadata |
---|
Tokenization: | L-320 |
|
Twi (Ghana)
| tw-GH |
|
|
Tasawaq | twq |
Metadata |
---|
Letter: | ɲẽẼŋšžŊŠŽƝãõÃÕ | Mark: | ̃̌ |
|
Tasawaq (Niger)
| twq-NE |
|
|
Tahitian | ty |
Metadata |
---|
Tokenization: | L-321 | Letter: | āūōēīĀŪŌĒĪ | Mark: | ̄ |
|
Tahitian (French Polynesia)
| ty-PF |
Metadata |
---|
Tokenization: | c-543 |
|
|
Tuvinian | tyv |
Metadata |
---|
Letter: | кижнңэргелбүтуазычдьсмяоюцхпшөйвъфёКИЖНҢЭРГЕЛБҮТУАЗЫЧДЬСМЯОЮЦХПШӨЙВЪФЁ | Mark: | ̆̈ |
|
|
Tzeltal | tzh |
|
|
Central Atlas Tamazight | tzm |
Metadata |
---|
Letter: | ɛɣḍḥṛṣṭẓỵḌḤṚṢṬẒỴƐƔâéçÂÉÇʷ | Mark: | ̧̣̂́ |
|
Central Atlas Tamazight {Tfng}
| tzm-Tfng |
Metadata |
---|
Tokenization: | L-709 |
|
Central Atlas Tamazight {Latn} (Algeria)
| tzm-Latn-DZ |
|
Central Atlas Tamazight {Arab} (Morocco)
| tzm-Arab-MA |
|
Central Atlas Tamazight {Latn} (Morocco)
| tzm-Latn-MA |
|
Central Atlas Tamazight {Tfng} (Morocco)
| tzm-Tfng-MA |
Metadata |
---|
Tokenization: | c-545 |
|
|
Tzotzil | tzo |
Metadata |
---|
Punctuation: | ’ | Letter: | óáéíúÓÁÉÍÚ | Mark: | ́ |
|
|
Udmurt | udm |
|
|
Uyghur | ug |
Metadata |
---|
Tokenization: | L-322 | Punctuation: | ،؛ | Letter: | ئاەبپتجچخدرزژسشغفقكگڭلمنھوۇۆۈۋېىي | Mark: | ٔ |
|
Uyghur {Latn}
| ug-Latn |
Metadata |
---|
Punctuation: | ’ | Letter: | öéüÖÉÜ | Mark: | ̈́ |
|
Uyghur (China)
| ug-CN |
Metadata |
---|
Tokenization: | c-745 |
|
Uyghur {Arab} (China)
| ug-Arab-CN |
Metadata |
---|
Tokenization: | c-556 |
|
Uyghur {Cyrl} (Kazakhstan)
| ug-Cyrl-KZ |
Metadata |
---|
Tokenization: | c-557 |
|
|
Ukrainian | uk |
Metadata |
---|
Tokenization: | L-323 | Punctuation: | –’“„‐«»§ | Letter: | абвгґдеєжзиіїйклмнопрстуфхцчшщьюяАБВГҐДЕЄЖЗИІЇЙКЛМНОПРСТУФХЦЧШЩЬЮЯʼ | Mark: | ̈̆ |
|
Ukrainian (Ukraine)
| uk-UA |
Metadata |
---|
Tokenization: | c-324 |
|
Ukrainian {Latn} (Ukraine)
| uk-Latn-UA |
Metadata |
---|
Tokenization: | c-818 |
|
|
Umbundu | umb |
Metadata |
---|
Tokenization: | L-741 | Punctuation: | ’ | Letter: | ñêãîõâÑÊÃÎÕ | Mark: | ̃̂ |
|
Umbundu (Angola)
| umb-AO |
Metadata |
---|
Tokenization: | c-793 |
|
|
Undetermined | und |
Metadata |
---|
Tokenization: | L-604 |
|
|
Urdu | ur |
Metadata |
---|
Tokenization: | L-325 | Punctuation: | ،؍٫٬؛؟۔”“٪ | Letter: | اآبپتٹثجچحخدڈذرڑزژسشصضطظعغفقکگلمنںوؤہۂھءیئےهي | Mark: | ًَُِّٰٔٓ | Number: | ۱۲۳۴۵۶۷۸۹ |
|
Urdu (India)
| ur-IN |
Metadata |
---|
Tokenization: | c-746 |
|
Urdu (Pakistan)
| ur-PK |
Metadata |
---|
Tokenization: | c-326 |
|
Urdu {Latn} (Pakistan)
| ur-Latn-PK |
Metadata |
---|
Tokenization: | c-823 |
|
|
Urarina | ura |
Metadata |
---|
Letter: | úóíÚÓÍ | Mark: | ́ |
|
|
Uzbek | uz |
Metadata |
---|
Tokenization: | L-327 |
|
Uzbek (Afghanistan)
| uz-AF |
Metadata |
---|
Tokenization: | c-747 |
|
Uzbek {Cyrl} (Uzbekistan)
| uz-Cyrl-UZ |
Metadata |
---|
Tokenization: | c-329 |
|
Uzbek {Latn} (Uzbekistan)
| uz-Latn-UZ |
Metadata |
---|
Tokenization: | c-328 |
|
|
Northern Uzbek | uzn |
Metadata |
---|
Punctuation: | ‐–—…‘’“”„′″«»§ | Letter: | ʻʼ |
|
Northern Uzbek {Arab}
| uzn-Arab |
Metadata |
---|
Punctuation: | ؉٪٫٬ | Letter: | ءآأؤئابةتثجحخدذرزسشصضطظعغفقلمنهويپچژکگۇۉی | Mark: | ًٌٍَُِّْٰٓٔ | Number: | ۱۲۳۴۵۶۷۸۹ |
|
Northern Uzbek {Cyrl}
| uzn-Cyrl |
Metadata |
---|
Punctuation: | ‐– | Letter: | инсоҳуқлармждекцяйбшгтўвэъпчзёфхюғИНСОҲУҚЛАРМЖДЕКЦЯЙБШГТЎВЭЪПЧЗЁФХЮҒ | Mark: | ̆̈ |
|
|
Vai | vai |
Metadata |
---|
Letter: | ꔀꔁꔂꔃꔄꔅꔆꔇꔈꔉꔊꔋꔌꔍꔎꔏꔐꔑꔒꔓꔔꔕꔖꔗꔘꔙꔚꔛꔜꔝꔞꔟꔠꔡꔢꔣꔤꔥꔦꔧꔨꔩꔪꔫꔬꔭꔮꔯꔰꔱꔲꔳꔴꔵꔶꔷꔸꔹꔺꔻꔼꔽꔾꔿꕀꕁꕂꕃꕄꕅꕆꕇꕈꕉꕊꕋꕌꕍꕎꕏꕐꕑꕒꕓꕔꕕꕖꕗꕘꕙꕚꕛꕜꕝꕞꕟꕠꕡꕢꕣꕤꕥꕦꕧꕨꕩꕪꕫꕬꕭꕮꕯꕰꕱꕲꕳꕴꕵꕶꕷꕸꕹꕺꕻꕼꕽꕾꕿꖀꖁꖂꖃꖄꖅꖆꖇꖈꖉꖊꖋꖌꖍꖎꖏꖐꖑꖒꖓꖔꖕꖖꖗꖘꖙꖚꖛꖜꖝꖞꖟꖠꖡꖢꖣꖤꖥꖦꖧꖨꖩꖪꖫꖬꖭꖮꖯꖰꖱꖲꖳꖴꖵꖶꖷꖸꖹꖺꖻꖼꖽꖾꖿꗀꗁꗂꗃꗄꗅꗆꗇꗈꗉꗊꗋꗌꗍꗎꗏꗐꗑꗒꗓꗔꗕꗖꗗꗘꗙꗚꗛꗜꗝꗞꗟꗠꗡꗢꗣꗤꗥꗦꗧꗨꗩꗪꗫꗬꗭꗮꗯꗰꗱꗲꗳꗴꗵꗶꗷꗸꗹꗺꗻꗼꗽꗾꗿꘀꘁꘂꘃꘄꘅꘆꘇꘈꘉꘊꘋꘌ |
|
Vai {Latn}
| vai-Latn |
Metadata |
---|
Letter: | áãéíóõúÁÃÉÍÓÕÚƁƊƐƆĩŋũĨŊŨẽẼɓɗɛɔ | Mark: | ́̃ |
|
Vai {Vaii} (Liberia)
| vai-Vaii-LR |
|
|
Venda | ve |
Metadata |
---|
Tokenization: | L-330 | Punctuation: | “” | Letter: | ṱḽḓṅṋṰḼḒṄṊ | Mark: | ̭̇ |
|
Venda (South Africa)
| ve-ZA |
Metadata |
---|
Tokenization: | c-559 |
|
|
Venetian | vec |
Metadata |
---|
Punctuation: | ’— | Letter: | óàòèùéìçÓÀÒÈÙÉÌÇƚȽđĐ | Mark: | ̧́̀ |
|
Venetian (Italy)
| vec-IT |
|
|
Veps | vep |
Metadata |
---|
Punctuation: | ’ | Letter: | üäöÜÄÖšžčŠŽČ | Mark: | ̈̌ |
|
|
Vietnamese | vi |
Metadata |
---|
Tokenization: | L-331 | Punctuation: | §‐–—…‘’“”†‡′″ | Letter: | àãáâèéêìíòõóôùúýÀÃÁÂÈÉÊÌÍÒÕÓÔÙÚÝơưƠƯăđĩũĂĐĨŨảạằẳẵắặầẩẫấậẻẽẹềểễếệỉịỏọồổỗốộờởỡớợủụừửữứựỳỷỹỵẢẠẰẲẴẮẶẦẨẪẤẬẺẼẸỀỂỄẾỆỈỊỎỌỒỔỖỐỘỜỞỠỚỢỦỤỪỬỮỨỰỲỶỸỴ | Mark: | ̛̣̀̉̃́̆̂ |
|
Vietnamese (Vietnam)
| vi-VN |
Metadata |
---|
Tokenization: | c-332 |
|
|
Soyaltepec Mazatec | vmp |
Metadata |
---|
Tokenization: | L-593 |
|
Soyaltepec Mazatec (Mexico)
| vmp-MX |
Metadata |
---|
Tokenization: | c-594 |
|
|
Makhuwa | vmw |
Metadata |
---|
Punctuation: | ’… | Letter: | çõãÇÕà | Mark: | ̧̃ |
|
|
Ayautla Mazatec | vmy |
Metadata |
---|
Tokenization: | L-595 |
|
Ayautla Mazatec (Mexico)
| vmy-MX |
Metadata |
---|
Tokenization: | c-596 |
|
|
Mazatlán Mazatec | vmz |
Metadata |
---|
Tokenization: | L-597 |
|
Mazatlán Mazatec (Mexico)
| vmz-MX |
Metadata |
---|
Tokenization: | c-598 |
|
|
Volapük | vo |
Metadata |
---|
Tokenization: | L-333 | Punctuation: | «»§‐–—…‘’“” | Letter: | äöüÄÖÜ | Mark: | ̈ |
|
Volapük (International)
| vo-INT |
|
|
Võro | vro |
Metadata |
---|
Tokenization: | L-752 |
|
Võro (Estonia)
| vro-EE |
Metadata |
---|
Tokenization: | c-820 |
|
|
Vunjo | vun |
|
Vunjo (Tanzania, United Republic of)
| vun-TZ |
|
|
Walloon | wa |
Metadata |
---|
Tokenization: | L-334 | Letter: | éåèûîôâêçàÉÅÈÛÎÔÂÊÇÀ | Mark: | ̧́̊̀̂ |
|
Walloon (Belgium)
| wa-BE |
Metadata |
---|
Tokenization: | c-560 |
|
|
Walser | wae |
Metadata |
---|
Letter: | áäãéíóöõúüÁÄÃÉÍÓÖÕÚÜčšũČŠŨ | Mark: | ́̈̃̌ |
|
Walser (Switzerland)
| wae-CH |
|
|
Wolaytta, Wolaitta | wal |
|
Wolaytta, Wolaitta (Ethiopia)
| wal-ET |
|
|
Waray (Philippines) | war |
Metadata |
---|
Tokenization: | L-608 |
|
Waray (Philippines) {Latn} (Philippines)
| war-Latn-PH |
Metadata |
---|
Tokenization: | c-609 |
|
|
Sorbian languages | wen |
Metadata |
---|
Tokenization: | L-636 |
|
|
Cameroon Pidgin | wes |
Metadata |
---|
Tokenization: | L-735 |
|
Cameroon Pidgin (Cameroon)
| wes-CM |
Metadata |
---|
Tokenization: | c-785 |
|
|
Wolof | wo |
Metadata |
---|
Tokenization: | L-335 | Punctuation: | ‰ | Letter: | ëñàéóËÑÀÉÓŋŊ | Mark: | ̈̃̀́ |
|
Wolof (Gambia)
| wo-GM |
Metadata |
---|
Tokenization: | c-563 |
|
Wolof (Senegal)
| wo-SN |
Metadata |
---|
Tokenization: | c-562 |
|
|
Waama | wwa |
Metadata |
---|
Letter: | ãìàùèÃÌÀÙÈǹƆƐǸũŋŨŊɔɛ | Mark: | ̃̀ |
|
|
Xhosa | xh |
Metadata |
---|
Tokenization: | L-336 |
|
Xhosa (South Africa)
| xh-ZA |
Metadata |
---|
Tokenization: | c-337 |
|
|
Kangri | xnr |
Metadata |
---|
Tokenization: | L-729 |
|
Kangri {Deva} (India)
| xnr-Deva-IN |
Metadata |
---|
Tokenization: | c-777 |
|
|
Soga | xog |
|
Soga (Uganda)
| xog-UG |
|
|
Liberia Kpelle | xpe |
Metadata |
---|
Letter: | ƐƁƆƝƏĝŋĜŊɛɓɔɲə | Mark: | ̂ |
|
|
Kasem | xsm |
|
|
Yagua | yad |
Metadata |
---|
Letter: | ñíéáÑÍÉÁ | Mark: | ̃́ |
|
|
Yao | yao |
|
Yao (Malawi)
| yao-MW |
|
|
Yapese | yap |
Metadata |
---|
Tokenization: | L-718 | Punctuation: | ‐ | Letter: | ʼ |
|
|
Yangben | yav |
Metadata |
---|
Letter: | áàâéèíìîóòôúùûÁÀÂÉÈÍÌÎÓÒÔÚÙÛǎǒǔǍƐǑƆǓāīŋōūĀĪŊŌŪɛɔ | Mark: | ́̀̂̌̄ |
|
Yangben (Cameroon)
| yav-CM |
|
|
Eastern Yiddish | ydd |
Metadata |
---|
Punctuation: | ׳״־‐–— | Letter: | אבגדזשהויחטײכךלמםנןסעפףצץקרתװױ | Mark: | ִַָּֿׂ |
|
|
Yiddish | yi |
Metadata |
---|
Tokenization: | L-338 |
|
Yiddish (Germany)
| yi-DE |
Metadata |
---|
Tokenization: | c-767 |
|
Yiddish (Israel)
| yi-IL |
Metadata |
---|
Tokenization: | c-564 |
|
Yiddish (International)
| yi-INT |
|
Yiddish (United States)
| yi-US |
Metadata |
---|
Tokenization: | c-748 |
|
|
Northern Yukaghir | ykg |
Metadata |
---|
Letter: | эльистачйкөдҥнбпрумогецяҕхжѳқзвфыющЭЛЬИСТАЧЙКӨДҤНБПРУМОГЕЦЯҔХЖѲҚЗВФЫЮЩ | Mark: | ̆ |
|
|
Maay Maay | ymm |
Metadata |
---|
Tokenization: | L-645 |
|
Maay Maay (Somalia)
| ymm-SO |
|
|
Yoruba | yo |
Metadata |
---|
Tokenization: | L-339 | Punctuation: | ‐ | Letter: | áàéèíìóòúùÁÀÉÈÍÌÓÒÚÙńŃẹọṣẸỌṢ | Mark: | ̩̣́̀̄ |
|
Yoruba (Benin)
| yo-BJ |
|
Yoruba (Nigeria)
| yo-NG |
Metadata |
---|
Tokenization: | c-565 |
|
|
Yucateco, Yucatec Maya | yua |
Metadata |
---|
Punctuation: | ‐ | Letter: | ʼóíáúéÓÍÁÚÉ | Mark: | ́ |
|
|
Yue Chinese, Cantonese | yue |
Metadata |
---|
Tokenization: | L-722 |
|
Yue Chinese, Cantonese (China)
| yue-CN |
|
Yue Chinese, Cantonese (Hong Kong)
| yue-HK |
Metadata |
---|
Tokenization: | c-758 |
|
|
Zhuang | za |
Metadata |
---|
Tokenization: | L-340 |
|
Zhuang (China)
| za-CN |
|
|
Miahuatlán Zapotec | zam |
Metadata |
---|
Letter: | óáñíÓÁÑÍʼ | Mark: | ́̃ |
|
|
Ngazidja Comorian | zdj |
|
|
Standard Moroccan Tamazight | zgh |
Metadata |
---|
Letter: | ⴰⵍⵖⵓⵎⴹⵏⵉⵣⵔⴼⴳⴷⵊⴱⵜⵡⴽⵢⵙⵀⵛⵥⵇⵯⴻⵕⵟⵃⵄⵅⵚ |
|
Standard Moroccan Tamazight {Tfng} (Morocco)
| zgh-Tfng-MA |
Metadata |
---|
Tokenization: | c-546 |
|
|
Chinese | zh |
Metadata |
---|
Tokenization: | L-343 |
|
Chinese (China)
| zh-CN |
Metadata |
---|
Tokenization: | c-344 |
|
Chinese {Hans} (China)
| zh-Hans-CN |
|
Chinese (Hong Kong)
| zh-HK |
Metadata |
---|
Tokenization: | c-345 |
|
Chinese {Hans} (Hong Kong)
| zh-Hans-HK |
|
Chinese (Macau)
| zh-MO |
Metadata |
---|
Tokenization: | c-346 |
|
Chinese {Hans} (Macau)
| zh-Hans-MO |
|
Chinese (Malaysia)
| zh-MY |
Metadata |
---|
Tokenization: | c-431 |
|
Chinese (Singapore)
| zh-SG |
Metadata |
---|
Tokenization: | c-347 |
|
Chinese (Taiwan)
| zh-TW |
Metadata |
---|
Tokenization: | c-348 |
|
|
Malay (individual language) | zlm |
|
Malay (individual language) {Arab}
| zlm-Arab |
Metadata |
---|
Punctuation: | ، | Letter: | ڤراشتهنحقسيمأجڬدبوڽڠعفكلچخظصزطۏؤئذ | Mark: | ٔ | Number: | ٢ |
|
|
Zou | zom |
Metadata |
---|
Tokenization: | L-646 |
|
|
Záparo | zro |
|
|
Güilá Zapotec | ztu |
Metadata |
---|
Letter: | ëíéËÍÉ | Mark: | ̈́ |
|
|
Zulu | zu |
Metadata |
---|
Tokenization: | L-341 |
|
Zulu (South Africa)
| zu-ZA |
Metadata |
---|
Tokenization: | c-342 |
|
|
No linguistic content, Not applicable | zxx |
Metadata |
---|
Tokenization: | L-603 |
|
|
Yongbei Zhuang | zyb |
|
|
Zaza, Dimili, Dimli, Kirdki, Kirmanjki, Zazaki | zza |
Metadata |
---|
Tokenization: | L-725 |
|
Zaza, Dimili, Dimli, Kirdki, Kirmanjki, Zazaki (Turkey)
| zza-TR |
Metadata |
---|
Tokenization: | c-768 |
|