|
| Afar | aa |
|
| Afar (Djibouti)
| aa-DJ |
|
| Afar (Dominica)
| aa-DM |
|
| Afar (Eritrea)
| aa-ER |
|
| Afar (Ethiopia)
| aa-ET |
| Metadata |
|---|
| Tokenization: | c-441 |
|
|
| Abkhazian | ab |
| Metadata |
|---|
| Tokenization: | L-1 | | Punctuation: | –‐ | | Letter: | ЏАБВГДЕЖЗИКЛМНОПРСТУФХЦЧШЫЬабвгдежзиклмнопрстуфхцчшыьџҔҕҚқҞҟҦҧҨҩҬҭҲҳҴҵҶҷҼҽҾҿӘәӠӡӶӷ |
|
|
| Achinese | ace |
| Metadata |
|---|
| Punctuation: | ‐“” | | Letter: | ÈÉËÔÖèéëôö | | Mark: | ̀́̂̈ |
|
| Achinese {Arab} (Indonesia)
| ace-Arab-ID |
|
| Achinese {Latn} (Indonesia)
| ace-Latn-ID |
|
|
| Acoli | ach |
| Metadata |
|---|
| Tokenization: | L-349 |
|
| Acoli (Uganda)
| ach-UG |
| Metadata |
|---|
| Tokenization: | c-440 |
|
|
| Mesopotamian Arabic | acm |
| Metadata |
|---|
| Tokenization: | L-730 |
|
|
| Achuar-Shiwiar | acu |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | úáÚÁ | | Mark: | ́ |
|
|
| Adangme | ada |
| Metadata |
|---|
| Letter: | íÍƆƐɔɛ | | Mark: | ́ |
|
|
| Adyghe, Adygei | ady |
| Metadata |
|---|
| Letter: | ЁАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёӏӀ | | Mark: | ̆̈ |
|
|
| Avestan | ae |
| Metadata |
|---|
| Tokenization: | L-31 |
|
|
| Afrikaans | af |
| Metadata |
|---|
| Tokenization: | L-3 | | Punctuation: | §‐–—…‘’“”†‡′″‰ | | Letter: | áâéèêëîïôöûÁÂÉÈÊËÎÏÔÖÛ | | Mark: | ́̂̀̈ |
|
| Afrikaans (Namibia)
| af-NA |
|
| Afrikaans (South Africa)
| af-ZA |
|
|
| Aghem | agq |
| Metadata |
|---|
| Punctuation: | ‰ | | Letter: | àâèêìîòôùûÀÂÈÊÌÎÒÔÙÛǎǐǒǔǍƐǏƗǑƆǓɄāěēīŋōūĀĚĒĪŊŌŪɛɨɔʉʔ | | Mark: | ̀̂̌̄ |
|
| Aghem (Cameroon)
| agq-CM |
|
|
| Aguaruna | agr |
| Metadata |
|---|
| Punctuation: | ¡¿‐ | | Letter: | áíÁÍ | | Mark: | ́ |
|
|
| Assyrian Neo-Aramaic | aii |
| Metadata |
|---|
| Punctuation: | ،܆܇؛.؟ | | Letter: | ܐܝܘܦܒܬܛܕܟܓܩܣܨܙܫܚܥܗܡܢܪܠ | | Mark: | ܼܹܸ̰̮݂̱݈̣ܿܲܵ݁̃̄݇̈̇݀ |
|
|
| Aja (Benin) | ajg |
| Metadata |
|---|
| Letter: | úóòùàèéìíõáÚÓÒÙÀÈÉÌÍÕÁƆƉƐƷŋŊɔɖɛʒ | | Mark: | ̀́̃ |
|
|
| Akan | ak |
| Metadata |
|---|
| Tokenization: | L-5 | | Punctuation: | ‰ | | Letter: | ɛɔƐƆ |
|
| Akan (Ghana)
| ak-GH |
| Metadata |
|---|
| Tokenization: | c-442 |
|
|
| Tosk Albanian | als |
| Metadata |
|---|
| Punctuation: | «»§‐–—…‘’“”′″‰ | | Letter: | çëÇË | | Mark: | ̧̈ |
|
|
| Southern Altai | alt |
| Metadata |
|---|
| Punctuation: | ‐ | | Letter: | кижнҥтапэрешдлцязыгьйсмбјчӱоуӧвщюъфхКИЖНҤТАПЭРЕШДЛЦЯЗЫГЬЙСМБЈЧӰОУӦВЩЮЪФХ |
|
|
| Amharic | am |
| Metadata |
|---|
| Tokenization: | L-8 | | Punctuation: | ፡፣፤፥፦።‐–‹›«» | | Letter: | ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖሗመሙሚማሜምሞሟሠሡሢሣሤሥሦሧረሩሪራሬርሮሯሰሱሲሳሴስሶሷሸሹሺሻሼሽሾሿቀቁቂቃቄቅቆቈቊቋቌቍበቡቢባቤብቦቧቨቩቪቫቬቭቮቯተቱቲታቴትቶቷቸቹቺቻቼችቾቿኀኁኂኃኄኅኆኈኊኋኌኍነኑኒናኔንኖኗኘኙኚኛኜኝኞኟአኡኢኣኤእኦኧከኩኪካኬክኮኰኲኳኴኵኸኹኺኻኼኽኾወዉዊዋዌውዎዐዑዒዓዔዕዖዘዙዚዛዜዝዞዟዠዡዢዣዤዥዦዧየዩዪያዬይዮደዱዲዳዴድዶዷጀጁጂጃጄጅጆጇገጉጊጋጌግጎጐጒጓጔጕጠጡጢጣጤጥጦጧጨጩጪጫጬጭጮጯጰጱጲጳጴጵጶጷጸጹጺጻጼጽጾጿፀፁፂፃፄፅፆፈፉፊፋፌፍፎፏፐፑፒፓፔፕፖፗ | | Number: | ፩፪፫፬፭፮፯፰፱፲፳፴ |
|
| Amharic (Ethiopia)
| am-ET |
| Metadata |
|---|
| Tokenization: | c-443 |
|
|
| Amahuaca | amc |
|
|
| Yanesha' | ame |
| Metadata |
|---|
| Letter: | ñëóíãõáÑËÓÍÃÕÁẽẼʼ | | Mark: | ̃̈́ |
|
|
| Amis | ami |
|
|
| Amarakaeri | amr |
| Metadata |
|---|
| Punctuation: | ¿’ | | Mark: | ̱ |
|
|
| Aragonese | an |
| Metadata |
|---|
| Tokenization: | L-26 |
|
| Aragonese (Spain)
| an-ES |
| Metadata |
|---|
| Tokenization: | c-444 |
|
|
| Sudanese Arabic | apd |
| Metadata |
|---|
| Tokenization: | L-719 |
|
| Sudanese Arabic (Sudan)
| apd-SD |
| Metadata |
|---|
| Tokenization: | c-749 |
|
| Sudanese Arabic {Latn} (Sudan)
| apd-Latn-SD |
| Metadata |
|---|
| Tokenization: | c-709 |
|
|
| Arabic | ar |
|
| Arabic {Latn}
| ar-Latn |
| Metadata |
|---|
| Tokenization: | L-753 |
|
| Arabic (United Arab Emirates)
| ar-AE |
| Metadata |
|---|
| Tokenization: | c-10 |
|
| Arabic {Arab} (United Arab Emirates)
| ar-Arab-AE |
|
| Arabic (Bahrain)
| ar-BH |
| Metadata |
|---|
| Tokenization: | c-11 |
|
| Arabic (Algeria)
| ar-DZ |
| Metadata |
|---|
| Tokenization: | c-12 |
|
| Arabic (Egypt)
| ar-EG |
| Metadata |
|---|
| Tokenization: | c-13 |
|
| Arabic (Western Sahara)
| ar-EH |
| Metadata |
|---|
| Tokenization: | c-710 |
|
| Arabic (Israel)
| ar-IL |
| Metadata |
|---|
| Tokenization: | c-648 |
|
| Arabic {Arab} (International)
| ar-Arab-INT |
|
| Arabic {Latn} (International)
| ar-Latn-INT |
|
| Arabic (Iraq)
| ar-IQ |
| Metadata |
|---|
| Tokenization: | c-14 |
|
| Arabic (Jordan)
| ar-JO |
| Metadata |
|---|
| Tokenization: | c-15 |
|
| Arabic (Comoros)
| ar-KM |
|
| Arabic (Kuwait)
| ar-KW |
| Metadata |
|---|
| Tokenization: | c-16 |
|
| Arabic (Lebanon)
| ar-LB |
| Metadata |
|---|
| Tokenization: | c-17 |
|
| Arabic (Levant)
| ar-LEV |
| Metadata |
|---|
| Tokenization: | c-778 |
|
| Arabic (Libya)
| ar-LY |
| Metadata |
|---|
| Tokenization: | c-18 |
|
| Arabic (Morocco)
| ar-MA |
| Metadata |
|---|
| Tokenization: | c-19 |
|
| Arabic (Maghreb)
| ar-MGB |
| Metadata |
|---|
| Tokenization: | c-779 |
|
| Arabic (Mauritania)
| ar-MR |
| Metadata |
|---|
| Tokenization: | c-711 |
|
| Arabic (Oman)
| ar-OM |
| Metadata |
|---|
| Tokenization: | c-20 |
|
| Arabic (Palestine)
| ar-PS |
| Metadata |
|---|
| Tokenization: | c-712 |
|
| Arabic (Qatar)
| ar-QA |
| Metadata |
|---|
| Tokenization: | c-21 |
|
| Arabic (Saudi Arabia)
| ar-SA |
| Metadata |
|---|
| Tokenization: | c-22 |
|
| Arabic (Sudan)
| ar-SD |
| Metadata |
|---|
| Tokenization: | c-649 |
|
| Arabic (Somalia)
| ar-SO |
|
| Arabic (South Sudan)
| ar-SS |
|
| Arabic (Syria)
| ar-SY |
| Metadata |
|---|
| Tokenization: | c-23 |
|
| Arabic (Chad)
| ar-TD |
| Metadata |
|---|
| Tokenization: | c-713 |
|
| Arabic (Tunisia)
| ar-TN |
| Metadata |
|---|
| Tokenization: | c-24 |
|
| Arabic (Yemen)
| ar-YE |
| Metadata |
|---|
| Tokenization: | c-25 |
|
|
| Standard Arabic | arb |
| Metadata |
|---|
| Tokenization: | L-599 | | Punctuation: | ؉،؛؟٪٫٬‐–—…‰«» | | Letter: | ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي | | Mark: | ًٌٍَُِّْٰٕٓٔ | | Number: | ١٢٣٤٥٦٧٨٩ |
|
|
| Arabela | arl |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | úÚ | | Mark: | ́ |
|
|
| Mapudungun, Mapuche | arn |
| Metadata |
|---|
| Tokenization: | L-400 | | Letter: | ñáíóÑÁÍÓ | | Mark: | ̃́ |
|
| Mapudungun, Mapuche (Chile)
| arn-CL |
|
|
| Najdi Arabic | ars |
| Metadata |
|---|
| Tokenization: | L-755 |
|
| Najdi Arabic (Saudi Arabia)
| ars-SA |
| Metadata |
|---|
| Tokenization: | c-828 |
|
|
| Assamese | as |
| Metadata |
|---|
| Tokenization: | L-29 | | Punctuation: | ‰ | | Letter: | অআইঈউঊঋএঐওঔকখগঘঙচছজঝঞটঠডঢণতথদধনপফবভমযৰলৱশষসহ | | Mark: | ়ংঁঃ্ািীুূৃেৈোৌৗ | | Number: | ১২৩৪৫৬৭৮৯ |
|
| Assamese (India)
| as-IN |
| Metadata |
|---|
| Tokenization: | c-445 |
|
|
| Asu (Tanzania) | asa |
|
| Asu (Tanzania) (Tanzania, United Republic of)
| asa-TZ |
|
|
| American Sign Language | ase |
| Metadata |
|---|
| Tokenization: | L-582 |
|
|
| Asturian, Asturleonese, Bable, Leonese | ast |
| Metadata |
|---|
| Tokenization: | L-352 | | Punctuation: | ¡¿«»§‐–—…‘’“”†‡′″‰ | | Letter: | áéíñóúüÁÉÍÑÓÚÜḥḷḤḶ | | Mark: | ̣́̃̈ |
|
| Asturian, Asturleonese, Bable, Leonese (Spain)
| ast-ES |
| Metadata |
|---|
| Tokenization: | c-446 |
|
|
| Waorani | auc |
| Metadata |
|---|
| Letter: | ñíéóÑÍÉÓ | | Mark: | ̃́ |
|
|
| Avaric | av |
| Metadata |
|---|
| Tokenization: | L-30 |
|
| Avaric (Russia)
| av-RU |
| Metadata |
|---|
| Tokenization: | c-448 |
|
|
| Aymara | ay |
| Metadata |
|---|
| Tokenization: | L-32 |
|
| Aymara (Bolivia)
| ay-BO |
| Metadata |
|---|
| Tokenization: | c-449 |
|
|
| North Mesopotamian Arabic | ayp |
| Metadata |
|---|
| Tokenization: | L-731 |
|
|
| Central Aymara | ayr |
| Metadata |
|---|
| Letter: | ñïäíáëúÑÏÄÍÁËÚ | | Mark: | ̃̈́ |
|
|
| Azerbaijani | az |
| Metadata |
|---|
| Tokenization: | L-33 |
|
| Azerbaijani {Cyrl} (Azerbaijan)
| az-Cyrl-AZ |
| Metadata |
|---|
| Tokenization: | c-35 |
|
| Azerbaijani {Latn} (Azerbaijan)
| az-Latn-AZ |
| Metadata |
|---|
| Tokenization: | c-34 |
|
|
| South Azerbaijani | azb |
| Metadata |
|---|
| Tokenization: | L-711 | | Letter: | آؤئاتثجحخدذرزسشصضطظعغفقلمنهويٮپچژکگۆۇیەݣ | | Mark: | َْٓٔ |
|
| South Azerbaijani {Arab} (Iran)
| azb-Arab-IR |
| Metadata |
|---|
| Tokenization: | c-450 |
|
|
| North Azerbaijani | azj |
| Metadata |
|---|
| Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | çöüÇÖÜƏğışĞŞİə | | Mark: | ̧̇̆̈ |
|
| North Azerbaijani {Cyrl}
| azj-Cyrl |
| Metadata |
|---|
| Punctuation: | ‐–—…‘’“”†‡′″‰§ | | Letter: | аәбвгғдежзийјкҝлмноөпрстуүфхһчҹшыАӘБВГҒДЕЖЗИЙЈКҜЛМНОӨПРСТУҮФХҺЧҸШЫ | | Mark: | ̆ |
|
|
| Bashkir | ba |
| Metadata |
|---|
| Tokenization: | L-37 |
|
| Bashkir (Russia)
| ba-RU |
| Metadata |
|---|
| Tokenization: | c-454 |
|
|
| Baluchi | bal |
| Metadata |
|---|
| Tokenization: | L-355 |
|
| Baluchi (Iran)
| bal-IR |
| Metadata |
|---|
| Tokenization: | c-452 |
|
|
| Balinese | ban |
| Metadata |
|---|
| Tokenization: | L-354 |
|
| Balinese {Bali}
| ban-Bali |
| Metadata |
|---|
| Punctuation: | ᭞᭟᭚᭛᭜᭝᭠ | | Letter: | ᬅᬆᬇᬈᬉᬊᬋᬌᬍᬎᬏᬐᬑᬒᬓᬔᬕᬖᬗᬘᬙᬚᬛᬜᬝᬞᬟᬠᬡᬢᬣᬤᬥᬦᬧᬨᬩᬪᬫᬬᬭᬮᬯᬰᬱᬲᬳ | | Mark: | ᬂᬃᬄ᬴ᬵᬶᬷᬸᬹᬺᬻᬼᬽᬾᬿᭀᭁᭂᭃ᭄ | | Number: | ᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙ |
|
| Balinese (Indonesia)
| ban-ID |
| Metadata |
|---|
| Tokenization: | c-451 |
|
|
| Basaa | bas |
| Metadata |
|---|
| Letter: | áàâéèêíìîóòôúùûÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛǎǐǹǒǔǍƁƐǏǸǑƆǓāěēīńŋōūĀĚĒĪŃŊŌŪɓɛɔ | | Mark: | ᷆᷇́̀̂̌̄ |
|
| Basaa (Cameroon)
| bas-CM |
|
|
| Bamun | bax |
| Metadata |
|---|
| Punctuation: | ‘’ | | Letter: | úéêüûâôîáèùàÚÉÊÜÛÂÔÎÁÈÙÀṅṄ | | Mark: | ́̂̈̀̇ |
|
| Bamun {Bamu}
| bax-Bamu |
| Metadata |
|---|
| Punctuation: | ꛲꛳꛴꛵꛶꛷ | | Letter: | ꚠꚡꚢꚣꚤꚥꚦꚧꚨꚩꚪꚫꚬꚭꚮꚯꚰꚱꚲꚳꚴꚵꚶꚷꚸꚹꚺꚻꚼꚽꚾꚿꛀꛁꛂꛃꛄꛅꛆꛇꛈꛉꛊꛋꛌꛍꛎꛏꛐꛑꛒꛓꛔꛕꛖꛗꛘꛙꛚꛛꛜꛝꛞꛟꛠꛡꛢꛣꛤꛥꛦꛧꛨꛩꛪꛫꛬꛭꛮꛯ | | Mark: | ꛰꛱ |
|
|
| Baatonum | bba |
| Metadata |
|---|
| Letter: | àéùèóÀÉÙÈÓǹƐƆǸɛɔ | | Mark: | ̀́ |
|
|
| Central Bikol | bcl |
|
|
| Belarusian | be |
| Metadata |
|---|
| Tokenization: | L-40 | | Punctuation: | ‐«» | | Letter: | абвгджзеёійклмнопрстуўфхцчшыьэюяиАБВГДЖЗЕЁІЙКЛМНОПРСТУЎФХЦЧШЫЬЭЮЯИʼ | | Mark: | ̈̆ |
|
| Belarusian (Belarus)
| be-BY |
| Metadata |
|---|
| Tokenization: | c-41 |
|
|
| Bemba (Zambia) | bem |
| Metadata |
|---|
| Tokenization: | L-721 |
|
| Bemba (Zambia) (Zambia)
| bem-ZM |
| Metadata |
|---|
| Tokenization: | c-757 |
|
|
| Berber languages | ber |
| Metadata |
|---|
| Tokenization: | L-425 |
|
|
| Bena (Tanzania) | bez |
|
|
| Malba Birifor | bfo |
| Metadata |
|---|
| Tokenization: | L-399 |
|
| Malba Birifor (Burkina Faso)
| bfo-BF |
| Metadata |
|---|
| Tokenization: | c-508 |
|
|
| Bulgarian | bg |
| Metadata |
|---|
| Tokenization: | L-48 | | Punctuation: | ‐–—…‘‚“„″§ | | Letter: | абвгдежзийклмнопрстуфхцчшщъьюяАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЬЮЯ | | Mark: | ̆ |
|
| Bulgarian (Bulgaria)
| bg-BG |
| Metadata |
|---|
| Tokenization: | c-49 |
|
| Bulgarian {Latn} (Bulgaria)
| bg-Latn-BG |
| Metadata |
|---|
| Tokenization: | c-816 |
|
|
| Haryanvi | bgc |
| Metadata |
|---|
| Tokenization: | L-577 |
|
| Haryanvi (India)
| bgc-IN |
| Metadata |
|---|
| Tokenization: | c-578 |
|
|
| Bihari languages | bh |
| Metadata |
|---|
| Tokenization: | L-43 |
|
| Bihari languages (India)
| bh-IN |
| Metadata |
|---|
| Tokenization: | c-714 |
|
|
| Bhojpuri | bho |
| Metadata |
|---|
| Tokenization: | L-723 | | Punctuation: | । | | Letter: | मनवधकरखतसयषटउचबहलघणपगठदभअएआओथशजडइछऔफढईझऐञ | | Mark: | ािंु्ेोी़ूौृै |
|
| Bhojpuri {Deva} (India)
| bho-Deva-IN |
| Metadata |
|---|
| Tokenization: | c-763 |
|
| Bhojpuri {Deva} (Nepal)
| bho-Deva-NP |
| Metadata |
|---|
| Tokenization: | c-764 |
|
|
| Bislama | bi |
| Metadata |
|---|
| Tokenization: | L-44 |
|
| Bislama (Vanuatu)
| bi-VU |
| Metadata |
|---|
| Tokenization: | c-456 |
|
|
| Bikol | bik |
|
|
| Bini, Edo | bin |
| Metadata |
|---|
| Letter: | ÀÁÈÉÌÍÒÓÙÚàáèéìíòóùúẸẹỌọ | | Mark: | ̣̀́ |
|
| Bini, Edo (Nigeria)
| bin-NG |
|
|
| Southern Birifor | biv |
| Metadata |
|---|
| Tokenization: | L-421 |
|
| Southern Birifor (Ghana)
| biv-GH |
| Metadata |
|---|
| Tokenization: | c-538 |
|
|
| Banjarese | bjn |
|
| Banjarese {Arab} (Indonesia)
| bjn-Arab-ID |
|
| Banjarese {Latn} (Indonesia)
| bjn-Latn-ID |
|
|
| Buhid | bku |
| Metadata |
|---|
| Punctuation: | ᜵᜶ | | Letter: | ᝉᝆᝃᝊᝇᝄᝐᝑᝋᝈᝅᝏᝍᝎᝌᝀᝁᝂ | | Mark: | ᝒᝓ |
|
| Buhid {Buhd}
| bku-Buhd |
| Metadata |
|---|
| Punctuation: | ᜵᜶ | | Letter: | ᝉᝆᝃᝊᝇᝄᝐᝑᝋᝈᝅᝏᝍᝎᝌᝀᝁᝂ | | Mark: | ᝒᝓ |
|
|
| Tai Dam | blt |
| Metadata |
|---|
| Letter: | ꪀꪁꪂꪃꪄꪅꪆꪇꪈꪉꪊꪋꪌꪍꪎꪏꪐꪑꪒꪓꪔꪕꪖꪗꪘꪙꪚꪛꪜꪝꪞꪟꪠꪡꪢꪣꪤꪥꪦꪧꪨꪩꪪꪫꪬꪭꪮꪯꪱꪵꪶꪹꪺꪻꪼꪽꫀꫂꫛꫜꫝ | | Mark: | ꪴꪰꪲꪳꪷꪸꪾ꪿꫁ |
|
|
| Bambara | bm |
| Metadata |
|---|
| Tokenization: | L-36 | | Punctuation: | ’ | | Letter: | ƐƝƆŋŊɛɲɔ |
|
| Bambara (Mali)
| bm-ML |
| Metadata |
|---|
| Tokenization: | c-453 |
|
| Bambara {Latn} (Mali)
| bm-Latn-ML |
|
|
| Bengali | bn |
| Metadata |
|---|
| Tokenization: | L-42 | | Punctuation: | ।॥“”‘’ | | Letter: | অআইঈউঊঋএঐওঔকষখগঘঙচছজঝঞটঠডঢণতৎথদধনপফবভমযরলশসহঽ | | Mark: | ়ংঃঁ্ািীুূৃেৈোৌৗ | | Number: | ১২৩৪৫৬৭৮৯০ |
|
| Bengali (Bangladesh)
| bn-BD |
| Metadata |
|---|
| Tokenization: | c-356 |
|
| Bengali (India)
| bn-IN |
| Metadata |
|---|
| Tokenization: | c-455 |
|
| Bengali {Latn} (India)
| bn-Latn-IN |
| Metadata |
|---|
| Tokenization: | c-821 |
|
|
| Tibetan | bo |
| Metadata |
|---|
| Tokenization: | L-69 | | Punctuation: | ༄༅༈་༌།༎ | | Letter: | ཀཁགངཅཆཇཉཊཋཌཎཏཐདནཔཕབམཙཚཛཝཞཟའཡརལཤཥསཧཨཪ | | Mark: | ིེོུྐྑྒྔྕྖྗྙྚྛྜྞྟྠྡྣྤྥྦྨྩྪྫྭྮྯྰྱྲླྴྵྶྷྸྺྻྼ | | Number: | ༡༢༣༤༥༦༧༨༩ |
|
| Tibetan (China)
| bo-CN |
| Metadata |
|---|
| Tokenization: | c-550 |
|
| Tibetan (India)
| bo-IN |
|
|
| Bora | boa |
| Metadata |
|---|
| Letter: | úáéñíóÚÁÉÑÍÓɨȉƗȈ | | Mark: | ́̃̏ |
|
|
| Breton | br |
| Metadata |
|---|
| Tokenization: | L-47 | | Punctuation: | ’– | | Letter: | êñùÊÑÙʼ | | Mark: | ̂̃̀ |
|
| Breton (France)
| br-FR |
| Metadata |
|---|
| Tokenization: | c-457 |
|
|
| Bodo (India) | brx |
| Metadata |
|---|
| Tokenization: | L-726 | | Letter: | अआइईउऊऍएऐऑओऔकखगघचछजझञटठडढणतथदधनपफबभमयरलळवशषसह | | Mark: | ़ँंािीुूृॅेैॉोौ् |
|
| Bodo (India) {Deva} (India)
| brx-Deva-IN |
| Metadata |
|---|
| Tokenization: | c-773 |
|
|
| Bosnian | bs |
| Metadata |
|---|
| Tokenization: | L-45 | | Punctuation: | ‐–—…‘’“”′″ | | Letter: | čćžđšČĆŽĐŠ | | Mark: | ̌́ |
|
| Bosnian {Cyrl}
| bs-Cyrl |
| Metadata |
|---|
| Punctuation: | ‐–—…‘’“”′″ | | Letter: | абвгдђежзијклљмнњопрстћуфхцчџшАБВГДЂЕЖЗИЈКЛЉМНЊОПРСТЋУФХЦЧЏШ |
|
| Bosnian (Bosnia and Herzegovina)
| bs-BA |
|
| Bosnian {Cyrl} (Bosnia and Herzegovina)
| bs-Cyrl-BA |
| Metadata |
|---|
| Tokenization: | c-357 |
|
| Bosnian {Latn} (Bosnia and Herzegovina)
| bs-Latn-BA |
| Metadata |
|---|
| Tokenization: | c-358 |
|
|
| Bassa | bsq |
| Metadata |
|---|
| Letter: | ɓɔɖɛḾḿṸṹĒēĚěĨĩŃńŨũŪūƁƆƉƐǍǎǏǐǑǒǓǔǸǹÀÁÃÈÉÌÍÒÓÙÚàáãèéìíòóùú | | Mark: | ̀́̃̄̌ |
|
| Bassa {Bass}
| bsq-Bass |
| Metadata |
|---|
| Punctuation: | ֫բ | | Letter: | ֫Ͱ֫ʰ֫ɰ֫˰̰֫֫Ȱ֫ǰ֫Ű֫°֫p֫װ֫ްٰ֫߰֫֫Ѱְ֫֫ذ֫ݰ֫Ӱ֫ð֫Ұ֫ܰ֫ڰ֫0֫İ֫֫а֫հ֫۰֫Ƣ | | Mark: | ֫а֫Ѱ֫Ұ֫Ӱ֫Ԣ |
|
|
| Buriat | bua |
| Metadata |
|---|
| Tokenization: | L-612 |
|
|
| Bushi | buc |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | ìàãÌÀÃɓŋĩŊĨƁɗƊ | | Mark: | ̀̃ |
|
|
| Buginese | bug |
|
| Buginese {Bugi}
| bug-Bugi |
| Metadata |
|---|
| Punctuation: | ᨞᨟ | | Letter: | ᨀᨁᨂᨄᨅᨆᨈᨉᨊᨌᨍᨎᨐᨑᨒᨓᨔᨖᨃᨏᨋᨇᨕ | | Mark: | ᨘᨗᨙᨚᨛ |
|
|
| Bulu (Cameroon) | bum |
| Metadata |
|---|
| Letter: | óñôéáÓÑÔÉÁōńŌŃ | | Mark: | ̄́̃̂ |
|
|
| Bukusu, Lubukusu | bxk |
|
|
| Bilin, Blin | byn |
|
| Bilin, Blin (Eritrea)
| byn-ER |
|
|
| Catalan, Valencian | ca |
| Metadata |
|---|
| Tokenization: | L-51 | | Punctuation: | ·¡¿«»§‐–—…‘’“”†‡′″ | | Letter: | àçéèíïóòúüÀÇÉÈÍÏÓÒÚÜ | | Mark: | ̧̀́̈ |
|
| Catalan, Valencian (Andorra)
| ca-AD |
| Metadata |
|---|
| Tokenization: | c-715 |
|
| Catalan, Valencian (Spain)
| ca-ES |
| Metadata |
|---|
| Tokenization: | c-52 |
|
| Catalan, Valencian (Spain; Valenciana, Comunidad)
| ca-ES-VC |
|
| Catalan, Valencian (France)
| ca-FR |
|
| Catalan, Valencian (International; Valenciana, Comunidad)
| ca-INT-VC |
|
| Catalan, Valencian (Italy)
| ca-IT |
|
|
| Garifuna | cab |
| Metadata |
|---|
| Letter: | üúñáéíèóÜÚÑÁÉÍÈÓ | | Mark: | ̈́̃̀ |
|
|
| Kaqchikel, Cakchiquel | cak |
| Metadata |
|---|
| Letter: | äïöüÄÏÖÜ | | Mark: | ̈ |
|
|
| Carolinian | cal |
| Metadata |
|---|
| Tokenization: | L-712 |
|
|
| Chachi | cbi |
| Metadata |
|---|
| Punctuation: | ¿¡ | | Letter: | ñóúáíéÑÓÚÁÍÉ | | Mark: | ̃́ |
|
|
| Chavacano | cbk |
|
|
| Cashibo-Cacataibo | cbr |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | ñëúíáéóÑËÚÍÁÉÓ | | Mark: | ́̃̈́ |
|
|
| Cashinahua | cbs |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | íÍ | | Mark: | ́ |
|
|
| Chayahuita | cbt |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | ëóíËÓÍ | | Mark: | ̈́ |
|
|
| Candoshi-Shapra | cbu |
| Metadata |
|---|
| Punctuation: | ¿¡ | | Letter: | íáÍÁ | | Mark: | ́ |
|
|
| Chakma | ccp |
| Metadata |
|---|
| Punctuation: | хpхðх°х"0‐–—…‘’“”†‡′″§ | | Letter: | ф߰фڰфðфǰфŰф°фݰфŰфɰф̰фٰфpф0фְфѰфΰфذфưфðф۰фϰфȰфʰфܰфӰф˰фװфްфͰфհфҰфİфưфаффĢ | | Mark: | ф̰фͰффǰфpфΰфӰфȰфɰфʰфѰфаф˰фϰфҰф0ф¢ | | Number: | ০১২৩৪৫৬৭৮৯фװфذфٰфڰф۰фܰфݰфްф߰ф |
|
|
| Chechen | ce |
| Metadata |
|---|
| Tokenization: | L-54 | | Punctuation: | ‐–—…‘‚“„«»§ | | Letter: | аьбвгӏдеёжзийкхлмнопрстуфцчшщъыэюяАЬБВГӀДЕЁЖЗИЙКХЛМНОПРСТУФЦЧШЩЪЫЭЮЯ | | Mark: | ̈̆ |
|
| Chechen (Russia)
| ce-RU |
|
| Chechen {Cyrl} (Russia)
| ce-Cyrl-RU |
| Metadata |
|---|
| Tokenization: | c-461 |
|
| Chechen {Latn} (Russia)
| ce-Latn-RU |
| Metadata |
|---|
| Tokenization: | c-462 |
|
|
| Cebuano | ceb |
| Metadata |
|---|
| Tokenization: | L-359 |
|
| Cebuano (Philippines)
| ceb-PH |
| Metadata |
|---|
| Tokenization: | c-459 |
|
| Cebuano {Latn} (Philippines)
| ceb-Latn-PH |
|
|
| Falam Chin | cfm |
|
| Falam Chin (Myanmar)
| cfm-MM |
|
|
| Chiga | cgg |
|
| Chiga (Uganda)
| cgg-UG |
|
|
| Chamorro | ch |
| Metadata |
|---|
| Tokenization: | L-53 | | Letter: | ÅÑåñ | | Mark: | ̃̊ |
|
|
| Ojitlán Chinantec | chj |
| Metadata |
|---|
| Punctuation: | – | | Letter: | öíäñáéúïüëóÖÍÄÑÁÉÚÏÜËÓ | | Mark: | ̈́̃ |
|
|
| Chuukese | chk |
| Metadata |
|---|
| Tokenization: | L-613 |
|
|
| Mari (Russia) | chm |
|
|
| Cherokee | chr |
| Metadata |
|---|
| Tokenization: | L-432 | | Letter: | ᏸᏹᏺᏻᏼᎠᎡᎢᎣᎤᎥᎦᎧᎨᎩᎪᎫᎬᎭᎮᎯᎰᎱᎲᎳᎴᎵᎶᎷᎸᎹᎺᎻᎼᎽᎾᎿᏀᏁᏂᏃᏄᏅᏆᏇᏈᏉᏊᏋᏌᏍᏎᏏᏐᏑᏒᏓᏔᏕᏖᏗᏘᏙᏚᏛᏜᏝᏞᏟᏠᏡᏢᏣᏤᏥᏦᏧᏨᏩᏪᏫᏬᏭᏮᏯᏰᏱᏲᏳᏴꭰꭱꭲꭳꭴꭵꭶꭷꭸꭹꭺꭻꭼꭽꭾꭿꮀꮁꮂꮃꮄꮅꮆꮇꮈꮉꮊꮋꮌꮍꮎꮏꮐꮑꮒꮓꮔꮕꮖꮗꮘꮙꮚꮛꮜꮝꮞꮟꮠꮡꮢꮣꮤꮥꮦꮧꮨꮩꮪꮫꮬꮭꮮꮯꮰꮱꮲꮳꮴꮵꮶꮷꮸꮹꮺꮻꮼꮽꮾꮿ |
|
| Cherokee (United States)
| chr-US |
| Metadata |
|---|
| Tokenization: | c-463 |
|
|
| Chickasaw | cic |
| Metadata |
|---|
| Punctuation: | — | | Letter: | óáíÓÁÍ | | Mark: | ̱́ |
|
|
| Cimbrian | cim |
| Metadata |
|---|
| Tokenization: | L-732 |
|
| Cimbrian (Italy)
| cim-IT |
| Metadata |
|---|
| Tokenization: | c-780 |
|
|
| Western Cham | cja |
| Metadata |
|---|
| Tokenization: | L-745 |
|
| Western Cham {Arab} (Cambodia)
| cja-Arab-KH |
| Metadata |
|---|
| Tokenization: | c-797 |
|
| Western Cham {Cham} (Cambodia)
| cja-Cham-KH |
| Metadata |
|---|
| Tokenization: | c-798 |
|
|
| Chokwe | cjk |
| Metadata |
|---|
| Tokenization: | L-748 |
|
| Chokwe (Angola)
| cjk-AO |
| Metadata |
|---|
| Tokenization: | c-803 |
|
|
| Shor | cjs |
| Metadata |
|---|
| Letter: | кижнтолағыңудерцязчқшйъӱгьсмбюпӧэвфхКИЖНТОЛАҒЫҢУДЕРЦЯЗЧҚШЙЪӰГЬСМБЮПӦЭВФХЁЩщё | | Mark: | ̆̈ |
|
|
| Central Kurdish | ckb |
| Metadata |
|---|
| Tokenization: | L-360 | | Punctuation: | ٫٬٪؉ | | Letter: | ئابپتجچحخدرزڕژسشعغفڤقکگلڵمنھەوۆیێي | | Mark: | ٔ | | Number: | ١٢٣٤٥٦٧٨٩ |
|
| Central Kurdish {Latn}
| ckb-Latn |
| Metadata |
|---|
| Letter: | şŞûîêçÛÎÊÇ | | Mark: | ̧̂ |
|
| Central Kurdish {Arab} (Iraq)
| ckb-Arab-IQ |
| Metadata |
|---|
| Tokenization: | c-460 |
|
|
| Chaldean Neo-Aramaic | cld |
| Metadata |
|---|
| Tokenization: | L-614 |
|
|
| Mandarin Chinese | cmn |
| Metadata |
|---|
| Tokenization: | L-600 |
|
| Mandarin Chinese (China)
| cmn-CN |
| Metadata |
|---|
| Tokenization: | c-601 |
|
|
| Hakha Chin, Haka Chin | cnh |
| Metadata |
|---|
| Tokenization: | L-615 |
|
| Hakha Chin, Haka Chin {Latn} (Myanmar)
| cnh-Latn-MM |
|
| Hakha Chin, Haka Chin {Mymr} (Myanmar)
| cnh-Mymr-MM |
|
|
| Asháninka | cni |
| Metadata |
|---|
| Letter: | áéÁÉÑñ | | Mark: | ́̃ |
|
|
| Montenegrin | cnr |
| Metadata |
|---|
| Tokenization: | L-405 |
|
| Montenegrin (Montenegro)
| cnr-ME |
| Metadata |
|---|
| Tokenization: | c-734 |
|
| Montenegrin {Cyrl} (Montenegro)
| cnr-Cyrl-ME |
| Metadata |
|---|
| Tokenization: | c-512 |
|
| Montenegrin {Latn} (Montenegro)
| cnr-Latn-ME |
| Metadata |
|---|
| Tokenization: | c-513 |
|
|
| Corsican | co |
| Metadata |
|---|
| Tokenization: | L-58 | | Punctuation: | ’ | | Letter: | àèìùòÀÈÌÙÒ | | Mark: | ̀ |
|
| Corsican (France)
| co-FR |
| Metadata |
|---|
| Tokenization: | c-468 |
|
| Corsican (Italy)
| co-IT |
|
|
| Colorado | cof |
|
|
| Caquinte | cot |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | óÓ | | Mark: | ́ |
|
|
| Chinese Pidgin English | cpi |
| Metadata |
|---|
| Tokenization: | L-724 |
|
|
| Pichis Ashéninka | cpu |
| Metadata |
|---|
| Letter: | ñáéÑÁÉ | | Mark: | ̃́ |
|
|
| Cree | cr |
| Metadata |
|---|
| Tokenization: | L-59 |
|
|
| Crimean Tatar, Crimean Turkish | crh |
|
| Crimean Tatar, Crimean Turkish {Cyrl}
| crh-Cyrl |
| Metadata |
|---|
| Tokenization: | L-547 |
|
| Crimean Tatar, Crimean Turkish {Latn}
| crh-Latn |
| Metadata |
|---|
| Tokenization: | L-548 |
|
|
| Sãotomense | cri |
| Metadata |
|---|
| Letter: | çóêéáâôºíÇÓÊÉÁÂÔÍ | | Mark: | ̧́̂ |
|
|
| Seselwa Creole French | crs |
| Metadata |
|---|
| Tokenization: | L-419 | | Punctuation: | ’ | | Letter: | íÍ | | Mark: | ́ |
|
| Seselwa Creole French (Seychelles)
| crs-SC |
| Metadata |
|---|
| Tokenization: | c-533 |
|
|
| Czech | cs |
| Metadata |
|---|
| Tokenization: | L-63 | | Punctuation: | ‐–…‘‚“„§ | | Letter: | áéíóúýÁÉÍÓÚÝčďěňřšťůžČĎĚŇŘŠŤŮŽ | | Mark: | ́̌̊ |
|
| Czech (Czech Republic)
| cs-CZ |
| Metadata |
|---|
| Tokenization: | c-64 |
|
|
| Chiltepec Chinantec | csa |
| Metadata |
|---|
| Punctuation: | † | | Letter: | öüïóáñäëéíúÖÜÏÓÁÑÄËÉÍÚ | | Mark: | ̷̱̍̎̈́̃ |
|
|
| Kashubian | csb |
| Metadata |
|---|
| Tokenization: | L-386 |
|
| Kashubian (Poland)
| csb-PL |
| Metadata |
|---|
| Tokenization: | c-494 |
|
|
| Swampy Cree | csw |
| Metadata |
|---|
| Punctuation: | ᙮ | | Letter: | ᐁᐢᐱᑕᑲᒥᐠᐊᑭᒋᐃᑗᐎᐣᓂᑯᓯᓇᐅᔑᒧᓀᐡᑐᑌᑎᐸᐗᐳᒪᒶᐌᔭᓄᑾᔦᒣᐤᓴᓶᔕᑴᐯᐟᑫᓱᓉᐺᑡᐨᔓᑺᓋᔗᔾᔀᑊᔡᒬᒼ |
|
|
| Tedim Chin | ctd |
|
|
| Old Slavonic | cu |
| Metadata |
|---|
| Tokenization: | L-70 | | Punctuation: | ꙾꙳–—‐ | | Letter: | абвгдеєжѕзиіїйклмнѻоѡѽѿпрстуфхцчшщъыьѣюѧѫѯѱѳѵѷАБВГДЕЄЖЅЗИІЇЙКЛМНѺОѠѼѾПРСТУФХЦЧШЩЪЫЬѢЮѦѪѮѰѲѴѶꙿꙁꙍꙋꙗꙀꙌꙊꙖⸯ | | Mark: | ҇҃ⷠⷡⷢⷣⷤⷥⷦⷧⷨⷩⷪⷬⷭⷯⷱⷴ꙽ |
|
| Old Slavonic (Russia)
| cu-RU |
|
|
| Chuvash | cv |
| Metadata |
|---|
| Tokenization: | L-56 |
|
| Chuvash (Russia)
| cv-RU |
| Metadata |
|---|
| Tokenization: | c-466 |
|
|
| Welsh | cy |
| Metadata |
|---|
| Tokenization: | L-71 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | áàâäéèêëíìîïóòôöúùûüýÿÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜÝŵŷŴŶŸẃẁẅỳẂẀẄỲ | | Mark: | ́̀̂̈ |
|
| Welsh (United Kingdom)
| cy-GB |
| Metadata |
|---|
| Tokenization: | c-72 |
|
|
| Danish | da |
| Metadata |
|---|
| Tokenization: | L-65 | | Punctuation: | §‐–…‘’“”†′″ | | Letter: | æøåÆØÅ | | Mark: | ̊ |
|
| Danish (Denmark)
| da-DK |
| Metadata |
|---|
| Tokenization: | c-66 |
|
| Danish (Greenland)
| da-GL |
|
|
| Dagbani | dag |
| Metadata |
|---|
| Letter: | ƐƆƔƷŋŊɛɔɣʒ’ |
|
| Dagbani (Ghana)
| dag-GH |
|
|
| Taita, Dawida | dav |
|
| Taita, Dawida (Kenya)
| dav-KE |
|
|
| Dendi (Benin) | ddn |
| Metadata |
|---|
| Letter: | ãâõÃÂÕǎǒƆƐǍƉǑŋŊɔɛɖ | | Mark: | ̃̌̂ |
|
|
| German | de |
| Metadata |
|---|
| Tokenization: | L-73 | | Punctuation: | «»§‐–—…‘‚“„ | | Letter: | äößüÄÖÜ | | Mark: | ̈ |
|
| German (Austria)
| de-AT |
| Metadata |
|---|
| Tokenization: | c-74 |
|
| German (Belgium)
| de-BE |
| Metadata |
|---|
| Tokenization: | c-379 |
|
| German (Switzerland)
| de-CH |
| Metadata |
|---|
| Tokenization: | c-75 |
|
| German (Germany)
| de-DE |
| Metadata |
|---|
| Tokenization: | c-76 |
|
| German (Liechtenstein)
| de-LI |
| Metadata |
|---|
| Tokenization: | c-77 |
|
| German (Luxembourg)
| de-LU |
| Metadata |
|---|
| Tokenization: | c-78 |
|
| German (Netherlands)
| de-NL |
| Metadata |
|---|
| Tokenization: | c-380 |
|
|
| Southern Dagaare | dga |
| Metadata |
|---|
| Letter: | ãÃƐƆũŨɛɔ | | Mark: | ̃ |
|
|
| Dogri (individual language) | dgo |
| Metadata |
|---|
| Tokenization: | L-728 |
|
| Dogri (individual language) {Deva} (India)
| dgo-Deva-IN |
| Metadata |
|---|
| Tokenization: | c-776 |
|
|
| Dimli (individual language) | diq |
| Metadata |
|---|
| Tokenization: | L-569 |
|
| Dimli (individual language) (Turkey)
| diq-TR |
| Metadata |
|---|
| Tokenization: | c-570 |
|
|
| Zarma | dje |
| Metadata |
|---|
| Letter: | ãõÃÕƝŋšžŊŠŽẽẼɲ | | Mark: | ̃̌ |
|
| Zarma (Niger)
| dje-NE |
|
|
| Dass | dot |
| Metadata |
|---|
| Tokenization: | L-714 |
|
|
| Lower Sorbian | dsb |
| Metadata |
|---|
| Tokenization: | L-396 | | Punctuation: | «»§‐–—…‘’‚“„ | | Letter: | óÓčćěłńŕšśžźČĆĚŁŃŔŠŚŽŹ | | Mark: | ̌́ |
|
| Lower Sorbian (Germany)
| dsb-DE |
| Metadata |
|---|
| Tokenization: | c-502 |
|
|
| Kadazan Dusun, Central Dusun | dtp |
|
|
| Duala | dua |
| Metadata |
|---|
| Letter: | áéíóúÁÉÍÓÚƁƊƐƆŋūŊŪɓɗɛɔ | | Mark: | ́̄ |
|
| Duala (Cameroon)
| dua-CM |
|
|
| Drung | duu |
|
|
| Maldivian | dv |
| Metadata |
|---|
| Tokenization: | L-67 | | Punctuation: | ،؛ | | Letter: | ޑސމބރގއދޖލހޢނފކށވޙޤތޕޓޔޝޞޅޚޣޒޠޗޏޘޛޟޜޡޥޱ | | Mark: | ިެްަީުާޮޭޫޯ |
|
| Maldivian (India)
| dv-IN |
| Metadata |
|---|
| Tokenization: | c-716 |
|
| Maldivian (Maldives)
| dv-MV |
| Metadata |
|---|
| Tokenization: | c-68 |
|
|
| Jola-Fonyi | dyo |
| Metadata |
|---|
| Punctuation: | “”‰ | | Letter: | áéíñóúàÁÉÍÑÓÚÀŋŊ | | Mark: | ́̃̀ |
|
| Jola-Fonyi (Senegal)
| dyo-SN |
|
|
| Dyula | dyu |
| Metadata |
|---|
| Tokenization: | L-747 | | Punctuation: | ’‘ | | Letter: | úàìóáòùèíéÚÀÌÓÁÒÙÈÍÉƐƆƝŋŊɛɔɲ | | Mark: | ́̀ |
|
| Dyula {Arab} (Côte d'Ivoire)
| dyu-Arab-CI |
| Metadata |
|---|
| Tokenization: | c-802 |
|
| Dyula {Latn} (Côte d'Ivoire)
| dyu-Latn-CI |
| Metadata |
|---|
| Tokenization: | c-801 |
|
| Dyula {Nkoo} (Côte d'Ivoire)
| dyu-Nkoo-CI |
| Metadata |
|---|
| Tokenization: | c-800 |
|
|
| Dzongkha | dz |
| Metadata |
|---|
| Tokenization: | L-79 | | Punctuation: | ༼༽༄༅༆༈༉༊࿐࿑༒࿒࿓࿔༌།༎༏༐༑༔་§‐–—…‘’“”†‡ | | Letter: | ཀཁགངཅཆཇཉཏཐདནཔཕབམཙཚཛཝཞཟའཡརལཤསཧཨ | | Mark: | ིེོུྐྑྒྔྗྙྟྠྡྣྤྥྦྨྩྪྫྭྱྲླྵྶྷཱྕ | | Number: | ༡༢༣༤༥༦༧༨༩༠ |
|
| Dzongkha (Bhutan)
| dz-BT |
| Metadata |
|---|
| Tokenization: | c-470 |
|
|
| Embu, Kiembu | ebu |
| Metadata |
|---|
| Letter: | ĩũĨŨ | | Mark: | ̃ |
|
| Embu, Kiembu (Kenya)
| ebu-KE |
|
|
| Ewe | ee |
| Metadata |
|---|
| Tokenization: | L-80 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | áàãéèíìóòõúùÁÀÃÉÈÍÌÓÒÕÚÙƒƉƐƑƔƆƲĩŋũĨŊŨẽẼɖɛɣɔʋ | | Mark: | ́̀̃ |
|
| Ewe (Ghana)
| ee-GH |
| Metadata |
|---|
| Tokenization: | c-471 |
|
| Ewe (Togo)
| ee-TG |
|
|
| Standard Estonian | ekk |
| Metadata |
|---|
| Letter: | õäöüÕÄÖÜšžŠŽ | | Mark: | ̌̃̈ |
|
|
| Greek | el |
| Metadata |
|---|
| Tokenization: | L-81 | | Punctuation: | «»§‐–—… | | Letter: | ΆΈΉΊΌΎΏΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώ | | Mark: | ́̈ |
|
| Greek (Cyprus)
| el-CY |
| Metadata |
|---|
| Tokenization: | c-381 |
|
| Greek (Greece)
| el-GR |
| Metadata |
|---|
| Tokenization: | c-82 |
|
| Greek {Latn} (Greece)
| el-Latn-GR |
| Metadata |
|---|
| Tokenization: | c-819 |
|
|
| Eastern Maninkakan | emk |
|
| Eastern Maninkakan {Latn} (Guinea)
| emk-Latn-GN |
|
|
| English | en |
| Metadata |
|---|
| Tokenization: | L-83 | | Punctuation: | §‐–—…‘’“”†‡′″ |
|
| English (United Arab Emirates)
| en-AE |
| Metadata |
|---|
| Tokenization: | c-658 |
|
| English (Antigua and Barbuda)
| en-AG |
|
| English (Anguilla)
| en-AI |
|
| English (Antarctica)
| en-AQ |
| Metadata |
|---|
| Tokenization: | c-659 |
|
| English (American Samoa)
| en-AS |
|
| English (Asia)
| en-ASIA |
| Metadata |
|---|
| Tokenization: | c-717 |
|
| English (Austria)
| en-AT |
| Metadata |
|---|
| Tokenization: | c-660 |
|
| English (Australia)
| en-AU |
| Metadata |
|---|
| Tokenization: | c-84 |
|
| English (Barbados)
| en-BB |
|
| English (Bangladesh)
| en-BD |
| Metadata |
|---|
| Tokenization: | c-363 |
|
| English (Belgium)
| en-BE |
| Metadata |
|---|
| Tokenization: | c-661 |
|
| English (Bulgaria)
| en-BG |
| Metadata |
|---|
| Tokenization: | c-662 |
|
| English (Bahrain)
| en-BH |
| Metadata |
|---|
| Tokenization: | c-663 |
|
| English (Burundi)
| en-BI |
|
| English (Bermuda)
| en-BM |
|
| English (Bahamas)
| en-BS |
|
| English (Botswana)
| en-BW |
|
| English (Belize)
| en-BZ |
| Metadata |
|---|
| Tokenization: | c-85 |
|
| English (Canada)
| en-CA |
| Metadata |
|---|
| Tokenization: | c-86 |
|
| English (Caribbean)
| en-CB |
| Metadata |
|---|
| Tokenization: | c-87 |
|
| English (Cocos (Keeling) Islands)
| en-CC |
|
| English (Switzerland)
| en-CH |
| Metadata |
|---|
| Tokenization: | c-675 |
|
| English (Cook Islands)
| en-CK |
|
| English (Cameroon)
| en-CM |
|
| English (China)
| en-CN |
| Metadata |
|---|
| Tokenization: | c-664 |
|
| English (Christmas Island)
| en-CX |
|
| English (Cyprus)
| en-CY |
| Metadata |
|---|
| Tokenization: | c-665 |
|
| English (Czech Republic)
| en-CZ |
| Metadata |
|---|
| Tokenization: | c-666 |
|
| English (Germany)
| en-DE |
| Metadata |
|---|
| Tokenization: | c-667 |
|
| English (Denmark)
| en-DK |
| Metadata |
|---|
| Tokenization: | c-668 |
|
| English (Estonia)
| en-EE |
| Metadata |
|---|
| Tokenization: | c-669 |
|
| English (Egypt)
| en-EG |
| Metadata |
|---|
| Tokenization: | c-670 |
|
| English (Eritrea)
| en-ER |
|
| English (Finland)
| en-FI |
| Metadata |
|---|
| Tokenization: | c-671 |
|
| English (Fiji)
| en-FJ |
|
| English (Falkland Islands (Malvinas))
| en-FK |
|
| English (Micronesia)
| en-FM |
|
| English (France)
| en-FR |
| Metadata |
|---|
| Tokenization: | c-672 |
|
| English (United Kingdom)
| en-GB |
| Metadata |
|---|
| Tokenization: | c-88 |
|
| English (Grenada)
| en-GD |
|
| English (Guernsey)
| en-GG |
|
| English (Ghana)
| en-GH |
|
| English (Gibraltar)
| en-GI |
|
| English (Gambia)
| en-GM |
|
| English (Greece)
| en-GR |
| Metadata |
|---|
| Tokenization: | c-673 |
|
| English (South Georgia and the South Sandwich Islands)
| en-GS |
|
| English (Guam)
| en-GU |
|
| English (Guyana)
| en-GY |
|
| English (Hong Kong)
| en-HK |
| Metadata |
|---|
| Tokenization: | c-362 |
|
| English (Hungary)
| en-HU |
| Metadata |
|---|
| Tokenization: | c-674 |
|
| English (Indonesia)
| en-ID |
| Metadata |
|---|
| Tokenization: | c-365 |
|
| English (Ireland)
| en-IE |
| Metadata |
|---|
| Tokenization: | c-89 |
|
| English (Israel)
| en-IL |
| Metadata |
|---|
| Tokenization: | c-676 |
|
| English (Isle of Man)
| en-IM |
|
| English (India)
| en-IN |
| Metadata |
|---|
| Tokenization: | c-364 |
|
| English (International)
| en-INT |
| Metadata |
|---|
| Tokenization: | c-366 |
|
| English (British Indian Ocean Territory)
| en-IO |
|
| English (Iceland)
| en-IS |
| Metadata |
|---|
| Tokenization: | c-677 |
|
| English (Italy)
| en-IT |
| Metadata |
|---|
| Tokenization: | c-678 |
|
| English (Jersey)
| en-JE |
|
| English (Jamaica)
| en-JM |
| Metadata |
|---|
| Tokenization: | c-90 |
|
| English (Jordan)
| en-JO |
| Metadata |
|---|
| Tokenization: | c-368 |
|
| English (Japan)
| en-JP |
| Metadata |
|---|
| Tokenization: | c-367 |
|
| English (Kenya)
| en-KE |
| Metadata |
|---|
| Tokenization: | c-759 |
|
| English (Cambodia)
| en-KH |
| Metadata |
|---|
| Tokenization: | c-679 |
|
| English (Kiribati)
| en-KI |
|
| English (Saint Kitts and Nevis)
| en-KN |
|
| English (Kuwait)
| en-KW |
| Metadata |
|---|
| Tokenization: | c-680 |
|
| English (Cayman Islands)
| en-KY |
|
| English (Laos)
| en-LA |
| Metadata |
|---|
| Tokenization: | c-681 |
|
| English (Lebanon)
| en-LB |
| Metadata |
|---|
| Tokenization: | c-650 |
|
| English (Saint Lucia)
| en-LC |
|
| English (Sri Lanka)
| en-LK |
| Metadata |
|---|
| Tokenization: | c-683 |
|
| English (Liberia)
| en-LR |
|
| English (Lesotho)
| en-LS |
|
| English (Lithuania)
| en-LT |
| Metadata |
|---|
| Tokenization: | c-684 |
|
| English (Luxembourg)
| en-LU |
| Metadata |
|---|
| Tokenization: | c-685 |
|
| English (Latvia)
| en-LV |
| Metadata |
|---|
| Tokenization: | c-686 |
|
| English (Morocco)
| en-MA |
| Metadata |
|---|
| Tokenization: | c-687 |
|
| English (Madagascar)
| en-MG |
|
| English (Marshall Islands)
| en-MH |
|
| English (Macau)
| en-MO |
|
| English (Northern Mariana Islands)
| en-MP |
|
| English (Montserrat)
| en-MS |
|
| English (Malta)
| en-MT |
| Metadata |
|---|
| Tokenization: | c-651 |
|
| English (Mauritius)
| en-MU |
|
| English (Malawi)
| en-MW |
|
| English (Malaysia)
| en-MY |
| Metadata |
|---|
| Tokenization: | c-369 |
|
| English (Namibia)
| en-NA |
|
| English (Neutral)
| en-NEUTRAL |
| Metadata |
|---|
| Tokenization: | c-718 |
|
| English (Norfolk Island)
| en-NF |
|
| English (Nigeria)
| en-NG |
| Metadata |
|---|
| Tokenization: | c-689 |
|
| English (Netherlands)
| en-NL |
| Metadata |
|---|
| Tokenization: | c-690 |
|
| English (Norway)
| en-NO |
| Metadata |
|---|
| Tokenization: | c-691 |
|
| English (Nauru)
| en-NR |
|
| English (Niue)
| en-NU |
|
| English (New Zealand)
| en-NZ |
| Metadata |
|---|
| Tokenization: | c-91 |
|
| English (Oman)
| en-OM |
| Metadata |
|---|
| Tokenization: | c-692 |
|
| English (Papua New Guinea)
| en-PG |
|
| English (Philippines)
| en-PH |
| Metadata |
|---|
| Tokenization: | c-92 |
|
| English (Pirate)
| en-PI |
| Metadata |
|---|
| Tokenization: | c-371 |
|
| English (Pakistan)
| en-PK |
| Metadata |
|---|
| Tokenization: | c-370 |
|
| English (Pitcairn)
| en-PN |
|
| English (Puerto Rico)
| en-PR |
| Metadata |
|---|
| Tokenization: | c-372 |
|
| English (Portugal)
| en-PT |
| Metadata |
|---|
| Tokenization: | c-693 |
|
| English (Palau)
| en-PW |
|
| English (Qatar)
| en-QA |
| Metadata |
|---|
| Tokenization: | c-694 |
|
| English (Romania)
| en-RO |
| Metadata |
|---|
| Tokenization: | c-695 |
|
| English (Rwanda)
| en-RW |
|
| English (Saudi Arabia)
| en-SA |
| Metadata |
|---|
| Tokenization: | c-652 |
|
| English (Solomon Islands)
| en-SB |
|
| English (Seychelles)
| en-SC |
|
| English (Sudan)
| en-SD |
|
| English (Sweden)
| en-SE |
| Metadata |
|---|
| Tokenization: | c-697 |
|
| English (Singapore)
| en-SG |
| Metadata |
|---|
| Tokenization: | c-373 |
|
| English (Saint Helena, Ascension and Tristan da Cunha)
| en-SH |
|
| English (Slovenia)
| en-SI |
| Metadata |
|---|
| Tokenization: | c-699 |
|
| English (Slovakia)
| en-SK |
| Metadata |
|---|
| Tokenization: | c-698 |
|
| English (Sierra Leone)
| en-SL |
|
| English (South Sudan)
| en-SS |
|
| English (Sint Maarten (Dutch part))
| en-SX |
|
| English (Swaziland)
| en-SZ |
|
| English (Turks and Caicos Islands)
| en-TC |
|
| English (Thailand)
| en-TH |
| Metadata |
|---|
| Tokenization: | c-700 |
|
| English (Tokelau)
| en-TK |
|
| English (Tonga)
| en-TO |
|
| English (Trinidad and Tobago)
| en-TT |
| Metadata |
|---|
| Tokenization: | c-93 |
|
| English (Tuvalu)
| en-TV |
|
| English (Taiwan)
| en-TW |
| Metadata |
|---|
| Tokenization: | c-701 |
|
| English (Tanzania, United Republic of)
| en-TZ |
|
| English (Upside Down)
| en-UD |
| Metadata |
|---|
| Tokenization: | c-374 |
|
| English (Uganda)
| en-UG |
|
| English (United States Minor Outlying Islands)
| en-UM |
|
| English (United States)
| en-US |
| Metadata |
|---|
| Tokenization: | c-94 |
|
| English (Uruguay)
| en-UY |
| Metadata |
|---|
| Tokenization: | c-702 |
|
| English (Saint Vincent and the Grenadines)
| en-VC |
|
| English (Virgin Islands, British)
| en-VG |
|
| English (Virgin Islands, U.S.)
| en-VI |
|
| English (Vietnam)
| en-VN |
| Metadata |
|---|
| Tokenization: | c-703 |
|
| English (Vanuatu)
| en-VU |
|
| English (Samoa)
| en-WS |
|
| English (South Africa)
| en-ZA |
| Metadata |
|---|
| Tokenization: | c-95 |
|
| English (Zambia)
| en-ZM |
|
| English (Zimbabwe)
| en-ZW |
| Metadata |
|---|
| Tokenization: | c-96 |
|
|
| Esperanto | eo |
| Metadata |
|---|
| Tokenization: | L-97 | | Punctuation: | ‐–—…‘’“” | | Letter: | ĉĝĥĵŝŭĈĜĤĴŜŬ | | Mark: | ̂̆ |
|
| Esperanto (International)
| eo-INT |
|
|
| Spanish | es |
| Metadata |
|---|
| Tokenization: | L-98 | | Punctuation: | ‐–—…‘’“”†‡′″¡¿«»§ | | Letter: | áéíïñóúüýÁÉÍÏÑÓÚÜÝ | | Mark: | ́̈̃ |
|
| Spanish (Andorra)
| es-AD |
| Metadata |
|---|
| Tokenization: | c-704 |
|
| Spanish (Argentina)
| es-AR |
| Metadata |
|---|
| Tokenization: | c-99 |
|
| Spanish (Bolivia)
| es-BO |
| Metadata |
|---|
| Tokenization: | c-100 |
|
| Spanish (Chile)
| es-CL |
| Metadata |
|---|
| Tokenization: | c-101 |
|
| Spanish (Colombia)
| es-CO |
| Metadata |
|---|
| Tokenization: | c-102 |
|
| Spanish (Costa Rica)
| es-CR |
| Metadata |
|---|
| Tokenization: | c-103 |
|
| Spanish (Cuba)
| es-CU |
| Metadata |
|---|
| Tokenization: | c-721 |
|
| Spanish (Dominican Republic)
| es-DO |
| Metadata |
|---|
| Tokenization: | c-104 |
|
| Spanish (Ecuador)
| es-EC |
| Metadata |
|---|
| Tokenization: | c-105 |
|
| Spanish (Spain)
| es-ES |
| Metadata |
|---|
| Tokenization: | c-106 |
|
| Spanish (Equatorial Guinea)
| es-GQ |
|
| Spanish (Guatemala)
| es-GT |
| Metadata |
|---|
| Tokenization: | c-107 |
|
| Spanish (Heard Island and McDonald Islands)
| es-HM |
|
| Spanish (Honduras)
| es-HN |
| Metadata |
|---|
| Tokenization: | c-108 |
|
| Spanish (International)
| es-INT |
| Metadata |
|---|
| Tokenization: | c-719 |
|
| Spanish (Latin America)
| es-LAT |
| Metadata |
|---|
| Tokenization: | c-422 |
|
| Spanish (Mexico)
| es-MX |
| Metadata |
|---|
| Tokenization: | c-109 |
|
| Spanish (Neutral)
| es-NEUTRAL |
| Metadata |
|---|
| Tokenization: | c-720 |
|
| Spanish (Nicaragua)
| es-NI |
| Metadata |
|---|
| Tokenization: | c-110 |
|
| Spanish (Panama)
| es-PA |
| Metadata |
|---|
| Tokenization: | c-111 |
|
| Spanish (Peru)
| es-PE |
| Metadata |
|---|
| Tokenization: | c-112 |
|
| Spanish (Philippines)
| es-PH |
|
| Spanish (Puerto Rico)
| es-PR |
| Metadata |
|---|
| Tokenization: | c-113 |
|
| Spanish (Paraguay)
| es-PY |
| Metadata |
|---|
| Tokenization: | c-114 |
|
| Spanish (El Salvador)
| es-SV |
| Metadata |
|---|
| Tokenization: | c-115 |
|
| Spanish (Universal)
| es-UN |
| Metadata |
|---|
| Tokenization: | c-427 |
|
| Spanish (United States)
| es-US |
| Metadata |
|---|
| Tokenization: | c-424 |
|
| Spanish (Uruguay)
| es-UY |
| Metadata |
|---|
| Tokenization: | c-116 |
|
| Spanish (Venezuela)
| es-VE |
| Metadata |
|---|
| Tokenization: | c-117 |
|
|
| Estonian | et |
| Metadata |
|---|
| Tokenization: | L-118 |
|
| Estonian (Estonia)
| et-EE |
| Metadata |
|---|
| Tokenization: | c-119 |
|
|
| Basque | eu |
| Metadata |
|---|
| Tokenization: | L-38 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | çñÇÑ | | Mark: | ̧̃ |
|
| Basque (Spain)
| eu-ES |
| Metadata |
|---|
| Tokenization: | c-39 |
|
|
| Even | eve |
| Metadata |
|---|
| Punctuation: | ‐ | | Letter: | стаьябэйилокчурмнхдеҥгөыцпвһюзѳшжъфщСТАЬЯБЭЙИЛОКЧУРМНХДЕҤГӨЫЦПВҺЮЗѲШЖЪФЩ | | Mark: | ̆ |
|
|
| Evenki | evn |
| Metadata |
|---|
| Punctuation: | – | | Letter: | упкатңилэбгдерӣынӯмвчзоюцяьйсёһъщжхфУПКАТҢИЛЭБГДЕРӢЫНӮМВЧЗОЮЦЯЬЙСЁҺЪЩЖХФ | | Mark: | ̄̆̈ |
|
|
| Ewondo | ewo |
| Metadata |
|---|
| Tokenization: | L-734 | | Letter: | áàâéèêíìîóòôúùûÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛǎǐǹǒǔǍƏƐǏǸǑƆǓěńŋĚŃŊəɛɔ | | Mark: | ́̀̂̌ |
|
| Ewondo (Cameroon)
| ewo-CM |
| Metadata |
|---|
| Tokenization: | c-784 |
|
|
| Persian | fa |
| Metadata |
|---|
| Tokenization: | L-120 |
|
| Persian (Iran)
| fa-IR |
| Metadata |
|---|
| Tokenization: | c-121 |
|
| Persian {Latn} (Iran)
| fa-Latn-IR |
| Metadata |
|---|
| Tokenization: | c-815 |
|
|
| Guinean Fang | fan |
|
| Guinean Fang (Equatorial Guinea)
| fan-GQ |
|
|
| Fanti | fat |
| Metadata |
|---|
| Tokenization: | L-616 | | Letter: | ãõÃÕƆƐɔɛ | | Mark: | ̃ |
|
|
| Fulah | ff |
| Metadata |
|---|
| Tokenization: | L-122 | | Letter: | ñÑƴƁƊƳŋŊɓɗ | | Mark: | ̃ |
|
| Fulah {Latn} (Senegal)
| ff-Latn-SN |
|
|
| Maasina Fulfulde | ffm |
| Metadata |
|---|
| Tokenization: | L-710 |
|
| Maasina Fulfulde {Latn} (Mali)
| ffm-Latn-ML |
| Metadata |
|---|
| Tokenization: | c-478 |
|
|
| Finnish | fi |
| Metadata |
|---|
| Tokenization: | L-123 | | Punctuation: | »§‐–…’” | | Letter: | åäöÅÄÖšžŠŽ | | Mark: | ̌̊̈ |
|
| Finnish (Finland)
| fi-FI |
| Metadata |
|---|
| Tokenization: | c-124 |
|
|
| Filipino, Pilipino | fil |
| Metadata |
|---|
| Tokenization: | L-375 | | Punctuation: | §‐–—…‘’“”′″ | | Letter: | ñÑ | | Mark: | ̃ |
|
| Filipino, Pilipino (Philippines)
| fil-PH |
| Metadata |
|---|
| Tokenization: | c-473 |
|
|
| Tornedalen Finnish | fit |
| Metadata |
|---|
| Tokenization: | L-402 |
|
|
| Fijian | fj |
| Metadata |
|---|
| Tokenization: | L-125 |
|
| Fijian (Fiji)
| fj-FJ |
| Metadata |
|---|
| Tokenization: | c-472 |
|
|
| Faroese | fo |
| Metadata |
|---|
| Tokenization: | L-126 | | Punctuation: | ́§‐–…‘’“”†′″ | | Letter: | áðíóúýæøÁÐÍÓÚÝÆØ | | Mark: | ́ |
|
| Faroese (Denmark)
| fo-DK |
|
| Faroese (Faroe Islands)
| fo-FO |
| Metadata |
|---|
| Tokenization: | c-127 |
|
|
| Fon | fon |
| Metadata |
|---|
| Letter: | óéòèáúàìùíÓÉÒÈÁÚÀÌÙÍǎǐǔƐƆƉǍǏǓěđĚĐɛɔɖ | | Mark: | ́̌̀ |
|
| Fon (Benin)
| fon-BJ |
|
|
| French | fr |
| Metadata |
|---|
| Tokenization: | L-128 | | Punctuation: | «»§‐–—…’“”†‡ | | Letter: | àâæçéèêëîïôùûüÿÀÂÆÇÉÈÊËÎÏÔÙÛÜœŒŸ | | Mark: | ̧̀̂́̈ |
|
| French (Sub-Saharan Africa)
| fr-202 |
|
| French (Belgium)
| fr-BE |
| Metadata |
|---|
| Tokenization: | c-129 |
|
| French (Burkina Faso)
| fr-BF |
|
| French (Burundi)
| fr-BI |
|
| French (Benin)
| fr-BJ |
|
| French (Saint Barthélemy)
| fr-BL |
|
| French (Canada)
| fr-CA |
| Metadata |
|---|
| Tokenization: | c-130 |
|
| French (Caribbean)
| fr-CB |
|
| French (Democratic Republic of the Congo)
| fr-CD |
| Metadata |
|---|
| Tokenization: | c-760 |
|
| French (Central African Republic)
| fr-CF |
|
| French (Congo)
| fr-CG |
|
| French (Switzerland)
| fr-CH |
| Metadata |
|---|
| Tokenization: | c-131 |
|
| French (Côte d'Ivoire)
| fr-CI |
|
| French (Cameroon)
| fr-CM |
|
| French (Djibouti)
| fr-DJ |
|
| French (Algeria)
| fr-DZ |
|
| French (France)
| fr-FR |
| Metadata |
|---|
| Tokenization: | c-132 |
|
| French (Gabon)
| fr-GA |
|
| French (French Guiana)
| fr-GF |
|
| French (Guinea)
| fr-GN |
|
| French (Guadeloupe)
| fr-GP |
|
| French (Haiti)
| fr-HT |
|
| French (International)
| fr-INT |
| Metadata |
|---|
| Tokenization: | c-829 |
|
| French (Comoros)
| fr-KM |
|
| French (Luxembourg)
| fr-LU |
| Metadata |
|---|
| Tokenization: | c-133 |
|
| French (Morocco)
| fr-MA |
| Metadata |
|---|
| Tokenization: | c-653 |
|
| French (Monaco)
| fr-MC |
| Metadata |
|---|
| Tokenization: | c-134 |
|
| French (Saint Martin)
| fr-MF |
|
| French (Madagascar)
| fr-MG |
|
| French (Maghreb)
| fr-MGB |
|
| French (Mali)
| fr-ML |
|
| French (Martinique)
| fr-MQ |
|
| French (Mauritania)
| fr-MR |
|
| French (Mauritius)
| fr-MU |
|
| French (New Caledonia)
| fr-NC |
|
| French (Niger)
| fr-NE |
|
| French (French Polynesia)
| fr-PF |
|
| French (Saint Pierre and Miquelon)
| fr-PM |
|
| French (Quebec)
| fr-QC |
| Metadata |
|---|
| Tokenization: | c-376 |
|
| French (Réunion)
| fr-RE |
|
| French (Rwanda)
| fr-RW |
|
| French (Seychelles)
| fr-SC |
|
| French (Senegal)
| fr-SN |
|
| French (Syria)
| fr-SY |
|
| French (Chad)
| fr-TD |
|
| French (Togo)
| fr-TG |
|
| French (Tunisia)
| fr-TN |
|
| French (Vanuatu)
| fr-VU |
|
| French (Wallis and Futuna)
| fr-WF |
|
| French (Mayotte)
| fr-YT |
|
|
| Arpitan, Francoprovençal | frp |
| Metadata |
|---|
| Tokenization: | L-351 |
|
|
| Adamawa Fulfulde | fub |
|
| Adamawa Fulfulde {Arab} (Cameroon)
| fub-Arab-CM |
|
| Adamawa Fulfulde {Latn} (Cameroon)
| fub-Latn-CM |
|
|
| Pulaar | fuc |
|
| Pulaar {Latn}
| fuc-Latn |
| Metadata |
|---|
| Tokenization: | L-477 |
|
| Pulaar {Arab} (Gambia)
| fuc-Arab-GM |
|
| Pulaar {Latn} (Gambia)
| fuc-Latn-GM |
|
| Pulaar {Arab} (Senegal)
| fuc-Arab-SN |
|
| Pulaar {Latn} (Senegal)
| fuc-Latn-SN |
|
|
| Pular | fuf |
|
| Pular {Adlm}
| fuf-Adlm |
| Metadata |
|---|
| Punctuation: | ޥ߰ޥޢف⁏؟ | | Letter: | ޤͰޤ˰ޤհޤӰޤ˰ޤɰޤΰޤ̰ޤ°ޤ0ޤȰޤưޤܰޤڰޤװޤհޤӰޤѰޤٰޤװޤưޤİޤɰޤǰޤðޤpޤϰޤͰޤְޤޤڰޤذޤ̰ޤʰޤǰޤŰޤذޤְޤŰޤðޤҰޤаޤ۰ޤٰޤݰޤ۰ޤѰޤϰޤʰޤȰޤİޤ°ޤޤҰޤаޤΰޤްޤܰޤ߰ޤݰޥ0ޤްޥpޤ߰ޥ°ޤ0ޥðޤpޥˢ | | Mark: | ޥʰޥưޥŰޥİޥȰޥɰޥǢ | | Number: | ޥаޥѰޥҰޥӰޥޥհޥְޥװޥذޥ٢ |
|
| Pular {Arab} (Guinea)
| fuf-Arab-GN |
| Metadata |
|---|
| Tokenization: | c-476 |
|
| Pular {Latn} (Guinea)
| fuf-Latn-GN |
| Metadata |
|---|
| Tokenization: | c-475 |
|
| Pular {Latn} (Nigeria)
| fuf-Latn-NG |
| Metadata |
|---|
| Tokenization: | c-762 |
|
|
| Western Niger Fula | fuh |
|
| Western Niger Fula {Arab} (Niger)
| fuh-Arab-NE |
|
| Western Niger Fula {Latn} (Niger)
| fuh-Latn-NE |
|
|
| Friulian | fur |
| Metadata |
|---|
| Tokenization: | L-377 | | Letter: | àâçèêìîòôùûÀÂÇÈÊÌÎÒÔÙÛ | | Mark: | ̧̀̂ |
|
| Friulian (Italy)
| fur-IT |
| Metadata |
|---|
| Tokenization: | c-474 |
|
|
| Nigerian Fulfulde | fuv |
| Metadata |
|---|
| Tokenization: | L-720 |
|
| Nigerian Fulfulde {Latn} (Nigeria)
| fuv-Latn-NG |
| Metadata |
|---|
| Tokenization: | c-756 |
|
|
| Fur | fvr |
|
|
| Western Frisian | fy |
| Metadata |
|---|
| Tokenization: | L-135 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | ûâêúôòëïáàäéèíóöüýÛÂÊÚÔÒËÏÁÀÄÉÈÍÓÖÜÝ | | Mark: | ̂́̀̈ |
|
| Western Frisian (Netherlands)
| fy-NL |
| Metadata |
|---|
| Tokenization: | c-561 |
|
|
| Irish | ga |
| Metadata |
|---|
| Tokenization: | L-136 | | Letter: | áéíóúÁÉÍÓÚ | | Mark: | ́ |
|
| Irish (Ireland)
| ga-IE |
| Metadata |
|---|
| Tokenization: | c-490 |
|
|
| Ga | gaa |
| Metadata |
|---|
| Tokenization: | L-378 | | Letter: | ãÃƆƐŋŊɔɛ | | Mark: | ̃ |
|
| Ga (Ghana)
| gaa-GH |
| Metadata |
|---|
| Tokenization: | c-479 |
|
|
| Gagauz | gag |
| Metadata |
|---|
| Punctuation: | — | | Letter: | üäêöçÜÄÊÖÇışţŞİŢ | | Mark: | ̧̇̈̂ |
|
| Gagauz (Moldova)
| gag-MD |
|
|
| Borana-Arsi-Guji Oromo | gax |
|
|
| Gaelic | gd |
| Metadata |
|---|
| Tokenization: | L-137 | | Letter: | ìàòèùÌÀÒÈÙ | | Mark: | ̀ |
|
| Gaelic (United Kingdom)
| gd-GB |
| Metadata |
|---|
| Tokenization: | c-532 |
|
| Gaelic (Ireland)
| gd-IE |
| Metadata |
|---|
| Tokenization: | c-755 |
|
|
| Gilbertese | gil |
|
|
| Gonja | gjn |
|
|
| Galician | gl |
| Metadata |
|---|
| Tokenization: | L-138 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | áéíñóúüªÁÉÍÑÓÚÜ | | Mark: | ́̃̈ |
|
| Galician (Spain)
| gl-ES |
| Metadata |
|---|
| Tokenization: | c-139 |
|
|
| Nanai | gld |
| Metadata |
|---|
| Punctuation: | – | | Letter: | найпрвослиебщдкцягьмзюуёчэӈтхӣӯъфжНАЙПРВОСЛИЕБЩДКЦЯГЬМЗЮУЁЧЭӇТХӢӮЪФЖ | | Mark: | ̄̆̈ |
|
|
| Guarani | gn |
| Metadata |
|---|
| Tokenization: | L-140 |
|
| Guarani (Paraguay)
| gn-PY |
| Metadata |
|---|
| Tokenization: | c-481 |
|
|
| Gronings | gos |
|
|
| Gothic | got |
| Metadata |
|---|
| Tokenization: | L-751 |
|
|
| Ancient Greek | grc |
| Metadata |
|---|
| Tokenization: | L-350 |
|
| Ancient Greek (Greece)
| grc-GR |
| Metadata |
|---|
| Tokenization: | c-771 |
|
|
| Swiss German, Alemannic, Alsatian | gsw |
| Metadata |
|---|
| Letter: | äöüÄÖÜ | | Mark: | ̈ |
|
| Swiss German, Alemannic, Alsatian (Switzerland)
| gsw-CH |
|
| Swiss German, Alemannic, Alsatian (France)
| gsw-FR |
|
| Swiss German, Alemannic, Alsatian (Liechtenstein)
| gsw-LI |
|
|
| Gujarati | gu |
| Metadata |
|---|
| Tokenization: | L-141 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | ૐઅઆઇઈઉઊઋૠઍએઐઑઓઔકખગઘઙચછજઝઞટઠડઢણતથદધનપફબભમયરલવશષસહળઽ | | Mark: | ઼ંઁઃાિીુૂૃૄૅેૈૉોૌ્ | | Number: | ૧૨૩૪૫૬૭૮૯૦ |
|
| Gujarati (India)
| gu-IN |
| Metadata |
|---|
| Tokenization: | c-142 |
|
|
| Wayuu | guc |
| Metadata |
|---|
| Letter: | üñÜÑ | | Mark: | ̈̃ |
|
|
| Paraguayan Guaraní | gug |
| Metadata |
|---|
| Letter: | óáñéãíúõèÓÁÑÉÃÍÚÕÈʼĩũĨŨẽẼ | | Mark: | ́̃̀ |
|
|
| Yanomamö | guu |
| Metadata |
|---|
| Letter: | ëãáõíËÃÁÕÍĩũĨŨẽẼ | | Mark: | ̈̃́ |
|
|
| Gusii, Ekegusii | guz |
| Metadata |
|---|
| Tokenization: | L-740 |
|
| Gusii, Ekegusii (Kenya)
| guz-KE |
| Metadata |
|---|
| Tokenization: | c-792 |
|
|
| Manx | gv |
| Metadata |
|---|
| Tokenization: | L-143 | | Punctuation: | ’ | | Letter: | çÇ | | Mark: | ̧ |
|
| Manx (United Kingdom)
| gv-GB |
| Metadata |
|---|
| Tokenization: | c-509 |
|
| Manx (Isle of Man)
| gv-IM |
|
|
| Guarayu | gyr |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | ëñäüöéïËÑÄÜÖÉÏ | | Mark: | ̈̃́ |
|
|
| Hausa | ha |
| Metadata |
|---|
| Tokenization: | L-144 | | Punctuation: | ‐’‘ | | Letter: | ƙƴƁƊƘƳɓɗʼ |
|
| Hausa {Arab}
| ha-Arab |
| Metadata |
|---|
| Punctuation: | ،؟‹›«» | | Letter: | أإابتثجحدرزسشطعغلموىٻڟکیۑࢻࢼࢽݣࣃࣄ | | Mark: | َُِّْٰٕٜٔ |
|
| Hausa {Latn} (Ghana)
| ha-Latn-GH |
|
| Hausa {Latn} (Niger)
| ha-Latn-NE |
|
| Hausa (Nigeria)
| ha-NG |
| Metadata |
|---|
| Tokenization: | c-722 |
|
| Hausa {Latn} (Nigeria)
| ha-Latn-NG |
| Metadata |
|---|
| Tokenization: | c-483 |
|
|
| Hakka Chinese | hak |
| Metadata |
|---|
| Tokenization: | L-617 |
|
|
| Hawaiian | haw |
| Metadata |
|---|
| Tokenization: | L-382 | | Punctuation: | ’‘“” | | Letter: | āēīōūĀĒĪŌŪʻ | | Mark: | ̄ |
|
| Hawaiian (United States)
| haw-US |
| Metadata |
|---|
| Tokenization: | c-484 |
|
|
| Serbo-Croatian | hbs |
| Metadata |
|---|
| Tokenization: | L-618 |
|
|
| Hebrew | he |
| Metadata |
|---|
| Tokenization: | L-145 | | Punctuation: | ׳״־‐–— | | Letter: | אבגדהוזחטיכךלמםנןסעפףצץקרשת |
|
| Hebrew (Israel)
| he-IL |
| Metadata |
|---|
| Tokenization: | c-146 |
|
|
| Hindi | hi |
| Metadata |
|---|
| Tokenization: | L-147 | | Punctuation: | ।॥॰‘’“”— | | Letter: | अआइईउऊऋएऐओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसह\u{958}\u{959}\u{95A}\u{95B}\u{95C}\u{95D}\u{95E} | | Mark: | ँंः़ािीुूृेैोौ् | | Number: | १२३४५६७८९ |
|
| Hindi (India)
| hi-IN |
| Metadata |
|---|
| Tokenization: | c-148 |
|
| Hindi {Latn} (India)
| hi-Latn-IN |
| Metadata |
|---|
| Tokenization: | c-822 |
|
|
| Fiji Hindi | hif |
|
| Fiji Hindi (Fiji)
| hif-FJ |
|
|
| Hiligaynon | hil |
| Metadata |
|---|
| Tokenization: | L-383 |
|
| Hiligaynon (Philippines)
| hil-PH |
| Metadata |
|---|
| Tokenization: | c-485 |
|
|
| Matu Chin | hlt |
| Metadata |
|---|
| Tokenization: | L-619 |
|
|
| Hmong, Mong | hmn |
| Metadata |
|---|
| Tokenization: | L-384 |
|
| Hmong, Mong (United States)
| hmn-US |
| Metadata |
|---|
| Tokenization: | c-723 |
|
|
| Mina (Cameroon) | hna |
| Metadata |
|---|
| Letter: | éáìóòúíàèùÉÁÌÓÒÚÍÀÈÙǒǐǔǹƉƐƆǑǏǓǸŋŊɖɛɔ | | Mark: | ̀́̌ |
|
|
| Hani | hni |
|
|
| Hanunoo | hnn |
| Metadata |
|---|
| Punctuation: | ᜵᜶ | | Letter: | ᜩᜦᜣᜪᜧᜤᜰᜱᜫᜨᜥᜯᜭᜮᜬᜠᜡᜢ | | Mark: | ᜲᜳ᜴ |
|
| Hanunoo {Hano}
| hnn-Hano |
| Metadata |
|---|
| Punctuation: | ᜵᜶ | | Letter: | ᜩᜦᜣᜪᜧᜤᜰᜱᜫᜨᜥᜯᜭᜮᜬᜠᜡᜢ | | Mark: | ᜲᜳ᜴ |
|
|
| Caribbean Hindustani | hns |
| Metadata |
|---|
| Punctuation: | ‘’ | | Letter: | áêòíèàëÁÊÒÍÈÀË | | Mark: | ́̂̀̈ |
|
|
| Hiri Motu | ho |
| Metadata |
|---|
| Tokenization: | L-149 |
|
| Hiri Motu (Papua New Guinea)
| ho-PG |
| Metadata |
|---|
| Tokenization: | c-486 |
|
|
| Croatian | hr |
| Metadata |
|---|
| Tokenization: | L-60 | | Punctuation: | ‐–—…‘’‚“”„′″ | | Letter: | čćžđšČĆŽĐŠ | | Mark: | ̌́ |
|
| Croatian (Bosnia and Herzegovina)
| hr-BA |
| Metadata |
|---|
| Tokenization: | c-61 |
|
| Croatian (Croatia)
| hr-HR |
| Metadata |
|---|
| Tokenization: | c-62 |
|
|
| Upper Sorbian | hsb |
| Metadata |
|---|
| Tokenization: | L-428 | | Punctuation: | «»§‐–—…‘’‚“„ | | Letter: | čćźěłńřšžČĆŹĚŁŃŘŠŽóÓ | | Mark: | ̌́ |
|
| Upper Sorbian (Germany)
| hsb-DE |
| Metadata |
|---|
| Tokenization: | c-555 |
|
|
| Haitian | ht |
| Metadata |
|---|
| Tokenization: | L-150 | | Letter: | èéòÈÉÒ | | Mark: | ̀́ |
|
| Haitian (Haiti)
| ht-HT |
| Metadata |
|---|
| Tokenization: | c-482 |
|
|
| Hungarian | hu |
| Metadata |
|---|
| Tokenization: | L-151 | | Punctuation: | «»§–…’”„ | | Letter: | áéíóöúüÁÉÍÓÖÚÜőűŐŰ | | Mark: | ́̈̋ |
|
| Hungarian (Hungary)
| hu-HU |
| Metadata |
|---|
| Tokenization: | c-152 |
|
|
| Hupa | hup |
|
|
| Huastec | hus |
| Metadata |
|---|
| Letter: | íáúéóàÍÁÚÉÓÀ | | Mark: | °́̀ |
|
|
| Murui Huitoto | huu |
| Metadata |
|---|
| Letter: | úñáÚÑÁƗɨ | | Mark: | ́̃ |
|
|
| Armenian | hy |
| Metadata |
|---|
| Tokenization: | L-27 | | Punctuation: | ֊՝՜՞՛։․«» | | Letter: | աբգդեզէըթժիլխծկհձղճմյնշոչպջռսվտրցւփքևօֆԱԲԳԴԵԶԷԸԹԺԻԼԽԾԿՀՁՂՃՄՅՆՇՈՉՊՋՌՍՎՏՐՑՒՓՔՕՖ |
|
| Armenian (Armenia)
| hy-AM |
| Metadata |
|---|
| Tokenization: | c-28 |
|
| Armenian {Latn} (Armenia)
| hy-Latn-AM |
| Metadata |
|---|
| Tokenization: | c-812 |
|
|
| Herero | hz |
| Metadata |
|---|
| Tokenization: | L-153 |
|
|
| Interlingua | ia |
| Metadata |
|---|
| Tokenization: | L-154 |
|
| Interlingua (France)
| ia-FR |
|
| Interlingua (International)
| ia-INT |
|
|
| Iban | iba |
|
|
| Ibibio | ibb |
|
| Ibibio (Nigeria)
| ibb-NG |
|
|
| Indonesian | id |
| Metadata |
|---|
| Tokenization: | L-155 | | Punctuation: | ‐–—…‘’“” |
|
| Indonesian (Indonesia)
| id-ID |
| Metadata |
|---|
| Tokenization: | c-156 |
|
|
| Interlingue | ie |
| Metadata |
|---|
| Tokenization: | L-157 |
|
|
| Igbo | ig |
| Metadata |
|---|
| Tokenization: | L-158 | | Punctuation: | ‐ | | Letter: | ẹịṅọụẸỊṄỌỤ | | Mark: | ̣̇ |
|
| Igbo (Nigeria)
| ig-NG |
| Metadata |
|---|
| Tokenization: | c-487 |
|
|
| Nuosu | ii |
| Metadata |
|---|
| Tokenization: | L-159 | | Punctuation: | 《》。、,(): | | Letter: | ꀀꀁꀂꀃꀄꀅꀆꀇꀈꀉꀊꀋꀌꀍꀎꀏꀐꀑꀒꀓꀔꀕꀖꀗꀘꀙꀚꀛꀜꀝꀞꀟꀠꀡꀢꀣꀤꀥꀦꀧꀨꀩꀪꀫꀬꀭꀮꀯꀰꀱꀲꀳꀴꀵꀶꀷꀸꀹꀺꀻꀼꀽꀾꀿꁀꁁꁂꁃꁄꁅꁆꁇꁈꁉꁊꁋꁌꁍꁎꁏꁐꁑꁒꁓꁔꁕꁖꁗꁘꁙꁚꁛꁜꁝꁞꁟꁠꁡꁢꁣꁤꁥꁦꁧꁨꁩꁪꁫꁬꁭꁮꁯꁰꁱꁲꁳꁴꁵꁶꁷꁸꁹꁺꁻꁼꁽꁾꁿꂀꂁꂂꂃꂄꂅꂆꂇꂈꂉꂊꂋꂌꂍꂎꂏꂐꂑꂒꂓꂔꂕꂖꂗꂘꂙꂚꂛꂜꂝꂞꂟꂠꂡꂢꂣꂤꂥꂦꂧꂨꂩꂪꂫꂬꂭꂮꂯꂰꂱꂲꂳꂴꂵꂶꂷꂸꂹꂺꂻꂼꂽꂾꂿꃀꃁꃂꃃꃄꃅꃆꃇꃈꃉꃊꃋꃌꃍꃎꃏꃐꃑꃒꃓꃔꃕꃖꃗꃘꃙꃚꃛꃜꃝꃞꃟꃠꃡꃢꃣꃤꃥꃦꃧꃨꃩꃪꃫꃬꃭꃮꃯꃰꃱꃲꃳꃴꃵꃶꃷꃸꃹꃺꃻꃼꃽꃾꃿꄀꄁꄂꄃꄄꄅꄆꄇꄈꄉꄊꄋꄌꄍꄎꄏꄐꄑꄒꄓꄔꄕꄖꄗꄘꄙꄚꄛꄜꄝꄞꄟꄠꄡꄢꄣꄤꄥꄦꄧꄨꄩꄪꄫꄬꄭꄮꄯꄰꄱꄲꄳꄴꄵꄶꄷꄸꄹꄺꄻꄼꄽꄾꄿꅀꅁꅂꅃꅄꅅꅆꅇꅈꅉꅊꅋꅌꅍꅎꅏꅐꅑꅒꅓꅔꅕꅖꅗꅘꅙꅚꅛꅜꅝꅞꅟꅠꅡꅢꅣꅤꅥꅦꅧꅨꅩꅪꅫꅬꅭꅮꅯꅰꅱꅲꅳꅴꅵꅶꅷꅸꅹꅺꅻꅼꅽꅾꅿꆀꆁꆂꆃꆄꆅꆆꆇꆈꆉꆊꆋꆌꆍꆎꆏꆐꆑꆒꆓꆔꆕꆖꆗꆘꆙꆚꆛꆜꆝꆞꆟꆠꆡꆢꆣꆤꆥꆦꆧꆨꆩꆪꆫꆬꆭꆮꆯꆰꆱꆲꆳꆴꆵꆶꆷꆸꆹꆺꆻꆼꆽꆾꆿꇀꇁꇂꇃꇄꇅꇆꇇꇈꇉꇊꇋꇌꇍꇎꇏꇐꇑꇒꇓꇔꇕꇖꇗꇘꇙꇚꇛꇜꇝꇞꇟꇠꇡꇢꇣꇤꇥꇦꇧꇨꇩꇪꇫꇬꇭꇮꇯꇰꇱꇲꇳꇴꇵꇶꇷꇸꇹꇺꇻꇼꇽꇾꇿꈀꈁꈂꈃꈄꈅꈆꈇꈈꈉꈊꈋꈌꈍꈎꈏꈐꈑꈒꈓꈔꈕꈖꈗꈘꈙꈚꈛꈜꈝꈞꈟꈠꈡꈢꈣꈤꈥꈦꈧꈨꈩꈪꈫꈬꈭꈮꈯꈰꈱꈲꈳꈴꈵꈶꈷꈸꈹꈺꈻꈼꈽꈾꈿꉀꉁꉂꉃꉄꉅꉆꉇꉈꉉꉊꉋꉌꉍꉎꉏꉐꉑꉒꉓꉔꉕꉖꉗꉘꉙꉚꉛꉜꉝꉞꉟꉠꉡꉢꉣꉤꉥꉦꉧꉨꉩꉪꉫꉬꉭꉮꉯꉰꉱꉲꉳꉴꉵꉶꉷꉸꉹꉺꉻꉼꉽꉾꉿꊀꊁꊂꊃꊄꊅꊆꊇꊈꊉꊊꊋꊌꊍꊎꊏꊐꊑꊒꊓꊔꊕꊖꊗꊘꊙꊚꊛꊜꊝꊞꊟꊠꊡꊢꊣꊤꊥꊦꊧꊨꊩꊪꊫꊬꊭꊮꊯꊰꊱꊲꊳꊴꊵꊶꊷꊸꊹꊺꊻꊼꊽꊾꊿꋀꋁꋂꋃꋄꋅꋆꋇꋈꋉꋊꋋꋌꋍꋎꋏꋐꋑꋒꋓꋔꋕꋖꋗꋘꋙꋚꋛꋜꋝꋞꋟꋠꋡꋢꋣꋤꋥꋦꋧꋨꋩꋪꋫꋬꋭꋮꋯꋰꋱꋲꋳꋴꋵꋶꋷꋸꋹꋺꋻꋼꋽꋾꋿꌀꌁꌂꌃꌄꌅꌆꌇꌈꌉꌊꌋꌌꌍꌎꌏꌐꌑꌒꌓꌔꌕꌖꌗꌘꌙꌚꌛꌜꌝꌞꌟꌠꌡꌢꌣꌤꌥꌦꌧꌨꌩꌪꌫꌬꌭꌮꌯꌰꌱꌲꌳꌴꌵꌶꌷꌸꌹꌺꌻꌼꌽꌾꌿꍀꍁꍂꍃꍄꍅꍆꍇꍈꍉꍊꍋꍌꍍꍎꍏꍐꍑꍒꍓꍔꍕꍖꍗꍘꍙꍚꍛꍜꍝꍞꍟꍠꍡꍢꍣꍤꍥꍦꍧꍨꍩꍪꍫꍬꍭꍮꍯꍰꍱꍲꍳꍴꍵꍶꍷꍸꍹꍺꍻꍼꍽꍾꍿꎀꎁꎂꎃꎄꎅꎆꎇꎈꎉꎊꎋꎌꎍꎎꎏꎐꎑꎒꎓꎔꎕꎖꎗꎘꎙꎚꎛꎜꎝꎞꎟꎠꎡꎢꎣꎤꎥꎦꎧꎨꎩꎪꎫꎬꎭꎮꎯꎰꎱꎲꎳꎴꎵꎶꎷꎸꎹꎺꎻꎼꎽꎾꎿꏀꏁꏂꏃꏄꏅꏆꏇꏈꏉꏊꏋꏌꏍꏎꏏꏐꏑꏒꏓꏔꏕꏖꏗꏘꏙꏚꏛꏜꏝꏞꏟꏠꏡꏢꏣꏤꏥꏦꏧ |
|
| Nuosu (China)
| ii-CN |
| Metadata |
|---|
| Tokenization: | c-521 |
|
|
| Inupiaq | ik |
| Metadata |
|---|
| Tokenization: | L-160 |
|
| Inupiaq (United States)
| ik-US |
| Metadata |
|---|
| Tokenization: | c-834 |
|
|
| Eastern Canadian Inuktitut | ike |
| Metadata |
|---|
| Tokenization: | L-622 | | Letter: | ᐁᐃᐄᐅᐆᐊᐋᐯᐱᐲᐳᐴᐸᐹᑉᑌᑎᑏᑐᑑᑕᑖᑦᑫᑭᑮᑯᑰᑲᑳᒃᒉᒋᒌᒍᒎᒐᒑᒡᒣᒥᒦᒧᒨᒪᒫᒻᓀᓂᓃᓄᓅᓇᓈᓐᓓᓕᓖᓗᓘᓚᓛᓪᓭᓯᓰᓱᓲᓴᓵᔅᔦᔨᔩᔪᔫᔭᔮᔾᕃᕆᕇᕈᕉᕋᕌᕐᕓᕕᕖᕗᕘᕙᕚᕝᕼᕿᖀᖁᖂᖃᖄᖅᖏᖐᖑᖒᖓᖔᖕᖠᖡᖢᖣᖤᖥᖦᖯᙯᙰᙱᙲᙳᙴᙵᙶ |
|
|
| Inuinnaqtun, Western Canadian Inuktitut | ikt |
| Metadata |
|---|
| Tokenization: | L-623 |
|
|
| Iloko | ilo |
| Metadata |
|---|
| Tokenization: | L-385 |
|
| Iloko (Philippines)
| ilo-PH |
| Metadata |
|---|
| Tokenization: | c-488 |
|
|
| Ingush | inh |
|
|
| Ido | io |
| Metadata |
|---|
| Tokenization: | L-161 |
|
|
| Icelandic | is |
| Metadata |
|---|
| Tokenization: | L-162 | | Punctuation: | §‐–—…‘‚“„†‡′″ | | Letter: | áðéíóúýþæöÁÐÉÍÓÚÝÞÆÖ | | Mark: | ́̈ |
|
| Icelandic (Iceland)
| is-IS |
| Metadata |
|---|
| Tokenization: | c-163 |
|
|
| Italian | it |
| Metadata |
|---|
| Tokenization: | L-164 | | Punctuation: | «»—…’“” | | Letter: | àéèìóòùÀÉÈÌÓÒÙ | | Mark: | ̀́ |
|
| Italian (Switzerland)
| it-CH |
| Metadata |
|---|
| Tokenization: | c-165 |
|
| Italian (Italy)
| it-IT |
| Metadata |
|---|
| Tokenization: | c-166 |
|
| Italian (San Marino)
| it-SM |
|
|
| Inuktitut | iu |
| Metadata |
|---|
| Tokenization: | L-167 |
|
| Inuktitut {Cans}
| iu-Cans |
| Metadata |
|---|
| Tokenization: | L-624 |
|
| Inuktitut {Latn}
| iu-Latn |
|
| Inuktitut (Canada)
| iu-CA |
| Metadata |
|---|
| Tokenization: | c-489 |
|
| Inuktitut {Cans} (Canada)
| iu-Cans-CA |
| Metadata |
|---|
| Tokenization: | c-654 |
|
|
| Iu Mien | ium |
| Metadata |
|---|
| Tokenization: | L-621 |
|
|
| Japanese | ja |
| Metadata |
|---|
| Tokenization: | L-168 |
|
| Japanese (Japan)
| ja-JP |
| Metadata |
|---|
| Tokenization: | c-169 |
|
|
| Jamaican Creole English | jam |
| Metadata |
|---|
| Tokenization: | L-574 |
|
| Jamaican Creole English (Jamaica)
| jam-JM |
| Metadata |
|---|
| Tokenization: | c-575 |
|
|
| Lojban | jbo |
| Metadata |
|---|
| Tokenization: | L-392 |
|
|
| Ngomba | jgo |
| Metadata |
|---|
| Punctuation: | «»‹› | | Letter: | áâíîúûÁÂÍÎÚÛꞌꞋǎǐǹǔǍƐǏǸƆǓɄńŋŃŊḿẅḾẄɛɔʉ | | Mark: | ́̀̂̌̄̈ |
|
| Ngomba (Cameroon)
| jgo-CM |
|
|
| Shuar | jiv |
| Metadata |
|---|
| Letter: | áíúéÁÍÚÉ | | Mark: | ́ |
|
|
| Machame | jmc |
|
| Machame (Tanzania, United Republic of)
| jmc-TZ |
|
|
| Javanese | jv |
| Metadata |
|---|
| Tokenization: | L-170 | | Punctuation: | ‰ | | Letter: | ÂÅÈÉÊÌÒÙâåèéêìòù | | Mark: | ̀́̂̊ |
|
| Javanese {Java}
| jv-Java |
| Metadata |
|---|
| Punctuation: | ꧁꧂꧃꧄꧅꧆꧇꧈꧉꧊꧋꧌꧍ | | Letter: | ꦄꦆꦇꦈꦉꦊꦋꦌꦎꦏꦑꦒꦓꦔꦕꦖꦗꦘꦚꦛꦝꦟꦠꦡꦢꦤꦥꦦꦧꦨꦩꦪꦫꦭꦮꦱꦲꧏ | | Mark: | ꦀꦁꦂꦃ꦳ꦴꦶꦸꦺꦼꦽꦾꦿ꧀ | | Number: | ꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙ |
|
| Javanese (Indonesia)
| jv-ID |
| Metadata |
|---|
| Tokenization: | c-491 |
|
| Javanese {Latn} (Indonesia)
| jv-Latn-ID |
|
|
| Georgian | ka |
| Metadata |
|---|
| Tokenization: | L-171 | | Punctuation: | ჻«»§‐–—…‘‚“„†‡′″ | | Letter: | აბგდევზთიკლმნოპჟრსტუფქღყშჩცძწჭხჯჰ |
|
| Georgian (Georgia)
| ka-GE |
| Metadata |
|---|
| Tokenization: | c-172 |
|
|
| Kabyle | kab |
| Metadata |
|---|
| Punctuation: | ‰ | | Letter: | ǧƐǦƔčČḍḥṛṣṭẓḌḤṚṢṬẒɛɣ | | Mark: | ̣̌ |
|
| Kabyle {Latn} (Algeria)
| kab-Latn-DZ |
|
|
| Kachin, Jingpho | kac |
| Metadata |
|---|
| Tokenization: | L-744 |
|
| Kachin, Jingpho {Latn} (Myanmar)
| kac-Latn-MM |
| Metadata |
|---|
| Tokenization: | c-796 |
|
|
| Kamba (Kenya) | kam |
| Metadata |
|---|
| Tokenization: | L-738 | | Letter: | ĩũĨŨ | | Mark: | ̃ |
|
| Kamba (Kenya) (Kenya)
| kam-KE |
| Metadata |
|---|
| Tokenization: | c-790 |
|
|
| Karen languages | kar |
| Metadata |
|---|
| Tokenization: | L-625 |
|
|
| Kabardian | kbd |
| Metadata |
|---|
| Letter: | цӏыхуэфащмтеднйпсожлъкрзгьибяшвчіюЦӀЫХУЭФАЩМТЕДНЙПСОЖЛЪКРЗГЬИБЯШВЧІЮ | | Mark: | ̆ |
|
| Kabardian (Russia)
| kbd-RU |
|
|
| Kabiyè | kbp |
| Metadata |
|---|
| Letter: | ñÑƆƐƱƉƖƔŋŊɔɛʊɖɩɣ | | Mark: | ̃ |
|
|
| Makonde | kde |
|
| Makonde (Tanzania, United Republic of)
| kde-TZ |
|
|
| Tem | kdh |
| Metadata |
|---|
| Letter: | íáéúóÿÍÁÉÚÓƖƱƐƉƆńŋŃŸŊḿḾɩʊɛɖɔ | | Mark: | ́̈ |
|
|
| Kam | kdx |
|
|
| Kabuverdianu | kea |
| Metadata |
|---|
| Tokenization: | L-733 | | Punctuation: | ’ | | Letter: | ñçêéâíèáôóãºõúàòÑÇÊÉÂÍÈÁÔÓÃÕÚÀÒ | | Mark: | ̧̃̂́̀ |
|
| Kabuverdianu (Cabo Verde)
| kea-CV |
| Metadata |
|---|
| Tokenization: | c-781 |
|
|
| Kekchí | kek |
|
|
| Kongo | kg |
| Metadata |
|---|
| Tokenization: | L-173 |
|
| Kongo (Angola)
| kg-AO |
| Metadata |
|---|
| Tokenization: | c-782 |
|
| Kongo (Congo)
| kg-CG |
| Metadata |
|---|
| Tokenization: | c-724 |
|
|
| Khasi | kha |
| Metadata |
|---|
| Letter: | ïñÏÑ | | Mark: | ̈̃ |
|
|
| Lü | khb |
| Metadata |
|---|
| Letter: | ᦀᦁᦂᦃᦄᦅᦆᦇᦈᦉᦊᦋᦌᦍᦎᦏᦐᦑᦒᦓᦔᦕᦖᦗᦘᦙᦚᦛᦜᦝᦞᦟᦠᦡᦢᦣᦤᦥᦦᦧᦨᦩᦪᦫᦰᦱᦲᦳᦴᦵᦶᦷᦸᦹᦺᦻᦼᦽᦾᦿᧀᧁᧂᧃᧄᧅᧆᧇ | | Number: | ᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᧚ |
|
|
| Halh Mongolian | khk |
| Metadata |
|---|
| Punctuation: | ̈̆‐–—…‘’“”†‡′″§ | | Letter: | абвгдеёжзийклмноөпрстуүфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОӨПРСТУҮФХЦЧШЩЪЫЬЭЮЯ | | Mark: | ̈̆ |
|
| Halh Mongolian {Mong}
| khk-Mong |
| Metadata |
|---|
| Punctuation: | ᠊᠁᠂᠃᠄()〈〉《》〔〕?! | | Letter: | ᠢᠦᠤᠡᠧᠥᠣᠠᠫᠪᠲᠳᠴᠵᠬᠰᠱᠭᠨᠩᠮᠯᠶᠷᠸᠹᠺᠻᠼᠽᠾᠿᡀᡁᡂ | | Number: | ᠑᠒᠓᠔᠕᠖᠗᠘᠙ |
|
|
| Koyra Chiini Songhay | khq |
| Metadata |
|---|
| Tokenization: | L-737 | | Letter: | ãõÃÕƝŋšžŊŠŽẽẼɲ | | Mark: | ̃̌ |
|
| Koyra Chiini Songhay {Latn} (Mali)
| khq-Latn-ML |
| Metadata |
|---|
| Tokenization: | c-786 |
|
|
| Kikuyu | ki |
| Metadata |
|---|
| Tokenization: | L-174 | | Letter: | ĩũĨŨ | | Mark: | ̃ |
|
| Kikuyu (Kenya)
| ki-KE |
| Metadata |
|---|
| Tokenization: | c-830 |
|
|
| Kirmanjki (individual language) | kiu |
| Metadata |
|---|
| Tokenization: | L-567 |
|
| Kirmanjki (individual language) (Turkey)
| kiu-TR |
| Metadata |
|---|
| Tokenization: | c-568 |
|
|
| Kwanyama | kj |
| Metadata |
|---|
| Tokenization: | L-175 |
|
|
| Khakas | kjh |
| Metadata |
|---|
| Letter: | прайтиксізледјвоцяыгнмбңюьчуғхжҷэфщъПРАЙТИКСІЗЛЕДЈВОЦЯЫГНМБҢЮЬЧУҒХЖҶЭФЩЪ | | Mark: | ̆ |
|
|
| Kazakh | kk |
| Metadata |
|---|
| Tokenization: | L-176 | | Punctuation: | ‐–—…‘’“”«»§ | | Letter: | аәбвгғдеёжзийкқлмнңоөпрстуұүфхһцчшщъыіьэюяАӘБВГҒДЕЁЖЗИЙКҚЛМНҢОӨПРСТУҰҮФХҺЦЧШЩЪЫІЬЭЮЯ |
|
| Kazakh (Kazakhstan)
| kk-KZ |
|
| Kazakh {Cyrl} (Kazakhstan)
| kk-Cyrl-KZ |
| Metadata |
|---|
| Tokenization: | c-177 |
|
| Kazakh {Latn} (Kazakhstan)
| kk-Latn-KZ |
| Metadata |
|---|
| Tokenization: | c-789 |
|
|
| Khün | kkh |
| Metadata |
|---|
| Punctuation: | ᪨᪩᪪᪫ | | Letter: | ᨠᨡᨣᨤᨥᨦᨧᨨᨩᨪᨫᨬᨭᨮᨯᨰᨱᨲᨳᨴᨵᨶᨷᨸᨹᨺᨻᨼᨽᨾᨿᩀᩁᩃᩅᩆᩇᩈᩉᩊᩋᩌᩍᩎᩏᩐᩑᩒᩓᩔᪧ | | Mark: | ᩕᩖᩘᩙᩛᩜᩝᩞ᩠ᩡᩢᩣᩤᩥᩦᩧᩨᩩᩪᩫᩬᩭᩮᩯᩰᩱᩳᩴ᩵᩶᩺᩼ | | Number: | ᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉ |
|
|
| Kako | kkj |
| Metadata |
|---|
| Punctuation: | «»…‘‹›“” | | Letter: | áàâéèêíìîóòôúùûÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛnjƁƊƐNJƆŋŊɓɗɛɔ | | Mark: | ̧́̀̂ |
|
| Kako (Cameroon)
| kkj-CM |
|
|
| Greenlandic | kl |
| Metadata |
|---|
| Tokenization: | L-178 |
|
| Greenlandic (Greenland)
| kl-GL |
| Metadata |
|---|
| Tokenization: | c-725 |
|
|
| Kalenjin | kln |
| Metadata |
|---|
| Tokenization: | L-739 |
|
| Kalenjin (Kenya)
| kln-KE |
| Metadata |
|---|
| Tokenization: | c-791 |
|
|
| Khmer | km |
| Metadata |
|---|
| Tokenization: | L-179 | | Punctuation: | ៖។៕៙៚‘’“” | | Letter: | ឥឦឪឧឩឯឰឱឳឲឫឬឭឮកខគឃងចឆជឈញដឋឌឍណតថទធនបផពភមយរឡលវសហអៗ | | Mark: | ៈាិីឹឺុូួើឿៀេែៃោៅំះ៉៊់៍័្ | | Number: | ១២៣៤៥៦៧៨៩ |
|
| Khmer (Cambodia)
| km-KH |
| Metadata |
|---|
| Tokenization: | c-495 |
|
|
| Kimbundu | kmb |
| Metadata |
|---|
| Tokenization: | L-742 | | Punctuation: | ’ | | Letter: | êâôÊÂÔ | | Mark: | ̂ |
|
| Kimbundu (Angola)
| kmb-AO |
| Metadata |
|---|
| Tokenization: | c-794 |
|
|
| Northern Kurdish | kmr |
| Metadata |
|---|
| Tokenization: | L-409 | | Letter: | ûîêçÛÎÊÇşŞ | | Mark: | ̧̂ |
|
| Northern Kurdish {Arab} (Iraq)
| kmr-Arab-IQ |
| Metadata |
|---|
| Tokenization: | c-519 |
|
| Northern Kurdish {Arab} (Iran)
| kmr-Arab-IR |
| Metadata |
|---|
| Tokenization: | c-610 |
|
| Northern Kurdish {Latn} (Syria)
| kmr-Latn-SY |
| Metadata |
|---|
| Tokenization: | c-783 |
|
| Northern Kurdish {Latn} (Turkey)
| kmr-Latn-TR |
| Metadata |
|---|
| Tokenization: | c-518 |
|
|
| Kannada | kn |
| Metadata |
|---|
| Tokenization: | L-180 | | Punctuation: | ‐–—…‘’“”′″ | | Letter: | ಅಆಇಈಉಊಋೠಌೡಎಏಐಒಓಔಕಖಗಘಙಚಛಜಝಞಟಠಡಢಣತಥದಧನಪಫಬಭಮಯರಱಲವಶಷಸಹಳಽ | | Mark: | ಼̃ಂಃಾಿೀುೂೃೄೆೇೈೊೋೌ್ೕೖ | | Number: | ೧೨೩೪೫೬೭೮೯ |
|
| Kannada (India)
| kn-IN |
| Metadata |
|---|
| Tokenization: | c-181 |
|
|
| Central Kanuri | knc |
|
|
| Koongo | kng |
|
|
| Konkani (individual language) | knn |
| Metadata |
|---|
| Letter: | ॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसहळऽ | | Mark: | ़ंँःािीुूृॅेैॉोौ् | | Number: | १२३४५६७८९ |
|
|
| Korean | ko |
| Metadata |
|---|
| Tokenization: | L-182 |
|
| Korean (North Korea)
| ko-KP |
|
| Korean (Korea)
| ko-KR |
| Metadata |
|---|
| Tokenization: | c-183 |
|
|
| Komi-Permyak | koi |
| Metadata |
|---|
| Punctuation: | – | | Letter: | мортпавэзлӧнбыдсиьекцяюгйучішжёщъфхМОРТПАВЭЗЛӦНБЫДСИЬЕКЦЯЮГЙУЧІШЖЁЩЪФХ | | Mark: | ̈̆ |
|
|
| Konkani | kok |
| Metadata |
|---|
| Tokenization: | L-184 |
|
| Konkani (India)
| kok-IN |
| Metadata |
|---|
| Tokenization: | c-185 |
|
|
| Konzo | koo |
|
|
| Kosraean | kos |
| Metadata |
|---|
| Tokenization: | L-713 |
|
|
| Kpelle | kpe |
| Metadata |
|---|
| Tokenization: | L-626 |
|
|
| Kaonde | kqn |
|
|
| Kanuri | kr |
| Metadata |
|---|
| Tokenization: | L-186 |
|
| Kanuri {Arab}
| kr-Arab |
| Metadata |
|---|
| Tokenization: | L-492 |
|
| Kanuri {Latn}
| kr-Latn |
| Metadata |
|---|
| Tokenization: | L-191 |
|
|
| Krio | kri |
| Metadata |
|---|
| Tokenization: | L-627 | | Punctuation: | – | | Letter: | ƐƆŋŊɛɔ |
|
| Krio (Sierra Leone)
| kri-SL |
|
|
| Karelian | krl |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | äöÄÖčžšČŽŠ | | Mark: | ̈̌ |
|
|
| Kashmiri | ks |
| Metadata |
|---|
| Tokenization: | L-187 | | Punctuation: | ‰ | | Letter: | ؠءآأؤابتثجحخدذرزسشصضطظعغفقلمنوٲٹپچڈڑژکگںھہۄۆیۍے | | Mark: | ٓٔ | | Number: | ۱۲۳۴۵۶۷۸۹ |
|
| Kashmiri {Deva}
| ks-Deva |
| Metadata |
|---|
| Punctuation: | । | | Letter: | अआइईउऊऎएऐऒओऔकखगचछजटठडतथदनपफबमयरलवशसहॳॴॵॶॷ | | Mark: | ँंऺऻ़ािीुूॆेैॊोौ्ॏॖॗ |
|
| Kashmiri (India)
| ks-IN |
|
| Kashmiri {Arab} (India)
| ks-Arab-IN |
|
| Kashmiri {Deva} (India)
| ks-Deva-IN |
| Metadata |
|---|
| Tokenization: | c-493 |
|
| Kashmiri (Pakistan)
| ks-PK |
| Metadata |
|---|
| Tokenization: | c-387 |
|
|
| Shambala | ksb |
|
| Shambala (Tanzania, United Republic of)
| ksb-TZ |
|
|
| Bafia | ksf |
| Metadata |
|---|
| Letter: | áéíóúÁÉÍÓÚǝƎƐƆŋŊɛɔ | | Mark: | ́ |
|
| Bafia (Cameroon)
| ksf-CM |
|
|
| Kölsch | ksh |
| Metadata |
|---|
| Punctuation: | ‐–—…‘‚“„†‡§⸗ | | Letter: | ėœůĖŒŮåäæëößüÅÄÆËÖÜ | | Mark: | ̊̈̇ |
|
| Kölsch (Germany)
| ksh-DE |
|
|
| Kituba (Democratic Republic of Congo) | ktu |
|
|
| Kurdish | ku |
| Metadata |
|---|
| Tokenization: | L-188 |
|
| Kurdish {Arab}
| ku-Arab |
| Metadata |
|---|
| Tokenization: | L-628 |
|
| Kurdish (Iraq)
| ku-IQ |
| Metadata |
|---|
| Tokenization: | c-726 |
|
| Kurdish {Arab} (Iraq)
| ku-Arab-IQ |
|
| Kurdish {Arab} (Iran)
| ku-Arab-IR |
|
| Kurdish (Turkey)
| ku-TR |
| Metadata |
|---|
| Tokenization: | c-706 |
|
|
| Kunama | kun |
| Metadata |
|---|
| Tokenization: | L-629 |
|
|
| Komi | kv |
| Metadata |
|---|
| Tokenization: | L-189 |
|
| Komi (Russia)
| kv-RU |
|
|
| Cornish | kw |
| Metadata |
|---|
| Tokenization: | L-57 |
|
| Cornish (United Kingdom)
| kw-GB |
| Metadata |
|---|
| Tokenization: | c-467 |
|
|
| Awa-Cuaiquer | kwi |
| Metadata |
|---|
| Punctuation: | · | | Letter: | áñëóçâùéàêÁÑËÓÇÂÙÉÀÊ | | Mark: | ̧́̃̈̂̀ |
|
|
| Kyrgyz | ky |
| Metadata |
|---|
| Tokenization: | L-190 | | Punctuation: | ‐–—…‘‚“„«»§ | | Letter: | абгдеёжзийклмнӊоөпрстуүхчшъыэюяцңвьфАБГДЕЁЖЗИЙКЛМНӉОӨПРСТУҮХЧШЪЫЭЮЯЦҢВЬФ | | Mark: | ̈̆ |
|
| Kyrgyz (Kyrgyzstan)
| ky-KG |
| Metadata |
|---|
| Tokenization: | c-191 |
|
| Kyrgyz (Kazakhstan)
| ky-KZ |
| Metadata |
|---|
| Tokenization: | c-192 |
|
|
| Latin | la |
| Metadata |
|---|
| Tokenization: | L-193 |
|
| Latin (International)
| la-INT |
|
| Latin (Holy See)
| la-VA |
| Metadata |
|---|
| Tokenization: | c-770 |
|
|
| Ladino | lad |
| Metadata |
|---|
| Punctuation: | – | | Letter: | íÍ | | Mark: | ́ |
|
|
| Langi | lag |
| Metadata |
|---|
| Letter: | áéíóúÁÉÍÓÚƗɄɨʉ | | Mark: | ́ |
|
| Langi (Tanzania, United Republic of)
| lag-TZ |
|
|
| Luxembourgish | lb |
| Metadata |
|---|
| Tokenization: | L-194 | | Punctuation: | «»§‐–—…‘‚“„ | | Letter: | äéëêüöôàÄÉËÊÜÖÔÀ | | Mark: | ̈́̂̀ |
|
| Luxembourgish (Belgium)
| lb-BE |
| Metadata |
|---|
| Tokenization: | c-503 |
|
| Luxembourgish (Luxembourg)
| lb-LU |
| Metadata |
|---|
| Tokenization: | c-504 |
|
|
| Lingua Franca Nova | lfn |
|
|
| Luganda | lg |
| Metadata |
|---|
| Tokenization: | L-195 | | Letter: | ŋŊ |
|
| Luganda (Uganda)
| lg-UG |
| Metadata |
|---|
| Tokenization: | c-480 |
|
|
| Limburgish | li |
| Metadata |
|---|
| Tokenization: | L-196 |
|
| Limburgish (Netherlands)
| li-NL |
| Metadata |
|---|
| Tokenization: | c-769 |
|
|
| West-Central Limba | lia |
|
|
| Ligurian | lij |
| Metadata |
|---|
| Tokenization: | L-391 | | Punctuation: | ’ | | Letter: | çòæéùöôâîàêÇÒÆÉÙÖÔÂÎÀÊ | | Mark: | ̧̀́̈̂ |
|
|
| Lisu | lis |
| Metadata |
|---|
| Punctuation: | 《》…꓾꓿ | | Letter: | ꓐꓑꓒꓓꓔꓕꓖꓗꓘꓙꓚꓛꓜꓝꓞꓟꓠꓡꓢꓣꓤꓥꓦꓧꓨꓩꓪꓫꓬꓭꓮꓯꓰꓱꓲꓳꓴꓵꓶꓷꓸꓹꓺꓻꓼꓽʼˍ |
|
|
| Lakota | lkt |
| Metadata |
|---|
| Punctuation: | ́̌‐–—“” | | Letter: | ʼáéíóúÁÉÍÓÚǧȟǦȞŋčšžŊČŠŽ | | Mark: | ́̌ |
|
| Lakota (United States)
| lkt-US |
|
|
| Ladin | lld |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | ëéüêàèöìùîâôòóûËÉÜÊÀÈÖÌÙÎÂÔÒÓÛćĆ | | Mark: | ̈́̂̀ |
|
|
| Lombard | lmo |
| Metadata |
|---|
| Tokenization: | L-393 |
|
|
| Lingala | ln |
| Metadata |
|---|
| Tokenization: | L-197 | | Punctuation: | ’ | | Letter: | áâéêíîóôúÁÂÉÊÍÎÓÔÚǎǐǒǍƐǏǑƆěĚɛɔ | | Mark: | ́̂̌ |
|
| Lingala (Angola)
| ln-AO |
|
| Lingala (Democratic Republic of the Congo)
| ln-CD |
|
| Lingala {Latn} (Democratic Republic of the Congo)
| ln-Latn-CD |
| Metadata |
|---|
| Tokenization: | c-571 |
|
| Lingala (Central African Republic)
| ln-CF |
|
| Lingala (Congo)
| ln-CG |
|
| Lingala {Latn} (Congo)
| ln-Latn-CG |
| Metadata |
|---|
| Tokenization: | c-727 |
|
|
| Lamnso' | lns |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | áéùìòúíóàèÁÉÙÌÒÚÍÓÀÈƏŋŊə | | Mark: | ̀́ |
|
|
| Lao | lo |
| Metadata |
|---|
| Tokenization: | L-198 | | Letter: | ໆກຂຄງຈສຊຍດຕຖທນບປຜຝພຟມຢຣລວຫໜໝອຮຯະາຳຽເແໂໃໄ | | Mark: | ່້໊໋́໌ໍັິີຶືຸູົຼ |
|
| Lao (Laos)
| lo-LA |
| Metadata |
|---|
| Tokenization: | c-501 |
|
|
| Lobi | lob |
| Metadata |
|---|
| Letter: | àáäÀÁÄƲƖƆƐʋɩɔɛʔ | | Mark: | ̀́̈ |
|
|
| Otuho | lot |
|
|
| Lozi | loz |
|
|
| Northern Luri | lrc |
| Metadata |
|---|
| Punctuation: | ،٫٬؛؟‐…‹›«» | | Letter: | آأؤئابپتثجچحخدذرزژسشصضطظعغفڤقکگلمنھەوۉۊیؽي | | Mark: | ٙٛٓٔ |
|
| Northern Luri (Iraq)
| lrc-IQ |
|
| Northern Luri (Iran)
| lrc-IR |
|
|
| Lithuanian | lt |
| Metadata |
|---|
| Tokenization: | L-199 | | Punctuation: | ‐–—…“„ | | Letter: | éÉąčęėįšųūžĄČĘĖĮŠŲŪŽ | | Mark: | ̨̌̇̄́ |
|
| Lithuanian (Lithuania)
| lt-LT |
| Metadata |
|---|
| Tokenization: | c-200 |
|
|
| Latgalian | ltg |
|
|
| Luba-Katanga | lu |
| Metadata |
|---|
| Tokenization: | L-201 | | Letter: | áàéèíìóòúùÁÀÉÈÍÌÓÒÚÙƐƆɛɔ | | Mark: | ́̀ |
|
| Luba-Katanga (Democratic Republic of the Congo)
| lu-CD |
|
|
| Luba-Lulua | lua |
|
|
| Luvale | lue |
|
|
| Lunda | lun |
|
|
| Luo (Kenya and Tanzania), Dholuo | luo |
| Metadata |
|---|
| Tokenization: | L-746 |
|
| Luo (Kenya and Tanzania), Dholuo {Latn} (Kenya)
| luo-Latn-KE |
| Metadata |
|---|
| Tokenization: | c-799 |
|
|
| Mizo, Lushai, Duhlian | lus |
| Metadata |
|---|
| Letter: | âêûîãÂÊÛÎà | | Mark: | ̂̃ |
|
| Mizo, Lushai, Duhlian {Beng} (India)
| lus-Beng-IN |
|
| Mizo, Lushai, Duhlian {Latn} (India)
| lus-Latn-IN |
|
|
| Luyia, Oluluyia | luy |
| Metadata |
|---|
| Tokenization: | L-397 |
|
| Luyia, Oluluyia (Kenya)
| luy-KE |
|
|
| Latvian | lv |
| Metadata |
|---|
| Tokenization: | L-202 |
|
| Latvian (Latvia)
| lv-LV |
| Metadata |
|---|
| Tokenization: | c-203 |
|
|
| Standard Latvian | lvs |
| Metadata |
|---|
| Punctuation: | §‐–—…‘’‚“”„†‡′″ | | Letter: | āčēģīķļņšūžĀČĒĢĪĶĻŅŠŪŽ | | Mark: | ̧̄̌ |
|
|
| Literary Chinese | lzh |
|
|
| San Jerónimo Tecóatl Mazatec | maa |
| Metadata |
|---|
| Tokenization: | L-583 |
|
| San Jerónimo Tecóatl Mazatec (Mexico)
| maa-MX |
| Metadata |
|---|
| Tokenization: | c-584 |
|
|
| Madurese | mad |
|
| Madurese {Java} (Indonesia)
| mad-Java-ID |
|
| Madurese {Latn} (Indonesia)
| mad-Latn-ID |
|
|
| Magahi | mag |
| Metadata |
|---|
| Punctuation: | । | | Letter: | मनवधकरलसयतषटउचबघणपगठदहभअएऔथओशईजखआडइछफढझञऐ | | Mark: | ािेंु्ोी़ूौृैँ |
|
|
| Maithili | mai |
| Metadata |
|---|
| Tokenization: | L-398 | | Punctuation: | ।– | | Letter: | सरवभमनधकघषणटदबएतआउलजपठगअछहऐयशओचथखफइढडङईञʼ | | Mark: | ा्ौिोंेँीृूुःै़ |
|
| Maithili (Nepal)
| mai-NP |
| Metadata |
|---|
| Tokenization: | c-505 |
|
|
| Jalapa De Díaz Mazatec | maj |
| Metadata |
|---|
| Tokenization: | L-585 |
|
| Jalapa De Díaz Mazatec (Mexico)
| maj-MX |
| Metadata |
|---|
| Tokenization: | c-586 |
|
|
| Mam | mam |
|
|
| Mandingo, Manding | man |
|
|
| Chiquihuitlán Mazatec | maq |
| Metadata |
|---|
| Tokenization: | L-587 |
|
| Chiquihuitlán Mazatec (Mexico)
| maq-MX |
| Metadata |
|---|
| Tokenization: | c-588 |
|
|
| Masai | mas |
| Metadata |
|---|
| Tokenization: | L-576 | | Letter: | áàâéèêíìîóòôúùûÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛƐƗƆɄāēīŋōūĀĒĪŊŌŪɛɨɔʉ | | Mark: | ́̀̂̄ |
|
| Masai (Kenya)
| mas-KE |
|
| Masai (Tanzania, United Republic of)
| mas-TZ |
|
|
| Huautla Mazatec | mau |
| Metadata |
|---|
| Tokenization: | L-589 |
|
| Huautla Mazatec (Mexico)
| mau-MX |
| Metadata |
|---|
| Tokenization: | c-590 |
|
|
| Central Mazahua | maz |
| Metadata |
|---|
| Letter: | ñÑ | | Mark: | ̸̱̃ |
|
|
| Sharanahua | mcd |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | úíóáÚÍÓÁ | | Mark: | ́ |
|
|
| Matsés | mcf |
|
|
| Mende (Sierra Leone) | men |
| Metadata |
|---|
| Punctuation: | –‐ | | Letter: | ƆƐŋŊɔɛ |
|
|
| Meru | mer |
| Metadata |
|---|
| Letter: | ĩũĨŨ | | Mark: | ̃ |
|
| Meru (Kenya)
| mer-KE |
|
|
| Morisyen | mfe |
| Metadata |
|---|
| Tokenization: | L-401 |
|
| Morisyen (Mauritius)
| mfe-MU |
| Metadata |
|---|
| Tokenization: | c-511 |
|
|
| Malagasy | mg |
| Metadata |
|---|
| Tokenization: | L-204 |
|
| Malagasy (Madagascar)
| mg-MG |
| Metadata |
|---|
| Tokenization: | c-506 |
|
|
| Makhuwa-Meetto | mgh |
|
| Makhuwa-Meetto (Mozambique)
| mgh-MZ |
|
|
| Meta' | mgo |
| Metadata |
|---|
| Punctuation: | ‘’“” | | Letter: | ʼàèìòùÀÈÌÒÙƏƆŋŊəɔ | | Mark: | ̀ |
|
| Meta' (Cameroon)
| mgo-CM |
|
|
| Marshallese | mh |
| Metadata |
|---|
| Tokenization: | L-205 |
|
| Marshallese (Marshall Islands)
| mh-MH |
| Metadata |
|---|
| Tokenization: | c-510 |
|
|
| Eastern Mari | mhr |
|
| Eastern Mari (Russia)
| mhr-RU |
|
|
| Maori | mi |
| Metadata |
|---|
| Tokenization: | L-206 | | Punctuation: | ‰ | | Letter: | ĀāĒēĪīŌōŪūïÏ | | Mark: | ̄̈ |
|
| Maori (New Zealand)
| mi-NZ |
| Metadata |
|---|
| Tokenization: | c-207 |
|
|
| Mi'kmaq, Micmac | mic |
|
|
| Mandaic | mid |
| Metadata |
|---|
| Punctuation: | ࡞ | | Letter: | ࡀࡁࡂࡃࡄࡅࡆࡇࡈࡉࡊࡋࡌࡍࡎࡏࡐࡑࡒࡓࡔࡕࡖࡗࡘ | | Mark: | ࡙࡚࡛ |
|
|
| Minangkabau | min |
| Metadata |
|---|
| Tokenization: | L-606 |
|
| Minangkabau {Arab} (Indonesia)
| min-Arab-ID |
|
| Minangkabau {Latn} (Indonesia)
| min-Latn-ID |
| Metadata |
|---|
| Tokenization: | c-607 |
|
|
| Mískito | miq |
| Metadata |
|---|
| Letter: | áâÁ | | Mark: | ́̂ |
|
|
| Macedonian | mk |
| Metadata |
|---|
| Tokenization: | L-208 | | Punctuation: | ‐–—…‘‚“„ | | Letter: | абвгдѓежзѕијклљмнњопрстќуфхцчџшАБВГДЃЕЖЗЅИЈКЛЉМНЊОПРСТЌУФХЦЧЏШ | | Mark: | ́ |
|
| Macedonian (Macedonia)
| mk-MK |
| Metadata |
|---|
| Tokenization: | c-209 |
|
|
| Malayalam | ml |
| Metadata |
|---|
| Tokenization: | L-210 | | Punctuation: | ‘’“” | | Letter: | അആഇഈഉഊഋൠഌൡഎഏഐഒഓഔകൿഖഗഘങചഛജഝഞടഠഡഢണൺതഥദധനൻപഫബഭമയരർലൽവശഷസഹളൾഴറ | | Mark: | ഃംാിീുൂൃെേൈൊോൌൗ് |
|
| Malayalam (India)
| ml-IN |
| Metadata |
|---|
| Tokenization: | c-507 |
|
|
| Mongolian | mn |
| Metadata |
|---|
| Tokenization: | L-211 |
|
| Mongolian {Mong} (China)
| mn-Mong-CN |
|
| Mongolian (Mongolia)
| mn-MN |
| Metadata |
|---|
| Tokenization: | c-728 |
|
| Mongolian {Cyrl} (Mongolia)
| mn-Cyrl-MN |
| Metadata |
|---|
| Tokenization: | c-566 |
|
| Mongolian {Mong} (Mongolia)
| mn-Mong-MN |
|
|
| Manipuri | mni |
| Metadata |
|---|
| Tokenization: | L-727 |
|
| Manipuri (India)
| mni-IN |
|
| Manipuri {Beng} (India)
| mni-Beng-IN |
| Metadata |
|---|
| Tokenization: | c-774 |
|
| Manipuri {Mtei} (India)
| mni-Mtei-IN |
| Metadata |
|---|
| Tokenization: | c-775 |
|
|
| Mandinka | mnk |
| Metadata |
|---|
| Tokenization: | L-630 |
|
|
| Mon | mnw |
| Metadata |
|---|
| Punctuation: | ၊။ | | Letter: | လကၚအခရမဟပဍစတသဂဒဇနဘဝဗဓထၜယညဆဏဖဿဥဋဉဌဠ | | Mark: | ိ်ောါၞုံွဲ္ဵၟဳြှူၠးဴီျ | | Number: | ၁၉၄၈၀၂၃၅၆၇ |
|
|
| Moldovan | mo |
| Metadata |
|---|
| Tokenization: | L-404 |
|
| Moldovan (Moldova)
| mo-MD |
| Metadata |
|---|
| Tokenization: | c-729 |
|
|
| Mohawk | moh |
| Metadata |
|---|
| Tokenization: | L-403 |
|
| Mohawk (Canada)
| moh-CA |
|
|
| Mossi | mos |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | ãõÃÕƖƱƐĩũœĨŨŒẽẼɩʊɛ | | Mark: | ̃ |
|
| Mossi (Burkina Faso)
| mos-BF |
|
|
| Marathi | mr |
| Metadata |
|---|
| Tokenization: | L-214 | | Punctuation: | ‐–—…‘’“”′″ | | Letter: | ऱॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसहळऽ | | Mark: | ़ंँःािीुूृॅेैॉोौ् | | Number: | १२३४५६७८९० |
|
| Marathi (India)
| mr-IN |
| Metadata |
|---|
| Tokenization: | c-215 |
|
|
| Malay | ms |
| Metadata |
|---|
| Tokenization: | L-216 |
|
| Malay (Brunei Darussalam)
| ms-BN |
| Metadata |
|---|
| Tokenization: | c-217 |
|
| Malay (Malaysia)
| ms-MY |
| Metadata |
|---|
| Tokenization: | c-218 |
|
| Malay (Singapore)
| ms-SG |
| Metadata |
|---|
| Tokenization: | c-708 |
|
|
| Maltese | mt |
| Metadata |
|---|
| Tokenization: | L-219 | | Punctuation: | ‘’“” | | Letter: | àèìòùÀÈÌÒÙċġħżĊĠĦŻ | | Mark: | ̀̇ |
|
| Maltese (Malta)
| mt-MT |
| Metadata |
|---|
| Tokenization: | c-220 |
|
|
| Totontepec Mixe | mto |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | äüëöéÄÜËÖÉ | | Mark: | ̈́ |
|
|
| Mundang | mua |
| Metadata |
|---|
| Letter: | ãëõÃËÕǝƁƊƎĩŋĨŊṽṼɓɗ | | Mark: | ̃̈ |
|
| Mundang (Cameroon)
| mua-CM |
|
|
| Creek | mus |
|
|
| Marwari | mwr |
|
| Marwari (India)
| mwr-IN |
|
|
| Hmong Daw | mww |
|
|
| Mozarabic | mxi |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | àùèòÀÙÈÒ | | Mark: | ̀ |
|
|
| Jamiltepec Mixtec | mxt |
| Metadata |
|---|
| Tokenization: | L-631 |
|
|
| Burmese | my |
| Metadata |
|---|
| Tokenization: | L-50 | | Punctuation: | ၏၊။၍၌၎‘’“” | | Letter: | ကခဂဃငစဆဇဈဉညဋဌဍဎဏတထဒဓနပဖဗဘမယရလဝသဟဠအဣဤဥဦဧဩဪဿ | | Mark: | ာါိီုူေဲံျြွှ့္်း | | Number: | ၁၉၄၈၀၂၃၅၆၇ |
|
| Burmese (Myanmar)
| my-MM |
| Metadata |
|---|
| Tokenization: | c-458 |
|
| Burmese {zwgy} (Myanmar)
| my-zwgy-MM |
| Metadata |
|---|
| Tokenization: | c-751 |
|
|
| Ixcatlán Mazatec | mzi |
| Metadata |
|---|
| Tokenization: | L-591 | | Punctuation: | ’ | | Letter: | áñíóéÁÑÍÓÉ | | Mark: | ́̃ |
|
| Ixcatlán Mazatec (Mexico)
| mzi-MX |
| Metadata |
|---|
| Tokenization: | c-592 |
|
|
| Mazanderani | mzn |
| Metadata |
|---|
| Punctuation: | ،٫٬؛؟‐…‹›«» | | Letter: | ءآأؤئابپةتثجچحخدذرزژسشصضطظعغفقکگلمنهویي | | Mark: | ًٌٍّٔٓ |
|
| Mazanderani (Iran)
| mzn-IR |
|
|
| Nauru | na |
| Metadata |
|---|
| Tokenization: | L-221 |
|
| Nauru (Nauru)
| na-NR |
| Metadata |
|---|
| Tokenization: | c-514 |
|
|
| Nahuatl languages | nah |
|
| Nahuatl languages (Mexico)
| nah-MX |
|
|
| Neapolitan | nap |
| Metadata |
|---|
| Tokenization: | L-750 |
|
| Neapolitan (Italy)
| nap-IT |
| Metadata |
|---|
| Tokenization: | c-810 |
|
|
| Khoekhoe, Nama (Namibia) | naq |
| Metadata |
|---|
| Letter: | ǀǁǂǃâîôûÂÎÔÛ | | Mark: | ̂ |
|
| Khoekhoe, Nama (Namibia) (Namibia)
| naq-NA |
|
|
| Norwegian Bokmål | nb |
| Metadata |
|---|
| Tokenization: | L-222 | | Punctuation: | «»§– | | Letter: | àéóòôæøåÀÉÓÒÔÆØÅ | | Mark: | ̀́̂̊ |
|
| Norwegian Bokmål (Norway)
| nb-NO |
| Metadata |
|---|
| Tokenization: | c-223 |
|
| Norwegian Bokmål (Svalbard and Jan Mayen)
| nb-SJ |
|
|
| Nyemba | nba |
|
|
| Central Huasteca Nahuatl | nch |
|
|
| North Ndebele | nd |
| Metadata |
|---|
| Tokenization: | L-224 |
|
| North Ndebele (Zimbabwe)
| nd-ZW |
| Metadata |
|---|
| Tokenization: | c-520 |
|
|
| Low German, Low Saxon | nds |
| Metadata |
|---|
| Tokenization: | L-395 | | Punctuation: | ’ | | Letter: | åäöüÅÄÖÜ | | Mark: | ̊̈ |
|
|
| Nepali | ne |
| Metadata |
|---|
| Tokenization: | L-225 |
|
| Nepali (India)
| ne-IN |
|
| Nepali (Nepal)
| ne-NP |
| Metadata |
|---|
| Tokenization: | c-516 |
|
|
| Ndonga | ng |
| Metadata |
|---|
| Tokenization: | L-226 |
|
| Ndonga (Namibia)
| ng-NA |
| Metadata |
|---|
| Tokenization: | c-515 |
|
|
| Guerrero Nahuatl | ngu |
|
|
| Nganasan | nio |
| Metadata |
|---|
| Punctuation: | ” | | Letter: | нерәзытбуоясикаӈҫүдйхлмпвгөъцьчэщжюНЕРӘЗЫТБУОЯСИКАӇҪҮДЙХЛМПВГӨЪЦЬЧЭЩЖЮ | | Mark: | ̆ |
|
|
| Niuean | niu |
|
|
| Bouna Kulango | nku |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | ƖƆƐƝƲŋŊɩɔɛɲʋ |
|
|
| Dutch | nl |
| Metadata |
|---|
| Tokenization: | L-227 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | áäéëíïóöúüÁÄÉËÍÏÓÖÚÜ | | Mark: | ́̈ |
|
| Dutch (Aruba)
| nl-AW |
|
| Dutch (Belgium)
| nl-BE |
| Metadata |
|---|
| Tokenization: | c-228 |
|
| Dutch (Bonaire, Sint Eustatius and Saba)
| nl-BQ |
|
| Dutch (Curaçao)
| nl-CW |
|
| Dutch (Germany)
| nl-DE |
| Metadata |
|---|
| Tokenization: | c-826 |
|
| Dutch (Netherlands)
| nl-NL |
| Metadata |
|---|
| Tokenization: | c-229 |
|
| Dutch (Suriname)
| nl-SR |
| Metadata |
|---|
| Tokenization: | c-430 |
|
| Dutch (Sint Maarten (Dutch part))
| nl-SX |
|
|
| Flemish, Vlaams | nld |
|
| Flemish, Vlaams (Belgium)
| nld-BE |
|
| Flemish, Vlaams (Netherlands)
| nld-NL |
|
|
| Kwasio | nmg |
| Metadata |
|---|
| Letter: | áâäéêíîïóôöúûÁÂÄÉÊÍÎÏÓÔÖÚÛǎǝǐǒǔǍƁƎƐǏǑƆǓāěēīńŋōŕūĀĚĒĪŃŊŌŔŪɓɛɔ | | Mark: | ́̂̌̄̈ |
|
| Kwasio (Cameroon)
| nmg-CM |
|
|
| Norwegian Nynorsk | nn |
| Metadata |
|---|
| Tokenization: | L-230 | | Punctuation: | ‰ | | Letter: | àéóòôæøåÀÉÓÒÔÆØÅ | | Mark: | ̀́̂̊ |
|
| Norwegian Nynorsk (Norway)
| nn-NO |
| Metadata |
|---|
| Tokenization: | c-231 |
|
|
| Ngiemboon | nnh |
| Metadata |
|---|
| Punctuation: | «»‘’ | | Letter: | ʼáàâéèêíìóòôúùûÿÁÀÂÉÈÊÍÌÓÒÔÚÙÛǎǒǔǍƐǑƆǓɄěńŋĚŃŊŸḿẅḾẄɛɔʉ | | Mark: | ́̀̂̌̈ |
|
| Ngiemboon (Cameroon)
| nnh-CM |
|
|
| Norwegian | no |
| Metadata |
|---|
| Tokenization: | L-232 |
|
| Norwegian (Norway)
| no-NO |
| Metadata |
|---|
| Tokenization: | c-730 |
|
|
| Northern Thai | nod |
| Metadata |
|---|
| Punctuation: | ᪨᪩᪪᪫ | | Letter: | ᨠᨡᨢᨣᨤᨥᨦᨧᨨᨩᨪᨫᨬᨭᨮᨯᨰᨱᨲᨳᨴᨵᨶᨷᨸᨹᨺᨻᨼᨽᨾᨿᩀᩁᩃᩅᩆᩇᩈᩉᩊᩋᩌᩍᩎᩏᩐᩑᩓᩔᪧ | | Mark: | ᩕᩖᩘᩙᩛᩝᩞ᩠ᩡᩢᩣᩤᩥᩦᩧᩨᩩᩪᩫᩬᩮᩯᩰᩱᩲᩳᩴ᩵᩶᩺᩻ | | Number: | ᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉ |
|
|
| Nomatsiguenga | not |
| Metadata |
|---|
| Letter: | ëíáóñËÍÁÓÑ | | Mark: | ̈́̃ |
|
|
| Nepali (individual language) | npi |
| Metadata |
|---|
| Punctuation: | । | | Letter: | ॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलळवशषसहऽ | | Mark: | ़ँंःािीुूृॅेैॉोौ् | | Number: | १२३४५६७८९० |
|
|
| N'Ko, N’Ko | nqo |
| Metadata |
|---|
| Punctuation: | ߷߸߹﴾﴿،؛؟⸜⸝ | | Letter: | ߊߋߌߍߎߏߐߑߒߓߔߕߖߗߘߙߚߛߜߝߞߟߠߡߢߣߤߥߦߧߴߵߺ | | Mark: | ߲߽߫߬߭߮߯߰߱߳ | | Number: | ߀߁߂߃߄߅߆߇߈߉ |
|
| N'Ko, N’Ko (Guinea)
| nqo-GN |
|
|
| South Ndebele | nr |
| Metadata |
|---|
| Tokenization: | L-233 |
|
| South Ndebele (South Africa)
| nr-ZA |
| Metadata |
|---|
| Tokenization: | c-539 |
|
|
| Pedi, Northern Sotho, Sepedi | nso |
| Metadata |
|---|
| Tokenization: | L-410 | | Letter: | šŠ | | Mark: | ̌ |
|
| Pedi, Northern Sotho, Sepedi (Cameroon)
| nso-CM |
| Metadata |
|---|
| Tokenization: | c-765 |
|
| Pedi, Northern Sotho, Sepedi (South Africa)
| nso-ZA |
| Metadata |
|---|
| Tokenization: | c-235 |
|
|
| Nuer | nus |
| Metadata |
|---|
| Letter: | äëïöÄËÏÖƐƔƆŋŊɛɣɔ | | Mark: | ̱̈ |
|
| Nuer (South Sudan)
| nus-SS |
|
|
| Navajo | nv |
| Metadata |
|---|
| Tokenization: | L-236 | | Letter: | ʼéóáíÉÓÁÍǫǪąłįꥣĮĘ | | Mark: | ̨́ |
|
|
| Chewa | ny |
| Metadata |
|---|
| Tokenization: | L-55 |
|
| Chewa (Malawi)
| ny-MW |
| Metadata |
|---|
| Tokenization: | c-464 |
|
| Chewa (Zimbabwe)
| ny-ZW |
| Metadata |
|---|
| Tokenization: | c-465 |
|
|
| Nyamwezi | nym |
|
|
| Nyankole | nyn |
|
| Nyankole (Uganda)
| nyn-UG |
|
|
| Nzima | nzi |
|
|
| Orok | oaa |
| Metadata |
|---|
| Punctuation: | – | | Letter: | ƝūŪɲԩԨчипалнесдкробуӡгэӈмхтөвӯзЧИПАЛНЕСДКРОБУӠГЭӇМХТӨВӮЗ | | Mark: | ̄ |
|
|
| Occitan | oc |
| Metadata |
|---|
| Tokenization: | L-237 | | Punctuation: | «»’— | | Letter: | óèéçàïòìùúâêîëáôüûÓÈÉÇÀÏÒÌÙÚÂÊÎËÁÔÜÛ | | Mark: | ̧́̀̈̂ |
|
| Occitan (France)
| oc-FR |
| Metadata |
|---|
| Tokenization: | c-731 |
|
|
| Ojibwa | oj |
| Metadata |
|---|
| Tokenization: | L-238 |
|
|
| Northwestern Ojibwa | ojb |
| Metadata |
|---|
| Letter: | ᐯᒪᑎᓯᑦᑌᐸᑫᑕᑯᐎᓇᓐᒥᐌᑲᒃᔭᐊᓂᐃᔑᑭᔝᐤᐅᑾᐱᔦᑐᐗᒣᒋᐁᔅᓱᓀᓄᒧᓭᔥᐨᑡᔕᓴᓶᓉᐺᓪᑉᐼᑴᑄᒐᒬᔐᔗᑺᔡᒻᒡᑶ |
|
|
| Okiek | oki |
|
|
| Oromo | om |
| Metadata |
|---|
| Tokenization: | L-239 |
|
| Oromo (Ethiopia)
| om-ET |
| Metadata |
|---|
| Tokenization: | c-523 |
|
| Oromo (Kenya)
| om-KE |
|
|
| Oriya, Odia | or |
| Metadata |
|---|
| Tokenization: | L-240 |
|
| Oriya, Odia (India)
| or-IN |
| Metadata |
|---|
| Tokenization: | c-522 |
|
| Oriya, Odia {Latn} (India)
| or-Latn-IN |
| Metadata |
|---|
| Tokenization: | c-811 |
|
|
| Odia, Oriya (individual language) | ory |
| Metadata |
|---|
| Letter: | ଅଆଇଈଉଊଋଏଐଓଔକଖଗଘଙଚଛଜଝଞଟଠଡଢଣତଥଦଧନପଫବଭମଯୟରଲଳଵୱଶଷସହ | | Mark: | ଼ଁଂଃାିୀୁୂୃେୈୋୌ୍ୖୗ | | Number: | ୧୨୩୪୫୬୭୮୯ |
|
|
| Ossetian | os |
| Metadata |
|---|
| Tokenization: | L-241 |
|
| Ossetian (Georgia)
| os-GE |
|
| Ossetian (Russia)
| os-RU |
| Metadata |
|---|
| Tokenization: | c-524 |
|
|
| Osage | osa |
| Metadata |
|---|
| Letter: | ВаВѰВҰВӰВВհВְВװВذВٰВڰВ۰ВܰВݰВްВ߰Г0ГpГ°ГðГİГŰГưГǰГȰГɰГʰГ˰Г̰ГͰГΰГϰГаГѰГҰГӰГذГٰГڰГ۰ГܰГݰГްГ߰Г0ГpГ°ГðГİГŰГưГǰГȰГɰГʰГ˰Г̰ГͰГΰГϰГаГѰГҰГӰГГհГְГװГذГٰГڰГۊܢ | | Mark: | ̄́̋͘ |
|
|
| Ottoman Turkish | ota |
| Metadata |
|---|
| Tokenization: | L-715 |
|
|
| Mezquital Otomi | ote |
| Metadata |
|---|
| Letter: | öüäéñúíáèÖÜÄÉÑÚÍÁÈ | | Mark: | ̱̈́̃̀ |
|
|
| Querétaro Otomi | otq |
|
|
| Punjabi | pa |
| Metadata |
|---|
| Tokenization: | L-242 | | Punctuation: | ‐–—‘’“”′″। | | Letter: | ੴਉਊਓਅਆਐਔਇਈਏਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜ\u{A33}\u{A36}\u{A59}\u{A5A}\u{A5B}\u{A5E} | | Mark: | ੱੰ਼੍ਾਿੀੁੂੇੈੋੌਂ | | Number: | ੧੨੩੪੫੬੭੮੯ |
|
| Punjabi {Arab}
| pa-Arab |
| Metadata |
|---|
| Punctuation: | ‰ | | Letter: | ءآؤئابتثجحخدذرزسشصضطظعغفقلمنهويٹپچڈڑژکگںھہیے | | Mark: | ُٓٔ |
|
| Punjabi (India)
| pa-IN |
| Metadata |
|---|
| Tokenization: | c-243 |
|
| Punjabi (Pakistan)
| pa-PK |
| Metadata |
|---|
| Tokenization: | c-412 |
|
| Punjabi {Arab} (Pakistan)
| pa-Arab-PK |
| Metadata |
|---|
| Tokenization: | c-647 |
|
|
| Pangasinan | pag |
|
| Pangasinan (Philippines)
| pag-PH |
|
|
| Pampanga, Kapampangan | pam |
|
|
| Papiamento | pap |
| Metadata |
|---|
| Tokenization: | L-632 | | Punctuation: | ’ | | Letter: | ñÑ | | Mark: | ̃ |
|
| Papiamento (Caribbean)
| pap-CB |
|
|
| Palauan | pau |
| Metadata |
|---|
| Tokenization: | L-633 |
|
|
| Páez | pbb |
| Metadata |
|---|
| Letter: | üëäïáÜËÄÏÁ | | Mark: | ̈́ |
|
|
| Northern Pashto | pbu |
| Metadata |
|---|
| Punctuation: | ٫٬٪؉‰ | | Letter: | آاأءبپتټثجځچڅحخدډذرړزژږسشښصضطظعغفقکګگلمنڼهةوؤیيېۍئ | | Mark: | ًٌٍَُِّْٰٔٓ | | Number: | ۱۲۳۴۵۶۷۸۹ |
|
|
| Picard | pcd |
| Metadata |
|---|
| Tokenization: | L-634 | | Letter: | èåûîéôçÈÅÛÎÉÔÇ | | Mark: | ̧̀̊̂́ |
|
|
| Nigerian Pidgin | pcm |
| Metadata |
|---|
| Tokenization: | L-408 |
|
| Nigerian Pidgin (Nigeria)
| pcm-NG |
| Metadata |
|---|
| Tokenization: | c-517 |
|
|
| Iranian Persian | pes |
| Metadata |
|---|
| Punctuation: | ٫٬٪؉،؛؟‰‐…‹›«» | | Letter: | آاءأؤئبپتثجچحخدذرزژسشصضطظعغفقکگلمنوهةیإي | | Mark: | ًٌٍِّٕٔٓ | | Number: | ۱۲۳۴۵۶۷۸۹ |
|
|
| Pali | pi |
| Metadata |
|---|
| Tokenization: | L-244 |
|
|
| Pijin | pis |
| Metadata |
|---|
| Tokenization: | L-635 |
|
|
| Pintupi-Luritja | piu |
|
|
| Polish | pl |
| Metadata |
|---|
| Tokenization: | L-245 | | Punctuation: | «»§‐–—…”„†‡′″ | | Letter: | óÓąćęłńśźżĄĆĘŁŃŚŹŻ | | Mark: | ̨́̇ |
|
| Polish (Poland)
| pl-PL |
| Metadata |
|---|
| Tokenization: | c-246 |
|
|
| Plateau Malagasy | plt |
| Metadata |
|---|
| Letter: | àâéèêëìîïñôÀÂÉÈÊËÌÎÏÑÔ | | Mark: | ̀̂́̈̃ |
|
|
| Pam | pmn |
|
|
| Western Panjabi | pnb |
| Metadata |
|---|
| Punctuation: | ‐–—‘’“”′″ | | Letter: | ءآؤئابپتثٹجچحخدذڈرزڑژسشصضطظعغفقکگلمنںهھہویےي | | Mark: | ُٓٔ |
|
|
| Pohnpeian | pon |
| Metadata |
|---|
| Tokenization: | L-716 |
|
|
| Pipil, Nicarao | ppl |
| Metadata |
|---|
| Letter: | áéÁÉ | | Mark: | ́ |
|
|
| Prussian | prg |
| Metadata |
|---|
| Punctuation: | ‐–—…“„ | | Letter: | țȚāēģīķņōŗšūžĀĒĢĪĶŅŌŖŠŪŽḑḐ | | Mark: | ̧̦̄̌ |
|
| Prussian (International)
| prg-INT |
|
|
| Ashéninka Perené | prq |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | íÍ | | Mark: | ́ |
|
|
| Dari, Afghan Persian | prs |
| Metadata |
|---|
| Tokenization: | L-361 | | Punctuation: | ،‐ | | Letter: | اعلمیهجنحقوبشرصدسزآکئثتذضخپگظفغطأچژءي | | Mark: | ًٔٓ | | Number: | ۱۹۴۸۲۳۵۶۷۰ |
|
| Dari, Afghan Persian (Afghanistan)
| prs-AF |
| Metadata |
|---|
| Tokenization: | c-469 |
|
|
| Pashto | ps |
| Metadata |
|---|
| Tokenization: | L-247 |
|
| Pashto (Afghanistan)
| ps-AF |
| Metadata |
|---|
| Tokenization: | c-248 |
|
|
| Portuguese | pt |
| Metadata |
|---|
| Tokenization: | L-249 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | áàâãçéêíóòôõúºÁÀÂÃÇÉÊÍÓÒÔÕÚ | | Mark: | ̧́̀̂̃ |
|
| Portuguese (Africa)
| pt-002 |
|
| Portuguese (Angola)
| pt-AO |
| Metadata |
|---|
| Tokenization: | c-411 |
|
| Portuguese (Brazil)
| pt-BR |
| Metadata |
|---|
| Tokenization: | c-250 |
|
| Portuguese (Cabo Verde)
| pt-CV |
|
| Portuguese (Guinea-Bissau)
| pt-GW |
|
| Portuguese (Macau)
| pt-MO |
|
| Portuguese (Mozambique)
| pt-MZ |
| Metadata |
|---|
| Tokenization: | c-732 |
|
| Portuguese (Portugal)
| pt-PT |
| Metadata |
|---|
| Tokenization: | c-251 |
|
| Portuguese (Sao Tome and Principe)
| pt-ST |
|
| Portuguese (Timor-Leste)
| pt-TL |
|
|
| Hinglish | qhi |
|
| Hinglish (India)
| qhi-IN |
|
|
| Simple Hindi | qsh |
|
| Simple Hindi {Deva} (India)
| qsh-Deva-IN |
|
|
| Taiwanese Hokkien | qtg |
|
| Taiwanese Hokkien {Hant} (Taiwan)
| qtg-Hant-TW |
|
|
| Thoda English | qth |
|
| Thoda English {Deva} (India)
| qth-Deva-IN |
|
|
| Quechua | qu |
| Metadata |
|---|
| Tokenization: | L-252 |
|
| Quechua (Bolivia)
| qu-BO |
| Metadata |
|---|
| Tokenization: | c-253 |
|
| Quechua (Ecuador)
| qu-EC |
| Metadata |
|---|
| Tokenization: | c-254 |
|
| Quechua (Peru)
| qu-PE |
| Metadata |
|---|
| Tokenization: | c-255 |
|
|
| K'iche', Quiché | quc |
| Metadata |
|---|
| Tokenization: | L-388 |
|
| K'iche', Quiché (Guatemala)
| quc-GT |
| Metadata |
|---|
| Tokenization: | c-496 |
|
| K'iche', Quiché {Latn} (Guatemala)
| quc-Latn-GT |
|
| K'iche', Quiché (Peru)
| quc-PE |
| Metadata |
|---|
| Tokenization: | c-525 |
|
|
| Ayacucho Quechua | quy |
|
| Ayacucho Quechua (Peru)
| quy-PE |
|
|
| Cusco Quechua | quz |
|
| Cusco Quechua (Bolivia)
| quz-BO |
|
| Cusco Quechua (Ecuador)
| quz-EC |
|
| Cusco Quechua (Peru)
| quz-PE |
|
|
| Puno Quechua | qxp |
| Metadata |
|---|
| Punctuation: | ‰ | | Letter: | Ññʼ | | Mark: | ̃ |
|
|
| Quenya | qya |
| Metadata |
|---|
| Tokenization: | L-413 |
|
|
| Rarotongan, Cook Islands Maori | rar |
|
|
| Rohingya | rhg |
| Metadata |
|---|
| Tokenization: | L-754 |
|
| Rohingya {Rohg} (Myanmar)
| rhg-Rohg-MM |
| Metadata |
|---|
| Tokenization: | c-825 |
|
|
| Rakhine | rki |
| Metadata |
|---|
| Tokenization: | L-572 |
|
| Rakhine (Myanmar)
| rki-MM |
| Metadata |
|---|
| Tokenization: | c-573 |
|
|
| Romansh | rm |
| Metadata |
|---|
| Tokenization: | L-256 | | Letter: | àüöéèìòùÀÜÖÉÈÌÒÙ | | Mark: | ̀̈́ |
|
| Romansh (Switzerland)
| rm-CH |
| Metadata |
|---|
| Tokenization: | c-526 |
|
|
| Balkan Romani | rmn |
| Metadata |
|---|
| Letter: | àõùèìòâÀÕÙÈÌÒÂƟśěćŕăąňűźőģůščžŚĚĆŔĂĄŇŰŹŐĢŮŠČŽɵ | | Mark: | ̨̧̀́̌̃̆̋̂̊ |
|
|
| Rundi | rn |
| Metadata |
|---|
| Tokenization: | L-257 |
|
| Rundi (Burundi)
| rn-BI |
| Metadata |
|---|
| Tokenization: | c-498 |
|
|
| Romanian | ro |
| Metadata |
|---|
| Tokenization: | L-258 | | Punctuation: | «»‐–—…‘“”„ | | Letter: | âîÂÎșțȘȚăĂ | | Mark: | ̦̆̂ |
|
| Romanian (Moldova)
| ro-MD |
| Metadata |
|---|
| Tokenization: | c-260 |
|
| Romanian (Romania)
| ro-RO |
| Metadata |
|---|
| Tokenization: | c-259 |
|
|
| Rombo | rof |
|
| Rombo (Tanzania, United Republic of)
| rof-TZ |
|
|
| Romany | rom |
| Metadata |
|---|
| Tokenization: | L-414 |
|
|
| Russian | ru |
| Metadata |
|---|
| Tokenization: | L-261 | | Punctuation: | ‐–—…‘‚“„«»§ | | Letter: | всеобщаядклрципчнтзгшюйьмуыхъжэфёВСЕОБЩАЯДКЛРЦИПЧНТЗГШЮЙЬМУЫХЪЖЭФЁ | | Mark: | ̆̈ |
|
| Russian (Belarus)
| ru-BY |
|
| Russian (Estonia)
| ru-EE |
| Metadata |
|---|
| Tokenization: | c-655 |
|
| Russian (Israel)
| ru-IL |
| Metadata |
|---|
| Tokenization: | c-656 |
|
| Russian (Kyrgyzstan)
| ru-KG |
|
| Russian (Kazakhstan)
| ru-KZ |
|
| Russian (Latvia)
| ru-LV |
| Metadata |
|---|
| Tokenization: | c-657 |
|
| Russian (Moldova)
| ru-MD |
| Metadata |
|---|
| Tokenization: | c-263 |
|
| Russian (Russia)
| ru-RU |
| Metadata |
|---|
| Tokenization: | c-262 |
|
| Russian {Latn} (Russia)
| ru-Latn-RU |
| Metadata |
|---|
| Tokenization: | c-817 |
|
| Russian (Ukraine)
| ru-UA |
| Metadata |
|---|
| Tokenization: | c-415 |
|
| Russian (Uzbekistan)
| ru-UZ |
| Metadata |
|---|
| Tokenization: | c-824 |
|
|
| Rusyn | rue |
| Metadata |
|---|
| Tokenization: | L-416 |
|
| Rusyn (Ukraine)
| rue-UA |
| Metadata |
|---|
| Tokenization: | c-527 |
|
|
| Macedo-Romanian, Aromanian, Arumanian | rup |
| Metadata |
|---|
| Letter: | ãâà| | Mark: | ̃̂ |
|
|
| Kinyarwanda | rw |
| Metadata |
|---|
| Tokenization: | L-264 |
|
| Kinyarwanda (Rwanda)
| rw-RW |
| Metadata |
|---|
| Tokenization: | c-497 |
|
|
| Rwa | rwk |
|
| Rwa (Tanzania, United Republic of)
| rwk-TZ |
|
|
| Sanskrit | sa |
| Metadata |
|---|
| Tokenization: | L-265 | | Punctuation: | । | | Letter: | मनवधकरणजगतअभघषयपचशसएछबदटडहइआञउठथलढऽ | | Mark: | ािंो्ूेुौैीृॄ़ |
|
| Sanskrit (India)
| sa-IN |
| Metadata |
|---|
| Tokenization: | c-266 |
|
|
| Yakut | sah |
| Metadata |
|---|
| Letter: | абгҕдьийклмнҥоөпрстуүхһчыэецязювщъжфАБГҔДЬИЙКЛМНҤОӨПРСТУҮХҺЧЫЭЕЦЯЗЮВЩЪЖФ | | Mark: | ̆ |
|
| Yakut (Russia)
| sah-RU |
|
|
| Samburu | saq |
|
| Samburu (Kenya)
| saq-KE |
|
|
| Santali | sat |
| Metadata |
|---|
| Tokenization: | L-417 |
|
| Santali (India)
| sat-IN |
| Metadata |
|---|
| Tokenization: | c-529 |
|
|
| Sangu (Tanzania) | sbp |
|
| Sangu (Tanzania) (Tanzania, United Republic of)
| sbp-TZ |
|
|
| Sardinian | sc |
| Metadata |
|---|
| Tokenization: | L-267 |
|
| Sardinian (Italy)
| sc-IT |
| Metadata |
|---|
| Tokenization: | c-530 |
|
|
| Sicilian | scn |
|
| Sicilian (Italy)
| scn-IT |
|
|
| Scots | sco |
| Metadata |
|---|
| Tokenization: | L-418 |
|
| Scots (United Kingdom)
| sco-GB |
| Metadata |
|---|
| Tokenization: | c-531 |
|
|
| Sindhi | sd |
| Metadata |
|---|
| Tokenization: | L-268 | | Punctuation: | ‰ | | Letter: | آابٻپڀتثٺٽٿجھڃڄچڇحخدذڊڌڍڏرزڙسشصضطظعغفڦقکڪگڱڳلمنڻهوي | | Mark: | ٓ |
|
| Sindhi {Deva} (India)
| sd-Deva-IN |
| Metadata |
|---|
| Tokenization: | c-772 |
|
| Sindhi (Pakistan)
| sd-PK |
| Metadata |
|---|
| Tokenization: | c-733 |
|
| Sindhi {Arab} (Pakistan)
| sd-Arab-PK |
| Metadata |
|---|
| Tokenization: | c-535 |
|
| Sindhi {Deva} (Pakistan)
| sd-Deva-PK |
| Metadata |
|---|
| Tokenization: | c-605 |
|
|
| Southern Kurdish | sdh |
| Metadata |
|---|
| Tokenization: | L-499 |
|
| Southern Kurdish {Arab} (Iran)
| sdh-Arab-IR |
| Metadata |
|---|
| Tokenization: | c-500 |
|
|
| Northern Sami | se |
| Metadata |
|---|
| Tokenization: | L-269 | | Letter: | áÁčđŋšŧžČĐŊŠŦŽ | | Mark: | ́̌ |
|
| Northern Sami (Finland)
| se-FI |
| Metadata |
|---|
| Tokenization: | c-270 |
|
| Northern Sami (Norway)
| se-NO |
| Metadata |
|---|
| Tokenization: | c-271 |
|
| Northern Sami (Sweden)
| se-SE |
| Metadata |
|---|
| Tokenization: | c-272 |
|
|
| Sena | seh |
| Metadata |
|---|
| Letter: | áàâãçéêíóòôõúÁÀÂÃÇÉÊÍÓÒÔÕÚ | | Mark: | ̧́̀̂̃ |
|
| Sena (Mozambique)
| seh-MZ |
|
|
| Koyraboro Senni Songhai | ses |
| Metadata |
|---|
| Letter: | ãõÃÕƝŋšžŊŠŽẽẼɲ | | Mark: | ̃̌ |
|
| Koyraboro Senni Songhai (Mali)
| ses-ML |
|
|
| Secoya | sey |
| Metadata |
|---|
| Letter: | ëñàéËÑÀÉ | | Mark: | ̱̈̃̀́ |
|
|
| Sango | sg |
| Metadata |
|---|
| Tokenization: | L-273 | | Letter: | âäêëîïôöùûüÂÄÊËÎÏÔÖÙÛÜ | | Mark: | ̂̈̀ |
|
| Sango (Central African Republic)
| sg-CF |
| Metadata |
|---|
| Tokenization: | c-528 |
|
|
| Tachelhit | shi |
| Metadata |
|---|
| Letter: | ⴰⴱⴳⵯⴷⴹⴻⴼⴽⵀⵃⵄⵅⵇⵉⵊⵍⵎⵏⵓⵔⵕⵖⵙⵚⵛⵜⵟⵡⵢⵣⵥ |
|
| Tachelhit {Latn}
| shi-Latn |
| Metadata |
|---|
| Letter: | ḍḥṛṣṭḌḤṚṢṬƐƔɛɣʷ | | Mark: | ̣ |
|
| Tachelhit {Latn} (Morocco)
| shi-Latn-MA |
|
| Tachelhit {Tfng} (Morocco)
| shi-Tfng-MA |
|
|
| Shilluk | shk |
| Metadata |
|---|
| Letter: | ÀÁÄÈÉËÌÍÏÓÖØÙÚàáäèéëìíïóöøùú | | Mark: | ́̈̀ |
|
|
| Shan | shn |
| Metadata |
|---|
| Tokenization: | L-743 | | Punctuation: | ။၊ | | Letter: | လၵပၼၽဝငသဢတမၸၾႁယၶၺထရ | | Mark: | ိ်ႈုၢႇွႆူးဵီႊႅႃႉေႂႄြ |
|
| Shan {Mymr} (Myanmar)
| shn-Mymr-MM |
| Metadata |
|---|
| Tokenization: | c-795 |
|
|
| Shipibo-Conibo | shp |
| Metadata |
|---|
| Punctuation: | ¿ | | Letter: | íáóéñúÍÁÓÉÑÚ | | Mark: | ́̃ |
|
|
| Sinhala | si |
| Metadata |
|---|
| Tokenization: | L-274 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | අආඇඈඉඊඋඌඍඑඒඓඔඕඖකඛගඝඞඟචඡජඣඥටඨඩඪණඬතථදධනඳපඵබභමඹයරලවශෂසහළෆ | | Mark: | ංඃ්ාැෑිීුූෘෙේෛොෝෞෟ |
|
| Sinhala (Sri Lanka)
| si-LK |
| Metadata |
|---|
| Tokenization: | c-536 |
|
|
| Sidama | sid |
|
| Sidama {Latn} (Ethiopia)
| sid-Latn-ET |
|
|
| Epena | sja |
|
|
| Sindarin | sjn |
| Metadata |
|---|
| Tokenization: | L-717 |
|
|
| Slovak | sk |
| Metadata |
|---|
| Tokenization: | L-275 | | Punctuation: | ‐–…‘‚“„§ | | Letter: | čďĺľňŕšťžűČĎĹĽŇŔŠŤŽŰáäéíóôúýÁÄÉÍÓÔÚÝ | | Mark: | ́̈̌̂̋ |
|
| Slovak (Slovakia)
| sk-SK |
| Metadata |
|---|
| Tokenization: | c-276 |
|
|
| Saraiki, Seraiki | skr |
| Metadata |
|---|
| Punctuation: | ۔، | | Letter: | انسیحقودعلمشرپہڱھےکڄئتڻزںگڈفظجچبڑصڋخڔٹطآذضغةثٻي | | Mark: | ُٔٓ | | Number: | ۱۲۳۴۵۶۷۸۹۰ |
|
|
| Slovenian | sl |
| Metadata |
|---|
| Tokenization: | L-277 | | Letter: | čšžČŠŽ | | Mark: | ̌ |
|
| Slovenian (Slovenia)
| sl-SI |
| Metadata |
|---|
| Tokenization: | c-278 |
|
|
| Samoan | sm |
| Metadata |
|---|
| Tokenization: | L-279 |
|
| Samoan (Samoa)
| sm-WS |
| Metadata |
|---|
| Tokenization: | c-735 |
|
|
| Southern Sami | sma |
| Metadata |
|---|
| Tokenization: | L-637 |
|
| Southern Sami (Norway)
| sma-NO |
|
| Southern Sami (Sweden)
| sma-SE |
|
|
| Lule Sami | smj |
| Metadata |
|---|
| Tokenization: | L-638 |
|
| Lule Sami (Norway)
| smj-NO |
|
| Lule Sami (Sweden)
| smj-SE |
|
|
| Inari Sami | smn |
| Metadata |
|---|
| Tokenization: | L-639 | | Letter: | âäáÂÄÁčđŋšžČĐŊŠŽ | | Mark: | ̂̌̈́ |
|
| Inari Sami (Finland)
| smn-FI |
|
|
| Skolt Sami | sms |
| Metadata |
|---|
| Tokenization: | L-640 |
|
| Skolt Sami (Finland)
| sms-FI |
|
|
| Shona | sn |
| Metadata |
|---|
| Tokenization: | L-280 |
|
| Shona (Zimbabwe)
| sn-ZW |
| Metadata |
|---|
| Tokenization: | c-534 |
|
| Shona {Latn} (Zimbabwe)
| sn-Latn-ZW |
|
|
| Soninke | snk |
| Metadata |
|---|
| Tokenization: | L-641 | | Letter: | ñÑŋŊ | | Mark: | ̃ |
|
| Soninke {Latn} (Mali)
| snk-Latn-ML |
|
|
| Siona | snn |
| Metadata |
|---|
| Letter: | ëñíäéËÑÍÄÉ | | Mark: | ̱̈̃́ |
|
|
| Somali | so |
| Metadata |
|---|
| Tokenization: | L-281 |
|
| Somali (Djibouti)
| so-DJ |
| Metadata |
|---|
| Tokenization: | c-752 |
|
| Somali (Ethiopia)
| so-ET |
| Metadata |
|---|
| Tokenization: | c-753 |
|
| Somali (Kenya)
| so-KE |
| Metadata |
|---|
| Tokenization: | c-754 |
|
| Somali (Somalia)
| so-SO |
| Metadata |
|---|
| Tokenization: | c-537 |
|
|
| Songhai languages | son |
| Metadata |
|---|
| Tokenization: | L-420 |
|
|
| Sabaot | spy |
|
|
| Albanian | sq |
|
| Albanian (Albania)
| sq-AL |
|
| Albanian (Macedonia)
| sq-MK |
|
|
| Serbian | sr |
| Metadata |
|---|
| Tokenization: | L-282 | | Punctuation: | ‐–…‘‚“„ | | Letter: | абвгдђежзијклљмнњопрстћуфхцчџшАБВГДЂЕЖЗИЈКЛЉМНЊОПРСТЋУФХЦЧЏШ |
|
| Serbian {Cyrl}
| sr-Cyrl |
| Metadata |
|---|
| Tokenization: | L-287 |
|
| Serbian {Latn}
| sr-Latn |
| Metadata |
|---|
| Tokenization: | L-283 | | Punctuation: | ‐–…‘‚“„ | | Letter: | čćžđšČĆŽĐŠ | | Mark: | ̌́ |
|
| Serbian {Cyrl} (Bosnia and Herzegovina)
| sr-Cyrl-BA |
| Metadata |
|---|
| Tokenization: | c-285 |
|
| Serbian {Latn} (Bosnia and Herzegovina)
| sr-Latn-BA |
| Metadata |
|---|
| Tokenization: | c-284 |
|
| Serbian {Cyrl} (Montenegro)
| sr-Cyrl-ME |
| Metadata |
|---|
| Tokenization: | c-289 |
|
| Serbian {Latn} (Montenegro)
| sr-Latn-ME |
| Metadata |
|---|
| Tokenization: | c-288 |
|
| Serbian {Cyrl} (Serbia)
| sr-Cyrl-RS |
| Metadata |
|---|
| Tokenization: | c-290 |
|
| Serbian {Latn} (Serbia)
| sr-Latn-RS |
| Metadata |
|---|
| Tokenization: | c-286 |
|
|
| Logudorese Sardinian | src |
| Metadata |
|---|
| Letter: | òìàèùÒÌÀÈÙ | | Mark: | ̀ |
|
|
| Serer | srr |
| Metadata |
|---|
| Letter: | ñÑƭƴƊƁƬƳŋćŊĆṕṔɗɓ | | Mark: | ̃́ |
|
|
| Swati | ss |
| Metadata |
|---|
| Tokenization: | L-291 |
|
| Swati (Swaziland)
| ss-SZ |
|
| Swati (South Africa)
| ss-ZA |
|
|
| Saho | ssy |
|
| Saho (Eritrea)
| ssy-ER |
|
|
| Southern Sotho | st |
| Metadata |
|---|
| Tokenization: | L-292 |
|
| Southern Sotho (Lesotho)
| st-LS |
|
| Southern Sotho (South Africa)
| st-ZA |
| Metadata |
|---|
| Tokenization: | c-540 |
|
|
| Siberian Tatar | sty |
| Metadata |
|---|
| Tokenization: | L-602 |
|
| Siberian Tatar (Russia)
| sty-RU |
| Metadata |
|---|
| Tokenization: | c-549 |
|
|
| Sundanese | su |
| Metadata |
|---|
| Tokenization: | L-293 |
|
| Sundanese {Sund}
| su-Sund |
| Metadata |
|---|
| Letter: | ᮊᮋᮌᮍᮎᮏᮐᮑᮒᮓᮔᮕᮖᮗᮘᮙᮚᮛᮜᮝᮞᮟᮠᮮᮯᮃᮄᮅᮆᮇᮈᮉ | | Mark: | ᮡᮢᮣᮀᮁᮂᮤᮥᮦᮧᮨᮩ᮪ | | Number: | ᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹ |
|
| Sundanese {Latn} (Indonesia)
| su-Latn-ID |
| Metadata |
|---|
| Tokenization: | c-541 |
|
| Sundanese {Sund} (Indonesia)
| su-Sund-ID |
| Metadata |
|---|
| Tokenization: | c-542 |
|
|
| Sukuma | suk |
|
|
| Susu | sus |
|
|
| Swedish | sv |
| Metadata |
|---|
| Tokenization: | L-294 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | àéåäöÀÉÅÄÖ | | Mark: | ̀́̊̈ |
|
| Swedish (Åland Islands)
| sv-AX |
|
| Swedish (Finland)
| sv-FI |
| Metadata |
|---|
| Tokenization: | c-295 |
|
| Swedish (Sweden)
| sv-SE |
| Metadata |
|---|
| Tokenization: | c-296 |
|
|
| Swahili | sw |
| Metadata |
|---|
| Tokenization: | L-297 |
|
| Swahili (Democratic Republic of the Congo)
| sw-CD |
|
| Swahili (Democratic Republic of the Congo; Kinshasa)
| sw-CD-KN |
|
| Swahili (Kenya)
| sw-KE |
| Metadata |
|---|
| Tokenization: | c-298 |
|
| Swahili (Somalia)
| sw-SO |
| Metadata |
|---|
| Tokenization: | c-736 |
|
| Swahili (Tanzania, United Republic of)
| sw-TZ |
| Metadata |
|---|
| Tokenization: | c-737 |
|
| Swahili (Uganda)
| sw-UG |
| Metadata |
|---|
| Tokenization: | c-738 |
|
|
| Maore Comorian | swb |
| Metadata |
|---|
| Letter: | ãÃƁƊĩĨẽẼɓɗ | | Mark: | ̃ |
|
|
| Swahili (individual language), Kiswahili | swh |
|
|
| Sutu | sx |
| Metadata |
|---|
| Tokenization: | L-642 |
|
|
| Classical Syriac | syc |
| Metadata |
|---|
| Punctuation: | ،؛.؟܀܁܂܃܄܅܆܇܈܉܊܋܌܍ | | Letter: | ܐܝܘܦܒܬܛܕܟܓܩܥܣܤܨܫܙܚܗܡܢܪܠـ | | Mark: | ّܼܸܹܻܾܷܱܴ݂̥̣݄̤݈̱̭̮ܿܲܵܺܽܶܰܳ݁̊݀̇݃̈݇̄݉݊ |
|
|
| Syriac | syr |
| Metadata |
|---|
| Tokenization: | L-299 |
|
| Syriac (Syria)
| syr-SY |
| Metadata |
|---|
| Tokenization: | c-300 |
|
| Syriac (Turkey)
| syr-TR |
| Metadata |
|---|
| Tokenization: | c-739 |
|
|
| Silesian | szl |
| Metadata |
|---|
| Tokenization: | L-611 |
|
| Silesian (Poland)
| szl-PL |
| Metadata |
|---|
| Tokenization: | c-761 |
|
|
| Tamil | ta |
| Metadata |
|---|
| Tokenization: | L-301 | | Punctuation: | “”‘’ | | Letter: | ஃஅஆஇஈஉஊஎஏஐஒஓஔகஙசஜஞடணதநனபமயரறலளழவஶஷஸஹ | | Mark: | ாிீுூெேைொோௌ்ௗ |
|
| Tamil (India)
| ta-IN |
| Metadata |
|---|
| Tokenization: | c-302 |
|
| Tamil {Latn} (India)
| ta-Latn-IN |
| Metadata |
|---|
| Tokenization: | c-814 |
|
| Tamil (Sri Lanka)
| ta-LK |
| Metadata |
|---|
| Tokenization: | c-581 |
|
| Tamil (Malaysia)
| ta-MY |
|
| Tamil (Singapore)
| ta-SG |
|
|
| Tamasheq | taq |
| Metadata |
|---|
| Tokenization: | L-749 |
|
| Tamasheq {Latn} (Mali)
| taq-Latn-ML |
| Metadata |
|---|
| Tokenization: | c-805 |
|
| Tamasheq {Tfng} (Mali)
| taq-Tfng-ML |
| Metadata |
|---|
| Tokenization: | c-804 |
|
|
| Atayal | tay |
| Metadata |
|---|
| Tokenization: | L-353 |
|
| Atayal (Taiwan)
| tay-TW |
| Metadata |
|---|
| Tokenization: | c-447 |
|
|
| Tagbanwa | tbw |
| Metadata |
|---|
| Punctuation: | ᜵᜶ | | Letter: | ᝩᝦᝣᝪᝧᝤᝰᝫᝨᝥᝯᝮᝬᝠᝡᝢ | | Mark: | ᝲᝳ |
|
| Tagbanwa {Tagb}
| tbw-Tagb |
| Metadata |
|---|
| Punctuation: | ᜵᜶ | | Letter: | ᝩᝦᝣᝪᝧᝤᝰᝫᝨᝥᝯᝮᝬᝠᝡᝢ | | Mark: | ᝲᝳ |
|
|
| Ditammari | tbz |
| Metadata |
|---|
| Letter: | úàóãìùÚÀÓÃÌÙƉƐƆũŋĩŨŊĨɖɛɔ | | Mark: | ̃́̀ |
|
|
| Ticuna | tca |
| Metadata |
|---|
| Letter: | üéãñõúáíóÜÉÃÑÕÚÁÍÓĩũĨŨẽṯḏṉẼṮḎṈ | | Mark: | ̱̃́̈͟ |
|
|
| Tai Nüa | tdd |
| Metadata |
|---|
| Letter: | ᥐᥑᥒᥓᥔᥕᥖᥗᥘᥙᥚᥛᥜᥝᥞᥟᥠᥡᥢᥣᥤᥥᥦᥧᥨᥩᥪᥫᥬᥭᥰᥱᥲᥳᥴ |
|
|
| Tetun Dili | tdt |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | áíúóÁÍÚÓ | | Mark: | ́ |
|
|
| Telugu | te |
| Metadata |
|---|
| Tokenization: | L-303 | | Punctuation: | ‘’“” | | Letter: | అఆఇఈఉఊఋఎఏఐఒఓఔకఖగఘఙచఛజఝఞటఠడఢణతథదధనపఫబభమయరఱలళవశషసహ | | Mark: | ంఃాిీుూృెేైొోౌ్ౖ |
|
| Telugu (India)
| te-IN |
| Metadata |
|---|
| Tokenization: | c-304 |
|
| Telugu {Latn} (India)
| te-Latn-IN |
| Metadata |
|---|
| Tokenization: | c-813 |
|
|
| Timne | tem |
| Metadata |
|---|
| Punctuation: | ‐ | | Letter: | ɅƆƏƐŋŊʌɔəɛ |
|
|
| Teso | teo |
| Metadata |
|---|
| Tokenization: | L-579 |
|
| Teso (Kenya)
| teo-KE |
|
| Teso (Uganda)
| teo-UG |
| Metadata |
|---|
| Tokenization: | c-580 |
|
|
| Tetum | tet |
| Metadata |
|---|
| Tokenization: | L-643 |
|
| Tetum (Indonesia)
| tet-ID |
| Metadata |
|---|
| Tokenization: | c-740 |
|
| Tetum (Timor-Leste)
| tet-TL |
| Metadata |
|---|
| Tokenization: | c-741 |
|
|
| Tajik | tg |
| Metadata |
|---|
| Tokenization: | L-305 | | Punctuation: | ‰ | | Letter: | эъломияуҳқбашрпегфтднзкхсвӣёҷчғюӯйжьЭЪЛОМИЯУҲҚБАШРПЕГФТДНЗКХСВӢЁҶЧҒЮӮЙЖЬ | | Mark: | ̄̈̆ |
|
| Tajik (Tajikistan)
| tg-TJ |
| Metadata |
|---|
| Tokenization: | c-544 |
|
| Tajik {Cyrl} (Tajikistan)
| tg-Cyrl-TJ |
|
|
| Thai | th |
| Metadata |
|---|
| Tokenization: | L-306 | | Punctuation: | ‐–—‘’“”…′″๏๚๛ | | Letter: | ฯๆกขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮะาๅำเแโใไ | | Mark: | ์็่้๊๋ัิีึืุู | | Number: | ๑๒๓๔๕๖๗๘๙ |
|
| Thai (Thailand)
| th-TH |
| Metadata |
|---|
| Tokenization: | c-307 |
|
|
| Tigrinya | ti |
| Metadata |
|---|
| Tokenization: | L-308 | | Punctuation: | ፣፡’ | | Letter: | ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖሗመሙሚማሜምሞሟሠሡሢሣሤሥሦሧረሩሪራሬርሮሯሰሱሲሳሴስሶሷሸሹሺሻሼሽሾሿቀቁቂቃቄቅቆቈቊቋቌቍቐቑቒቓቔቕቖቘቚቛቜቝበቡቢባቤብቦቧቨቩቪቫቬቭቮቯተቱቲታቴትቶቷቸቹቺቻቼችቾቿኀኁኂኃኄኅኆኈኊኋኌኍነኑኒናኔንኖኗኘኙኚኛኜኝኞኟአኡኢኣኤእኦኧከኩኪካኬክኮኰኲኳኴኵኸኹኺኻኼኽኾዀዂዃዄዅወዉዊዋዌውዎዐዑዒዓዔዕዖዘዙዚዛዜዝዞዟዠዡዢዣዤዥዦዧየዩዪያዬይዮደዱዲዳዴድዶዷጀጁጂጃጄጅጆጇገጉጊጋጌግጎጐጒጓጔጕጠጡጢጣጤጥጦጧጨጩጪጫጬጭጮጯጰጱጲጳጴጵጶጷጸጹጺጻጼጽጾጿፀፁፂፃፄፅፆፇፈፉፊፋፌፍፎፏፐፑፒፓፔፕፖፗ | | Mark: | ፟ |
|
| Tigrinya (Eritrea)
| ti-ER |
| Metadata |
|---|
| Tokenization: | c-827 |
|
| Tigrinya (Ethiopia)
| ti-ET |
| Metadata |
|---|
| Tokenization: | c-551 |
|
|
| Tigre | tig |
|
| Tigre (Eritrea)
| tig-ER |
|
|
| Tiv | tiv |
|
|
| Turkmen | tk |
| Metadata |
|---|
| Tokenization: | L-309 | | Punctuation: | §–—…“”‐‰ | | Letter: | çäöüýÇÄÖÜÝžňşŽŇŞ | | Mark: | ̧̈̌́ |
|
| Turkmen {Cyrl}
| tk-Cyrl |
| Metadata |
|---|
| Punctuation: | ‐– | | Letter: | адмхуклрынңәиецясгшбптчвзэоҗйөүъюжфёАДМХУКЛРЫНҢӘИЕЦЯСГШБПТЧВЗЭОҖЙӨҮЪЮЖФЁ | | Mark: | ̆̈ |
|
| Turkmen (Turkmenistan)
| tk-TM |
| Metadata |
|---|
| Tokenization: | c-554 |
|
|
| Tagalog | tl |
| Metadata |
|---|
| Tokenization: | L-310 |
|
| Tagalog (Philippines)
| tl-PH |
| Metadata |
|---|
| Tokenization: | c-311 |
|
|
| Klingon | tlh |
| Metadata |
|---|
| Tokenization: | L-390 |
|
|
| Talysh | tly |
| Metadata |
|---|
| Letter: | çÇƏığşİĞŞə | | Mark: | ̧̇̆ |
|
|
| Tswana | tn |
| Metadata |
|---|
| Tokenization: | L-312 | | Punctuation: | ·‐ | | Letter: | šŠ | | Mark: | ̌ |
|
| Tswana (Botswana)
| tn-BW |
| Metadata |
|---|
| Tokenization: | c-742 |
|
| Tswana (South Africa)
| tn-ZA |
| Metadata |
|---|
| Tokenization: | c-313 |
|
|
| Tonga | to |
| Metadata |
|---|
| Tokenization: | L-314 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | āēīōūĀĒĪŌŪáéíóúÁÉÍÓÚʻ | | Mark: | ́̄ |
|
| Tonga (Tonga)
| to-TO |
| Metadata |
|---|
| Tokenization: | c-743 |
|
|
| Toba | tob |
| Metadata |
|---|
| Tokenization: | L-756 | | Letter: | ỹỸíÍ | | Mark: | ̃́ |
|
| Toba (Argentina)
| tob-AR |
| Metadata |
|---|
| Tokenization: | c-831 |
|
|
| Tonga (Zambia) | toi |
|
|
| Tojolabal | toj |
|
|
| Papantla Totonac | top |
|
|
| Tok Pisin | tpi |
| Metadata |
|---|
| Tokenization: | L-644 |
|
| Tok Pisin (Papua New Guinea)
| tpi-PG |
|
|
| Turkish | tr |
| Metadata |
|---|
| Tokenization: | L-315 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | çöüâÇÖÜÂğışĞŞİ | | Mark: | ̧̇̆̈̂ |
|
| Turkish (Cyprus)
| tr-CY |
| Metadata |
|---|
| Tokenization: | c-426 |
|
| Turkish (Turkey)
| tr-TR |
| Metadata |
|---|
| Tokenization: | c-316 |
|
|
| Turoyo, Surayt | tru |
| Metadata |
|---|
| Punctuation: | ،؛؟܆܇ | | Letter: | ܐܝܘܦܒܬܛܕܟܓܩܫܔܣܨܙܚܥܗܡܢܪܠ | | Mark: | ܷܱ̰݂݆ܶܳܰ݁݅ |
|
|
| Tsonga | ts |
| Metadata |
|---|
| Tokenization: | L-317 | | Punctuation: | ’ | | Letter: | ìàçãòèùíéáúÌÀÇÃÒÈÙÍÉÁÚ | | Mark: | ̧̀̃́ |
|
| Tsonga (South Africa)
| ts-ZA |
| Metadata |
|---|
| Tokenization: | c-552 |
|
| Tsonga (Zimbabwe)
| ts-ZW |
| Metadata |
|---|
| Tokenization: | c-553 |
|
|
| Tausug | tsg |
| Metadata |
|---|
| Tokenization: | L-736 |
|
| Tausug {Arab} (Philippines)
| tsg-Arab-PH |
| Metadata |
|---|
| Tokenization: | c-787 |
|
| Tausug {Latn} (Philippines)
| tsg-Latn-PH |
| Metadata |
|---|
| Tokenization: | c-788 |
|
|
| Purepecha | tsz |
| Metadata |
|---|
| Letter: | áïéíÁÏÉÍⱭƲŋŊɑʋ | | Mark: | ́̈ |
|
|
| Tatar | tt |
| Metadata |
|---|
| Tokenization: | L-318 | | Letter: | кешхоуларынңгмидцясбәтьвзпөъһҗчүйфюэжКЕШХОУЛАРЫНҢГМИДЦЯСБӘТЬВЗПӨЪҺҖЧҮЙФЮЭЖёщЁЩ | | Mark: | ̈̆ |
|
| Tatar (Russia)
| tt-RU |
| Metadata |
|---|
| Tokenization: | c-744 |
|
| Tatar {Cyrl} (Russia)
| tt-Cyrl-RU |
| Metadata |
|---|
| Tokenization: | c-319 |
|
|
| Tumbuka | tum |
|
| Tumbuka {Latn}
| tum-Latn |
|
| Tumbuka {Mwng}
| tum-Mwng |
|
|
| Tuvalu | tvl |
|
|
| Twi | tw |
| Metadata |
|---|
| Tokenization: | L-320 |
|
| Twi (Ghana)
| tw-GH |
|
|
| Tasawaq | twq |
| Metadata |
|---|
| Letter: | ɲẽẼŋšžŊŠŽƝãõÃÕ | | Mark: | ̃̌ |
|
| Tasawaq (Niger)
| twq-NE |
|
|
| Tahitian | ty |
| Metadata |
|---|
| Tokenization: | L-321 | | Letter: | āūōēīĀŪŌĒĪ | | Mark: | ̄ |
|
| Tahitian (French Polynesia)
| ty-PF |
| Metadata |
|---|
| Tokenization: | c-543 |
|
|
| Tuvinian | tyv |
| Metadata |
|---|
| Letter: | кижнңэргелбүтуазычдьсмяоюцхпшөйвъфёКИЖНҢЭРГЕЛБҮТУАЗЫЧДЬСМЯОЮЦХПШӨЙВЪФЁ | | Mark: | ̆̈ |
|
|
| Tzeltal | tzh |
|
|
| Central Atlas Tamazight | tzm |
| Metadata |
|---|
| Letter: | ɛɣḍḥṛṣṭẓỵḌḤṚṢṬẒỴƐƔâéçÂÉÇʷ | | Mark: | ̧̣̂́ |
|
| Central Atlas Tamazight {Tfng}
| tzm-Tfng |
| Metadata |
|---|
| Tokenization: | L-709 |
|
| Central Atlas Tamazight {Latn} (Algeria)
| tzm-Latn-DZ |
|
| Central Atlas Tamazight {Arab} (Morocco)
| tzm-Arab-MA |
|
| Central Atlas Tamazight {Latn} (Morocco)
| tzm-Latn-MA |
|
| Central Atlas Tamazight {Tfng} (Morocco)
| tzm-Tfng-MA |
| Metadata |
|---|
| Tokenization: | c-545 |
|
|
| Tzotzil | tzo |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | óáéíúÓÁÉÍÚ | | Mark: | ́ |
|
|
| Udmurt | udm |
|
|
| Uyghur | ug |
| Metadata |
|---|
| Tokenization: | L-322 | | Punctuation: | ،؛ | | Letter: | ئاەبپتجچخدرزژسشغفقكگڭلمنھوۇۆۈۋېىي | | Mark: | ٔ |
|
| Uyghur {Latn}
| ug-Latn |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | öéüÖÉÜ | | Mark: | ̈́ |
|
| Uyghur (China)
| ug-CN |
| Metadata |
|---|
| Tokenization: | c-745 |
|
| Uyghur {Arab} (China)
| ug-Arab-CN |
| Metadata |
|---|
| Tokenization: | c-556 |
|
| Uyghur {Cyrl} (Kazakhstan)
| ug-Cyrl-KZ |
| Metadata |
|---|
| Tokenization: | c-557 |
|
|
| Ukrainian | uk |
| Metadata |
|---|
| Tokenization: | L-323 | | Punctuation: | –’“„‐«»§ | | Letter: | абвгґдеєжзиіїйклмнопрстуфхцчшщьюяАБВГҐДЕЄЖЗИІЇЙКЛМНОПРСТУФХЦЧШЩЬЮЯʼ | | Mark: | ̈̆ |
|
| Ukrainian (Ukraine)
| uk-UA |
| Metadata |
|---|
| Tokenization: | c-324 |
|
| Ukrainian {Latn} (Ukraine)
| uk-Latn-UA |
| Metadata |
|---|
| Tokenization: | c-818 |
|
|
| Umbundu | umb |
| Metadata |
|---|
| Tokenization: | L-741 | | Punctuation: | ’ | | Letter: | ñêãîõâÑÊÃÎÕ | | Mark: | ̃̂ |
|
| Umbundu (Angola)
| umb-AO |
| Metadata |
|---|
| Tokenization: | c-793 |
|
|
| Undetermined | und |
| Metadata |
|---|
| Tokenization: | L-604 |
|
|
| Urdu | ur |
| Metadata |
|---|
| Tokenization: | L-325 | | Punctuation: | ،؍٫٬؛؟۔”“٪ | | Letter: | اآبپتٹثجچحخدڈذرڑزژسشصضطظعغفقکگلمنںوؤہۂھءیئےهي | | Mark: | ًَُِّٰٔٓ | | Number: | ۱۲۳۴۵۶۷۸۹ |
|
| Urdu (India)
| ur-IN |
| Metadata |
|---|
| Tokenization: | c-746 |
|
| Urdu (Pakistan)
| ur-PK |
| Metadata |
|---|
| Tokenization: | c-326 |
|
| Urdu {Latn} (Pakistan)
| ur-Latn-PK |
| Metadata |
|---|
| Tokenization: | c-823 |
|
|
| Urarina | ura |
| Metadata |
|---|
| Letter: | úóíÚÓÍ | | Mark: | ́ |
|
|
| Uzbek | uz |
| Metadata |
|---|
| Tokenization: | L-327 |
|
| Uzbek (Afghanistan)
| uz-AF |
| Metadata |
|---|
| Tokenization: | c-747 |
|
| Uzbek {Cyrl} (Uzbekistan)
| uz-Cyrl-UZ |
| Metadata |
|---|
| Tokenization: | c-329 |
|
| Uzbek {Latn} (Uzbekistan)
| uz-Latn-UZ |
| Metadata |
|---|
| Tokenization: | c-328 |
|
|
| Northern Uzbek | uzn |
| Metadata |
|---|
| Punctuation: | ‐–—…‘’“”„′″«»§ | | Letter: | ʻʼ |
|
| Northern Uzbek {Arab}
| uzn-Arab |
| Metadata |
|---|
| Punctuation: | ؉٪٫٬ | | Letter: | ءآأؤئابةتثجحخدذرزسشصضطظعغفقلمنهويپچژکگۇۉی | | Mark: | ًٌٍَُِّْٰٓٔ | | Number: | ۱۲۳۴۵۶۷۸۹ |
|
| Northern Uzbek {Cyrl}
| uzn-Cyrl |
| Metadata |
|---|
| Punctuation: | ‐– | | Letter: | инсоҳуқлармждекцяйбшгтўвэъпчзёфхюғИНСОҲУҚЛАРМЖДЕКЦЯЙБШГТЎВЭЪПЧЗЁФХЮҒ | | Mark: | ̆̈ |
|
|
| Vai | vai |
| Metadata |
|---|
| Letter: | ꔀꔁꔂꔃꔄꔅꔆꔇꔈꔉꔊꔋꔌꔍꔎꔏꔐꔑꔒꔓꔔꔕꔖꔗꔘꔙꔚꔛꔜꔝꔞꔟꔠꔡꔢꔣꔤꔥꔦꔧꔨꔩꔪꔫꔬꔭꔮꔯꔰꔱꔲꔳꔴꔵꔶꔷꔸꔹꔺꔻꔼꔽꔾꔿꕀꕁꕂꕃꕄꕅꕆꕇꕈꕉꕊꕋꕌꕍꕎꕏꕐꕑꕒꕓꕔꕕꕖꕗꕘꕙꕚꕛꕜꕝꕞꕟꕠꕡꕢꕣꕤꕥꕦꕧꕨꕩꕪꕫꕬꕭꕮꕯꕰꕱꕲꕳꕴꕵꕶꕷꕸꕹꕺꕻꕼꕽꕾꕿꖀꖁꖂꖃꖄꖅꖆꖇꖈꖉꖊꖋꖌꖍꖎꖏꖐꖑꖒꖓꖔꖕꖖꖗꖘꖙꖚꖛꖜꖝꖞꖟꖠꖡꖢꖣꖤꖥꖦꖧꖨꖩꖪꖫꖬꖭꖮꖯꖰꖱꖲꖳꖴꖵꖶꖷꖸꖹꖺꖻꖼꖽꖾꖿꗀꗁꗂꗃꗄꗅꗆꗇꗈꗉꗊꗋꗌꗍꗎꗏꗐꗑꗒꗓꗔꗕꗖꗗꗘꗙꗚꗛꗜꗝꗞꗟꗠꗡꗢꗣꗤꗥꗦꗧꗨꗩꗪꗫꗬꗭꗮꗯꗰꗱꗲꗳꗴꗵꗶꗷꗸꗹꗺꗻꗼꗽꗾꗿꘀꘁꘂꘃꘄꘅꘆꘇꘈꘉꘊꘋꘌ |
|
| Vai {Latn}
| vai-Latn |
| Metadata |
|---|
| Letter: | áãéíóõúÁÃÉÍÓÕÚƁƊƐƆĩŋũĨŊŨẽẼɓɗɛɔ | | Mark: | ́̃ |
|
| Vai {Vaii} (Liberia)
| vai-Vaii-LR |
|
|
| Venda | ve |
| Metadata |
|---|
| Tokenization: | L-330 | | Punctuation: | “” | | Letter: | ṱḽḓṅṋṰḼḒṄṊ | | Mark: | ̭̇ |
|
| Venda (South Africa)
| ve-ZA |
| Metadata |
|---|
| Tokenization: | c-559 |
|
|
| Venetian | vec |
| Metadata |
|---|
| Punctuation: | ’— | | Letter: | óàòèùéìçÓÀÒÈÙÉÌÇƚȽđĐ | | Mark: | ̧́̀ |
|
| Venetian (Italy)
| vec-IT |
|
|
| Veps | vep |
| Metadata |
|---|
| Punctuation: | ’ | | Letter: | üäöÜÄÖšžčŠŽČ | | Mark: | ̈̌ |
|
|
| Vietnamese | vi |
| Metadata |
|---|
| Tokenization: | L-331 | | Punctuation: | §‐–—…‘’“”†‡′″ | | Letter: | àãáâèéêìíòõóôùúýÀÃÁÂÈÉÊÌÍÒÕÓÔÙÚÝơưƠƯăđĩũĂĐĨŨảạằẳẵắặầẩẫấậẻẽẹềểễếệỉịỏọồổỗốộờởỡớợủụừửữứựỳỷỹỵẢẠẰẲẴẮẶẦẨẪẤẬẺẼẸỀỂỄẾỆỈỊỎỌỒỔỖỐỘỜỞỠỚỢỦỤỪỬỮỨỰỲỶỸỴ | | Mark: | ̛̣̀̉̃́̆̂ |
|
| Vietnamese (Vietnam)
| vi-VN |
| Metadata |
|---|
| Tokenization: | c-332 |
|
|
| Soyaltepec Mazatec | vmp |
| Metadata |
|---|
| Tokenization: | L-593 |
|
| Soyaltepec Mazatec (Mexico)
| vmp-MX |
| Metadata |
|---|
| Tokenization: | c-594 |
|
|
| Makhuwa | vmw |
| Metadata |
|---|
| Punctuation: | ’… | | Letter: | çõãÇÕà | | Mark: | ̧̃ |
|
|
| Ayautla Mazatec | vmy |
| Metadata |
|---|
| Tokenization: | L-595 |
|
| Ayautla Mazatec (Mexico)
| vmy-MX |
| Metadata |
|---|
| Tokenization: | c-596 |
|
|
| Mazatlán Mazatec | vmz |
| Metadata |
|---|
| Tokenization: | L-597 |
|
| Mazatlán Mazatec (Mexico)
| vmz-MX |
| Metadata |
|---|
| Tokenization: | c-598 |
|
|
| Volapük | vo |
| Metadata |
|---|
| Tokenization: | L-333 | | Punctuation: | «»§‐–—…‘’“” | | Letter: | äöüÄÖÜ | | Mark: | ̈ |
|
| Volapük (International)
| vo-INT |
|
|
| Võro | vro |
| Metadata |
|---|
| Tokenization: | L-752 |
|
| Võro (Estonia)
| vro-EE |
| Metadata |
|---|
| Tokenization: | c-820 |
|
|
| Vunjo | vun |
|
| Vunjo (Tanzania, United Republic of)
| vun-TZ |
|
|
| Walloon | wa |
| Metadata |
|---|
| Tokenization: | L-334 | | Letter: | éåèûîôâêçàÉÅÈÛÎÔÂÊÇÀ | | Mark: | ̧́̊̀̂ |
|
| Walloon (Belgium)
| wa-BE |
| Metadata |
|---|
| Tokenization: | c-560 |
|
|
| Walser | wae |
| Metadata |
|---|
| Letter: | áäãéíóöõúüÁÄÃÉÍÓÖÕÚÜčšũČŠŨ | | Mark: | ́̈̃̌ |
|
| Walser (Switzerland)
| wae-CH |
|
|
| Wolaytta, Wolaitta | wal |
|
| Wolaytta, Wolaitta (Ethiopia)
| wal-ET |
|
|
| Waray (Philippines) | war |
| Metadata |
|---|
| Tokenization: | L-608 |
|
| Waray (Philippines) {Latn} (Philippines)
| war-Latn-PH |
| Metadata |
|---|
| Tokenization: | c-609 |
|
|
| Sorbian languages | wen |
| Metadata |
|---|
| Tokenization: | L-636 |
|
|
| Cameroon Pidgin | wes |
| Metadata |
|---|
| Tokenization: | L-735 |
|
| Cameroon Pidgin (Cameroon)
| wes-CM |
| Metadata |
|---|
| Tokenization: | c-785 |
|
|
| Wolof | wo |
| Metadata |
|---|
| Tokenization: | L-335 | | Punctuation: | ‰ | | Letter: | ëñàéóËÑÀÉÓŋŊ | | Mark: | ̈̃̀́ |
|
| Wolof (Gambia)
| wo-GM |
| Metadata |
|---|
| Tokenization: | c-563 |
|
| Wolof (Senegal)
| wo-SN |
| Metadata |
|---|
| Tokenization: | c-562 |
|
|
| Waama | wwa |
| Metadata |
|---|
| Letter: | ãìàùèÃÌÀÙÈǹƆƐǸũŋŨŊɔɛ | | Mark: | ̃̀ |
|
|
| Xhosa | xh |
| Metadata |
|---|
| Tokenization: | L-336 |
|
| Xhosa (South Africa)
| xh-ZA |
| Metadata |
|---|
| Tokenization: | c-337 |
|
|
| Kangri | xnr |
| Metadata |
|---|
| Tokenization: | L-729 |
|
| Kangri {Deva} (India)
| xnr-Deva-IN |
| Metadata |
|---|
| Tokenization: | c-777 |
|
|
| Soga | xog |
|
| Soga (Uganda)
| xog-UG |
|
|
| Liberia Kpelle | xpe |
| Metadata |
|---|
| Letter: | ƐƁƆƝƏĝŋĜŊɛɓɔɲə | | Mark: | ̂ |
|
|
| Kasem | xsm |
|
|
| Yagua | yad |
| Metadata |
|---|
| Letter: | ñíéáÑÍÉÁ | | Mark: | ̃́ |
|
|
| Yao | yao |
|
| Yao (Malawi)
| yao-MW |
|
|
| Yapese | yap |
| Metadata |
|---|
| Tokenization: | L-718 | | Punctuation: | ‐ | | Letter: | ʼ |
|
|
| Yangben | yav |
| Metadata |
|---|
| Letter: | áàâéèíìîóòôúùûÁÀÂÉÈÍÌÎÓÒÔÚÙÛǎǒǔǍƐǑƆǓāīŋōūĀĪŊŌŪɛɔ | | Mark: | ́̀̂̌̄ |
|
| Yangben (Cameroon)
| yav-CM |
|
|
| Eastern Yiddish | ydd |
| Metadata |
|---|
| Punctuation: | ׳״־‐–— | | Letter: | אבגדזשהויחטײכךלמםנןסעפףצץקרתװױ | | Mark: | ִַָּֿׂ |
|
|
| Yiddish | yi |
| Metadata |
|---|
| Tokenization: | L-338 |
|
| Yiddish (Germany)
| yi-DE |
| Metadata |
|---|
| Tokenization: | c-767 |
|
| Yiddish (Israel)
| yi-IL |
| Metadata |
|---|
| Tokenization: | c-564 |
|
| Yiddish (International)
| yi-INT |
|
| Yiddish (United States)
| yi-US |
| Metadata |
|---|
| Tokenization: | c-748 |
|
|
| Northern Yukaghir | ykg |
| Metadata |
|---|
| Letter: | эльистачйкөдҥнбпрумогецяҕхжѳқзвфыющЭЛЬИСТАЧЙКӨДҤНБПРУМОГЕЦЯҔХЖѲҚЗВФЫЮЩ | | Mark: | ̆ |
|
|
| Maay Maay | ymm |
| Metadata |
|---|
| Tokenization: | L-645 |
|
| Maay Maay (Somalia)
| ymm-SO |
|
|
| Yoruba | yo |
| Metadata |
|---|
| Tokenization: | L-339 | | Punctuation: | ‐ | | Letter: | áàéèíìóòúùÁÀÉÈÍÌÓÒÚÙńŃẹọṣẸỌṢ | | Mark: | ̩̣́̀̄ |
|
| Yoruba (Benin)
| yo-BJ |
|
| Yoruba (Nigeria)
| yo-NG |
| Metadata |
|---|
| Tokenization: | c-565 |
|
|
| Yucateco, Yucatec Maya | yua |
| Metadata |
|---|
| Punctuation: | ‐ | | Letter: | ʼóíáúéÓÍÁÚÉ | | Mark: | ́ |
|
|
| Yue Chinese, Cantonese | yue |
| Metadata |
|---|
| Tokenization: | L-722 |
|
| Yue Chinese, Cantonese (China)
| yue-CN |
|
| Yue Chinese, Cantonese (Hong Kong)
| yue-HK |
| Metadata |
|---|
| Tokenization: | c-758 |
|
|
| Zhuang | za |
| Metadata |
|---|
| Tokenization: | L-340 |
|
| Zhuang (China)
| za-CN |
|
|
| Miahuatlán Zapotec | zam |
| Metadata |
|---|
| Letter: | óáñíÓÁÑÍʼ | | Mark: | ́̃ |
|
|
| Ngazidja Comorian | zdj |
|
|
| Standard Moroccan Tamazight | zgh |
| Metadata |
|---|
| Letter: | ⴰⵍⵖⵓⵎⴹⵏⵉⵣⵔⴼⴳⴷⵊⴱⵜⵡⴽⵢⵙⵀⵛⵥⵇⵯⴻⵕⵟⵃⵄⵅⵚ |
|
| Standard Moroccan Tamazight {Tfng} (Morocco)
| zgh-Tfng-MA |
| Metadata |
|---|
| Tokenization: | c-546 |
|
|
| Chinese | zh |
| Metadata |
|---|
| Tokenization: | L-343 |
|
| Chinese (China)
| zh-CN |
| Metadata |
|---|
| Tokenization: | c-344 |
|
| Chinese {Hans} (China)
| zh-Hans-CN |
|
| Chinese (Hong Kong)
| zh-HK |
| Metadata |
|---|
| Tokenization: | c-345 |
|
| Chinese {Hans} (Hong Kong)
| zh-Hans-HK |
|
| Chinese (Macau)
| zh-MO |
| Metadata |
|---|
| Tokenization: | c-346 |
|
| Chinese {Hans} (Macau)
| zh-Hans-MO |
|
| Chinese (Malaysia)
| zh-MY |
| Metadata |
|---|
| Tokenization: | c-431 |
|
| Chinese (Singapore)
| zh-SG |
| Metadata |
|---|
| Tokenization: | c-347 |
|
| Chinese (Taiwan)
| zh-TW |
| Metadata |
|---|
| Tokenization: | c-348 |
|
|
| Malay (individual language) | zlm |
|
| Malay (individual language) {Arab}
| zlm-Arab |
| Metadata |
|---|
| Punctuation: | ، | | Letter: | ڤراشتهنحقسيمأجڬدبوڽڠعفكلچخظصزطۏؤئذ | | Mark: | ٔ | | Number: | ٢ |
|
|
| Zou | zom |
| Metadata |
|---|
| Tokenization: | L-646 |
|
|
| Záparo | zro |
|
|
| Güilá Zapotec | ztu |
| Metadata |
|---|
| Letter: | ëíéËÍÉ | | Mark: | ̈́ |
|
|
| Zulu | zu |
| Metadata |
|---|
| Tokenization: | L-341 |
|
| Zulu (South Africa)
| zu-ZA |
| Metadata |
|---|
| Tokenization: | c-342 |
|
|
| No linguistic content, Not applicable | zxx |
| Metadata |
|---|
| Tokenization: | L-603 |
|
|
| Yongbei Zhuang | zyb |
|
|
| Zaza, Dimili, Dimli, Kirdki, Kirmanjki, Zazaki | zza |
| Metadata |
|---|
| Tokenization: | L-725 |
|
| Zaza, Dimili, Dimli, Kirdki, Kirmanjki, Zazaki (Turkey)
| zza-TR |
| Metadata |
|---|
| Tokenization: | c-768 |
|