|Languages||Urdu, Balti, infrequent use in Burushaski, others|
|U+0600 to U+06FF|
U+FE70 to U+FEFF
The Urdu alphabet (Urdu: ? ?, romanized: urd? tahajj? or ? ?, urd? har?f-e-tahajj?), also known as Shahmukhi, is the right-to-left alphabet used for the Urdu language. It is a modification of the Persian alphabet, which is itself a derivative of the Arabic alphabet. The Urdu alphabet has up to 39 or 40 distinct letters with no distinct letter cases and is typically written in the calligraphic Nasta?l?q script, whereas Arabic is more commonly written in the Naskh style.
Usually, bare transliterations of Urdu into the Latin alphabet (called Roman Urdu) omit many phonemic elements that have no equivalent in English or other languages commonly written in the Latin script.
The standard Urdu script is a modified version of the Perso-Arabic script and has its origins in 13th century Iran. It is closely related to the development of the Nastaʻliq style of Perso-Arabic script.
Despite the invention of the Urdu typewriter in 1911, Urdu newspapers continued to publish prints of handwritten scripts by calligraphers known as katibs or khush-navees until the late 1980s. The Pakistani national newspaper Daily Jang was the first Urdu newspaper to use Nasta?l?q computer-based composition. There are efforts under way to develop more sophisticated and user-friendly Urdu support on computers and the internet. Nowadays, nearly all Urdu newspapers, magazines, journals, and periodicals are composed on computers with Urdu software programs.
Other than the Indian subcontinent, the Urdu script is also used by Pakistan's large diaspora, including in the United Kingdom, the United Arab Emirates, the United States, Canada, Saudi Arabia and other places.
Urdu is written in the Nastaliq style (Persian: ? Nasta?l?q). The Nastaliq calligraphic writing style began as a Persian mixture of the Naskh and Ta'liq scripts. After the Mughal conquest, Nastaʻliq became the preferred writing style for Urdu. It is the dominant style in Pakistan and many Urdu writers elsewhere in the world use it. Nasta?l?q is more cursive and flowing than its Naskh counterpart.
In the Arabic alphabet, and many others derived from it, letters are regarded as having two or three general forms each, based on their position in the word (though obviously Arabic calligraphy can add a great deal of complexity). But the Nastaliq style in which Urdu is written uses more than three general forms for many letters, even in simple non-decorative documents.
The Urdu script is an abjad script derived from the modern Persian script, which is itself a derivative of the Arabic script. As an abjad, the Urdu script only shows consonants and long vowels; short vowels can only be inferred by the consonants' relation to each other. While this type of script is convenient in Semitic languages like Arabic and Hebrew, whose consonant roots are the key of the sentence, Urdu is an Indo-European language, which does not have the same luxury, hence necessitating more memorisation. The number of letters in the Urdu alphabet is somewhat ambiguous and debated.
|Letter [A]||Name ||IPA||Romanization||Sound: description or sound in English.||Unicode||Order|
|?||alif||/?:/, /?/, /?/||?, -||?, -||a as in bath (UK English, Received pronunciation) [D]||U+0627||1||1||1|
|?||b?||/b/||b||b||b as in Ball.||U+0628||2||2||2|
|?||p?||/p/||p||p||p as in Pigeon.||U+067E||3||3||3|
|?||t?||/t/||t||t||Dental T (used in Spanish and Flemish)||U+062A||4||4||4|
|?||/?/||?||t||T as in karta (Swedish)||U+0679||5||5||5|
|?||s||/s/||s?||s||c as in cinema.||U+062B||6||6||6|
|?||j?m||/d/||j||j||j in Jug.||U+062C||7||7||7|
|?||c?||/t/||c||ch||ch in Chimney.||U+0686||8||8||8|
|?||ba||/?/||?||h||h as in Happy.||U+062D||9||9||9|
|?||k?h?||/x/||k?h||kh||No full equivalent in English. Similar to guttural kh in Khundak.||U+062E||10||10||10|
|?||d?l||/d/||d||d||No full equivalent in English. Similar to soft d in dream.||U+062F||11||11||11|
|?||l||/?/||?||d||D as in Dream.||U+0688||12||12||12|
|?||l||/z/||?||z||Z as in zebra.||U+0630||13||13||13|
|?||r?||/r/||r||r||r as in Razor.||U+0631||14||14||14|
|?||r||No full equivalent in English. Similar to hard dh in Raigadh.||U+0691||15||15||15|
|?||z?||/z/||z||z||z as in Zebra.||U+0632||16||16||16|
|zh||zh||si as in version.||U+0698||17||17||17|
|?||s?n||/s/||s||s||s as in sea.||U+0633||18||18||18|
|?||sh?n||/?/||sh||sh||sh as in shine.||U+0634||19||19||19|
|?||?w?d||/s/||?||s||s as in swear.||U+0635||20||20||20|
|?||?w?d||/z/||?||z||z as in gazette.||U+0636||21||21||21|
|?||t?oʼ?||/t/||t?||t||No full equivalent in English. Similar to ta as in Talia.||U+0637||22||22||22|
|?||z?oʼ?||/z/||z?||z||Hard z in zoo.||U+0638||23||23||23|
|?||ʻain|| /?:/, /o:/, /e:/,
/?/, /?/, /?/
|No full equivalent in English. Similar to harsh guttural a in apple.||U+0639||24||24||24|
|?||g?hain||/?/||g?h||gh||No full equivalent in English. Similar to guttural gh in Ghalib.||U+063A||25||25||25|
|?||f?||/f/||f||f||f as in flower.||U+0641||26||26||26|
|?||q?f||/q/||q||q||Not used in English. In Arabic it is the first letter of Qatar and the last letter of Iraq. Sometimes said to resemble the call of a crow.||U+0642||27||27||27|
|?||k?f||/k/||k||k||k as in Kite.||U+06A9||28||28||28|
|?||g?f||/?/||g||g||g as in grass.||U+06AF||29||29||29|
|?||l?m||/l/||l||l||l as in lemon.||U+0644||30||30||30|
|?||m?m||/m/||m||m||m as in Mike.||U+0645||31||31||31|
|?||n?n|| /n/, /?/,
|n||n||n as in noon.||U+0646||32||32||32|
|?||n?n g?hunn?|| / /
|?||n||Nasal vowel. Not used in English, but used in French.[example needed]||U+06BA
|?||w?ʼo|| /? /, /u:/, /? /,
/o : /, /?: /
?, u, o, au
?, u, o, au
|w as in walet.||U+0648||33||33||34|
|?||g?l h?||/?/, /?:/, /e:/||h, ?, e||h, ?, e||h as in hot.||U+06C1
|?||?||do-cashm? h?|| /?/ or /?/
|?||cho y?||/j/, /i:/, /?:/||y, ?, á||y, ?, á||y as in yellow or ee as in feel.||U+06CC||36||35||38|
|?||ba y?|| /?:/, /e:/
|ai, e||ai, e||a as in cat or ay as in day.||U+06D2||37||35b||39|
|?||?||hamz?|| /?/ or /?/
|ʼ, -, yi||ʼ, -, yi||Ya (e.g. yak) sound for first. A-i (Sloppy A sound) for second.||U+0626||35a||37 |
T?' marbah is also sometimes considered the 40th letter of the Urdu alphabet, though it is rarely used except for in certain loan words from Arabic. T?' marbah is regarded as a form of t?, the Arabic version of Urdu t?, But it is not pronounced as such, and when replaced with an Urdu letter in naturalised loan words it is usually replaced with Gol h?.
|Group||Letter [A]||Name (see: Glossary of key words)||Unicode |
|Nastaliq [B]||Naskh with
|Roman Urdu or English |
alef with madda above 
|___||___||hamza on the line|
yeh with hamza above 
|___||___||y? hamza / alif hamza|
|?||?||___||___||ba y? hamza||U+06D3 |
yeh barree with hamza above 
waw with hamza above 
heh goal with hamza above 
or U+06C1 + U+0654
teh marbuta goal 
teh marbuta 
Hamza can be difficult to recognise in Urdu handwriting and fonts designed to replicate it, closely resembling two dots above as featured in ? Té and ? Qaf, whereas in Arabic and Geometric fonts it is more distinct and closely resembles the western form of the numeral 2 (two).
|14||mh||[m?]||(alternative of )|
|15||nh||[n?]||? (though arguably just a consonant cluster)|
Urdu has more letters added to the Persian base to represent sounds not present in Persian, which already has additional letters added to the Arabic base itself to represent sounds not present in Arabic. The letters added include: ? to represent /?/, ? to represent /?/, ? to represent /?/, ? to represent //, and ? to represent /?:/ or /e:/. A separate do-chashmi-he letter, ?, exists to denote a /?/ or a /?/. This letter is mainly used as part of the multitude of digraphs, detailed below. <
Old Hindustani used four dots over three Arabic letters to represent retroflex consonants: ?, ?, ?. In handwriting those dots was often written like a small vertical line attached to a small triangle. Subsequently, this shape became identical to a small letter ?. (It is commonly and erroneously assumed that itself was used to indicate retroflex consonants because of its being an emphatic alveolar consonant that Arabic scribes thought approximated the Hindustani retroflexes. In modern Urdu ?, called to'e is always pronounced as a dental, not a retroflex.)
The Urdu language has 10 vowels and 10 nasalized vowels. Each vowel has four forms depending on its position: initial, middle, final and isolated. Like in its parent Arabic alphabet, Urdu vowels are represented using a combination of digraphs and diacritics. Alif, Waw, Ye, He and their variants are used to represent vowels.
Urdu doesn't have standalone vowel letters. Short vowels (a, i, u) are represented by optional diacritics (zabar, zer, pesh) upon the preceding consonant or a placeholder consonant (alif, ain, or hamzah) if the syllable begins with the vowel, and long vowels by consonants alif, ain, ye, and wa'o as matres lectionis, with disambiguating diacritics, some of which are optional (zabar, zer, pesh), whereas some are not (madd, hamzah). Urdu does not have short vowels at the end of words. This is a table of Urdu vowels:
Alif is the first letter of the Urdu alphabet, and it is used exclusively as a vowel. At the beginning of a word, alif can be used to represent any of the short vowels: ab, ism, ? Urd?. For long ? at the beginning of words alif-mad is used: ?p, but a plain alif in the middle and at the end: bh?gn?.
Wo is used to render the vowels "?", "o", "u" and "au" ([u:], [o:], [?] and [?:] respectively), and it is also used to render the labiodental approximant, [?]. Only when preceded by the consonant k?h? (?), can wo render the "u" ([?]) sound (such as in , "k?hud" - myself), or not pronounced at all (such as in ?, "k?haab" - dream).
Ye is divided into two variants: cho ye ("little ye") and ba ye ("big ye").
Cho ye (?) is written in all forms exactly as in Persian. It is used for the long vowel "?" and the consonant "y".
Ba ye (?) is used to render the vowels "e" and "ai" (/e:/ and /?:/ respectively). Ba ye is distinguishable in writing from cho ye only when it comes at the end of a word/ligature. Additionally, Ba ye is never used to begin a word/ligature, unlike cho ye.
|Letter's name||Final Form||Middle Form||Initial Form||Isolated Form|
He is divided into two variants: gol he ("round he") and do-ca?mi he ("two-eyed he").
Gol he (?) is written round and zigzagged, and can impart the "h" (/?/) sound anywhere in a word. Additionally, at the end of a word, it can be used to render the long "a" or the "e" vowels (/?:/ or /e:/), which also alters its form slightly (it is worth noting that on modern digital writing systems, this final form is achieved by writing two he's consecutively).
Do-ca?mi he (?) is written as in Arabic Naskh style (as a loop), in order to create the aspirate consonants and write Arabic words.
|Letter's name||Final Form||Middle Form||Initial Form||Isolated Form|
Ayn in its initial and final position is silent in pronunciation and is replaced by the sound of its preceding or succeeding vowel.
Vowel nasalization is represented by nun ghunna written after their non nasalized versions, for example: when nasalized would become ?. In middle form nun ghunna is written just like nun and is differentiated by a diacritic called maghnoona or ulta jazm which is a superscript V symbol above the .
Urdu uses the same subset of diacritics used in Arabic based on Persian conventions. Urdu also uses Persian names of the diacritics instead of Arabic names. Commonly used diacritics are zabar (Arabic fat?ah), zer (Arabic kasrah), pesh (Arabic dammah) which are used to clarify the pronunciation of vowels, as shown above. Jazam ( , Arabic sukun) is used to indicate a consonant cluster and tashdid (, Arabic shaddah) is used to indicate a gemination, although it is never used for verbs, which require double consonants to be spelled out separately. Other diacritics include khari zabar (Arabic dagger alif), do zabar (Arabic fathatan) which are found in some common Arabic loan words. Other Arabic diacritics are also sometimes used though very rarely in loan words from Arabic. Zer-e-izafat and hamzah-e-izafat are described in the next section.
Other than common diacritics, Urdu also has special diacritics, which are often found only in dictionaries for the clarification of irregular pronunciation. These diacritics include kasrah-e-majhool, fathah-e-majhool, dammah-e-majhool, maghnoona, ulta jazam, alif-e-wavi and some other very rare diacritics. Among these, only maghnoona is used commonly in dictionaries and has a Unicode representation at U+0658. Other diacritics are only rarely written in printed form, mainly in some advanced dictionaries.
Ifat is a syntactical construction of two nouns, where the first component is a determined noun, and the second is a determiner. This construction was borrowed from Persian. A short vowel "i" is used to connect these two words, and when pronouncing the newly formed word the short vowel is connected to the first word. If the first word ends in a consonant or an ?ain (?), it may be written as zer ( ?) at the end of the first word, but usually is not written at all. If the first word ends in cho he (?) or ye (? or ?) then hamz? (?) is used above the last letter (? or ? or ?). If the first word ends in a long vowel (? or ?), then a different variation of ba ye (?) with hamz? on top (, obtained by adding ? to ?) is added at the end of the first word.
|?||sher-e Panj?b||the lion of Punjab|
|?||? ?||malka-ye duny?||the queen of the world|
|?||?||wal?-ye k?mil||perfect saint|
|?||mai-ye ishq||the wine of love|
|? ?||r?-ye zam?n||the surface of the Earth|
|?||sad?-ye buland||a high voice|
In the early days of computers, Urdu was not properly represented on any code page. One of the earliest code pages to represent Urdu was IBM Code Page 868 which dates back to 1990. Other early code pages which represented Urdu alphabets were Windows-1256 and MacArabic encoding both of which date back to the mid 1990s. In Unicode, Urdu is represented inside the Arabic block. Another code page for Urdu, which is used in India, is Perso-Arabic Script Code for Information Interchange. In Pakistan, the 8-bit code page which is developed by National Language Authority is called Urdu Zabta Takhti (? ?) (UZT) which represents Urdu in its most complete form including some of its specialized diacritics, though UZT is not designed to coexist with the Latin alphabet.
|? (U+06CC)||? (U+0649) |
|? (U+06A9)||? (U+0643)|
Like other writing systems derived from the Arabic script, Urdu uses the 0600-06FF Unicode range. Certain glyphs in this range appear visually similar (or identical when presented using particular fonts) even though the underlying encoding is different. This presents problems for information storage and retrieval. For example, the University of Chicago's electronic copy of John Shakespear's "A Dictionary, Hindustani, and English" includes the word '' (bh?rat "India"). Searching for the string "" returns no results, whereas querying with the (identical-looking in many fonts) string "" returns the correct entry. This is because the medial form of the Urdu letter do chashmi he (U+06BE)--used to form aspirate digraphs in Urdu--is visually identical in its medial form to the Arabic letter h (U+0647; phonetic value /h/). In Urdu, the /h/ phoneme is represented by the character U+06C1, called gol he (round he), or chhoti he (small he).
In 2003, the Center for Research in Urdu Language Processing (CRULP)--a research organisation affiliated with Pakistan's National University of Computer and Emerging Sciences--produced a proposal for mapping from the 1-byte UZT encoding of Urdu characters to the Unicode standard. This proposal suggests a preferred Unicode glyph for each character in the Urdu alphabet.
The Daily Jang was the first Urdu newspaper to be typeset digitally in Nastaʻliq by computer. There are efforts underway to develop more sophisticated and user-friendly Urdu support on computers and on the Internet. Nowadays, nearly all Urdu newspapers, magazines, journals and periodicals are composed on computers via various Urdu software programmes, the most widespread of which is InPage Desktop Publishing package. Microsoft has included Urdu language support in all new versions of Windows and both Windows Vista and Microsoft Office 2007 are available in Urdu through Language Interface Pack support. Most Linux Desktop distributions allow the easy installation of Urdu support and translations as well.Apple implemented the Urdu language keyboard across Mobile devices in its iOS 8 update in September 2014.
There are several romanization standards for writing Urdu with the Latin alphabet, though they are not very popular because most fall short of representing the Urdu language properly. Instead of standard romanization schemes, people on Internet, mobile phones and media often use a non-standard form of romanization which tries to mimic English orthography. The problem with this kind of romanization is that it can only be read by native speakers, and even for them with great difficulty. Among standardized romanization schemes, the most accurate is ALA-LC romanization, which is also supported by National Language Authority. Other romanization schemes are often rejected because either they are unable to represent sounds in Urdu properly, or they often do not take regard of Urdu orthography, and favor pronunciation over orthography.
The National Language Authority of Pakistan has developed a number of systems with specific notations to signify non-English sounds, but these can only be properly read by someone already familiar with the loan letters.
Roman Urdu also holds significance among the Christians of Pakistan and North India. Urdu was the dominant native language among Christians of Karachi and Lahore in present-day Pakistan and Madhya Pradesh, Uttar Pradesh Rajasthan in India, during the early part of the 19th and 20th century, and is still used by Christians in these places. Pakistani and Indian Christians often used the Roman script for writing Urdu. Thus Roman Urdu was a common way of writing among Pakistani and Indian Christians in these areas up to the 1960s. The Bible Society of India publishes Roman Urd? Bibles that enjoyed sale late into the 1960s (though they are still published today). Church songbooks are also common in Roman Urdu. However, the usage of Roman Urdu is declining with the wider use of Hindi and English in these states.
|Letter Name(s)||Urdu Word||Examples of other uses|
|Roman Urdu||Urdu||IPA||Roman Urdu
|English Translation||Urdu||Roman Urdu or IPA||Translation|
|?||?||ba||?||bi ||ba /
|big / elder||large intestine|
|?||?h y?||to:?i ||choti||small / minor / junior|
|?||?h h?||small intestine|
|g?l h?||go:l ||g?l||round / spherical / vague / silly / obese||gol gappay||panipuri|
|do||2 / two||?||bicameralism|
|/tm/ ||chashm||the eye / hope / expectation||eye|
|?||n?n-e ?unnah||nn? ||?unnah/ g?hunnah||nasal sound or twang||[example needed]|
|?||?||alif maddah||maddah||Arabic:||[example needed]|
|?||?||v?v-e mahm?z||mæhmu:z||mahm?z||defective / improper||[example needed]|
|? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?||? ?
|har?f tahaj? (alphabet)||?||tahaj?||sequence
|/h?ru:f/ ||har?f||letters (plural)
(often referred to as "alphabets" in informal Pakistani English)
|/h?rf/ ||harf||"letter of the alphabet" / handwriting / statement / blame / stigma||[example needed]|