In Italian there is no phonemic distinction between long and short vowels, but vowels in stressed open syllables, unless word-final, are long at the end of the intonational phrase (including isolated words) or when emphasized. Adjacent identical vowels found at morpheme boundaries are not resyllabified, but pronounced separately ("quickly rearticulated"), and they might be reduced to a single short vowel in rapid speech.
Although Italian contrasts close-mid (/e, o/) and open-mid (/?, ?/) vowels in stressed syllables, this distinction is neutralised in unstressed position, where only the close-mid vowels occur. The height of these vowels in unstressed position is context-sensitive; they are somewhat lowered ([e?, o?]) in the vicinity of more open vowels. The distinction between close-mid and open-mid vowels is lost entirely in a few Southern varieties of Regional Italian, especially in Northern Sicily (e.g. Palermo), where they are realized as open-mid [?, ?], as well as in some Northern varieties (in particular in Piedmont), where they are realized as mid [e?, o?].
Word-final stressed /?/ is found in a small number of words: però, ciò, paltò. However, as a productive morpheme, it marks the first person singular of all future tense verbs (e.g. dormirò 'I will sleep') and the third person singular preterite of first conjugation verbs (parlò 's/he spoke', but credé 's/he believed', dormì 's/he slept'). Word-final unstressed /u/ is rare,  found in onomatopoeic terms (babau),loanwords (guru), and place or family names derived from the Sardinian language (Gennargentu,Porcu).
When the last phoneme of a word is an unstressed vowel and the first phoneme of the following word is any vowel, the former vowel tends to become non-syllabic. This phenomenon is called synalepha and should be taken into account when counting syllables, e.g. in poetry.
In addition to monophthongs, Italian has diphthongs, but these are both phonemically and phonetically simply combinations of the other vowels, with some being very common (e.g. /ai, au/), others being rarer (e.g. /?i/) and some never occurring within (native) Italian words (e.g. /ou/). None of these diphthongs are however considered to have distinct phonemic status because their constituents do not behave differently than they would in isolation (and all occur in isolation), unlike the diphthongs in some languages like English and German. Grammatical tradition makes a distinction between 'falling' and 'rising' diphthongs; however, since rising diphthongs are composed of one semiconsonantal sound [j] or [w] and one vowel sound, they are not actually diphthongs. The practice of referring to them as 'diphthongs' has been criticised by phoneticians like Luciano Canepari.
As an onset, the cluster /s/ + voiceless consonant is inherently unstable. Phonetically, word-internal s+C normally syllabifies as [s.C]: ['r?s.po] rospo 'toad', [tras.'te:.ve.re] Trastevere (neighborhood of Rome). Phonetic syllabification of the cluster also occurs at word boundaries if a vowel precedes it without pause, e.g. [las.'t?:.rja] la storia 'the history', implying the same syllable break at the structural level, /s't?rja/, thus always latent due to the extrasyllabic /s/, but unrealized phonetically unless a vowel precedes. A competing analysis accepts that while the syllabification /s.C/ is accurate historically, modern retreat of i-prosthesis before word initial /s/+C (e.g. erstwhile con isforzo 'with effort' has generally given way to con sforzo) suggests that the structure is now underdetermined, with occurrence of /s.C/ or /.sC/ variable "according to the context and the idiosyncratic behaviour of the speakers."
The last combination is however rare and one of the approximants is often vocalised, e.g. quieto /'kwj?to, kwi'?to/, continuiamo /((konti'nwjamo)), kontinu'jamo, kontinwi'amo/
The nucleus is the only mandatory part of a syllable (for instance, a 'to, at' is a word) and must be a vowel or a diphthong. In a falling diphthong the most common second elements are /i?/ or /u?/ but other combinations such as idea /i'd?a?/, trae /'trae?/ may also be interpreted as diphthongs. Combinations of /j w/ with vowels are often labelled diphthongs, allowing for combinations of /j w/ with falling diphthongs to be called triphthongs. One view holds that it is more accurate to label /j w/ as consonants and /jV wV/ as consonant-vowel sequences rather than rising diphthongs. In that interpretation, Italian has only falling diphthongs (phonemically at least, cf. Synaeresis) and no triphthongs.
There are also restrictions in the types of syllables that permit consonants in the syllable coda. Krämer (2009) explains that neither geminates, nor coda consonants with "rising sonority" can follow falling diphthongs. However, "rising diphthongs" (or sequences of an approximant and a following vowel) may precede clusters with falling sonority, particularly those that stem historically from an obstruent+liquid onset. For example:
Word-initial consonants are geminated after certain vowel-final words in the same prosodic unit. There are two types of triggers of initial gemination: some unstressed particles, prepositions, and other monosyllabic words, and any oxytonic polysyllabic word. As an example of the first type, casa ('house') is pronounced ['ka:sa] but a casa ('homeward') is pronounced [ak'ka:sa]. This is not a purely phonological process, as no gemination is cued by the la in la casa 'the house' [la'ka:sa], and there is nothing detectable in the structure of the preposition a to account for the gemination. This type normally originates in language history: modern a, for example, derives from Latin AD, and today's geminate in [ak'ka:sa] is a continuation of what was once a simple assimilation. Gemination cued by final stressed vowels, however, is transparently phonological. Final stressed vowels are short by nature, if a consonant follows a short stressed vowel the syllable must be closed, thus the consonant following the final stressed vowel is drawn to lengthen: parlò portoghese [par'l?pporto'?e:ze] 's/he spoke Portuguese' vs. parla portoghese ['parlaporto'?e:ze] 's/he speaks Portuguese'.
In standard Italian, syntactic gemination occurs mainly in the following two cases:
In Northern Italy and Sardinia, speakers use it inconsistently because the feature is not present in the dialectal substratum and is not usually shown in the written language unless a new word is produced by the fusion of the two: "chi sa"-> chissà ("who knows" in the sense of goodness knows).
The above IPA symbols and description refer to standard Italian, based on a somewhat idealized version of the Tuscan-derived national language. As is common in many cultures, this single version of the language was pushed as neutral, proper, and eventually superior, leading to some stigmatization of varying accents. Television news anchors and other high-profile figures had to put aside their regional Italian when in the public sphere. However, in more recent years the enforcement of this standard has fallen out of favor in Italy, and news reporters, actors, and the like are now more free to deliver their words in their native regional variety of Italian, which appeals to the Italian population's range of linguistic diversity. The variety is still not represented in its wholeness and accents from the South are maybe to be considered less popular, except in shows set in the South and in comedy, a field in which Naples, Sicily and the South in general have always been present. Though it still represents the basics for the standard variety, the loosened restrictions have led to Tuscan being seen for what it is, just one dialect among many with its own regional peculiarities and qualities, many of which are shared with Umbria, Southern Marche and Northern Lazio.
The various Tuscan, Corsican and Central Italian dialects are, to some extent, the closest ones to Standard Italian in terms of linguistic features, since the latter is based on a somewhat polished form of Florentine.
Very little research has been done on the earliest stages of phonological development in Italian. This article primarily describes phonological development after the first year of life. See the main article on phonological development for a description of first year stages. Many of the earliest stages are thought to be universal to all infants.
Most consonants are word-initial: They are the stops /p/, /b/, /t/, and /k/ and the nasal /m/. A preference for a front place of articulation is present.
More phones now appear in intervocalic contexts. The additions to the phonetic inventory are the voiced stop /d/, the nasal /n/, the voiceless affricate /t/, and the liquid /l/.
The fricatives /f/, /v/, and /s/ are added, primarily at the intervocalic position.
Approximately equal numbers of phones are now produced in word-initial and intervocalic position. Additions to the phonetic inventory are the voiced stop /?/ and the consonant cluster /kw/. While the word-initial inventory now tends to have all the phones of the adult targets (adult production of the child's words), the intervocalic inventory tends to still be missing four consonants or consonant clusters of the adult targets: /f/, /d/, /r/, and /st/.
Stops are the most common manner of articulation at all stages and are produced more often than they are present in the target words at around 18 months. Gradually this frequency decreases to almost target-like frequency by around 27 months. The opposite process happens with fricatives, affricates, laterals and trills. Initially, the production of these phonemes is significantly less than what is found in the target words and the production continues to increases to target-like frequency. Alveolars and bilabials are the two most common places of articulation, with alveolar production steadily increasing after the first stage and bilabial production gently decreasing. Labiodental and postalveolar production increases throughout development, while velar production decreases.
Babbling becomes distinct from previous, less structured vocal play. Initially, syllable structure is limited to CVCV, called reduplicated babbling. At this stage, children's vocalizations have a weak relation to adult Italian and the Italian lexicon.
The most-used syllable type changes as children age, and the distribution of syllables takes on increasingly Italian characteristics. This ability significantly increases between the ages of 11 and 12 months, 12 and 13 months, and 13 and 14 months. Consonant clusters are still absent. Children's first ten words appear around month 12, and take CVCV format (e.g. mama 'mother', papa 'father').
Reduplicated babbling is replaced by variegated babbling, producing syllable structures such as C1VC2V (e.g. cane 'dog', topo 'mouse'). Production of trisyllabic words begins (e.g. pecora 'sheep', matita 'pencil'). Consonant clusters are now present (e.g. bimba 'female child', venti 'twenty'). Ambient language plays an increasingly significant role as children begin to solidify early syllable structure. Syllable combinations that are infrequent in the Italian lexicon, such as velar-labial sequences (e.g. capra 'goat' or gamba 'leg') are infrequently produced correctly by children, and are often subject to consonant harmony.
In Italian, stress is lexical, meaning it is word-specific and partly unpredictable. Penultimate stress (primary stress on the second-to-last syllable) is also generally preferred. This goal, acting simultaneously with the child's initial inability to produce polysyllabic words, often results in weak-syllable deletion. The primary environment for weak-syllable deletion in polysyllabic words is word-initial, as deleting word-final or word-medial syllables would interfere with the penultimate stress pattern heard in ambient language.
Children develop syllabic segmentation awareness earlier than phonemic segmentation awareness. In earlier stages, syllables are perceived as a separate phonetic unit, while phonemes are perceived as assimilated units by coarticulation in spoken language. By first grade, Italian children are nearing full development of segmentation awareness on both syllables and phonemes. Compared to those children whose mother tongue exhibits closed syllable structure (CVC,CCVC, CVCC, etc.), Italian-speaking children develop this segmentation awareness earlier, possibly due to its open syllable structure (CVCV, CVCVCV, etc.). Rigidity in Italian (shallow orthography and open syllable structure) makes it easier for Italian-speaking children to be aware of those segments.
Provided here is a rendition of the Bible, Luke 2, 1-7, as read by a native Italian speaker from Milan. As a northerner, his pronunciation lacks syntactic doubling (['fu 'fatto] instead of ['fu f'fatto]) and intervocalic [s] (['ka:za] instead of ['ka:sa]). The speaker realises /r/ as [?] in some positions.
2:1 In quei giorni, un decreto di Cesare Augusto ordinava che si facesse un censimento di tutta la terra.
2 Questo primo censimento fu fatto quando Quirino era governatore della Siria.
3 Tutti andavano a farsi registrare, ciascuno nella propria città.
4 Anche Giuseppe, che era della casa e della famiglia di Davide, dalla città di Nazaret e dalla Galilea si recò in Giudea nella città di Davide, chiamata Betlemme,
5 per farsi registrare insieme a Maria, sua sposa, che era incinta.
6 Proprio mentre si trovavano lì, venne il tempo per lei di partorire.
7 Mise al mondo il suo primogenito, lo avvolse in fasce e lo depose in una mangiatoia, poiché non c'era posto per loro nella locanda.
The differences in pronunciation are underlined in the following transcriptions; the velar [?] is an allophone of /n/ and the long vowels are allophones of the short vowels, but are shown for clarity.
A rough transcription of the audio sample is:
2:1 [i? 'kw?i 'd?orni un de'kre:to di 't?e:zare au'?usto ordi'na:va ke si fa't??sse un t?ensi'mento di 'tutta la 't?rra
2 'kw?sto 'pri:mo t?ensi'mento fu 'fatto 'kwando kwi'ri:no 'e:ra ?overna'to:re d?lla 'si:rja
3 'tutti an'da:vano a 'farsi red?i'stra:re t?a'sku:no n?lla 'pr?:prja t?it'ta
4 'a?ke d?u'z?ppe ke 'e:ra d?lla 'ka:za e d?lla fa'mia di 'da:vide dalla t?it'ta di 'naddzaret e dalla ?ali'le:a si re'k? in d?u'de:a n?lla t?it'ta di 'da:vide kja'ma:ta be'tl?mme
5 per 'farsi red?i'stra:re in'sje:me a ma'ri:a swa 'spo:za ke 'e:ra in't?inta
6 'pr?:prjo 'mentre si tro'va:vano 'li 'v?nne il 'tempo per 'l?i di parto'ri:re
7 'mi:ze al 'mondo il swo primo'd?e:nito, lo av'v?lse i? 'fa:?e e lo de'po:ze in 'u:na mand?a't?:ja poi'ke non 't?e:ra 'p?sto per 'lo:ro n?lla lo'kanda]
The Standard Italian pronunciation of the text is:
2:1 [i? 'kwei 'd?orni un de'kre:to di 't?e:zare au'?usto ordi'na:va ke ssi fa't?esse un t?ensi'mento di 'tutta la 't?rra
2 'kwesto 'pri:mo t?ensi'mento fu f'fatto 'kwando kwi'ri:no '?:ra ?overna'to:re della 'si:rja
3 'tutti an'da:vano a f'farsi red?i'stra:re t?a'sku:no nella 'pr?:prja t?it'ta
4 'a?ke d?u'z?ppe ke '?:ra della 'ka:sa e ddella fa'mia di 'da:vide dalla t?it'ta ddi 'naddzaret e ddalla ?ali'l?:a si re'k? in d?u'd?:a nella t?it'ta ddi 'da:vide kja'ma:ta be'tl?mme
5 per 'farsi red?i'stra:re in'sj?:me a mma'ri:a 'su:a 'sp?:za ke '?:ra in't?inta
6 'pr?:prjo 'mentre si tro'va:vano 'li 'venne il 't?mpo per 'l?i di parto'ri:re
7 'mi:se al 'mondo il 'su:o primo'd??:nito, lo av'v?lse i? 'fae e llo de'po:se in 'u:na mand?a'to:ja poi'ke nnon 't??:ra 'posto per 'lo:ro nella lo'kanda]