Chapter 7. The Source Filter Model of Speech Production

Table of Contents

7.1. The Source
7.2. The Filter
7.3. Combining Source and Filter
7.4. Source Filter Synthesis
7.5. Source Filter Analysis
7.6. Exercises
7.7. Further Reading

We are now equipped to begin talking about an important model of speech production: the source filter model. This model is at the heart of many speech analysis methods and drives thinking in speech perception research also.

The view put forward in the source filter model is that speech sounds are produced by the action of a filter, the vocal tract, on a sound source, either the glottis or some other constriction within the vocal tract. Fundamental to the model is the assumption that these are independent -- that is the properties of the filter can be modified without changing the properties of the source and vice versa. This assumption may not be strictly true in all cases but in practical terms provides us with a very useful and largely accurate model of speech production.

7.1. The Source

There are two acoustic sources in speech production corresponding to voiced and unvoiced speech sounds.

7.1.1. Voiced Speech

The source in voiced speech is the vibration of the vocal folds in response to airflow from the lungs. This vibration is periodic and if it could be examined independantly of the properties of the vocal tract (which change it's spectral shape) would be seen to consist of a series of broad spikes. Figures 3.4 and 3.5 from Harrington and Cassidy (reproduced in Figure 7.1) show the waveform and spectrum of the glottal source.

Figure 7.1. A glottal source waveform and it's spectrum

The spectrum of the glottal source is made up of a number of frequency spikes corresponding to the harmonics of the fundamental frequency of vibration of the vocal folds. The spectrum decreases in amplitude with increasing frequency at a rate of around -12dB per octave -- that is for each doubling in frequency, the amplitude of the spectrum decreases by around 12dB.

The effect of increasing the frequency of vibration of the vocal folds is to widen the gap between the frequency spikes in the glottal source spectrum, since these spikes occur at multiples of the base or fundamental frequency. The overall shape of the spectrum (the -12dB per octave attenuation) remains unchanged.

The average male voice has a source frequency of around 100Hz, female and child voices are typically higher in pitch: around 200 Hz for an average female voice and 200-300 Hz for children.

7.1.2. Unvoiced Speech

In unvoiced speech the sound source is not a regular vibration but rather vibrations are caused by turbulent airflow due to a constriction in the vocal tract. The various positions at which the vocal tract can constrict have been discussed in the earlier part of the course (the places of articulation for fricative sounds).

The sound created via a constriction is described as a noise source. It contains no dominating periodic component and has a relatively flat spectrum meaning that every frequency component is represented equally (in fact for some sounds the noise spectrum may slope down at around 6dB/octave). Looking at the time waveform of a noise source we see only a random pattern of movement around the zero axis. The spectrum looks the same, with random peaks and troughs but the overall trend is for it to be flat accross the frequency range.