5.3. Exercises

You should again do some exercises from the `Practical Exercises...' worksheets from Harrington and Cassidy. Continue with section 3.5 on windowing signals and then look at the first part of section 4 on speech signal parameters which gives examples of calculating autocorrelation coefficients for a sample signal. Leave the remaining exercises in section 4 until later.

5.3.1. Work to submit

This worksheet is intended to explore the use of autocorrelation for pitch tracking of speech signals. You will use the autocorrelation routines in XlispStat/Emu to look at various speech signals and asses the ease of automatic pitch tracking.

XlispStat/Emu has an autocorrelation routine which can be accessed by sending the :autocorr message to a signal object. The autocorrelation is by default performed on the entire signal, so if you want to analyse just a window from the signal you need to window the signal separately with the :window message. For example:

(def w (send stamp :window :start 2000 :width 1024 :type 'rectangular)
(def w-ac (send w :autocorr))
(send w-ac :plot)    
      
You can plot the autocorrelation by sending the :plot message as above, by default this will also show the four largest peaks as determined by a fairly crude peak picking algorithm.

You can use the window-demo function to preview the result of applying autocorrelation to different parts of a signal as follows:

(window-demo stamp :transform :autocorr :width 1024 :type 'rectangular)
      
A 1024 point window corresponds to just over two cycles of a 100Hz signal and so should be ok if the pitch is around that value. A rectangular window is appropriate here since we don't care about the frequency response of the window and in fact it's probably best to preserve the amplitude of the signal at the edges.

Answer the following questions:

  • For the stamp signal, estimate the pitch of the signal using autocorrelation at regular intervals along it's length.

  • Why can't you estimate the pitch in an unvoiced section like the initial /s/? Suggest a method of differentiating between voiced and unvoiced speech using only an autocorrelation analysis.