Speech Measures

Sp SPL (dB)

This version of the SPL measurement (see above) is optimized for speech samples. Instead of a simple RMS measurement, an 'F' (125 ms, exponential) time-weighted RMS is calculated. Only values of this RMS that fall above an SNR threshold are averaged to calculate Sp SPL.

SFo (Hz)

Speaking fundamental frequency [1]. This is the frequency in Hertz of glottal pulses or vocal fold vibrations. It is a time-weighted average over all voiced portions of the sound segment.

SFR (dB)

Spectral flatness of the residue [2]. SFR expressed in dB varies from -infinity for completely periodic signals to 0 for completely aperiodic signals (white noise).

The "residue" is the inverse filtered signal (glottal waveform) as determined by linear prediction. The SFR is the ratio of the geometric mean (over frequency) of the spectral power of the residue to the arithmetic mean of the power, expressed in decibels.

SFR (%)

Spectral flatness of the residue [2]. SFR expressed in percent varies from 0 for completely periodic signals to 100 for completely aperiodic signals (white noise).

PA

Pitch amplitude [3]. PA varies from 1 for completely periodic signals to 0 for completely aperiodic signals (white noise).

The "residue" is the inverse filtered signal (glottal waveform) as determined by linear prediction. The PA is the maximum amplitude of the normalized autocorrelation of the residue.

Tilt (dB/oct)

Spectral tilt [4]. This value estimates the slope of the long-term averaged spectrum (LTAS) in decibels per octave. The spectral tilt is computed from the ratio of power in two frequency bands. The tilt is considered positive when there is more power in the low-band than in the high-band. Low values of spectral tilt signify a flatter spectrum with more high-frequency noise.

CPP (dB)

Cepstral peak prominence [5]. This is the difference in decibels between the highest peak in the cepstrum and the background as defined by a linear fit to the cepstral coefficients (on a logarithmic scale). CPP varies from positive values for completely periodic signals to 0 for completely aperiodic signals (white noise).

SD SFo (Hz)

Standard deviation of the SFo. This quantity is a time-weighted standard deviation over all voiced portions of the sound segment.

SD SFR (dB)

Standard deviation of the SFR expressed in dB. This quantity is a time-weighted standard deviation over all voiced portions of the sound segment.

SD SFR (%)

Standard deviation of the SFR expressed in percent. This quantity is a time-weighted standard deviation over all voiced portions of the sound segment.

SD PA

Standard deviation of the PA. This quantity is a time-weighted standard deviation over all voiced portions of the sound segment.

SD Tilt (dB/oct)

Standard deviation of the Tilt. This quantity is a time-weighted standard deviation over all portions of the sound segment for which CPP > 0 dB.

SD CPP (dB)

Standard deviation of the CPP. This quantity is a time-weighted standard deviation over all portions of the sound segment for which CPP > 0 dB.

[1] Goy, H., Fernandes, D. N., Pichora-Fuller, M. K., van Lieshout, P. (2013). Normative voice data for younger and older adults, Journal of Voice, 27(5), 545-555.

[2] Markel, J. D., & Gray, Jr., A. H. (1976). Linear prediction of speech. Berlin: Springer-Verlag.

[3] Parsa, V., & Jamieson, D. G. (2000). Identification of pathological voices using glottal noise measures. Journal of Speech, Language, and Hearing Research, 43, 469-485.

[4] Parsa, V., & Jamieson, D. G. (2001). Acoustic discrimination of pathological voice: Sustained vowels versus continuous speech. Journal of Speech, Language, and Hearing Research, 44, 327-339.

[5] Hillenbrand, J., & Houde, R. A. (1996). Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. Journal of Speech and Hearing Research, 39, 311– 321.

[6] Baken, R. J., & Orlikoff, R. F. (2000). Clinical measurement of speech and voice. San Diego: Singular Publishing Group, Inc.