Voice Measures

SPL (dB)

Ratio of RMS sound pressure to reference pressure, expressed in dB. SPL is measured only if a dB-Scale calibration has previously been performed. SPL is measured with flat frequency-weighting. It is most closely equivalent to the 'C' weighting on sound level meters.

SPL Min (dB)

Minimum sound pressure level. This is the lowest SPL measured over all voiced portions of the sound segment.

Fo (Hz)

Fundamental frequency [1]. This is the frequency in Hertz of glottal pulses or vocal fold vibrations. It is a time-weighted average over all voiced portions of the sound segment.

Fo Min (Hz)

Minimum fundamental frequency. This is the the minimum value of Fo in Hertz over all voiced portions of the sound segment.

Fo Max (Hz)

Maximum fundamental frequency. This is the the maximum value of Fo in Hertz over all voiced portions of the sound segment.

Fo Range (ST)

Fundamental frequency range. This is the difference in semitones between the minimum and maximum values of Fo.

DUV (%)

Degree unvoiced. This is the fraction of the segment that is unvoiced, expressed as a percentage. Unvoiced means that no harmonic component was detectable in the signal.

DVB (%)

Degree of voice breaks. This is the fraction of the segment where there is a voice break, expressed as a percentage. A voice break is an unusually long interval occurring between glottal pulses.

MPT (s)

Maximum phonation time in seconds. This is the duration of the longest distinct phonation in the entire sound segment.

HNR (dB)

Harmonics-to-noise ratio [1]. This is the ratio of energy in the harmonic component of a signal to the energy in the noise (non-harmonic) component, expressed in dB.

NHR

Noise-to-harmonics ratio [1]. This is the ratio of energy in the noise (non-harmonic) component of a signal to the energy in the harmonic component, expressed as a fraction between 0 and 1.

SFR (dB)

Spectral flatness of the residue [2]. SFR expressed in dB varies from -infinity for completely periodic signals to 0 for completely aperiodic signals (white noise).

The "residue" is the inverse filtered signal (glottal waveform) as determined by linear prediction. The SFR is the ratio of the geometric mean (over frequency) of the spectral power of the residue to the arithmetic mean of the power, expressed in decibels.

SFR (%)

Spectral flatness of the residue [2]. SFR expressed in percent varies from 0 for completely periodic signals to 100 for completely aperiodic signals (white noise).

PA

Pitch amplitude [3]. PA varies from 1 for completely periodic signals to 0 for completely aperiodic signals (white noise).

The "residue" is the inverse filtered signal (glottal waveform) as determined by linear prediction. The PA is the maximum amplitude of the normalized autocorrelation of the residue.

Tilt (dB/oct)

Spectral tilt [4]. This value estimates the slope of the long-term averaged spectrum (LTAS) in decibels per octave. The spectral tilt is computed from the ratio of power in two frequency bands. The tilt is considered positive when there is more power in the low-band than in the high-band. Low values of spectral tilt signify a flatter spectrum with more high-frequency noise.

CPP (dB)

Cepstral peak prominence [5]. This is the difference in decibels between the highest peak in the cepstrum and the background as defined by a linear fit to the cepstral coefficients (on a logarithmic scale). CPP varies from positive values for completely periodic signals to 0 for completely aperiodic signals (white noise).

J Abs (µS)

Local absolute jitter [6]. This is the period-to-period variation in units of µS of the timing of glottal pulses.

J Loc (%)

Local jitter [6]. This is the ratio of J Abs to the average period, expressed as a percentage.

J PPQ (%)

Jitter based on the Pitch Period Perturbation Quotient method [6], expressed as a percentage.

J RAP (%)

Jitter based on the Relative Average Perturbation method [6], expressed as a percentage.

Sh Loc (dB)

Shimmer [6]. This is the period-to-period variation of the pulse amplitude, expressed in dB.

Sh Loc (%)

Shimmer [6]. This is the relative period-to-period variation of the pulse amplitude, expressed as a percentage.

APQ11 (%)

Shimmer based on the 11-point Amplitude Perturbation Quotient method [6], expressed as a percentage.

APQ5 (%)

Shimmer based on the 5-point Amplitude Perturbation Quotient method [6], expressed as a percentage.

APQ3 (%)

Shimmer based on the 3-point Amplitude Perturbation Quotient method [6], expressed as a percentage.

DSI

Dysphonia Severity Index [7]. This is a composite measure based on local jitter, MPT, maximum Fo, and minimum SPL. It was designed to correlate well with perceptual measures of dysphonia.

[1] Goy, H., Fernandes, D. N., Pichora-Fuller, M. K., van Lieshout, P. (2013). Normative voice data for younger and older adults, Journal of Voice, 27(5), 545-555.

[2] Markel, J. D., & Gray, Jr., A. H. (1976). Linear prediction of speech. Berlin: Springer-Verlag.

[3] Parsa, V., & Jamieson, D. G. (2000). Identification of pathological voices using glottal noise measures. Journal of Speech, Language, and Hearing Research, 43, 469-485.

[4] Parsa, V., & Jamieson, D. G. (2001). Acoustic discrimination of pathological voice: Sustained vowels versus continuous speech. Journal of Speech, Language, and Hearing Research, 44, 327-339.

[5] Hillenbrand, J., & Houde, R. A. (1996). Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. Journal of Speech and Hearing Research, 39, 311– 321.

[6] Baken, R. J., & Orlikoff, R. F. (2000). Clinical measurement of speech and voice. San Diego: Singular Publishing Group, Inc.

[7] Wuyts, F. L., De Bodt, M. S., Molenberghs, G., Remacle, M., et al. (2000). The Dysphonia Severity Index: An objective measure of vocal quality based on a multiparameter approach. Journal of Speech, Language, and Hearing Research, 43(3), 796-809.