OBJECTIVE ACOUSTIC VOICE ANALYSIS- WHERE ARE WE NOW | Zdrastveno savetovalište u oblasti poremećaja komunikacije

Submitted by drvulevu on Wed, 02/05/2025 - 11:37

Introduction

More than 50 years have been passed since first attempts in acoustic analysis of voice and speech were introduced into clinical practice (1). New science of communication disorders care was constituted. At the same time, there was some kind of parallel activity in the field of security state services and forensic investigations, as well as in the field of speech synthesis (2,3). Those were days of innovative approach to the secret of communication, when absolutely new methods were introduced in the practice as a great contribution to assessment of the voice, speech and language. It seemed that we have finally had at least such electro physiologic registration of the voice such as it was electrocardiogram in cardiology. Those were days of exciting expanding. It is easy to imagine such enthusiasm in giving the name
CASPER to one of first computer based analysis programs in 1993. , in fact that was the abbreviation of full name- Computer Assisted Speech Evaluation and Rehabilitation. All of those who had dedicated their life to the secret of voice had enjoyed in the opportunity of using of computer voice laboratory. Generations of phoniatricians, voice and speech pathologists and voice coaches were dreaming of that advance in technology support in their daily work with patients. Who can describe such exciting during installing new program in the office? For most of us that was the most important moment in our carriers. Those were pioneers days indeed. The time was stopped; day and night we were in our laboratories, comparing data from the literature with everyday practice. All members of our teams, together with our patients, were collecting and analyzing the results of multidimensional voice and speech analysis.

After 10 years of using objective acoustic analysis in various fields of voice assessment, first signs of skepticism could bee seen in some papers. Not rare we could find the conclusion that the technique of computerized acoustic analysis of voice signals is not sufficiently sensitive for general clinical usefulness (4). For example, jitter values could be in absolute normal range when it is about vocal fry phenomena or tremor values in amplitude had different implications than in frequency range.

In mid 80`s relevant methods for acoustic analysis of voice from a clinical point of view were: mean speaking frequency and fundamental frequency in singing; frequency range of phonation; pitch perturbations; intensity range of phonation and phonetogram; cycle-to-cycle amplitude variations; sound spectrograph (Visible Speech) and Long-Time-Average-Spectrum (5). Security institutions as well as speech synthesis laboratories were in close cooperation with these investigations as far (6). All of collected data were summarized nearly 30 years after electroglottography was discovered (7).

After another 10 years of wide use of computer voice analysis there were still papers of some kind of evaluation. Twenty years after introducing computer voice labs, there were recognized seven systems marketed for acoustic speech analysis: CSpeech, CSRE, ILS-PC, Kay Elemetrics model 5500 Sona-Graph, MacSpeech Lab II, MSL, and Signalyze, and three recently introduced systems: the Sensimetrics SpeechStation, the Kay Elemetrics Computerized Speech Lab (CSL), and the LSI Speech Workstation. In addition to the capability and performance summaries, this article offered suggestions for continued development of speech analysis systems, particularly in data exchange, journaling, display features, spectral analysis, and fundamental frequency analysis (8). Nevertheless, a progress have been made in interdisciplinary research cooperation and impressive growth in our knowledge of vocal function, but there was still need to develop new physiologically based management approaches (9) . Comparison of results by different teams was difficult because of the sensitivity of the method of acquisition and calculation (10).

Finally, in the last decade, reviewing several meta analysis, it seems that we are on the beginning. So you can find that although acoustic measures are routinely utilized in clinical voice examinations, the results of this meta-analysis suggest that caution is warranted regarding the concurrent validity and thus the clinical utility of many of these measures (11). The American Speech- Language-Hearing Association (ASHA) National Center for Evidence- Based Practice in Communication Disorders staff searched 29 databases for peer- reviewed English- language articles between January 1930 and April 2009 that included key words pertaining to objective and subjective voice measures, voice disorders, and diagnostic accuracy. One hundred articles met the search criteria. The majority of studies investigated acoustic measures (60%) and focused on how well a test method identified the presence or absence of a voice disorder (78%). Only 17 of the articles were judged to contain adequate evidence for the measures studied to be formally considered for inclusion in clinical voice assessment.

Results provide evidence for selected acoustic, laryngeal imaging- based, auditory- perceptual, functional, and aerodynamic measures to be used as effective components in a clinical voice evaluation. However, there was still clearly a pressing need for further high-quality research to produce sufficient evidence on which to recommend a comprehensive set of methods for a standard clinical voice evaluation (12).

Discussion

What is it all about? Is it possible that we do not know how to use this technology advancement on the best way? Is it possible that we do not know how to manage with limitations of this technology?

Is there a solution in producing new ad new generations of software? What are statistically comparable programs to the *gold standard* such are Multidimensional Voice Program (MDVP, KayPentax, NJ, USA), or Tiger Electronics (
Seattle, WA) Dr Speech software. Another important question was how to compare the results with other teams, especially because of having different voice labs. In those first steps of developing voice labs, everybody was buying those programs that were acceptable for any reason.

Even more, the modern industry offensive together with new generations of voice experts have introduced new ideas. The main proposition is that voice teams all over the world must have unification of computer laboratories, in the name of better understanding and better possibility of comparing the results. That is good idea, but is it real. Many of us had only one chance to by such laboratory, so what to do with this fact. The point is to find the best way of using the software that we already have.

Some of the authors have compared the calculations and results of acoustic voice analysis as calculated by two different analysis systems (Doctor Speech (DRS) Tiger Electronics, Neu- Anspach, Germany) and (Computerized Speech Lab (CSL), Kay Elemetrics Corporation, Lincoln Park, NJ ) and they concluded that DRS and CSL are not comparable in absolute figures, but their judgment against normative data is identical.

There is one work deals with one of the most complete instruments available for the acoustic analysis of voice: MDVP, developed by Kay Elemetrics, (1994, Model 4305) , that is probably one of the most standardized systems currently existing in the market , in comparison with a novel commercial tool that has recently been made available in the market: WPCVox (2006

The worst results in terms of correlation were obtained for the noise parameters: NHR, VTI I HNR. No statistical evidence has been found to indicate that both systems measure the same aspects of the phenomena with these parameters (13).

In the beginning, all of us were proud of 30- 40 parameters that we could analyze, later there have been offered mathematically constituted formulas with only few chosen parameters. Is there a solution in extracting some of registered voice parameters and using only few of them in new mathematical formulas (14). These authors have said that voice research so far has not led to the construction of or a consensus about a sensitive measure that unambiguously quantifies vocal quality. As a result of the vague relationship among pathology, vocal quality, and measurements, the clinician is frequently confronted with contradictory data when assessing an individual patient’s voice. Even more, acoustic, aerodynamic, and voice- range measurements of a patient’s voice are often within normal limits. Only values that deviate markedly from normal may be conclusive for clinical purposes (14).

Why is it so that in most studies evaluating effects of intervention, only the results of a particular therapy for groups of patients are investigated, whereas the results for individual patients (intra subject results) are not investigated? A possible explanation for this lack of data is that perceptual measurements scored on categorical scales are often used and consequently calculations cannot be made. Why is it so that despite evaluating effects of intervention is of growing importance in today s health care, because of the need for evidence- based intervention, there are however not yet well – accepted standardized instruments that can be used to assess the effects of intervention for voice disorders. When evaluating effects of intervention, there are two different aspects to take into account: the differences between groups of patients (inter subject differences) and the difference within one patient before and after intervention (intra subject differences).However, in daily clinical practice it is most practical to use the same objective measurement for all voice disorders (15). Why is it far from certain that different voice assessment tools correlate similarly to quality of life, which shows the necessity of further evaluation of methods applied for voice assessment (16)?

In most of recent papers from all over the world, one of conclusions still is the same- the need for an objective, quantifiable measure of voice quality is clearly important to establish correlation between treatments and outcomes (17, 18).

Furthermore, voice could change during the day or during the week and due to the great n- number no standardization was possible (19).

Is there final solution in improving measurement devices. Currently, there is an increasing demand for robust measures of voice quality. However, a comprehensive systematic and routine measurement of acoustic voice parameters for diagnostic and/or voice screening purposes or following treatment is only possible in hospitals with voice laboratory facilities. Therefore, future investigation should be concentrated on the utility of a large variety of voice signal feature types in classifying the voice into healthy and different pathological voice classes, using sophisticated contemporary methods of automated voice analysis (20).

How to overwhelm the fact that the MDVP extracts up to 33 acoustic variables from each voice analysis and compares them graphically or numerically with a built-in normative database, but the normative data, however, were derived solely from adults. These equipment-based tools, however, require costly and specialized instrumentation, an experienced operator, cooperative patients, and interpretation of complicated graphs and mathematical formulas. It is apparent that a pediatric database must be developed if acoustic measures are to be applied to the identification of pediatric vocal pathologic abnormalities (21).

Maybe the answer is in changing the way of voice assessment, as it was shown on the example of Phonetogram (established about 1950. standardized in 1981. and renamed to Voice Range Profile- VRP in 1992.). Factors, such adjusting methods, variability in task instructions, coaching provided to the patient, number of repeated trials and examiner experience may affect determination of vocal limits (22). How to deal with irregular voice signal, most of such texts had a conclusion that future research will determine this problem (23). It is not surprise to find the conclusion that computerized examination sacrifices precision in favor of speed (24). Is there any solution out of using standard way of measuring such as sustained phonation? The sustained vowel is considered the most common „language – indepedent“voice material used in clinical voice assessment. However, one major limitation of the sustained vowel is that it is an artificial type, it is insufficiently representative of daily speech and voice use patterns Controversies between sustained vowels and continuous speech are unsolved (25). Authors in last decade says that the validity and reliability of acoustic measures currently used in the clinic to objectively assess voice quality, are inherently limited by a reliance on the accurate determination of fundamental frequency and these measures have been further restricted to the analysis of sustained vowels (26).

What it the best way for using computer analysis (27).

Is there an answer in synthesizer package that offers the opportunity to test many hypotheses about the acoustic basis of voice quality perception, using experimental rather than correlational approach (28).

In our own daily practice we have made our database of voice samples, an original standard text passage and pocket of original test sentences. The practice itself showed us some ways of adaptation to the conditions of examination. In the field of standardization of ambient noise the practice has shown that the adjustment solution is in the instrument you have, what you really have to do is to make the same conditions for all of your patients. In the field of need for repeated tests you can repeat it on different programs, if you were lucky to have more programs (for example, Real Analysis,Vocal Assessment and Phonetogram). In the field of irregular signal you can check this parameter as first grade in voice analysis, and when you have, for the purpose of computer analysis, to extract the part of the sample because of its irregularity, that is another parameter as a second grade of hoarseness, and, finally the regular signal as third grade parameter that can be numerical displayed.

During the time, we could see all advances but also all of limitations in this modern technology. In the meantime, more and more sophisticated programs were on the market, ass well as more texts about the most representative parameters.

You have to deal with challenge that the software database that is the main for comparing the results of the patients with normal voice, was made in other language and cultural conditions that may not correspond with your own.

Another important question was also aroused- who is the expert that would be suitable for using computer analysis. Is it physician, ENT specialists or phoniatrician, or is it voice and speech pathologists. Is this activity reserved only for medicine stuff or that can be done by non medicine members of the team? Is it enough to have voice lab, without other parts of voice and speech examination, especially ENT and phoniatrician examination, endovideolaryngostroboscopy, subjective acoustic analysis and self perception of the voice. Does the hospital is only institution for voice lab. Is it better to make approach to every single office out of hospitals? What is the goal, if not to have better and prompt diagnostics or better health systems?

The basic question in cognitive neuroscience remained unsolved. What is happening in the brain, where and when – that question needs interdisciplinary approach. Synthetic work of phoniatricians, neuropsychologists, neurologists, speech and language pathologists could give the some kind of answer.

The secret of the phenomena of voice, speech and language probably will never be discovered. It is clear that phonation is product of interaction of numerous factors including intact psychological status and vocal box, with neuro, neuroendocrine and neurovegetative system helping, being synchronized with visual, audio, proper and deeper sensibility feed back system. This complex mechanism is coordinated by central nervous system.

The anatomic base of the processes of interpersonal communication is integrated in audio- vocal system; with the very true that material component is of the less importance.

“Intellectum dat qui auditum”- input of acoustic waves is transformed into neuroelectric waves in the ear. Auditor input is passing through adequate nerve pathway which is two-way road with its efferent component too. The output of the audio- vocal system is also well known, with its afferent component from the neuro receptors in the larynx.

The most important, central, part of this system, based on integration of parallel and sequential brain processing, is mostly unexplained. That is more complicated with the fact that the map of the cortex is not only the neuron network, but equally the neurotransmitters with their receptors. Multimodal cortical areas of frontal, parietal and temporal lobes are the place of analysis and synthesis of senso motor cognitive data. Their connections with limbic and para limbic areas give emotional evaluation of stimulus and modification of affects. Limbic system is transducer of information of emotional meanings of the external stimulus, and anterior cortex is modulating adequate motor activity.

From the modern technology aspect such as ultra electronic microscope, audio- vocal system is the pathway with input in mitochondria of the inner and external cochlear cells with stereo cilia. Their depolarisation liberates neurotransmitter glutamate, which activates action potentials of neurons. The middle part of this pathway is synaptic cleft and the ends of output are specific mitochondria of the neuromuscular junctions of the vocal folds.

Is it coincidence that the oldest information of the human race is stored in mitochondrial DNA?

Never neglect the importance of prosody examinations. The origin of the word prosody is in Latin prosodia, equally with Greek просоидиа, that means accentuation and intonation, as nonverbal aspects of communication. We are talking about linguistic and emotional prosody. Brain lateralisation makes organisation of the neuropsychological functions.

The clinical importance of the affect in communication was recognized in 1915, but the systematic approach to the dysprosodic syndromes was since 1947. The first authors pointed on the fact that right hemispheric deficit makes no difference in recognition of sad or happy voice, although such patients can understand the meaning of what was said . Very soon it was shown that linguistic prosody recognition is also damaged in right hemispheric deficit. EEG parameters confirmed the hypothesis of the connection between RHD and failure in prosody expression.

In the early 80-s of the last century, the specific neuropsychological test for aprosody identification was introduced. Only functional NMR imaging conducted in the same time with voice and speech test signals could be the main direction in investigations.

Conclusion

The best computer program for objective voice assessment is the one that you already have, this is the only true from the very beginnings (29). The best way to recognize all advantages and limitations of modern technology is to work in multidisciplinnary team. Only Communication Disorders Care Center with close cooperation of experts from fields of neurosciences, speech and language pathologist, ENT specialists and phoniatricians, can give the real opportunity to deal with the secret of voice, speech and language, as well as hearing.

The best way to use computer analysis is option of clinical progress measurement (30). Just as one of the greatest authorities in this field have said from his visionary point of view - As part of our diagnostic process, we also quantify voice function using objective measures, perceptual measures, aerodynamic assessment, and quality –of- life instruments. These adjuncts have not merely improved our diagnostic abilities but also have provided us with techniques to measure treatment outcomes-(31).

It seems to us that the answer is very simply. The answer is in our patients. Almost all of these programs have the option -clinical progress tracking-. If you have pure prove in clinical tracking progress, on the first examination, during the process of therapy and after the therapy, and if there is improvement of voice and speech quality, this is the database for publishing and comparing with colleagues from all over the world. If on our clinical tracking is shown that voice of our patient is better than before the therapy, that is enough, these results we can compare wit other voice labs and there is no need for buying new programs and unification all over the world.

Let us put the patient in the first plane. Let the patient to see his or her own voice by visible speech animations. This is the best way of vocal therapy. And something more; let us not forget the main challenge-how to measure prosody in the voice and speech, the only human item that can not be synthesized. There is very interesting situation in modern science that cognitive neuroscience in the field of language is not present enough like the role that the language has in human life. Different sciences such as psycholinguistics, neuro linguistics, neurobiology, and neurophysiology have the similar aim, but there is no much communication among them. Imperative of the future investigations is the closest work between experts from different fields, using different methods at the same time. Priority should be in testing of conversational speech, expression as well as perception. The main question is ability of non damaged part of the brain in activating the rest of communication ability.

At the beginning of the evaluation of voice disorder, do not forget that the precise history is of the greatest importance because laryngeal pathology can be only the sign of some disorder elsewhere. Interdisciplinary approach includes examination of the neurological, endocrine, pulmonary and digestive factors.

After 100 years (32), the opportunity of cooperation between phoniatricians and neuropsychological work is not used still. Many of phoniatric disorders need neuropsychological evaluation, and vice versa. Only interdisciplinary cooperation will give the answers in the field of objective voice analysis, speech synthesis (33), text to speech conversion as well as human- machine communication, and even in person, not only voice, recognition.

There is urgent need for Communication Disorders Care Centre constituting.

Literature

Iwata S. Leden H. von Voice prints in laryngeal disease. Arch Otolaryngol, 1970; 91 (4): 346- 51
Mamoux J.P. Identification of the human voice. Med Leg Dommage comor, 1971; 4(1): 35- 8
Schweisheimer W. Identification of the human voice using voice spectrography. Med Klin, 1972; 67 (47): 1571-3
Cox N.B., Morrison M.D. Acoustic analysis of voice for computerized laryngeal pathology assessment. J Otolaryngol, 1983; 12 (5): 295-301
Dejonckere P.H. Acoustic analysis of voice production. Production trial from a clinical perspective. Acta Otorhinolaryngol Belg, 1986; 40 (2) : 377- 85
Koenig B.E. Spectrographic voice identification: a forensic survey. J Acoust Soc Am, 1986; 79(6): 2088- 90
Baken R.J. Clinical Measurements of Speech and Voice. Taylor and Francis Ltd,
London, 1987.
Read C., Buder E.H., Kent R.D. Speech analysis systems: an evaluation. J Speech Hear Res, 1992; 35 (2): 314- 32
Stemple J.C. Voice research: so what. A clearer view of voice production, 25 years of progress; the speaking voice. J Voice, 1993; 7(4): 293- 300
Giovanni A., Resis J., Triglia J.M. Objective Aerodynamic and Acoustic Measurement of Voice Improvement After Phonosurgery. Laryngoscope, 1999; 109(4): 656-60
Maryn Y., Roy N., De Bodt M., et al. Acoustic measurement of overall voice quality: a meta analysis. J Acoust Soc Am, 2009; 126 (5) : 2619- 34
Roy N., Barkmeier- Kraemer J., Eadie T., et al. Evidence- based clinical voice assessment: a systematic review. Am J Speech Lang Pathol, 2013; 22 (2): 212- 26
Smits J. Ceuppens P., De Bodt M.S. A comparative study of acoustic voice measurements by means of Dr. Speech and Computerized Speech Lab. J Voice, 2005; 19(2): 187-96
Godino-Lorente JI., Osma-Ruiz V., Saenz-Lechon N., et al. Acoustic analysis of voice using WPCVox: a comparative study with Multi Dimensional Voice Program. Eur Arch Otorhinolaryngol,2008; 265: 465-476
Wuyts F.L., Molenberghs G., Remacle M., et al (2000) The Dysphonia Severity Index: An Objective Measure of Vocal Quality Based on a MUltiparametar Approach. Journal of Speech Lang Hear research 43 (3): 796- 809
Hakeesteegt M.M., Brocaar P.M., Wieringa H.M. The Applicability of the Dysphonia Severity Index and the Voice Handicap Index in Evaluating Effects of Voice Therapy and Phonosurhery. Journal of Voice, 2010; 24(2): 199- 205
Schneider S., Plank
Ch.,
Eysholdt U., et al. Voice Function and Voice –Related Quality of Life in the Elderly. Gerontology, 2011; 57: 109-114
Ali D., Hossein A., Mehdi B. (2010) Objective Voice Analysis of Iranian Speakers with
Normal Voices. Journal of Voice 24 (2): 161- 167
Maryn Y., De Bodt M., Barsties B., et al. The value of the Acoustic Voice Quality Index as a measure of dysphonia severity in subject speaking different languages. Eur Arch Otorhinolaryngol,2014; 271: 1609-19
Echternach M., Nusseck M., Dippold S., et al. Fundamental frequency, sound pressure level and vocal dose of a vocal loading test in comparision to a real teaching situation. Eur Arch Otorhinolaryngol, 2014; 271: 3263- 8
Uloza V., Padervinskis E., Vegiene A., et al. (2015)Exploring the feasibility of smart phone microphone for measurement of acoustic voice parameters and voice pathology screening. Eur Arch Otorhinolaryngol 272:3391- 3399
Campisi P. Tewfik T.L., Manoukian J.J., et al. Computer – Assisted Voice Analysis. Establishing a Pediatric Database. Arch Otolaryngol Head Neck Surg, 2002; 128 (2): 156- 160
D`Alatri L., Marchese M.R. The speech range profile (SRP) ; an easy and useful tool to assess vocal limits. Acta Otorhinolaryngologica Italica, 2014; 34: 253- 258
Sprecher A., Olszewski A., Jiang J.J. Updating signal typing in voice: Addition of type 4 signals. J Acoust Soc Am, 2010; 127 (6): 2710- 16
Parsa V., Jamieson D.G Acoustic Discrimination of Pathological Voice: Sustained Vowels Versus Continuous Speech. J Speech Lang Hear Res, 2001; 44: 327-39
Montojo J., Garmendia G.,
Cobeta I. Comparision of the Results Obtained Through Manual and Automatic Phonetogram. Acta Otorrinolaringol Esp, 2006; 57(7): 313- 8
Mehta D.D., Hillman R.E. Voice assessment: Updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Curr Opin Otolaryngol Head Neck Surg, 2008; 16(3): 211- 5
Kreiman J., Antonanzas-Barroso N., Gerrat B.R. Integrated Softwares for Analysis and Synthesis of Voice Quality. Behav Res Methods, 2010; 42 (4): 1030- 41
Yanagihara N. Significance of harmonic changes and noise components in hoarseness. J Speech Hear Res., 1969 ; 10: 531-41
Speyer R., Wieneke G.H., Dejonckere P.H. Documentation of progress in voice therapy: Perceptual, acoustic, and laryngostroboscopic findings pretherapy and posttherapy. J Voice, 2004; 18(3): 325-40
Sataloff RT. (2003) Laryngology: State of the Art. Laryngoscope 113 (9): 1477- 1478
Hughlins- Jackson J. On affections of speech from disease of the brain. Brain, 1915; 38: 106- 174
Murray I.R., Arnott J.L. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am, 1993; 93: 1097- 1108

NOTE: Written in 2019.