1. Introduction
Over the past several decades, straw phonation has emerged as a novel method to improve vocal economy. Straw phonation involves the use of tube held between the lips through which the subject phonates and was originally developed in the 1960s using glass straws [
1]. A wide range of studies have been conducted to measure the vocal quality before and after straw phonation for patients with muscle tension dysphonia [
2], vocal fatigue [
3], vocal fold paralysis [
4], vocal fold nodules [
5] and lesions [
6] as well as singers with high vocal demand [
7,
8]. Existing research on straw phonation has developed two main theories explaining increased ease of phonation: an acoustic-aerodynamic interaction and a mechano-acoustic interaction [
9]. The former deals with the shape of the glottal flow pulse while the latter is focused on the material properties of the vocal tract wall, especially inertance on supraglottal pressure. However, both of these theories consider the importance of impedance, which creates a build-up of pressure after the glottis to facilitate oscillation of the vocal folds with decreased subglottal pressure [
10].
Impedance is quantified as a complex number, whose real part is resistance and imaginary part is reactance. When displayed in vector form, the magnitude of impedance measures the ratio of pressure to flow velocity while the angle of the vector describes the phase difference between the pressure and velocity. Because impedance is useful for describing the effects of the vocal tract on the pressure and velocity of the sound wave as it travels from vocal folds to mouth, it is an important metric for measuring the effects of straw phonation and whether it improves the vocal quality of the subject.
A robust, physically-based model for calculating impedance was developed by Sondhi et al. in 1987 [
11] and later refined and expounded upon by Story et al. in 2000 [
12]. This model postulated the vocal tract as a series of cylindrical tubes, called “tubelets”, each with a constitutive length and cross-sectional area. As sound travels through the vocal tract the characteristics are assumed to be constant within each tubelet, so each tubelet can be described by a 2 × 2 matrix which acts on a 2-dimensional column vector composed of pressure and flow velocity. This iterated map outputs values for pressure and velocity at the end of each tubelet, taking into account the tubelet wall interactions with the sound wave, such as mass compliance, thermal conductivity, and viscosity. Since the tubelets are connected, the output of one is the input for the next tubelet in the line, until the sound wave reaches the mouth (or straw) opening and expands into the surrounding atmosphere. Given the successive multiplication of many matrices, this method of impedance calculation will be henceforth referred to as the “chain matrix” approach. However, as made apparent below, the equations for the 2 × 2 matrix for each tubelet are complicated and not intuitive in their derivation. This presents challenges to fully understanding the computations involved and prevents expanding the model to other related vocal tract inquiries. Therefore, developing an alternative model and method to calculate impedance may be valuable for straw phonation research.
The transmission line circuit analog of the vocal tract is one candidate for an alternative method to calculate impedance. Electrical transmission lines transmit a current from source to load based on the properties of the line. Acoustic-mechanical systems have many similarities to electrical circuits which make them convenient for analysis given the large amount of knowledge about electrical phenomenon in circuits. The oscillation of the vocal folds produces a periodic waveform similar to alternating current in a circuit.
Table 1 provides a direct comparison between the properties of a vocal tract and electrical phenomenon in a circuit.
Many previous acoustic and aerodynamic research papers have used the electrical analogy, but none so far have used it to study impedance [
13,
14,
15]. One important note concerning transmission lines is the distinction between lossless and lossy. A lossless line does not dissipate the energy of the current as it travels along the line, and thus only contains values for inductance and capacitance. The vocal tract is not, however, a lossless line due to the interaction of the sound waves with the tube walls causing it to lose energy. Moreover, the vocal tract, once discretized into tubelets, represents a stepped transmission line by having different values of impedance for each line segment corresponding to each tubelet. In
Figure 1, below, an abridged diagram of the electrical circuit model of the vocal tract is shown.
Multiple of these circuit elements are connected to nodes at the ends of the elements. Since impedance from inductance and resistance is directly proportional of to the value of the inductor or resistance, while impedance from capacitance and shunt conductance is inversely proportional, the inductors and resistors are shown in series while the capacitor and shunt conductor are in parallel to account for this difference.
Although more advanced aerodynamic simulations and computational fluid dynamics (CFD) tools have been introduced to the field of human phonation studies in recent years, the proposed transmission line circuit analogy still possesses great value to this field. CFD often requires intensive calculations which take significant time and computing resources for each simulation, which makes the tool possibly problematic for analyzing multiple different vocal tract, straw, and frequency configurations. In contrast, a successful transmission line model provides an opportunity to significantly simplify the calculations by creating a Thevenin equivalent circuit for each phoneme. Additionally, while CFD can provide more detailed information about the aerodynamic interaction between the vocal tract wall and the air inside the vocal tract, this information such as vorticity, turbulence, and nasal interaction, is not necessarily relevant to the study of impedance, which is purely concerned with the ratio of pressure to volume velocity at the glottis. Therefore, the simplicity of the transmission line method enables a direct focus on the desired quantity of impedance.
The objective of this study is to test the validity of the transmission line circuit analogy as a new method of calculating impedance, to use this method to find the optimal configuration of vocal tract shape and straw geometry for straw phonation, and to determine whether maximal impedance corresponds to maximal power transfer.
2. Materials and Methods
To compare the validity of the electrical circuit method for impedance computation to the existing chain matrix method, data of 18 vocal tract shapes were collected from previous research performed by Story et al. [
16]. Each of these shapes discretized the vocal tract into tubelets of equal length (0.396825 cm), so each shape had varying amounts of tubelets ranging from 40 to 46. To prevent undefined results from the calculations, tubelets listed with “0.00” cross-sectional area were increased to 0.01 cm
2. Since the largest tubelet was about 9 cm
2, an increase of 0.01 cm
2 for the tubelets with no cross-sectional area is small in magnitude and should not produce biased results. No other measurements from the glottal area data were altered. The formulae provided in Story et al., are copied below:
where
Hz,
Hz,
rad/s,
rad/s. For the electrical circuit model, formulae for inductance, capacitance, resistance, and shunt conductance of the tubelet equivalent circuits were taken from Flanagan from 1972 [
17] and listed below.
where
is the angular frequency in radians per second, defined as
;
A is the cross-sectional area of tubelet; and S is the circumference of the tubelet, which can be calculated from the geometric relation to the area such that
. It is important to note that there is no term for the length of the tubelet in any of the equations, which will be important for calculating impedance further on. Definitions for the other variables and their standard values used in these calculations are listed in
Table 2. Since this model includes values for resistance and shunt conductance, it is lossy, as opposed some other models for artificial speech synthesis [
18]. Moreover, the Flanagan model was also chosen for its simplicity by calculating the total amount of each electrical value for each tubelet, instead of multiple electrical components [
19,
20,
21].
Similar to Story et al., impedance can be calculated from a series of 2 × 2 matrices where the elements of each matrix are
where
is the characteristic impedance,
is the characteristic admittance, and
is the propagation constant (note that this
is different from the variable used in the Story et al. equations) of that tubelet such that [
22]
The equations above highlight the novelty of the transmission line model. While CFD can only find pressure and volume velocity as real numbers which actually occur in the vocal tract, the transmission line yields complex values which indicate both the realized aerodynamic values but also the stored energy to be later expended. Moreover, the propagation constant, a complex number of the form where is the attenuation constant and is the phase constant, further provides information about the process of phonation which CFD cannot. The attenuation constant describes the reduction of the amplitude of the wave is it travels down the line while the phase constant shows the phase difference between the voltage and current.
As with Story et al., the electrical circuit matrices are left multiplied to effectively iterate through the vocal tract backwards, from the mouth (or straw) opening to the glottis. The effect of the transmission line on the impedance of each tubelet in the direction of the wave propagation is given by
where
is the length along the transmission line element. Since each of the tubelets have the same length and the tubelets are connected in series so the input impedance of one tubelet is the load impedance of the next tubelet closer to the glottis, input impedance of the tubelet element
is
Radiation impedance is the impedance created from the atmosphere as sound exits the mouth (or straw). In this way, radiation impedance acts as the load which the sound wave is transmitted to. Most research, including Story et al., and Flanagan, agree that radiation impedance is best modeled as a vibrating piston in an infinite baffle. The vibrating piston produces a sinusoidal pressure wave like the oscillation of the vocal folds which is reflected off the baffle. However, since the walls of the baffle are infinitely far away from the emission of the sound, the reflection becomes nothing, and the equation devolves to
where
R and
L are the effective resistance and inductance of the equivalent circuit for radiation impedance, respectively. Since the end of the acoustic transmission line is open, there is no compression of the air or losses due to heat conduction with the tubelet walls, and as such there is no term for capacitance or shunt conductance. For more information on the formulae for calculation radiation impedance, the reader is directed to the paper from Story et al.
Importantly, radiation impedance is the ratio of the pressure to volume velocity at the mouth or straw opening such that , which only depends on the frequency and geometry of the mouth or straw opening. By substituting into the final equation for , a final expression can be found relating the input impedance to the radiation impedance and the product of tubelet matrices. Therefore, the input impedance does not depend on the initial conditions of subglottal pressure or volume velocity. As a result, the use of the chain matrix and transmission line models require significantly fewer simulations and calculations than comparable CFD.
Finally, the average power dissipated in a load in a reactive circuit is
with
being the voltage at the source,
R being resistance, and
X being reactance. Since impedance is defined as the quotient of voltage over current, the current at the source can be normalized to 1 so that the equation for average power dissipated becomes
Unless otherwise noted, impedance is measured on a log (base 10) scale, with data collected between 0 and 1000 Hz, sampled in 10 Hz increments. Since the purpose of this study was to investigate and compare the effects of phonation with and without a constricted vocal tract extension, straw dimensions were limited to lengths of 15 and 30 cm, and 1 cm2 cross-sectional area, as the exact value of impedance is not necessarily of importance, but rather just the change given the straw. To ensure that the straw configurations have clinical significance, the length of the straw was capped at 30 cm (about 1 ft), as opposed to Story et al., which tested straws up to 100 cm (1 m), because lengths beyond that become cumbersome to use. The cross-sectional area of 1 cm2 was chosen as a narrow, yet still feasible, tube size as compared to the normal vocal tract cross-sectional areas to show the effects of constricting the area in addition to the length extension.
3. Results
The first goal of this paper was to determine whether the electrical circuit analogy of the vocal tract as a transmission line is a viable alternative to the existing standard model from Story. While the two models essentially are equivalent for vocal tracts solely described as hard-walled straws, they diverge for realistic soft-walled vocal tracts. Example graphs of impedance, resistance, and reactance are given for both models based on a 17.5 cm long, 1 cm
2 area straw in
Figure 2.
Several notable features of impedance are apparent from these graphs. First and foremost, both impedance calculation methods generate remarkably similar plots for each of the values considered, and so much so that the overlap of the graphs appears to show only one curve. The plots only appear to separate at local extrema—both minima and maxima. The relative values at each of the extrema are not necessarily of importance because of the different units used between the two calculation methods, but rather, the shape of the general plot and the location of the local extrema are better indicators of the agreement between the two methods.
The peak value for impedance occurs at resonance, and more specifically the fundamental frequency F0. This frequency also coincides with the maximum resistance and somewhat counterintuitively, where the reactance curve crosses the horizontal axis. As explained by Story et al., this is plausible because resonance occurs when there is a sharp increase in the amount of sound transmitted, indicating an ease in phonation. While the energy being used for phonation can either be in the form of resistance or reactance, reactance is the storage of energy meaning that there is less energy to be transmitted to the atmosphere. Therefore, it is quite reasonable to expect that resonance occurs at this frequency.
However, the models produce disparate results when computing the input impedance of actual vocal tract shapes.
Table 3 lists the maximum impedance values along with the frequency at which each of these impedance values occurred for all 18 vocal tract shapes, calculated by both models, listed in order of decreasing value of maximum impedance calculated by the chain matrix method.
The discrepancies between the two models are expected, however. From the outset, Flanagan indicates that the values for electric parameters were derived for a hard walled tube with losses proportional to I2R and E2G. Thus, the losses due to the elasticity of the vocal tract wall itself are unaccounted for by Flanagan.
As such, henceforth, only the chain matrix method will be used to determine the best configuration of phoneme and straw. Since increased impedance is generally considered beneficial, the four vocal tract shapes with the highest maximum impedance as listed in
Table 3 (/t/, /i/, /u/, and /n/, respectively) are plotted in
Figure 3. Each phoneme impedance curve is calculated with no straw, a 15 cm straw, and a 30 cm straw as described above in the methods. Only the range of frequencies between 0 and 300 Hz are considered here because the most important information to gather from the impedance curve relates to the fundamental frequency which is found in this range for each of the phonemes. Note that the maximum impedance value listed in
Table 3 may not be achieved in the range of these frequencies.
The plots in
Figure 3 each demonstrate that the addition of a straw to the phonation of a given phoneme alter the impedance levels and location of the fundamental frequency F0. As the length of the straw is increased from 0 cm to 15 cm to 30 cm, F0 is decreased monotonically, although the amount of such change is not apparently related to the length of the straw. However, the change in the impedance value at F0 is not the same for all of the phonemes considered. Beyond F0, the highly nonlinear nature of phonation and aerodynamic vocal tract wall interaction appear to overwhelm any effect that the introduction of the straw may have, but since as discussed below, the fundamental frequency is of greater importance than other relative frequencies, this may not be a worthwhile argument against the effectiveness of straw phonation.
Furthermore, another goal of this paper is to gain more knowledge of acoustic power and its relation to impedance, which was calculated from the electrical equation for power dissipated. An example graph of power is shown in
Figure 4, with the same tubelet configuration of 17.5 cm long, 1 cm
2 area straw as
Figure 2.
It is clear that spikes in the power graph correspond to the same frequencies at which local extrema of the impedance graph exist—both minima and maxima. The case of the power peak near the impedance peak makes sense; because the numerator of the power equation is of cubic order, while the denominator is only quadratic, maximizing resistance and minimizing reactance will overall maximize power. This is precisely the case where impedance is at a maximum as described already. However, the peak near the minimum of impedance results from the properties of radiation impedance. While radiation resistance is not directly correlated to the frequency (as there is the same power of in both the numerator and denominator), radiation reactance changes inversely with , so resistance is relatively unchanged while reactance goes to zero.
4. Discussion
This study has demonstrated that the electrical circuit model of the vocal tract is not yet equipped fully to model the complex dynamics of human phonation. However, there is great value in developing a workable electrically based model, so future research should focus on the required alterations to Flanagan. Some suggestions are described in detail below.
After the derivation of the electrical formulae for smooth, hard-walled tubes, Flanagan estimates the effect of energy lost due to cavity wall vibration of a soft vocal tract wall with non-infinite surface impedance. However, this discussion of the cavity wall admittance only changes the calculation of the imaginary portion of the propagation constant. As defined in Equation (
4), the propagation constant is a complex value, with the real portion being the attenuation constant which describes how the amplitude of the wave is damped over time, and imaginary portion being the phase constant which describes change in phase per unit length of transmission line. Since cavity wall vibration provides another path for energy to be dissipated and stored, such vibration should have an effect on both parts of the attenuation constant. For a better understanding of interaction between phonation and vocal tract wall expansion, an analysis based on Hooke’s Law may be fruitful.
The Young’s modulus of a material describes the amount of stress required to cause a certain amount of strain, or elongation, of that material. On a stress–strain curve, the Young’s modulus represents the slope of the linear portion, where there is only elastic deformation. For vocal tract tissue, the Young’s modulus, also called the modulus of elasticity, is on the order of
–
Pascals [
23,
24]. Essentially, the vocal tract tissue acts a spring in the circumferential direction which expands and contracts based on the dynamics from the spring force provided by the tissue elasticity in opposition to the air pressure in the tract [
25]. Additionally, this spring-like motion is damped due to the internal friction between the layers of tissue in the vocal tract wall. While this approach may yield some added realism to the existing LCRG transmission line model, an alternative electrical model could benefit from the use of transformers.
A transformer is an electrical device which consists of two sets of wires wrapped around a shared metal core. The alternating current in one wire, called the primary, causes a changing magnetic flux in the core which, according to Faraday’s Law, causes an induced current in the secondary loop. The ratio of the number of turns in each loop is equal to the ratio of voltages in each wire. Therefore, the transformer could help account for the discontinuous nature of the pressure and volume velocity along the vocal tract after it is discretized into tubelets.
Beyond the changes required to account for the soft walled vocal tract, other changes could be valuable to extend the model beyond a healthy vocal tract for straw phonation. Three worthwhile cases are presented. First, continuous positive airway pressure (CPAP) has been studied before as an alternative to straw phonation, but recent results have been inconclusive [
26,
27]. The introduced oppositional airflow requires greater subglottal pressure to accelerate the mass of air in the vocal tract, changing the relative inductance. Second, inflammation of the larynx is a common vocal disorder and simulating the effects of this change to the vocal tract wall may aid in providing treatment. Anatomical and biological changes to the vocal tract wall effect the viscous friction and heat conduction with the vocal tract wall, and thus resistance and shunt conductance, respectively. Third and finally, water resistance therapy is similar in intent to straw phonation by providing extra resistance to the patient, along with introducing nonlinear feedback to massage the vocal folds [
28,
29,
30]. Along with the extension of the vocal tract with the aid of a straw, the end of the straw is submerged in water, requiring the phonation to push air through the water to produce bubbles. This submerged portion would have an altered inductance due to the mass of the water being moved, but also capacitance due to the incompressibility of the water. More research into electrical analogs for these effects could be very valuable for improving patient outcomes.
Besides the vocal tract, the larger scheme of sound production and reception could be incorporated into the transmission line model. The human ear transmits auditory information from the tympanic membrane to the cochlea where the signal is converted to electrical nerve impulses [
31], so the ear can also be modeled as a transmission line [
32,
33]. By matching the impedance of the source and the load, the entire phonation process could be optimized to ensure that the vocal quality produced from speaking can be correctly received.
Previous research by Titze [
34,
35] has shown that phonation threshold pressure (PTP) is proportional to F0. Thus, a valuable mechanism for straw phonation to ease phonation is to reduce the fundamental frequency meaning that phonemes which already have a high impedance F0 present possibly the best opportunity for straw phonation. If straws can reduce F0 for these sounds, then PTP could be significantly lowered while still maintaining a high impedance level useful for other theories of straw phonation.
Alternatively, an explanation for the benefits of straw phonation is the ability for patients to more easily achieve phonation and thus gain the sensory information of resonance with less exertion. Phonation involves complex neuro-muscular interactions which reinforce each other based on the sensory experience of producing sound. Since relatively little research has been done on the lasting effect of straw phonation after the patient stops using the straw [
36], an advantageous configuration for straw phonation could maximize impedance while keeping the fundamental frequency of resonance as close to that without the straw as possible. In this way, the acoustic-mechanical interaction of the vocal tract and glottis has maximum potential to reduce PTP while the patient is able to experience the sensory information of phonating at that specific frequency, as would be required without a straw. Given that this theory contradicts the work by Titze about the relationship between F0 and PTP, more research is needed to explain the underlying mechanism of straw phonation.
Moreover, future research in multi-step straw phonation, where the straw is slowly reduced in size until the straw is entirely removed, may be worthwhile in order to train the participant to phonate effectively without the straw. This is in line with the ultimate goal of straw phonation which is the increase vocal quality and reduce phonation effort of speech after the therapy.