by gg3 » Fri 16 Mar 2007, 10:06:51
MD you're mistaken; it's a lot simpler than that.
I'm speaking here as a telephone systems engineer with over 20 years in the industry.
When you think the call center people are "speaking faster and with less clarity," and when you can't understand them, here is what is happening, and I can prove this beyond any reasonable doubt.
Your telephone call is being shipped overseas via a packet-switched network: voice over internet-protocol, also known as VOIP and IP-telephony.
When implemented correctly, it's barely distinguishable from a regular circuit-switched telephone call on a landline.
When implemented poorly or to pinch pennies, the sound quality is about the same as a crappy cellphone connection... or worse, in subtle ways you are not ordinarily aware of but which affect the quality of the conversation.
The two variables here are the compression algorithm and the latency factor.
The compression algorithm does two things: One it determines how frequently the voice signal is sampled in order to create values represented in digital form. Two it determines the width of the "dynamic range" or variation in allowable loudness.
The sampling frequency determines intelligibility. Lower sampling rates produce the impression of a rapid burbling undertone, which can be confused with speaking too fast to be understood. The visual analogy for this is a movie that was filmed and played back at a slow frame rate: it appears jerky, thus it appears as if the actors' gestures and movements are unnaturally rapid. On the other hand, a high sampling rate (by analogy, fast movie frame rate) creates a much smoother sound, and allows the natural rhythms of speech (or gestures on the film stage) to be heard (or seen).
The dynamic range determines the degree of clarity of the person's voice when there is background noise (such as a room full of other people talking on the phone). You may have noticed when speaking with someone who is calling from a cellphone, that background noises at their end are often loud enough to obscure the speech. For example a car goes by and sounds like a louder truck, or a fire engine goes by and the siren is so loud it practically hurts. A crappy compression algorithm decreases the dynamic range, reducing the variations in sound volume, and thereby raising the level of background noise in the conversation. Thus it appears as if the other person is not speaking clearly.
By analogy (caution, vulgar analogy), if you are recording a drummer in a music studio, and you compress the signal to the maximum limit (where everything comes out with equal volume), then when the drummer farts it will sound like a drum roll, and on the other hand, a legitimate drum roll may come out sounding like a fart.
The impact of both of the above factors includes the loss or masking of nonverbal speech cues that convey the other person's mood and attitude. Thus, a voice that is trying to convey a helpful attitude may come out sounding emotionally flat, as if the person speaking does not care about their job or the person they are speaking with.
Last but not least, latency. This is the delay that occurs over the transmission medium, based on the amount of time you want to allow for the equipment at each end to "reassemble" the digital speech packets. Remember, in a packet switched network, the packets may be taking different routes to their destination, and they may arrive in an order that is different from the order in which they were sent. Latency occurs when you wait to be sure you have the packets assembled in the order you need in order to convert them to speech.
A normal circuit-switched analog landline telephone conversation has a latency of about 5 to 10 milliseconds, which is about as low as reasonably practical, and is not audible. In other words, you speak to me and I hear your words 5 to 10 milliseconds later, i.e. about 1/100 to 1/200 of a second later. This is also about the same as the latency of a live conversation where two people are sitting between 5 and 10 feet apart.
A reasonably decent IP-telephony connection might have a latency of about 50 milliseconds. Note that this is 5 to 10 times the latency of an analog landline or a high-quality IP-telephony path. This means that I hear your words about 1/20 of a second after you speak. This is still not too bad because most nonverbal rhythmic cues occur just slighly more slowly, so 50 msec gives you high enough resolution to detect them in conversation.
A crappy IP-telephony connection can have a latency anywhere from 100 to 300 milliseconds. What this means is, I hear your words from 1/10 to 1/3 of a second after you speak. However, the flow of normal conversation depends on nonverbal cues that include the pauses between each person speaking, and these pauses can be quite short, in the range of 50 to 150 milliseconds.
Thus, when latency is high, people talk over each other because they are deprived of the nonverbal cues of normal pauses. If you are not aware of what is happening, it will appear that the other person is constantly interrupting you or talking over your own words, and each person will become frustrated with the other person, thinking that the other person is being rude as hell.
And that, my friends, is why you think "those people" are being arseholes.
It's not "those people."
It's the people who are pinching the pennies on their telephone systems and carrier circuits. And those people usually have no clue about the effect of their miserly ways because they do not know how the miserliness translates into the miserable sound quality that causes people to become frustrated, fed-up, stressed-out, and all the rest of it. Or if they do know, then they really are the arseholes in the picture.
---
Having said all that, I suppose I can engage in an attempt at tasteful advertising:-) My company does office PBX and telecommuter systems for businesses and nonprofit organizations, up to 500 telephones or up to 120 telecommuters on a single system. We do networked PBX of up to five such systems, for example for companies with multiple branches. We do complete communications utility design & implementation for sustainable communities and ecovillages. And when we do VOIP, we do it the right way. We are available in Northern California from Santa Cruz through Sonora, and Mendocino and Humboldt counties. If you're interested, send me a private message.
Last edited by
gg3 on Fri 16 Mar 2007, 10:14:16, edited 1 time in total.