In order for you to understand Voice over IP (VoIP), it would be a great idea if you understood the TCP/IP protocol suite. For those of you who didn't read the previous lesson, I humbly recommend that you consider clicking on TCP/IP Essentials. Heck, even if you were here, it wouldn't hurt to review it before we get started.
Now that we're all together again, let's examine what is perhaps the wackiest idea yet for voice networking. In some ways it's not as wacky as Voice over Frame Relay (VoFR), which we covered in a previous lesson, but it's wacky, alright.
Let's Review Some Basics
You'll recall that the TCP/IP protocol suite was conceived and developed as a means of gluing together disparate data networks, independent of host computers, hardware and operating systems, transmission media, and data link technologies. TCP/IP was originally developed for the ARPANET, a network that linked together government agencies and institutions of higher learning with a small group of supercomputers used for various advanced research and development projects.
It was a time-share application that involved interactive data communications between asynchronous terminal devices as host computers. Since connectivity based on either circuit switching or dedicated leased lines was way too inefficient and expensive, TCP/IP took a packet-switched approach.
As we discussed in a previous lesson, Circuits, Packets, Frames and Cells, packet networks are highly shared data networks that always involve some degree of variability and unpredictability in terms of levels of latency (i.e., delay), jitter (i.e., variability in latency) and loss. Some applications can tolerate considerable levels of such problems, since there is time to adjust and recover, perhaps through retransmission.
E-mail is a good example, as is a file transfer, perhaps associated with a database backup. Some applications don't tolerate much, if any, of this sort of thing. Realtime voice and video are good examples. It appears to be quite clear that voice over an IP-based packet network doesn't make sense. So let's call it quits for the day. I'll see you at the next class.
Whoa! Not so fast! VoIP can be made to work, and to work quite well.
Voice The Conventional Way
Before we get into the specifics of VoIP, let's review the basics of voice communications as it is handled in the conventional PSTN (Public Switched Telephone Network). You will recall that voice is analog in its native form and that the PSTN also was entirely analog for the first 75 years or so.
In networks worldwide, the analog PSTN provided for each voice conversation to be carried in a 4-KHz channel. (Note: Hz is an abbreviation for Hertz, which is a single waveform. In a voice application, it starts out as an audio compression wave, which then is converted into an electrical wave. All electromagnetic energy travels in waveform.)
In fundamental terms, that means a channel runs between 0 KHz and 4 KHz. In a multichannel analog carrier system, one channel might run at 0-4 KHz, the next at 4-8 KHz, the next at 8-12 KHz, and so on. Therefore, each voice-grade channel supports a range of frequencies that is 4 KHz wide. That's not enough for perfect voice transmission (we are capable of creating audio well above 4 KHz), but it's good enough.
Further, each channel supports a range of signal amplitude (i.e., signal strength) that relates to a volume level. The amplitude level also is limited, so your loudest screams can't quite be heard over the network, but that's probably just as well. Again, it's not enough for perfect voice transmission, but it's good enough. It's known as toll quality voice.
Around the end of WWII (World War II, for you youngsters), the networks began the transition from analog to digital technology. Digital offers a lot of advantages, including greater bandwidth, better error performance, and enhanced management and control.
Virtually all contemporary switches of all types are digital in nature, and so is a lot of terminal equipment. Most transmission facilities also are digital, with the notable exception of most copper local loops serving residential and small business applications. That makes the contemporary WAN virtually 100% digital, from edge-to-edge, at least in developed countries.
To support voice in its native analog form over a digital network, the analog signal has to be coded (i.e., converted) into a digital format at some point after leaving your lips and prior to entering the WAN. On the receiving end, the digital signal has to be decoded (i.e., reconverted) back into an analog format in order to be intelligible to the human ear.
Those conversion processes are accomplished by a matching pair of codecs (coder/decoders), with the traditional method being PCM (Pulse Code Modulation), standardized by the ITU-T as G.711. PCM is based on the Nyquist Theorem, developed by Harry Nyquist of Bell Telephone Laboratories in 1928.
The theorem (paraphrased) states that, in order to convert analog voice to a digital format, send it over a digital circuit, and reproduce high-quality analog voice at the receiving end, one must sample the amplitude of the analog sine wave at twice the highest frequency on the line.
If one samples at twice the highest frequency on the line, one samples, therefore, at a rate of 4,000 x 2 = 8,000 times a second. (It's necessary to sample only the amplitude. The frequency will take care of itself at that rate.) If you do the math, you see that 8,000 samples x 8 bits per byte = 64,000 bits per second, or 64 Kbps. That's a voice-grade digital channel.
PCM further specifies that the sampling process take place at precise intervals of 125ms (microseconds, or millionths of a second), which is exactly 1/8000th of a second. Each sample is coded into an eight-bit digital value. The resulting eight-bit bytes are interleaved by multiplexers, and sent across channelized digital circuits (e.g., T-carrier) to be directed and redirected by circuit switches, sent across circuits (e.g., SONET) that interconnect the switches in the network core and ultimately decoded on the receiving end of the transmission.
The decoded signal, now in analog form once again, is only an approximation of the original analog signal, but it's thoroughly understandable to the human ear. It's not quite that simple, of course. Timing is critical. The network must be in a position to accept, switch, transport, and deliver every voice byte precisely every 125ms. That means that latency (i.e., delay) must be minimal and jitter (i.e., variability in delay) must be virtually zero.
Notice that we didn't mention loss, because there is none. That translates into a network based on circuit-switching and channelized T-carrier or E-carrier and/or SONET. Taken together, this approach ensures that, once the call is set up, the associated bandwidth is committed for the entire duration of a circuit-switched call, absolutely and without question.