TURN-KEY SOLUTIONS

High Density IP Voice Conferencing with Mixed Narrowband and Wideband Channels

The leap from traditional voice conferencing to IP-enabled voice conferencing brings with it a number of technical challenges that need addressing. We’ll start by describing the state of the art in traditional conferencing algorithms. Next, we’ll discuss the challenges that must be overcome in taking the leap to IP conferencing. Then, we’ll piece together an IP conferencing system on a DSP chip. We’ll conclude by listing the capabilities of a few Texas Instruments DSP chips in terms of conferencing channels and features that each can support.

Traditional High-Density Conferencing

A conferencing algorithm adds the active conference input signals together to form a composite signal. Before sending the composite signal back to each conference party, that party’s transmission is removed from the composite signal to avoid the perception of echo.

As the number of conference participants increases, we run into a few more issues. For example, each participant presumably has some level of background noise. The noise level may be low as in an office environment, high – as is the case for a person on a cell phone while driving, and anywhere in between. If we added all the input signals blindly, the noise would accumulate as the number of participants increased. Furthermore, when using fixed point arithmetic, the summation of many signals –including speech signals – can cause overflow or clipping, a very undesirable condition.

A conference algorithm can use several techniques to combat these issues. For example, only a few “dominant” speakers’ signals are added to the conference at any given time. This reduces the number of signals being added. Furthermore, noise suppression can be employed on all input channels. So when there is significant background noise that would otherwise bleed into the conference sum, the noise suppressor reduces the extent of such noise. Automatic Level Control can be employed to combat overflow and clipping as well as to compensate for different amounts of loss seen in the party’s input signal.

IP Conferencing – Challenges

A traditional voice conferencing system bridges together multiple traditional telephone channels which occupy 3.3 kHz of audio bandwidth, and are sampled at 8 kHz. VoIP channels can be either narrowband or wideband. Wideband channels have an audio bandwidth of 7 kHz and a sampling rate of 16 kHz. Both narrowband and wideband channels can be sent either in uncompressed or compressed format over a VoIP link.

Challenge #1: Compression

Table 1 compares some of the more commonly used narrowband and wideband compression standards. Table 1 includes the processor utilization for each of the algorithms for a Texas Instruments C64X DSP.


G.711
G.729A
G.722
G.722.1
G.722.2 (AMR WB)

 

Challenge #2: Adding Wideband Channels

In describing the state of the art in narrowband conferencing algorithms, we mentioned a number of signal processing algorithms that are used. We can perform some of these algorithms, such as noise reduction and voice activity detection on the individual channels at their native sampling rates. But before we do any summation, of a mixture of 8 kHz and 16 kHz sampled data streams, we need to convert to common sampling rate. The two logical choices are 8 kHz and 16 kHz. If we use 8 kHz, the 16 kHz channels must be passed through a 2:1 decimation filter. If we choose the 16 kHz sampling rate, the 8 kHz channels must be passed through a 1:2 interpolation filter.

The advantage of using an 8 kHz common sampling rate is the reduction in the signal processing load because portions of the conferencing algorithm will operate on half the number of samples. The disadvantage is that we will lose the audio quality benefit that is afforded by wideband audio channels. The pros and cons are the reversed when using a 16 kHz common sampling rate.

Challenge #3: VoIP Packet Loss

IP networks are not designed to carry real-time traffic. The end-to-end delay across a VoIP network varies from one packet to the next, and sometimes packets arrive too late to be decoded in real-time. Late packets are no better than lost packets when dealing with VoIP. Furthermore, packets arriving out of order must be resequenced.

Techniques to deal with these issues already exist in VoIP systems. A jitter buffer compensates for variations in packet delay. RTP reorders out-of-sequence packets. Packet loss concealment attempts to smooth over lost packets by looking at the recent signal history and filling the missing pieces.

The reason we mention these issues is that in high-density conferencing systems, these effects are magnified. For example, if we have a 100-channel conference call and each channel has an average packet loss rate of 1%, it is likely that for each frame, one or more channel will experience a lost packet! Since all channels except the offending channel will hear the effects, it is almost like having a 100% packet loss rate on a single channel.

Challenge #4: VoIP Echo Cancellation

Continuing the same line of reasoning with respect to echo cancellation, the situation gets worse. When echo exists on a VoIP channel, it is worse than on a TDM channel because the round-trip delay is longer due to the latency through the IP network. People are more sensitive to echo as the round-trip delay increases.

Using the previous example where there are 100 parties in a VoIP conference call, assume that one party has uncancelled echo on his/her line. When any of the other 99 people speaks, all 100 parties will hear an echo. In a typical two party call, the solution is to hang up and dial again. In a 100 party conference call, you first need to identify the offending party by determining whose speech does not cause echo. That person must then hang up and dial back in, hopefully this time on an echo-free circuit.

Running into this problem is far more likely in a 100 party conference call than in a two-party call. It is therefore much more important to ensure that a VoIP enabled echo canceller is used in an IP conferencing system.

Piecing Together a VoIP Conferencing System on a DSP Chip

Figure 1 (below) is a block diagram of a mixed narrowband/wideband VoIP conference system on a chip. The packet interface block handles RTP , jitter buffering, and both narrowband and wideband speech encode and decode functions. The packet echo canceller cancels echo that may be present on the opposite side of the packet network for narrowband channels only. It is assumed that wideband channels use four-wire interfaces at the far-end and therefore have no hybrid echo.

Using a similar argument, there is a line echo canceller connected to the narrowband (8 kHz) TDM interface, but not to the wideband (16 kHz) TDM interface.

The sampling rate converters perform sampling rate conversion with appropriate filtering on narrowband and wideband signals which include both PCM and packet channels. Note that it might not be necessary to perform both conversions because the conferencing algorithm can run at either sampling rate. But if tone detection/generation is to be performed, it tends to be done at the narrowband sampling rate so additional rate conversion may still be necessary.

The elastic store is a buffering mechanism that compensates for possible varying frame sizes in the packet channels.

Finally the conference module performs the actual conferencing - including voice activity detection, noise suppression, dominant speaker identification, AGC, and conference summation.

Conferencing Channel Densities

Each conferencing application may have unique requirements, making a programmable DSP an ideal chip on which to host the conferencing solution. Some applications may not need the tone processing; others may use external echo cancellers. By using a programmable DSP, features can be added or removed. By doing so, we only use as much of the DSP’s resources (MIPS and Memory) as is necessary for the required feature set. This allows us to use the appropriate (smallest, least expensive, lowest power, etc.) DSP. Stated differently, by removing the unneeded features, we can squeeze more conferences and more ports into a single DSP.

Table 2 lists a few sample conference configurations along with the achievable port density when running on different TI DSPs.

Summary

Achieving good voice quality during in large conference calls is a challenge even when the problem is confined to the traditional narrowband, TDM-based world. When VoIP channels and wideband channels into the mix, it is imperative that the right algorithms are used and they are integrated properly. Doing so efficiently on a programmable DSP is not only cost-effective, but can also get your product to market in a timely manner.

For a turnkey solution, please contact sales. Tel: 1-800-340-2066 x121

or email us.

 

 

 

 

Adaptive Digital Technologies, Inc.
PRODUCTS