HD SAEC | Stereophonic AEC Software

HD SAEC: High Definition Stereo Acoustic Echo Cancellation Software

Full-band stereo acoustic echo cancellation under a wide dynamic range of audio levels.

Designed for next gen products: Adaptive Digital’s HD SEAC™ is a full-band high definition (HD), multi-mic capable, full-duplex stereo acoustic echo canceller (SAEC) which includes noise reduction (NR), as well as anti-howling, adaptive filtering, nonlinear processing, and double-talk detection.

Stereo acoustic echo cancellation is considerably more complex to achieve than mono echo cancellation because of the signal correlation complexity, non-uniqueness, and convergence problems.

Availability – Fullband Stereo HD AEC

Platforms

Arm ® Devices – Armv7-M Cortex-M7 @ 48 kHz / Armv8-A / Armv9-A | Armv8-M Cortex M33/35, Armv8.1M Cortex-M85

Intel Core i7 Mobile – 6600u – Skylake

ADT HD SAEC is available on the above Platforms: Other configurations are available upon request.

Features List

True full-duplex operation under a wide dynamic range of audio levels, even when microphone input signal is weak
Programmable sampling rate, supporting narrowband (8 kHz), wideband (16 kHz), super-wideband (32 kHz), and full-band (44.1, 48 kHz)
Improved adaptive nonlinear processor
Handles echo tails of up to 500 msec. and greater, with true full-duplex cancellation.
Spectrally representative comfort noise generator
Automatically adjusts for unknown bulk (buffering/audio driver) delay
Able to handle strong echo (speaker to microphone gains up to 20 dB)

Anti-Howling
Instantly adjusts to user-controlled speaker gain changes
Handles external user-controlled volume changes
Parameters are user configurable
Improved fast convergence and reconvergence
No divergence during double-talk
Integrated Automatic Gain Control (AGC)
Improves speech recognition performance in an echoic environment.
Integrated Next Gen Noise Reduction (NR)
Integrated Transmit Equalization

Specifications

↓ Click on links below to view specification tables.

ARM® Devices

Note: HD SAEC Cortex-M4 MIPS generated with 0 wait state FLASH.
Specifications measured on TI Tiva C series ARM Cortex-M4 based MCU.

SAEC ARM Cortex-M7

CPU Utilization & Memory Requirements
All Memory usage is given in units of byte.

Platform	Sampling Rate	Tail Length (msec)	MIPS*per Mic	Per Channel Memory
Cortex-M7	48 kHz	32	257	65k
		64	271	65k
		128	310	96k
		256	383	96k

* with Anti-howling

SAEC ARM Cortex-M33/M35 – Estimate

CPU Utilization & Memory Requirements
All Memory usage is given in units of byte.

Platform	Sampling Rate	Tail Length (msec)	MIPS*per Mic	Per Channel Memory
Cortex-M33/35	48 kHz	32	360	65k
		64	380	65k
		128	434	96k
		256	536	96k

* with Anti-howling

Intel x86

SAEC – Intel Core i7 Mobile, Skylake-U

CPU UTILIZATION
MCPS (millions of Cycles Per Second)

Platform	Tail Length \| Frame Size	Sampling Rate	Mono/Stereo	MCPS Per Mic
Intel Core i7-6600@2.60GHz	256 ms	16khz	mono	23
			stereo	31
Intel Core i7-6600@2.60GHz	256 ms	32khz	mono	45
			stereo	63
Intel Core i7-6600@2.60GHz	256 ms	48khz	mono	50
			stereo	85

Stereophonic (Stereo) acoustic echo cancelling (SAEC)

Teleconferencing systems require the use of acoustic echo cancelers (AECs) to reduce echoes that result from coupling between the loudspeaker and microphone. In order to provide a more realistic conversational experience, two-channel audio is necessary. In the case of stereophonic audio transmission, the acoustic echo cancellation problem is more difficult to solve because of the necessity to uniquely identify two acoustic paths.

This disturbance, caused by echo, increases in severity with the propagation delay of the channel.

In applications such as teleconferencing and hands-free telephony, stereophonic systems provide telepresence compared to monaural systems to users by enabling listeners to localize conference participants in meetings where multiple parties might be conversing at the same time.

A high-end stereo teleconferencing system provides a more “natural” listening experience between conferees than monophonic systems. The stereophonic (Stereo) acoustic echo canceller (SAEC) software suppresses the echo returned to the transmission room to enable undisturbed communication between the rooms. Participants can more easily discern who is talking at the other end by means of the spatial aspect of the audio output. In such hands-free systems, stereophonic acoustic echo cancellers are absolutely necessary for full-duplex communication.

Stereophonic acoustic echo cancellation is similar to monophonic (mono) AEC in that the echo in need of canceling is due to a speaker – microphone acoustic coupling. In the SAEC case, each speaker will acoustically couple to each microphone in the system. On the surface the solution to the problem of cancelling multiple echoes seems relatively uncomplicated. The number of acoustic echo cancellers (AECs) needed in a system containing N speakers and M microphones NM. For the full-duplex stereophonic case, four (4) AECs are required. The actual echo cancellation requirements are not so straight forward.

The challenge with the SAEC is well documented and relates to the fact that the audio from each speaker is highly correlated (signals are similar) with the other speakers in the system since it is assumed all speaker audio outputs are coming from a common remote room. The high correlation between speakers can produce non-unique “confusion” for the adaptive filters. The individual adaptive filters can lock onto the same speaker signals producing an overall effect of slow convergence and high misalignment.

The usual solution to this problem is to perform a non-linear operation on each incoming signal. This has an effect of de-correlating the speaker signals which in turn allows the adaptive filters to converge more quickly with less misalignment. The downside of this is that the non-linear operators can cause too much audible distortion to the near end room listeners (since the speaker signals are now distorted).

Pyscho-acoustic non-linear operators can be a good choice since the human ear hears sound on a non-linear Bark Scale. Essentially the ear is non-linear listening device. So by distorting the signal in a pyscho-acoustic fashion, one mathematically de-correlates the signal but if done correctly will be less perceived as distortion to the ear.

—————————————

Psychoacoustics is the scientific study of sound perception. As soon as sound passes through the ears, it stops being a physical phenomena and becomes a matter of perception. What we hear is almost by rule different from what is actually sounding, due to the peculiarities and limitations of our hearing.

The Bark Scale is a nonlinear audio frequency scale.