Buy or Sell Software IPs at IPSupermarket

Friday, January 9, 2009

Understanding Various Speech Codecs

The Speech codecs is a method of compression/decompression of audio file containing speech data or streaming speech format. The codecs stands for Coders / Decoders.

There are various kinds of speech codecs available. Since these codecs have been implemented on different algorithms; they have different specification and application in various fields. These speech codecs generally complies Industry standards like ITU.

The various software speech codecs are:

  • G.711
  • G.722
  • G.723 & G.723.1
  • G.726
  • G.728
  • G.729
  • AMR, AMR-WB, AMR-NB

These various Speech codecs are technically differentiated from each other based on various factors which includes compression technology / algorithm, platform supported, bandwidth, data rates etc


One can easily compare & find out various Speech codecs on wikipedia. But still there is confusion which speech codec is the appropriate and where? However it also depends on application. But understanding pros & cons of some of these codecs gives us the better information and insight depth.


G.711

Overview

G.711 is a Pulse code modulation (PCM) of voice frequencies on a 64 kbps channel. G.711 uses a sampling rate of 8,000 samples per second. Non-uniform quantization with 8 bits is used to represent each sample, resulting in a 64 kbit/s bit rate.

There are two types of standard compression algorithms are used. (1) ยต-law algorithm (2) A-law algorithm.

Pros

  • Designed to deliver precise transmission of speech
  • Very low processing overheads

Cons

  • Poor network efficiency
  • Lacks missing packet interpolation
  • Including overheads, uses >64kbps, thus at least 128kbps bandwidth in each direction is required

Other Version

G.711.1 is an extension version of G.711, G.711.1, allows the addition of narrowband and/or wideband (16000 samples/s) enhancements, which leading to data rates of 64, 80 or 96 kbit/s.


G.722

Overview

G.722 is a ITU standard wideband speech codec operating at 48-64 kbit/s. Technology of the codec is based on split band ADPCM.

Pros

  • It is useful in fixed network voice over IP applications, where the required bandwidth is typically not prohibitive
  • It also offers a significant improvement in speech quality over older narrowband codecs such as G.711

Cons

  • They are not optimum for broadcast remotes

Other Version

G.722.1 is an ITU-T standard audio codec used for high quality speech G.722.1 is a transform-based compressor that is optimized for both speech and music. The computational complexity is quite low and the algorithmic delay end-to-end is 40 ms.

G.722.2 is also referred as AMR-WB. It is a speech coding standard developed after the AMR using same technology like ACELP. Kindly check AMR-WB for further details.


G.723 & G.723.1

G.723 is completely different than G.723.1

G.723 Overview:

G.723 is an ITU standard for speech codecs that uses the ADPCM method and provides good quality audio at 24 and 40 Kbps.

Note: G.723 codec mainly used for digital circuit multiplication equipment (DCME) applications. And latter folded into G.726. Kindly see the G.726

G.723.1 Overview:

G.723.1 is a speech codec that compresses voice audio in 30 ms frames. An algorithmic look-ahead of 7.5 ms duration means that total algorithmic delay is 37.5 ms.

Pros

  • Very high compression whilst maintaining high quality audio.
  • Allows simultaneous encode & decode in software (on fast computers)
  • G.723.1 is much effective in the audio portion of videoconferencing/telephony over public telephone (POTS).

Cons

  • Requires a lot of processor power.
  • Not well-suited to music or sound effects
  • Lower quality than many other codecs at similar data rates

G.726

Overview

G.726 is an ADPCM speech codec for the transmission of voice at rates of 16, 24, 32, and 40 kbit/s.G.721 and G.723 had been folded into G.726.

Pros

  • Uses 32 Kbits which is half the rate of G.711 codec and hence increasing the usable network capacity by 100%
  • Very much used on international trunks in the phone network.

Cons

  • Not well-suited to music or sound effects

G.728

Overview

G.728 uses Low-Delay Code Excited Linear Prediction (LD-CELP) compression technology at 16 kbps

Pros

  • G.728 rates as “toll quality”. So voice quality is really good as compared to its previous speech codecs.
  • G.728 is a Low delay speech coder hence including satellite, cellular, and video conferencing systems

Cons

  • Few bits are available for error protection

G729

Overview

The G.729 speech codec uses a audio data compression algorithm and compress the data at bit rates that vary between 6.4 and 12.4 kbps

Pros

  • Low delay for compression of speech data as low as 10 milliseconds. Hence music or tones such as DTMF or fax tones cannot be transported reliably with this codec
  • Because of its lower bandwidth around 8 kbps it mostly used in Voice over IP (VoIP) applications for its low bandwidth requirement

Cons

  • Speech quality decreases by marginally.
  • License required for use

Other Version

G.729A/G.729B uses Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP) compression algorithm. The reduction in complexity may result in a small decrease in voice quality. G.729A is suitable for VoIP or similar applications using multimedia, voice, and/or data


AMR

Overview

Adaptive Multi-Rate (AMR) is an audio data compression scheme optimized for speech coding. AMR was adopted as the standard speech codec by 3GPP

Pros

  • Superior sound quality due to wider speech bandwidth

Cons

  • The disadvantage is course the delay it introduces in the voice path.

Other Version

AMR-WB (Adaptive Multi Rate WideBand) is a speech coding standard developed after the AMR using same technology like ACELP.

AMR-NB (Adaptive Multi-Rate Narrowband) is a speech codec employed in low-bitrate applications like mobile phones. It is a form of ACELP.

To commercialize these speech codecs couples of portals are available where one can promote and procure these codecs. Such portals are design-reuse, chipestimates, IPsupermarket.com which allows you to buy/sell or license various speech codecs.

No comments: