The Speech codecs is a method of compression/decompression of audio file containing speech data or streaming speech format. The codecs stands for Coders / Decoders.
There are various kinds of speech codecs available. Since these codecs have been implemented on different algorithms; they have different specification and application in various fields. These speech codecs generally complies Industry standards like ITU.
The various software speech codecs are:
- G.723 & G.723.1
- AMR, AMR-WB, AMR-NB
These various Speech codecs are technically differentiated from each other based on various factors which includes compression technology / algorithm, platform supported, bandwidth, data rates etc
One can easily compare & find out various Speech codecs on wikipedia. But still there is confusion which speech codec is the appropriate and where? However it also depends on application. But understanding pros & cons of some of these codecs gives us the better information and insight depth.
G.711 is a Pulse code modulation (PCM) of voice frequencies on a 64 kbps channel. G.711 uses a sampling rate of 8,000 samples per second. Non-uniform quantization with 8 bits is used to represent each sample, resulting in a 64 kbit/s bit rate.
There are two types of standard compression algorithms are used. (1) µ-law algorithm (2) A-law algorithm.
- Designed to deliver precise transmission of speech
- Very low processing overheads
- Poor network efficiency
- Lacks missing packet interpolation
- Including overheads, uses >64kbps, thus at least 128kbps bandwidth in each direction is required
G.711.1 is an extension version of G.711, G.711.1, allows the addition of narrowband and/or wideband (16000 samples/s) enhancements, which leading to data rates of 64, 80 or 96 kbit/s.
G.722 is a ITU standard wideband speech codec operating at 48-64 kbit/s. Technology of the codec is based on split band ADPCM.
- It is useful in fixed network voice over IP applications, where the required bandwidth is typically not prohibitive
- It also offers a significant improvement in speech quality over older narrowband codecs such as G.711
- They are not optimum for broadcast remotes
G.722.1 is an ITU-T standard audio codec used for high quality speech G.722.1 is a transform-based compressor that is optimized for both speech and music. The computational complexity is quite low and the algorithmic delay end-to-end is 40 ms.
G.722.2 is also referred as AMR-WB. It is a speech coding standard developed after the AMR using same technology like ACELP. Kindly check AMR-WB for further details.
G.723 & G.723.1
G.723 is completely different than G.723.1
G.723 is an ITU standard for speech codecs that uses the ADPCM method and provides good quality audio at 24 and 40 Kbps.
Note: G.723 codec mainly used for digital circuit multiplication equipment (DCME) applications. And latter folded into G.726. Kindly see the G.726
G.723.1 is a speech codec that compresses voice audio in 30 ms frames. An algorithmic look-ahead of 7.5 ms duration means that total algorithmic delay is 37.5 ms.
- Very high compression whilst maintaining high quality audio.
- Allows simultaneous encode & decode in software (on fast computers)
- G.723.1 is much effective in the audio portion of videoconferencing/telephony over public telephone (POTS).
- Requires a lot of processor power.
- Not well-suited to music or sound effects
- Lower quality than many other codecs at similar data rates
G.726 is an ADPCM speech codec for the transmission of voice at rates of 16, 24, 32, and 40 kbit/s.G.721 and G.723 had been folded into G.726.
- Uses 32 Kbits which is half the rate of G.711 codec and hence increasing the usable network capacity by 100%
- Very much used on international trunks in the phone network.
- Not well-suited to music or sound effects
G.728 uses Low-Delay Code Excited Linear Prediction (LD-CELP) compression technology at 16 kbps
- G.728 rates as “toll quality”. So voice quality is really good as compared to its previous speech codecs.
- G.728 is a Low delay speech coder hence including satellite, cellular, and video conferencing systems
- Few bits are available for error protection
The G.729 speech codec uses a audio data compression algorithm and compress the data at bit rates that vary between 6.4 and 12.4 kbps
- Low delay for compression of speech data as low as 10 milliseconds. Hence music or tones such as DTMF or fax tones cannot be transported reliably with this codec
- Because of its lower bandwidth around 8 kbps it mostly used in Voice over IP (VoIP) applications for its low bandwidth requirement
- Speech quality decreases by marginally.
- License required for use
G.729A/G.729B uses Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP) compression algorithm. The reduction in complexity may result in a small decrease in voice quality. G.729A is suitable for VoIP or similar applications using multimedia, voice, and/or data
Adaptive Multi-Rate (AMR) is an audio data compression scheme optimized for speech coding. AMR was adopted as the standard speech codec by 3GPP
- Superior sound quality due to wider speech bandwidth
- The disadvantage is course the delay it introduces in the voice path.
AMR-WB (Adaptive Multi Rate WideBand) is a speech coding standard developed after the AMR using same technology like ACELP.
AMR-NB (Adaptive Multi-Rate Narrowband) is a speech codec employed in low-bitrate applications like mobile phones. It is a form of ACELP.
To commercialize these speech codecs couples of portals are available where one can promote and procure these codecs. Such portals are design-reuse, chipestimates, IPsupermarket.com which allows you to buy/sell or license various speech codecs.