An algorithm for speech coding at 8kbps using the Conjugate-Structure Algebraic-Code-Excited Linear-Prediction model, the G.729 operates on speech frames of 80 samples at a sampling rate of 8000 samples per second (each frame corresponds to 10ms of speech).
Each frame first passes a voice activity detection (VAD) block that decides if the frame is ?speech? or ?silent?. This decision is transmitted to the encoder and the decoder via the bit-stream, activating either the ?Active Voice? block or ?Non-Active Voice? block in both the encoder and decoder.
The ?Active Voice? block is invoked for ?speech? frames and represents the G.729A vocoder. In this case, the bit-stream that results by encoding with G.729A is 80 bits long.
The ?Non-Active Voice? block is called for ?silent? frames and means discontinuous transmission (DTX) in the encoder and comfort noise generation (CNG) in the decoder. The bit-stream that results by encoding a ?silent? frame is named Silence Insertion Descriptor (SID) and is 15 or 16 bits long, depending on the transmission mode (bit or octet).
Annex B of ITU-T G.729 Recommendation contains a complete description of the VAD algorithm and DTX and CNG blocks.