2.1 概念
采样率
比特率
质量(可变)
复杂度(可变)
变比特率
平均比特率
静音检测
非连续性传输
知觉增强
延时算法
2.2 编解码
2.3 预处理器
2.4 自适应抖动缓冲
2.5 回声消除
2.6 重采样
This section describes Speex and its features into more
details
这部分详细介绍Speex及其特性
2.1 概念
Before introducing all the Speex features, here are some concepts
in speech coding that help better understand the rest of the
manual. Although some are general concepts in speech/audio
processing, others are specific to Speex
在介绍Speex特性之前,为了便于阅读后面的文档,需要解释一些概念,尽管一些概念是在语音/音频处理过程中常见的,但也有Speex特有的一些。
采样率
The sampling rate expressed in Hertz (Hz) is the number of samples
taken from a signal per second. For a sampling rate of Fs kHz, the
highest frequency that can be represented is equal to Fs/2 kHz
(Fs/2 is known as the Nyquist frequency). This is a fundamental
property in signal processing and is described by the sampling
theorem. Speex is mainly designed for three diff erent sampling rates: 8 kHz, 16 kHz, and 32 kHz. These are
respectively refered to as narrowband, wideband and
ultra-wideband.
采样率是指从连续信号中每秒钟采集到的采样数量。用Fs kHz来表示,最高频率可表示为Fs/2
kHz(见奈奎斯特Nyquist频率)。采样定理表明这是信号处理最基本的属性。Speex主要设计了三种不同的采样率:8kHz,16kHz和32kHz。分别表示了窄带、宽带和超宽带。
比特率
When encoding a speech signal, the bit-rate is defined as the
number of bits per unit of time required to encode the speech. It
is measured in bits per second (bps), or generally kilobits per
second. It is important to make the distinction between kilobits
per second (kbps) and kilobytes per second (kBps).
比特率是指每秒钟传送的比特数,在语音信号编码时,表示语音数据每秒钟需要多少个比特表示,单位为bps(比特/秒)或kbps(千比特/秒)。注意区分kbps和kBps(千字节/秒)。
质量(可变)
Speex is a lossy codec, which means that it achives compression at
the expense of fidelity of the input speech signal. Unlike ome
other speech codecs, it is possible to control the tradeoff made
between quality and bit-rate. The Speex encoding process is
controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation,
the quality parameter is an integer, while for variable bit-rate
(VBR), the parameter is a float.
Speex是一种有损编解码库,这意味着它的文档压缩方面会导致语音输入信号的失真,和一些语音编解码库不同的是,它尽可能的去控制质量和比特率之间的平
衡。大多数时候,是用一个0到10范围内的质量参数来控制Speex的编码,比特率为常量的操作,质量参数是整数,如果是变比特率(VBR),则为浮点数
(Float)
复杂度(可变)
With Speex, it is possible to vary the complexity allowed for the
encoder. This is done by controlling how the search is performed
with an integer ranging from 1 to 10 in a way that’s similar to the
-1 to -9 options to gzip and bzip2 compression utilities. For
normal use, the noise level at complexity 1 is between 1 and 2 dB
higher than at complexity 10, but the CPU requirements for
complexity 10 is about 5 times higher than for complexity 1. In
practice, the best trade-off is between complexity 2 and 4, though
higher settings are often useful when encoding non-speech sounds
like DTMF tones.
在Speex中,编码器可调整复杂度。用1到10的整数来控制如何执行搜索,就像用-1到-9来设置压缩工具gzip或bzip2(博主注:设计压缩的块长度,为100k~900k)。正常情况下,复杂度为1时噪声级会比复杂度为10时高1~2
dB(分贝),而复杂度为10的CPU需求是复杂度为1的5倍。实践证明,最好将复杂度设置在2~4,设置较高则对非语音编码如双音多频(DTMF)音质较为有用。
变比特率(VBR)
Variable bit-rate (VBR) allows a codec to change its bit-rate
dynamically to adapt to the “difficulty” of the audio being
encoded. In the example of Speex, sounds like vowels and
high-energy transients require a higher bit-rate to achieve good
quality, while fricatives (e.g. s,f sounds) can be coded adequately
with less bits. For this reason, VBR can achive lower bit-rate for
the same quality, or a better quality for a certain bit-rate.
Despite its advantages, VBR has two main drawbacks: first, by only
specifying quality, there’s no guaranty about the final average
bit-rate. Second, for some real-time applications like voice over
IP (VoIP), what counts is the maximum bit-rate, which must be low
enough for the communication channel.