| Commit message (Collapse) | Author | Age |
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
|
|
|
|
| |
Also rename the enum values to be consistent with other DCT permutations.
|
| |
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
|
|
|
|
| |
This makes the init files match the structure of the dsputil split.
|
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
|
| |
|
| |
|
| |
|
|
|
|
| |
The remaining dsputil bits are encoding-specific anyway.
|
| |
|
| |
|
|
|
|
|
|
| |
Sample-Id: OPFLAG_B_Qualcomm_1.bit, OPFLAG_C_Qualcomm_1.bit
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
| |
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
| |
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
|
|
|
|
| |
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
|
|
|
|
|
| |
The only thing the demuxer needs is the sample rate to set the timebase,
which can be simply read with AV_RB32.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This instruction is deprecated on ARMv8, and it is serializing on
some ARMv7 cores as well [1].
[1] http://article.gmane.org/gmane.linux.ports.arm.kernel/339293
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
For implicit signaling cases (as possible for Spectral Band Replication
and Parametric Stereo Tools), the decoder must decode the first frame to
correctly identify the stream configuration (as called from
avformat_find_stream_info). The mechanism for this is built-in and only
requires adding CODEC_CAP_CHANNEL_CONF to the libfdk-aacdec AVCodec
struct.
Signed-off-by: Omer Osman <omer.osman@iis.fraunhofer.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
|
|
| |
All other files containing purely inline assembly are treated the same way.
|
|
|
|
|
| |
This avoids a link failure with MMX disabled as the init functions
are referenced unconditionally.
|
|
|
|
|
| |
This avoids a link failure with MMX disabled as now code and
initialization are compiled under the same condition.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The default error concealment method if none is set via
aacDecoder_SetParam(AAC_CONCEAL_METHOD) is set in
CConcealment_InitCommonData within the fdk-aac library
and is set to Energy Interpolation. This method requires one frame
delay to the output. To reduce the default decoder output delay and
avoid missing the last frame in file based decoding, use Noise
Substitution as the default concealment method.
Signed-off-by: Omer Osman <omer.osman@iis.fraunhofer.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
|
|
|
|
|
|
| |
Per the specification, "The red and green components of a pixel
define the meta Huffman code used in a particular block of the ARGB
image."
|