| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.
Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.
The decode_hf implementations have following timings:
For x86 Arrandale:
C SSE SSE2 SSE4
win32: 260 162 119 104
win64: 242 N/A 89 72
The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
|
|
|
|
|
|
|
|
|
| |
Based on a patch from Christophe Gisquet.
Unrolling of the m == 0 case avoids a possible use of the uninitilized
value sum when s->predictor_history is not set. I failed to find a
sample for it. It also reduced the cycle count from 220 to 150 on
sandy bridge, x86_64 linux, gcc 4.8.2 compared to his patch.
|
|
|
|
|
|
|
| |
The scaling factor is constant so it is faster to scale the
FIR coefficients in the tables during compilation.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
| |
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
|
|
|
|
|
| |
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
|
|
|
|
|
|
|
| |
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
| |
|
|
|
|
|
|
| |
When downmixing 2.1 to 2-channel, if the 2.0 portion is Lt/Rt, sum-difference or dual mono, the actual output will be the same (with the LFE either mixed-in or discarded).
Also, when downmixing an arbitrary layout to 2-channel, if the bitstream contains custom downmix coefficients targeting Lt/Rt, then the output will be Lt/Rt rather than regular Stereo.
|
|
|
|
|
| |
Signed-off-by: Tim Walker <tdskywalker@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
|
|
|
|
| |
Based on a patch by Michael Niedermayer.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
|
|
|
|
| |
This supplements the deprecated request_channels-based control of XCh decoding.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
|
|
|
| |
The check for (prim_channels > 2) before calling dca_downmix made these
cases unreachable, but now 2.1 layouts will go through the downmix code.
Having dual mono, Lt/Rt and sum-difference layouts print errors when
regular Stereo doesn't seems pointless.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
| |
Embedded downmix coefficients can use this.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
| |
As per ETSI TS 102 114 V1.4.1 specification.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
|
| |
It was based on an old, seemingly incorrect specification, so default
coefficients were always used anyway.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
| |
Easier to read, modify, and avoids relying on an outdated table.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
| |
The 7-bit codes previously used are absent from the ETSI 102 114 V1.4.1 spec.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
| |
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
|
|
|
|
|
|
| |
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
| |
This does most of the work formerly carried out by
the static function qmf_32_subbands() in dcadec.c.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
| |
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
|
|
|
|
|
| |
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
|
|
|
|
| |
This makes the DCA parser and decoder independent.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
It will be useful in the upcoming transition to refcounted AVFrames.
|
|
|
|
|
|
|
|
|
|
| |
When the extra rear channel is present but unused, the
s->channel_order_tab[] value for that channel is -1. The QMF can be
skipped for the extra channel, and doing so avoids an out-of-array read
on s->samples_chanptr[].
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
|
|
|
|
| |
Also reorder some other #include when applicable.
|
| |
|
|
|
|
|
| |
The output AVFrame buffer only has data for the downmix channels.
Fixes a segfault when decoding dca with request_channels == 2.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
This will allow adding dca.c with tables used from other files.
|