| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
The only systems which benefit from ff_lfe_fir0_float_sse are truely
ancient 32bit x86s as all other systems use at least the SSE2 versions
(this includes all x64 cpus (which is why this code is restricted
to x86-32)).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
| |
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\
| |
| |
| |
| |
| |
| | |
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
dca: remove unused decode_hf function and quant_d tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| | |
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
| |
| |
| |
| |
| |
| | |
This fixes compilation failures with --disable-fma3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
~10% faster than the SSE version.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
Fixes compilation failures with "--disable-{avx,fma3} --disable-optimizations"
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
alternatively the call could be put under #if or the #if
over the function removed
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'c74b86699c86bdf62e8570f41d8a38be5710baa3':
x86/synth_filter: add synth_filter_fma3
x86/synth_filter: add synth_filter_avx
x86/synth_filter: add synth_filter_sse
Conflicts:
libavcodec/x86/dcadsp.asm
libavcodec/x86/dcadsp_init.c
See: 64672098361361cd15d37e36f747ab44de5b80ca
See: 68c3ed936a76c3ff7738f602fa90237ac7e3ce08
See: 7fd64e3e36f79204c0eda7cacce6884c14ddc1fb
See: aa1f38015cb0d04a5c50a8957dd7aba79f0d8882
See: dfd865e51b890d9be394804bccddf55198f4a251
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sandy Bridge Win64:
180 cycles in ff_synth_filter_inner_sse2
150 cycles in ff_synth_filter_inner_avx
Also switch some instructions to a three operand format to avoid
assembly errors with Yasm 1.1.0 or older.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| |
| | |
Build only on x86_32 targets.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '3bfdee00cd92ff07c364d4901c4aefda32780756':
x86: dcadsp: Fix linking with yasm and optimizations disabled
Conflicts:
libavcodec/x86/dcadsp_init.c
See: 206167a295a5c28cec3c38f7308835b0b7e0618f
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
Should fix compilation failures with --disable-yasm on some compilers
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sandy Bridge Win64:
180 cycles on ff_synth_filter_inner_sse2
150 cycles on ff_synth_filter_inner_avx
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
Build only on x86_32 targets.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '4cb6964244fd6c099383d8b7e99731e72cc844b9':
dcadec: simplify decoding of VQ high frequencies
Conflicts:
configure
libavcodec/dcadec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.
Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.
The decode_hf implementations have following timings:
For x86 Arrandale:
C SSE SSE2 SSE4
win32: 260 162 119 104
win64: 242 N/A 89 72
The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '08e3ea60ff4059341b74be04a428a38f7c3630b0':
x86: synth filter float: implement SSE2 version
Conflicts:
libavcodec/x86/dcadsp.asm
libavcodec/x86/dcadsp_init.c
See: 2cdbcc004837ce092a14f326f24d97a29512a2c3
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Results for Arrandale/Windows:
32: 1670 -> 316
64: 728 -> 298
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Results for Arrandale/Windows:
32: 1670 -> 316
64: 728 -> 298
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|/
|
|
|
|
|
| |
* commit '5b59a9fc6152169599561f04b4f66370edda5c9c':
x86: dcadsp: implement int8x8_fmul_int32
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|