summaryrefslogtreecommitdiff
path: root/libavcodec/x86/dcadsp.asm
Commit message (Collapse)AuthorAge
* avcodec/x86/dcadsp: Remove obsolete SSE functionAndreas Rheinhardt2022-06-22
| | | | | | | | | The only systems which benefit from ff_lfe_fir0_float_sse are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32James Almer2016-07-05
| | | | | | About 10% faster. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/dcadec: add ff_lfe_fir1_float_{sse3,avx}James Almer2016-02-22
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}James Almer2016-02-06
| | | | | | | Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/dca: remove old decoderfoo862016-01-31
| | | | | Remove all files and functions which are not going to be reused, and disable all functions and FATE tests temporarily which will be.
* avcodec/synth_filter: split off remaining code from dcadec filesJames Almer2016-01-25
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* Merge commit '2008f76054906e9ff6bf744800af0e5a5bfe61be'Hendrik Leppkes2016-01-02
|\ | | | | | | | | | | | | * commit '2008f76054906e9ff6bf744800af0e5a5bfe61be': dca: remove unused decode_hf function and quant_d tables Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
| * dca: remove unused decode_hf function and quant_d tablesAlexandra Hájková2015-12-24
| | | | | | | | | | They were superseded with their integer equivalents. Rename integer decode_hf to decode_hf.
| * x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-11
| | | | | | | | | | | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86: dcadsp: Avoid SSE2 instructions in SSE functionsHenrik Gramner2015-08-11
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86/synth_filter: remove the fma3 version ifdefsJames Almer2014-04-13
| | | | | | | | | | | | | | This fixes compilation failures with --disable-fma3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* | x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-04
| | | | | | | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
* | avcodec/x86: add missing colon to labelsJames Almer2015-07-26
| | | | | | | | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* | dcadsp: fix SSE code to not use SSE2 instructions.Hendrik Leppkes2014-04-06
| | | | | | | | | | | | | | movq from SSE register to memory is an SSE2 instruction. Instead, use SSE movlps, which does the same thing. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/dcadsp: add ff_dca_lfe_fir0_fma3James Almer2014-04-05
| | | | | | | | | | | | | | ~10% faster than the SSE version. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/synth_filter: compile avx and fma3 functions unconditionallyJames Almer2014-04-05
| | | | | | | | | | | | | | Fixes compilation failures with "--disable-{avx,fma3} --disable-optimizations" Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit 'c74b86699c86bdf62e8570f41d8a38be5710baa3'Michael Niedermayer2014-04-04
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'c74b86699c86bdf62e8570f41d8a38be5710baa3': x86/synth_filter: add synth_filter_fma3 x86/synth_filter: add synth_filter_avx x86/synth_filter: add synth_filter_sse Conflicts: libavcodec/x86/dcadsp.asm libavcodec/x86/dcadsp_init.c See: 64672098361361cd15d37e36f747ab44de5b80ca See: 68c3ed936a76c3ff7738f602fa90237ac7e3ce08 See: 7fd64e3e36f79204c0eda7cacce6884c14ddc1fb See: aa1f38015cb0d04a5c50a8957dd7aba79f0d8882 See: dfd865e51b890d9be394804bccddf55198f4a251 Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86/synth_filter: add synth_filter_fma3James Almer2014-04-04
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86/synth_filter: add synth_filter_avxJames Almer2014-04-04
| | | | | | | | | | | | | | | | | | | | | | | | Sandy Bridge Win64: 180 cycles in ff_synth_filter_inner_sse2 150 cycles in ff_synth_filter_inner_avx Also switch some instructions to a three operand format to avoid assembly errors with Yasm 1.1.0 or older. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86/synth_filter: add synth_filter_sseJames Almer2014-04-04
| | | | | | | | | | | | | | Build only on x86_32 targets. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* | x86/synth_filter: remove the main loop when it's not neededChristophe Gisquet2014-04-04
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/synth_filter: improve FMA versionJames Almer2014-03-17
| | | | | | | | | | | | | | | | | | Replace mulps+subps with fnmaddps, resulting in two less instructions inside the inner loops. About 1% faster FMA3 performance. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/synth_filter: add synth_filter_fma3James Almer2014-03-05
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/synth_filter: Revert the switch to float ops with SSE2James Almer2014-03-02
| | | | | | | | | | | | | | | | | | This reverts the changes 64672098361361cd15d37e36f747ab44de5b80ca and 68c3ed936a76c3ff7738f602fa90237ac7e3ce08 did to the SSE2 version, which generated a hit of about 5 cycles. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/synth_filter: add synth_filter_avxJames Almer2014-03-02
| | | | | | | | | | | | | | | | | | Sandy Bridge Win64: 180 cycles on ff_synth_filter_inner_sse2 150 cycles on ff_synth_filter_inner_avx Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/synth_filter: add synth_filter_sseJames Almer2014-03-01
| | | | | | | | | | | | | | Build only on x86_32 targets. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '4cb6964244fd6c099383d8b7e99731e72cc844b9'Michael Niedermayer2014-02-28
|\| | | | | | | | | | | | | | | | | | | | | * commit '4cb6964244fd6c099383d8b7e99731e72cc844b9': dcadec: simplify decoding of VQ high frequencies Conflicts: configure libavcodec/dcadec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * dcadec: simplify decoding of VQ high frequenciesChristophe Gisquet2014-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vector dequantization has a test in a loop preventing effective SIMD implementation. By moving it out of the loop, this loop can be DSPized. Therefore, modify the current DSP implementation. In particular, the DSP implementation no longer has to handle null loop sizes. The decode_hf implementations have following timings: For x86 Arrandale: C SSE SSE2 SSE4 win32: 260 162 119 104 win64: 242 N/A 89 72 The arm NEON optimizations follow in a later patch as external asm. The now unused check for the y modifier in arm inline asm is removed from configure.
* | Merge commit '08e3ea60ff4059341b74be04a428a38f7c3630b0'Michael Niedermayer2014-02-28
|\| | | | | | | | | | | | | | | | | | | | | | | * commit '08e3ea60ff4059341b74be04a428a38f7c3630b0': x86: synth filter float: implement SSE2 version Conflicts: libavcodec/x86/dcadsp.asm libavcodec/x86/dcadsp_init.c See: 2cdbcc004837ce092a14f326f24d97a29512a2c3 Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: synth filter float: implement SSE2 versionChristophe Gisquet2014-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* | x86: synth filter float: implement SSE2 versionChristophe Gisquet2014-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit 'ad507d7907457e678900bac132122ba7be4644cb'Michael Niedermayer2014-02-28
|\| | | | | | | | | | | | | | | | | | | | | * commit 'ad507d7907457e678900bac132122ba7be4644cb': x86: dcadsp: implement SSE lfe_dir Conflicts: libavcodec/x86/dcadsp.asm See: 169243112c1e310d90c030fb258092f6d2e46117 Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: dcadsp: implement SSE lfe_dirChristophe Gisquet2014-02-28
| | | | | | | | | | | | | | | | Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* | x86: dcadsp: implement SSE lfe_dirChristophe Gisquet2014-02-28
| | | | | | | | | | | | | | | | Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '5b59a9fc6152169599561f04b4f66370edda5c9c'Michael Niedermayer2014-02-08
|/ | | | | | | * commit '5b59a9fc6152169599561f04b4f66370edda5c9c': x86: dcadsp: implement int8x8_fmul_int32 Merged-by: Michael Niedermayer <michaelni@gmx.at>
* x86: dcadsp: implement int8x8_fmul_int32Christophe Gisquet2014-02-07
For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 29 26 Win64: 30 33 25 23 The SSE version is neither compiled nor set for ARCH_X86_64, as the inlinable function takes over. Signed-off-by: Janne Grunau <janne-libav@jannau.net>