summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* x86: hevc: remove a parameter to WP internalsChristophe Gisquet2015-02-14
| | | | | | | The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to get the value in bytes). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: optimize AVX2 mc functionsJames Almer2015-02-12
| | | | | | | | | | | Before 40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips After 37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32James Almer2015-02-12
| | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevc_sao: make sao_band_filter work on x86_32James Almer2015-02-09
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: hevc_mc: remove lea in EPEL_LOADChristophe Gisquet2015-02-08
| | | | | | | The second parameter to the macro is always an immediate address, so no lea is needed. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: fewer gpr autoloads for _v filtersChristophe Gisquet2015-02-08
| | | | | | In that case, it's just to load my, but mx/r3src is not used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/vp9dsp: fix clobbering of xmm6 on IDCT sse2 functionsJames Almer2015-02-08
| | | | | Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: lavc/hevc_mc: fix commentsChristophe Gisquet2015-02-07
| | | | | | | The width parameter is now completely at the back, and actually never used. This helps understanding the actual parameter list. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: lavc: share more constant through definesChristophe Gisquet2015-02-07
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* lavc/lossless_audiodsp: revert various commitsChristophe Gisquet2015-02-07
| | | | | | | Their intent was to make the DSP work with wmalossless pro. The later was fixed to work with the DSP. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: lavc: share more constantsChristophe Gisquet2015-02-06
| | | | | Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: use aligned loadsMickaël Raulet2015-02-06
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/lossless_audiodsp: fix compilation with --disable-yasmJames Almer2015-02-06
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevc_sao: fix loading of RIP addressJames Almer2015-02-06
| | | | | | | | | | | pb_eo must be handled as a rip relative address for MSVC64, so an intermediate register is needed. Should fix link failures. Suggested by Hendrik Leppkes and Christophe Gisquet. Tested-By: Hendrik Leppkes <h.leppkes@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevc: use CLIPW macro when possibleMickaël Raulet2015-02-06
| | | | | | | | Conflicts: libavcodec/x86/hevc_mc.asm Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: use epel_hv 16-wide functionChristophe Gisquet2015-02-06
| | | | | | | The epel_hv functions were still relying on only epel_hv 8-wide being the maximum width instanciated. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: add AVX2 optimizationsPierre Edouard Lepere2015-02-06
| | | | | | | | | | | | | | | | | | before 33304 decicycles in luma_bi_1, 523066 runs, 1222 skips 38138 decicycles in luma_bi_2, 523427 runs, 861 skips 13490 decicycles in luma_uni, 516138 runs, 8150 skips after 20185 decicycles in luma_bi_1, 519970 runs, 4318 skips 24620 decicycles in luma_bi_2, 521024 runs, 3264 skips 10397 decicycles in luma_uni, 515715 runs, 8573 skips Conflicts: libavcodec/x86/hevc_mc.asm libavcodec/x86/hevcdsp_init.c Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Revert "avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 ↵Michael Niedermayer2015-02-06
| | | | | | | | | | | | | | prototypes more similar" This reverts commit 3b4ffba3af968ae702e3a44f6b5f53445efc7363. Unbreaks the SSSE3 code on mingw32 Conflicts: libavcodec/x86/lossless_audiodsp.asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/x86/lossless_audiodsp: Move order&8 fallback into C codeMichael Niedermayer2015-02-06
| | | | | | | This is simpler and more robust, and fixes mismatching XMM save restore mismatches Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes ↵Michael Niedermayer2015-02-06
| | | | | | | | | | more similar This is needed as the mmx code is used as fallback from the ssse3 code Suggested-by: jamrial Tested-by: wm4 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}James Almer2015-02-05
| | | | | | | | | | | | | | | | | | | Original x86 intrinsics code by Pierre-Edouard Lepere. Yasm port, refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips 29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips 13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips Width 64 581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips 59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips 28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}James Almer2015-02-05
| | | | | | | | | | | | | | | | | | | Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips Width 64 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2James Almer2015-02-02
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2James Almer2015-02-01
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86: hevc/sao: aligned source buffersChristophe Gisquet2015-02-01
| | | | | | | | | | | | Usefull for at least band filter, for which: - Band filter call only: 32 64 Before: 16556 54015 After: 16497 52355 - Whole case: 32 64 Before: 37031 103008 After: 32045 93952
* x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}James Almer2015-02-01
| | | | | | | | | | | | | | | | | | | | | | Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere. 10/12bit yasm ports, refactoring and optimizations by James Almer Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U width 32 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips 8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips 7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips 4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips width 64 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips 28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips 26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips 14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/sbrdsp: Use different mem movesChristophe Gisquet2015-01-25
| | | | | | | | | | Before 2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips After 2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips Signed-off-by: James Almer <jamrial@gmail.com>
* x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3}James Almer2015-01-25
| | | | | | 2 to 2.5 times faster. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/flacdsp: remove unneeded ifdefferyJames Almer2015-01-05
| | | | | | | x86inc can translate r*m into a register or stack on its own Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/swr: add SSE2/AVX pack_8ch functionsJames Almer2014-12-30
| | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* vp9/x86: add myself to copyright holders for loopfilter assembly.Ronald S. Bultje2014-12-27
|
* vp9/x86: make filter_16_h work on 32-bit.Ronald S. Bultje2014-12-27
|
* vp9/x86: make filter_48/84/88_h work on 32-bit.Ronald S. Bultje2014-12-27
|
* vp9/x86: make filter_44_h work on 32-bit.Ronald S. Bultje2014-12-27
|
* vp9/x86: make filter_16_v work on 32-bit.Ronald S. Bultje2014-12-27
|
* vp9/x86: make filter_48/84_v work on 32-bit.Ronald S. Bultje2014-12-27
|
* vp9/x86: make filter_88_v work on 32-bit.Ronald S. Bultje2014-12-27
|
* vp9/x86: make filter_44_v work on 32-bit.Ronald S. Bultje2014-12-27
|
* vp8/x86: save one register in SIGN_ADD/SUB.Ronald S. Bultje2014-12-27
|
* vp9/x86: store unpacked intermediates for filter6/14 on stack.Ronald S. Bultje2014-12-27
| | | | | filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88 goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
* vp8/x86: move variable assigned inside macro branch.Ronald S. Bultje2014-12-27
| | | | The value is not used outside the branch.
* vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning.Ronald S. Bultje2014-12-27
|
* vp8/x86: remove unused register from ABSSUB_CMP macro.Ronald S. Bultje2014-12-27
|
* vp9/x86: slightly simplify 44/48/84/88 h stores.Ronald S. Bultje2014-12-27
|
* vp9/x86: make cglobal statement more conservative in register allocation.Ronald S. Bultje2014-12-27
|
* vp9/x86: save one register in loopfilter surface coverage.Ronald S. Bultje2014-12-27
|
* x86/vp9: remove duplicate function prototypesJames Almer2014-12-23
| | | | | | Fixes "redundant redeclaration" warnings. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vp3dsp: port put_vp_no_rnd_pixels8_l2_mmx to yasmJames Almer2014-12-20
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/constants: fix alignment of pw_255James Almer2014-12-19
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* vp9/x86: intra prediction sse2/32bit support.Ronald S. Bultje2014-12-19
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>