summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* x86/hevc_sao: use unaligned movs for sao_{band,filter} with width 8James Almer2015-03-01
| | | | | | Suggested-by: Christophe Gisquet <christophe.gisquet@gmail.com> Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* Merge commit '71f1ad37d858b810b71a4af1c25771beaa50b27b'Michael Niedermayer2015-03-01
|\ | | | | | | | | | | | | | | | | | | | | | | * commit '71f1ad37d858b810b71a4af1c25771beaa50b27b': lavc: do not compile fmtconvert unconditionally Conflicts: configure libavcodec/ppc/Makefile libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * lavc: do not compile fmtconvert unconditionallyAnton Khirnov2015-02-28
| | | | | | | | Only ac3dec and dcadec use it.
* | Merge commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca'Michael Niedermayer2015-02-28
|\| | | | | | | | | | | | | | | | | | | | | | | * commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca': fmtconvert: drop unused functions Conflicts: libavcodec/arm/fmtconvert_vfp_armv6.S libavcodec/x86/fmtconvert.asm libavcodec/x86/fmtconvert_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * fmtconvert: drop unused functionsAnton Khirnov2015-02-28
| |
| * hevc_deblock: Fix compilation with nasmCarl Eugen Hoyos2015-02-22
| | | | | | | | | | | | CC: libav-stable@libav.org Bug-Id: 795 Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* | avcodec/v210dec: Add ff prefix to v210_x86_init()Michael Niedermayer2015-02-27
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avcodec/snow: mark dwt init as av_coldMichael Niedermayer2015-02-27
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avcodec/x86/mlpdsp_init: Simplify mlp_filter_channel_x86()Michael Niedermayer2015-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on patch by Francisco Blas Izquierdo Riera Commit message partly taken from carl fixes a compilation error in mlpdsp_init.c with -fstack-check and some gcc compilers (I reproduced the issue with gcc 4.7.3) by simplifying the code. See also https://bugs.gentoo.org/show_bug.cgi?id=471756 $ make libavcodec/x86/mlpdsp_init.o libavcodec/x86/mlpdsp_init.c: In function ‘mlp_filter_channel_x86’: libavcodec/x86/mlpdsp_init.c:142:5: error: can’t find a register in class ‘GENERAL_REGS’ while reloading ‘asm’ libavcodec/x86/mlpdsp_init.c:142:5: error: ‘asm’ operand has impossible constraints 4551 -> 4509 dezicycles Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: hevc_mc: fewer xmm regs used in epel h/vChristophe Gisquet2015-02-17
| | | | | | | | | | | | | | 11 xmm regs seem only required for avx2. Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: hevc_mc: save 1 gpr in epel filter loadingChristophe Gisquet2015-02-16
| | | | | | | | | | | | | | The 3*stride value stored in r3src can be loaded much later, so use r3src instead of a dedicated gpr when possible. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/g722dsp: add ff_g722_apply_qmf_sse2James Almer2015-02-16
| | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86: hevc: remove a parameter to WP internalsChristophe Gisquet2015-02-14
| | | | | | | | | | | | | | The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to get the value in bytes). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/hevc_mc: optimize AVX2 mc functionsJames Almer2015-02-12
| | | | | | | | | | | | | | | | | | | | | | Before 40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips After 37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32James Almer2015-02-12
| | | | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/hevc_sao: make sao_band_filter work on x86_32James Almer2015-02-09
| | | | | | | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86: hevc_mc: remove lea in EPEL_LOADChristophe Gisquet2015-02-08
| | | | | | | | | | | | | | The second parameter to the macro is always an immediate address, so no lea is needed. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: hevc_mc: fewer gpr autoloads for _v filtersChristophe Gisquet2015-02-08
| | | | | | | | | | | | In that case, it's just to load my, but mx/r3src is not used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/vp9dsp: fix clobbering of xmm6 on IDCT sse2 functionsJames Almer2015-02-08
| | | | | | | | | | Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86: lavc/hevc_mc: fix commentsChristophe Gisquet2015-02-07
| | | | | | | | | | | | | | The width parameter is now completely at the back, and actually never used. This helps understanding the actual parameter list. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: lavc: share more constant through definesChristophe Gisquet2015-02-07
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | lavc/lossless_audiodsp: revert various commitsChristophe Gisquet2015-02-07
| | | | | | | | | | | | | | Their intent was to make the DSP work with wmalossless pro. The later was fixed to work with the DSP. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: lavc: share more constantsChristophe Gisquet2015-02-06
| | | | | | | | | | Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/hevc_mc: use aligned loadsMickaël Raulet2015-02-06
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/lossless_audiodsp: fix compilation with --disable-yasmJames Almer2015-02-06
| | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/hevc_sao: fix loading of RIP addressJames Almer2015-02-06
| | | | | | | | | | | | | | | | | | | | | | pb_eo must be handled as a rip relative address for MSVC64, so an intermediate register is needed. Should fix link failures. Suggested by Hendrik Leppkes and Christophe Gisquet. Tested-By: Hendrik Leppkes <h.leppkes@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/hevc: use CLIPW macro when possibleMickaël Raulet2015-02-06
| | | | | | | | | | | | | | | | Conflicts: libavcodec/x86/hevc_mc.asm Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: hevc_mc: use epel_hv 16-wide functionChristophe Gisquet2015-02-06
| | | | | | | | | | | | | | The epel_hv functions were still relying on only epel_hv 8-wide being the maximum width instanciated. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: hevc_mc: add AVX2 optimizationsPierre Edouard Lepere2015-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | before 33304 decicycles in luma_bi_1, 523066 runs, 1222 skips 38138 decicycles in luma_bi_2, 523427 runs, 861 skips 13490 decicycles in luma_uni, 516138 runs, 8150 skips after 20185 decicycles in luma_bi_1, 519970 runs, 4318 skips 24620 decicycles in luma_bi_2, 521024 runs, 3264 skips 10397 decicycles in luma_uni, 515715 runs, 8573 skips Conflicts: libavcodec/x86/hevc_mc.asm libavcodec/x86/hevcdsp_init.c Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Revert "avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 ↵Michael Niedermayer2015-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | prototypes more similar" This reverts commit 3b4ffba3af968ae702e3a44f6b5f53445efc7363. Unbreaks the SSSE3 code on mingw32 Conflicts: libavcodec/x86/lossless_audiodsp.asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avcodec/x86/lossless_audiodsp: Move order&8 fallback into C codeMichael Niedermayer2015-02-06
| | | | | | | | | | | | | | This is simpler and more robust, and fixes mismatching XMM save restore mismatches Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes ↵Michael Niedermayer2015-02-06
| | | | | | | | | | | | | | | | | | | | more similar This is needed as the mmx code is used as fallback from the ssse3 code Suggested-by: jamrial Tested-by: wm4 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}James Almer2015-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original x86 intrinsics code by Pierre-Edouard Lepere. Yasm port, refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips 29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips 13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips Width 64 581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips 59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips 28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}James Almer2015-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips Width 64 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2James Almer2015-02-02
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2James Almer2015-02-01
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | x86: hevc/sao: aligned source buffersChristophe Gisquet2015-02-01
| | | | | | | | | | | | | | | | | | | | | | | | Usefull for at least band filter, for which: - Band filter call only: 32 64 Before: 16556 54015 After: 16497 52355 - Whole case: 32 64 Before: 37031 103008 After: 32045 93952
* | x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}James Almer2015-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere. 10/12bit yasm ports, refactoring and optimizations by James Almer Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U width 32 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips 8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips 7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips 4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips width 64 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips 28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips 26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips 14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/sbrdsp: Use different mem movesChristophe Gisquet2015-01-25
| | | | | | | | | | | | | | | | | | | | Before 2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips After 2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3}James Almer2015-01-25
| | | | | | | | | | | | 2 to 2.5 times faster. Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/flacdsp: remove unneeded ifdefferyJames Almer2015-01-05
| | | | | | | | | | | | | | x86inc can translate r*m into a register or stack on its own Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/swr: add SSE2/AVX pack_8ch functionsJames Almer2014-12-30
| | | | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | vp9/x86: add myself to copyright holders for loopfilter assembly.Ronald S. Bultje2014-12-27
| |
* | vp9/x86: make filter_16_h work on 32-bit.Ronald S. Bultje2014-12-27
| |
* | vp9/x86: make filter_48/84/88_h work on 32-bit.Ronald S. Bultje2014-12-27
| |
* | vp9/x86: make filter_44_h work on 32-bit.Ronald S. Bultje2014-12-27
| |
* | vp9/x86: make filter_16_v work on 32-bit.Ronald S. Bultje2014-12-27
| |
* | vp9/x86: make filter_48/84_v work on 32-bit.Ronald S. Bultje2014-12-27
| |
* | vp9/x86: make filter_88_v work on 32-bit.Ronald S. Bultje2014-12-27
| |
* | vp9/x86: make filter_44_v work on 32-bit.Ronald S. Bultje2014-12-27
| |