summaryrefslogtreecommitdiff
path: root/libavcodec/x86/hevc_mc.asm
Commit message (Collapse)AuthorAge
* x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-04
| | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
* avcodec/x86: add missing colon to labelsJames Almer2015-07-26
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* x86: hevc_mc: fewer xmm regs used in epel h/vChristophe Gisquet2015-02-17
| | | | | | | 11 xmm regs seem only required for avx2. Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: save 1 gpr in epel filter loadingChristophe Gisquet2015-02-16
| | | | | | | The 3*stride value stored in r3src can be loaded much later, so use r3src instead of a dedicated gpr when possible. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc: remove a parameter to WP internalsChristophe Gisquet2015-02-14
| | | | | | | The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to get the value in bytes). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: optimize AVX2 mc functionsJames Almer2015-02-12
| | | | | | | | | | | Before 40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips After 37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: hevc_mc: remove lea in EPEL_LOADChristophe Gisquet2015-02-08
| | | | | | | The second parameter to the macro is always an immediate address, so no lea is needed. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: fewer gpr autoloads for _v filtersChristophe Gisquet2015-02-08
| | | | | | In that case, it's just to load my, but mx/r3src is not used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: lavc/hevc_mc: fix commentsChristophe Gisquet2015-02-07
| | | | | | | The width parameter is now completely at the back, and actually never used. This helps understanding the actual parameter list. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: lavc: share more constant through definesChristophe Gisquet2015-02-07
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: lavc: share more constantsChristophe Gisquet2015-02-06
| | | | | Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: use aligned loadsMickaël Raulet2015-02-06
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc: use CLIPW macro when possibleMickaël Raulet2015-02-06
| | | | | | | | Conflicts: libavcodec/x86/hevc_mc.asm Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: add AVX2 optimizationsPierre Edouard Lepere2015-02-06
| | | | | | | | | | | | | | | | | | before 33304 decicycles in luma_bi_1, 523066 runs, 1222 skips 38138 decicycles in luma_bi_2, 523427 runs, 861 skips 13490 decicycles in luma_uni, 516138 runs, 8150 skips after 20185 decicycles in luma_bi_1, 519970 runs, 4318 skips 24620 decicycles in luma_bi_2, 521024 runs, 3264 skips 10397 decicycles in luma_uni, 515715 runs, 8573 skips Conflicts: libavcodec/x86/hevc_mc.asm libavcodec/x86/hevcdsp_init.c Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/x86/hevc_mc: fix sse register countsMichael Niedermayer2014-12-11
| | | | | | | | These fix failures of --enable-xmm-clobber-test It would be better to change the code to use fewer registers, but until someone does the used register count must not be too small Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTERMichael Niedermayer2014-12-10
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc: get rid off packusdw for ssse3 compatibilityMickaël Raulet2014-10-04
| | | | | | | | | | cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2 Fixes out of array access Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: correct unneeded use of SSE4 codeChristophe Gisquet2014-08-24
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevcdsp: use compilation-time-fixed constantChristophe Gisquet2014-08-22
| | | | | | | The stride for some buffers is known. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* hevcdsp: remove more instances of compile-time-fixed parametersChristophe Gisquet2014-08-22
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* hevcdsp: remove compilation-time-fixed parameterChristophe Gisquet2014-08-22
| | | | | | | The dststride parameter is always MAX_PB_SIZE. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: assume 2nd source stride is 64Christophe Gisquet2014-08-22
| | | | | Reviewed-by: Mickaël Raulet <mraulet@gmail.com Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: use fewer instructions in hevc_put_hevc_{uni, bi}_w[24]_{8, 10, 12}James Almer2014-08-04
| | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: remove an unnecessary pxorJames Almer2014-08-04
| | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: fix register count usageChristophe Gisquet2014-07-29
| | | | | | | A macro was using a fixed register, causing too many GPRs to be declared as used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: load less data in epel filtersChristophe Gisquet2014-07-27
| | | | | | | | | | | | Before: 5679 decicycles in epel_bi, 2059976 runs, 37176 skips 3468 decicycles in epel_uni, 1040886 runs, 7690 skips After: 5323 decicycles in epel_bi, 2059493 runs, 37659 skips 3262 decicycles in epel_uni, 1040871 runs, 7705 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: replace one lea by addChristophe Gisquet2014-07-27
| | | | | | Should have been in 036f11bdb565. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: replace simple leas by addsChristophe Gisquet2014-07-26
| | | | | | | | | | | | | | | | | | | | | | | lea is detrimental for those simple cases. No impact overall to the change though. Before: 15017 decicycles in q, 1016152 runs, 32424 skips 15382 decicycles in q_bi, 1013673 runs, 34903 skips 3713 decicycles in e, 2074534 runs, 22618 skips 3901 decicycles in e_bi, 2065509 runs, 31643 skips 7852 decicycles in q_uni, 520165 runs, 4123 skips 2398 decicycles in e_uni, 1043339 runs, 5237 skips After: 14898 decicycles in q, 1016295 runs, 32281 skips 15119 decicycles in q_bi, 1015392 runs, 33184 skips 3682 decicycles in e, 2073224 runs, 23928 skips 3720 decicycles in e_bi, 2065043 runs, 32109 skips 7643 decicycles in q_uni, 520280 runs, 4008 skips 2363 decicycles in e_uni, 1043780 runs, 4796 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc: add 12bits support for MCMickaël Raulet2014-07-26
| | | | | | cherry picked from commit 3fcb7a4595a6f40100a22110a5805e3b7510c0fd Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: remove unneeded shiftChristophe Gisquet2014-06-01
| | | | | | The immediate value may be 0. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: better register allocationChristophe Gisquet2014-05-28
| | | | | | | | | | | | | | | | | | | | | The xmm reg count was incorrect, and manual loading of the gprs furthermore allows to noticeable reduce the number needed. The modified functions are used in weighted prediction, so only a few samples like WP_* exhibit a change. For this one and Win64 (some widths removed because of too few occurrences): WP_A_Toshiba_3.bit, ff_hevc_put_hevc_uni_w 16 32 before: 2194 3872 after: 2119 3767 WP_B_Toshiba_3.bit, ff_hevc_put_hevc_bi_w 16 32 64 before: 2819 4960 9396 after: 2617 4788 9150 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: specify coefficients registersChristophe Gisquet2014-05-18
| | | | | | | | | By default, macro EPEL_FILTER loads the coefficients inconditionally into m14/m15. This forces an unneeded higher register count. Reduce that count by making them parameters of EPEL_FILTER. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* hevcdsp: correctly indicate that hevc_put_hevc_bi_epel_h uses 9 GPRsHendrik Leppkes2014-05-12
| | | | | | | Fixes FATE on Windows. Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* HEVC : added assembly MC functionsplepere2014-05-06
pretty print x86 Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>