summaryrefslogtreecommitdiff
path: root/libavcodec/x86/hevc_sao.asm
Commit message (Collapse)AuthorAge
* avcodec: increase AV_INPUT_BUFFER_PADDING_SIZE to 64James Almer2018-01-11
| | | | | | | | | | AVX-512 support has been introduced, and even if no functions currently use zmm registers (able to load as much as 64 bytes of consecutive data per instruction), they will be added eventually. Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevc_sao: move 10/12bit functions into a separate fileJames Almer2015-09-30
| | | | | Tested-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-04
| | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
* Merge commit '059a934806d61f7af9ab3fd9f74994b838ea5eba'Michael Niedermayer2015-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '059a934806d61f7af9ab3fd9f74994b838ea5eba': lavc: Consistently prefix input buffer defines Conflicts: doc/examples/decoding_encoding.c libavcodec/4xm.c libavcodec/aac_adtstoasc_bsf.c libavcodec/aacdec.c libavcodec/aacenc.c libavcodec/ac3dec.h libavcodec/asvenc.c libavcodec/avcodec.h libavcodec/avpacket.c libavcodec/dvdec.c libavcodec/ffv1enc.c libavcodec/g2meet.c libavcodec/gif.c libavcodec/h264.c libavcodec/h264_mp4toannexb_bsf.c libavcodec/huffyuvdec.c libavcodec/huffyuvenc.c libavcodec/jpeglsenc.c libavcodec/libxvid.c libavcodec/mdec.c libavcodec/motionpixels.c libavcodec/mpeg4videodec.c libavcodec/mpegvideo.c libavcodec/noise_bsf.c libavcodec/nuv.c libavcodec/nvenc.c libavcodec/options.c libavcodec/parser.c libavcodec/pngenc.c libavcodec/proresenc_kostya.c libavcodec/qsvdec.c libavcodec/svq1enc.c libavcodec/tiffenc.c libavcodec/truemotion2.c libavcodec/utils.c libavcodec/utvideoenc.c libavcodec/vc1dec.c libavcodec/wmalosslessdec.c libavformat/adxdec.c libavformat/aiffdec.c libavformat/apc.c libavformat/apetag.c libavformat/avidec.c libavformat/bink.c libavformat/cafdec.c libavformat/flvdec.c libavformat/id3v2.c libavformat/isom.c libavformat/matroskadec.c libavformat/mov.c libavformat/mpc.c libavformat/mpc8.c libavformat/mpegts.c libavformat/mvi.c libavformat/mxfdec.c libavformat/mxg.c libavformat/nutdec.c libavformat/oggdec.c libavformat/oggparsecelt.c libavformat/oggparseflac.c libavformat/oggparseopus.c libavformat/oggparsespeex.c libavformat/omadec.c libavformat/rawdec.c libavformat/riffdec.c libavformat/rl2.c libavformat/rmdec.c libavformat/rtpdec_latm.c libavformat/rtpdec_mpeg4.c libavformat/rtpdec_qdm2.c libavformat/rtpdec_svq3.c libavformat/sierravmd.c libavformat/smacker.c libavformat/smush.c libavformat/spdifenc.c libavformat/takdec.c libavformat/tta.c libavformat/utils.c libavformat/vqf.c libavformat/westwood_vqa.c libavformat/xmv.c libavformat/xwma.c libavformat/yop.c Merged-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/x86: add missing colon to labelsJames Almer2015-07-26
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevc_sao: use unaligned movs for sao_{band,filter} with width 8James Almer2015-03-01
| | | | | | Suggested-by: Christophe Gisquet <christophe.gisquet@gmail.com> Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32James Almer2015-02-12
| | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevc_sao: make sao_band_filter work on x86_32James Almer2015-02-09
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: lavc: share more constantsChristophe Gisquet2015-02-06
| | | | | Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_sao: fix loading of RIP addressJames Almer2015-02-06
| | | | | | | | | | | pb_eo must be handled as a rip relative address for MSVC64, so an intermediate register is needed. Should fix link failures. Suggested by Hendrik Leppkes and Christophe Gisquet. Tested-By: Hendrik Leppkes <h.leppkes@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}James Almer2015-02-05
| | | | | | | | | | | | | | | | | | | Original x86 intrinsics code by Pierre-Edouard Lepere. Yasm port, refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips 29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips 13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips Width 64 581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips 59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips 28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}James Almer2015-02-05
| | | | | | | | | | | | | | | | | | | Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips Width 64 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2James Almer2015-02-02
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2James Almer2015-02-01
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86: hevc/sao: aligned source buffersChristophe Gisquet2015-02-01
| | | | | | | | | | | | Usefull for at least band filter, for which: - Band filter call only: 32 64 Before: 16556 54015 After: 16497 52355 - Whole case: 32 64 Before: 37031 103008 After: 32045 93952
* x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}James Almer2015-02-01
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere. 10/12bit yasm ports, refactoring and optimizations by James Almer Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U width 32 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips 8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips 7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips 4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips width 64 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips 28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips 26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips 14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>