summaryrefslogtreecommitdiff
path: root/libavcodec/hevcdsp.h
Commit message (Collapse)AuthorAge
* lavc/aarch64: port HEVC SIMD idct NEONReimar Döffinger2021-02-18
| | | | | | | | | | | | Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts", running on Apple M1. Signed-off-by: Josh Dekker <josh@itanimul.li>
* lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bumpAnton Khirnov2021-01-01
| | | | They are not properly namespaced and not intended for public use.
* Merge commit '0b9a237b2386ff84a6f99716bd58fa27a1b767e7'James Almer2017-10-24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '0b9a237b2386ff84a6f99716bd58fa27a1b767e7': hevc: Add NEON 4x4 and 8x8 IDCT [15:12:59] <@ubitux> hevc_idct_4x4_8_c: 389.1 [15:13:00] <@ubitux> hevc_idct_4x4_8_neon: 126.6 [15:13:02] <@ubitux> our ^ [15:13:06] <@ubitux> hevc_idct_4x4_8_c: 389.3 [15:13:08] <@ubitux> hevc_idct_4x4_8_neon: 107.8 [15:13:10] <@ubitux> hevc_idct_4x4_10_c: 418.6 [15:13:12] <@ubitux> hevc_idct_4x4_10_neon: 108.1 [15:13:14] <@ubitux> libav ^ [15:13:30] <@ubitux> so yeah, we can probably trash our versions here Merged-by: James Almer <jamrial@gmail.com>
| * hevc: Add NEON 4x4 and 8x8 IDCTAlexandra Hájková2017-03-27
| | | | | | | | | | | | | | | | Optimized by Martin Storsjö <martin@martin.st>. The speedup vs C code is around 3.2-4.4x. Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit 'b0e6b3f4777910d61083976aa9fc78a1e0731aae'Clément Bœsch2017-04-17
|\| | | | | | | | | | | | | * commit 'b0e6b3f4777910d61083976aa9fc78a1e0731aae': hevc: ppc: Add HEVC 4x4 IDCT for PowerPC Merged-by: Clément Bœsch <u@pkh.me>
| * hevc: ppc: Add HEVC 4x4 IDCT for PowerPCAlexandra Hajkova2016-12-12
| | | | | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* | Merge commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b'Clément Bœsch2017-03-24
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b': hevc: x86: Add add_residual() SIMD optimizations See a6af4bf64dae46356a5f91537a1c8c5f86456b37 This merge is only cosmetics (renames, space shuffling, etc). The functionnal changes in the ASM are *not* merged: - unrolling with %rep is kept - ADD_RES_MMX_4_8 is left untouched: this needs investigation Merged-by: Clément Bœsch <u@pkh.me>
| * hevc: x86: Add add_residual() SIMD optimizationsPierre Edouard Lepere2016-10-22
| | | | | | | | | | | | | | Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>, extended by James Almer <jamrial@gmail.com>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
| * hevc: Add coefficient limiting to speed up IDCTMickaël Raulet2016-07-18
| | | | | | | | | | | | Integrated to libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
| * hevc: Add DC IDCTMickaël Raulet2016-07-18
| | | | | | | | | | | | Integrated to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
* | Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d'Clément Bœsch2017-01-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | * commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d': hevc: Separate adding residual to prediction from IDCT This commit should be a noop but isn't because of the following renames: - transform_add → add_residual - transform_skip → dequant - idct_4x4_luma → transform_4x4_luma Merged-by: Clément Bœsch <cboesch@gopro.com>
| * hevc: Separate adding residual to prediction from IDCTAlexandra Hájková2016-07-18
| | | | | | | | | | | | | | | | Based on patch 250430bf28118cf843df887e8c8b345f1c60c82d by Mickaël Raulet <mraulet@insa-rennes.fr>, integrated to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
| * hevcdsp: add x86 SIMD for MCAnton Khirnov2015-12-05
| |
| * hevcdsp: split the pred functions by widthAnton Khirnov2015-12-05
| | | | | | | | This should allow for more efficient SIMD.
| * hevcdsp: split the epel functions by widthAnton Khirnov2015-12-05
| | | | | | | | This should allow for more efficient SIMD.
| * hevcdsp: split the qpel functions by width instead of by the subpixel fractionAnton Khirnov2015-12-05
| | | | | | | | | | | | | | This should allow for more efficient SIMD. Keep the C versions as they are now, to allow the compiler to inline the interpolation coefficients.
| * hevcdsp: fix a function nameAnton Khirnov2015-08-21
| | | | | | | | | | put_weighted_pred_avg should be put_unweighted_pred_avg, there is no weighting there.
* | Merge commit '059a934806d61f7af9ab3fd9f74994b838ea5eba'Michael Niedermayer2015-07-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '059a934806d61f7af9ab3fd9f74994b838ea5eba': lavc: Consistently prefix input buffer defines Conflicts: doc/examples/decoding_encoding.c libavcodec/4xm.c libavcodec/aac_adtstoasc_bsf.c libavcodec/aacdec.c libavcodec/aacenc.c libavcodec/ac3dec.h libavcodec/asvenc.c libavcodec/avcodec.h libavcodec/avpacket.c libavcodec/dvdec.c libavcodec/ffv1enc.c libavcodec/g2meet.c libavcodec/gif.c libavcodec/h264.c libavcodec/h264_mp4toannexb_bsf.c libavcodec/huffyuvdec.c libavcodec/huffyuvenc.c libavcodec/jpeglsenc.c libavcodec/libxvid.c libavcodec/mdec.c libavcodec/motionpixels.c libavcodec/mpeg4videodec.c libavcodec/mpegvideo.c libavcodec/noise_bsf.c libavcodec/nuv.c libavcodec/nvenc.c libavcodec/options.c libavcodec/parser.c libavcodec/pngenc.c libavcodec/proresenc_kostya.c libavcodec/qsvdec.c libavcodec/svq1enc.c libavcodec/tiffenc.c libavcodec/truemotion2.c libavcodec/utils.c libavcodec/utvideoenc.c libavcodec/vc1dec.c libavcodec/wmalosslessdec.c libavformat/adxdec.c libavformat/aiffdec.c libavformat/apc.c libavformat/apetag.c libavformat/avidec.c libavformat/bink.c libavformat/cafdec.c libavformat/flvdec.c libavformat/id3v2.c libavformat/isom.c libavformat/matroskadec.c libavformat/mov.c libavformat/mpc.c libavformat/mpc8.c libavformat/mpegts.c libavformat/mvi.c libavformat/mxfdec.c libavformat/mxg.c libavformat/nutdec.c libavformat/oggdec.c libavformat/oggparsecelt.c libavformat/oggparseflac.c libavformat/oggparseopus.c libavformat/oggparsespeex.c libavformat/omadec.c libavformat/rawdec.c libavformat/riffdec.c libavformat/rl2.c libavformat/rmdec.c libavformat/rtpdec_latm.c libavformat/rtpdec_mpeg4.c libavformat/rtpdec_qdm2.c libavformat/rtpdec_svq3.c libavformat/sierravmd.c libavformat/smacker.c libavformat/smush.c libavformat/spdifenc.c libavformat/takdec.c libavformat/tta.c libavformat/utils.c libavformat/vqf.c libavformat/westwood_vqa.c libavformat/xmv.c libavformat/xwma.c libavformat/yop.c Merged-by: Michael Niedermayer <michael@niedermayer.cc>
* | avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for HEVC horizontal and ↵Shivraj Patil2015-04-17
| | | | | | | | | | | | | | | | vertical mc functions Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevcdsp: ARM NEON optimized deblocking filterSeppo Tomperi2015-02-05
| | | | | | | | | | | | cherry picked from commit 1b9ee47d2f43b0a029a9468233626102eb1473b8 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}James Almer2015-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips Width 64 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Signed-off-by: James Almer <jamrial@gmail.com>
* | hevcdsp: remove compilation-time-fixed parameter from sao_edge_filterJames Almer2015-02-05
| | | | | | | | | | | | The stride_src parameter is always 2 * MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE. Signed-off-by: James Almer <jamrial@gmail.com>
* | hevcdsp: replace the SAOParams struct parameter from sao_edge_filterJames Almer2015-02-04
| | | | | | | | | | | | | | | | As with sao_band_filter, pass instead the two variables from the struct needed in the function. This simplifies writing asm optimized versions. Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr> Signed-off-by: James Almer <jamrial@gmail.com>
* | hevcdsp: simplified sao_edge_filterSeppo Tomperi2015-02-04
| | | | | | | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
* | hevcdsp: separated sao edge filter and pixel restore funcsSeppo Tomperi2015-02-04
| | | | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
* | x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}James Almer2015-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere. 10/12bit yasm ports, refactoring and optimizations by James Almer Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U width 32 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips 8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips 7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips 4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips width 64 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips 28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips 26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips 14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | hevcdsp: replace the SAOParams struct parameter from sao_band_filterJames Almer2015-02-01
| | | | | | | | | | | | | | Pass instead the two variables from the struct needed in the function. This simplifies writing asm optimized versions of the function Signed-off-by: James Almer <jamrial@gmail.com>
* | hevcdsp: remove unused parameter from sao_band_filterJames Almer2015-02-01
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | hevcdsp: remove more instances of compile-time-fixed parametersChristophe Gisquet2014-08-22
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevcdsp: remove compilation-time-fixed parameterChristophe Gisquet2014-08-22
| | | | | | | | | | | | | | The dststride parameter is always MAX_PB_SIZE. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevc: move MAX_PB_SIZE declarationChristophe Gisquet2014-08-22
| | | | | | | | | | Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevc_deblock: change tc typeChristophe Gisquet2014-08-06
| | | | | | | | | | | | | | The x86 asm expects int32_t so use that type. Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d'Michael Niedermayer2014-07-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d': hevc: SSE2 and SSSE3 loop filters Conflicts: libavcodec/hevcdsp.c libavcodec/hevcdsp.h libavcodec/x86/Makefile libavcodec/x86/hevc_deblock.asm libavcodec/x86/hevcdsp_init.c See: de7b89fd43f850d77cf24ad6ae50185dfe391e91 and several others Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * hevc: SSE2 and SSSE3 loop filtersPierre Edouard Lepere2014-07-26
| | | | | | | | | | | | | | | | Additional contributions by James Almer <jamrial@gmail.com>, Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and Anton Khirnov <anton@khirnov.net> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * hevcdsp: remove an unneeded variable in the loop filterAnton Khirnov2014-07-26
| | | | | | | | beta0 and beta1 will always be the same
* | x86/hevc_idct: replace old and unused idct functionsJames Almer2014-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevcdsp: change types of SAO parametersChristophe Gisquet2014-07-23
| | | | | | | | | | | | | | From openhevc Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevcdsp: remove an unneeded variable in the loop filterAnton Khirnov2014-07-22
| | | | | | | | | | | | | | | | | | beta0 and beta1 will always be the same within a CU Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr> cherry picked from commit 4a23d824741a289c7d2d2f2871d1e2621b63fa1b Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevc/sao: optimze sao implementationMickaël Raulet2014-07-18
| | | | | | | | | | | | | | | | | | | | - adding one extra pixel all around the frame - do not copy when SAO is not applied 5% improvement cherry picked from commit 10fc29fc19a12c4d8168fbe1a954b76386db12d0 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevc/rext: add support for Range extension toolsMickaël Raulet2014-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPS features/flags: - transform_skip_rotation_enabled_flag - transform_skip_context_enabled_flag - implicit_rdpcm_enabled_flag - explicit_rdpcm_enabled_flag - intra_smoothing_disabled_flag - persistent_rice_adaptation_enabled_flag PPS features/flags: - log2_max_transform_skip_block_size - cross_component_prediction_enabled_flag - chroma_qp_offset_list_enabled_flag - diff_cu_chroma_qp_offset_depth - chroma_qp_offset_list_len_minus1 - cb_qp_offset_list - cr_qp_offset_list - log2_sao_offset_scale_luma - log2_sao_offset_scale_chroma (cherry picked from commit 005294c5b939a23099871c6130c8a7cc331f73ee) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevc/rext: basic infrastructure for supporting range extensionMickaël Raulet2014-07-15
| | | | | | | | | | | | | | | | - support for 4:2:2 and 4:4:4 up to 12 bits - add a new profile for range extension (cherry picked from commit d3c067fa65bbc871758d28aa07f54123430ca346) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevc: separate residu and prediction (needed for Range Extension)Mickaël Raulet2014-07-15
| | | | | | | | | | | | (cherry picked from commit 6b3856ef57d66f2e59ee61fd2eb5f83b6d0d7d4a) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevc: simplify SAO computation, delay from one row its computationMickaël Raulet2014-07-15
| | | | | | | | | | | | (cherry picked from commit f2c5f647cec786df26f442a85e6d685a131a50c9) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avcodec/hevc: new idct + asmplepere2014-06-17
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | HEVC : added assembly MC functionsplepere2014-05-06
| | | | | | | | | | | | | | pretty print x86 Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | hevc: C code update for new motion compensationMickaël Raulet2014-05-06
| | | | | | | | | | | | | | pretty print C Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-12-22
|\| | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: hevc: move DSP declarations from hevc.h into hevcdsp.h Conflicts: libavcodec/hevc.h libavcodec/hevcdsp.c libavcodec/hevcdsp.h See: c8dd048ab8cff815c9f4b16a62db0b74df011f0a Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * hevc: move DSP declarations from hevc.h into hevcdsp.hGuillaume Martres2013-12-22
| | | | Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* lavc: add a HEVC decoder.Guillaume Martres2013-10-15
Initially written by Guillaume Martres <smarter@ubuntu.com> as a GSoC project. Further contributions by the OpenHEVC project and other developers, namely: Mickaël Raulet <mraulet@insa-rennes.fr> Seppo Tomperi <seppo.tomperi@vtt.fi> Gildas Cocherel <gildas.cocherel@laposte.net> Khaled Jerbi <khaled_jerbi@yahoo.fr> Wassim Hamidouche <wassim.hamidouche@insa-rennes.fr> Vittorio Giovara <vittorio.giovara@gmail.com> Jan Ekström <jeebjp@gmail.com> Anton Khirnov <anton@khirnov.net> Martin Storsjö <martin@martin.st> Luca Barbato <lu_zero@gentoo.org> Yusuke Nakamura <muken.the.vfrmaniac@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>