| Commit message (Collapse) | Author | Age |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '0b9a237b2386ff84a6f99716bd58fa27a1b767e7':
hevc: Add NEON 4x4 and 8x8 IDCT
[15:12:59] <@ubitux> hevc_idct_4x4_8_c: 389.1
[15:13:00] <@ubitux> hevc_idct_4x4_8_neon: 126.6
[15:13:02] <@ubitux> our ^
[15:13:06] <@ubitux> hevc_idct_4x4_8_c: 389.3
[15:13:08] <@ubitux> hevc_idct_4x4_8_neon: 107.8
[15:13:10] <@ubitux> hevc_idct_4x4_10_c: 418.6
[15:13:12] <@ubitux> hevc_idct_4x4_10_neon: 108.1
[15:13:14] <@ubitux> libav ^
[15:13:30] <@ubitux> so yeah, we can probably trash our versions here
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Optimized by Martin Storsjö <martin@martin.st>.
The speedup vs C code is around 3.2-4.4x.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'b0e6b3f4777910d61083976aa9fc78a1e0731aae':
hevc: ppc: Add HEVC 4x4 IDCT for PowerPC
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| | |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b':
hevc: x86: Add add_residual() SIMD optimizations
See a6af4bf64dae46356a5f91537a1c8c5f86456b37
This merge is only cosmetics (renames, space shuffling, etc).
The functionnal changes in the ASM are *not* merged:
- unrolling with %rep is kept
- ADD_RES_MMX_4_8 is left untouched: this needs investigation
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| |
| |
| |
| | |
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
|
| |
| |
| |
| |
| |
| | |
Integrated to libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
|
| |
| |
| |
| |
| |
| | |
Integrated to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d':
hevc: Separate adding residual to prediction from IDCT
This commit should be a noop but isn't because of the following renames:
- transform_add → add_residual
- transform_skip → dequant
- idct_4x4_luma → transform_4x4_luma
Merged-by: Clément Bœsch <cboesch@gopro.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Based on patch 250430bf28118cf843df887e8c8b345f1c60c82d
by Mickaël Raulet <mraulet@insa-rennes.fr>, integrated
to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
|
| | |
|
| |
| |
| |
| | |
This should allow for more efficient SIMD.
|
| |
| |
| |
| | |
This should allow for more efficient SIMD.
|
| |
| |
| |
| |
| |
| |
| | |
This should allow for more efficient SIMD.
Keep the C versions as they are now, to allow the compiler to inline the
interpolation coefficients.
|
| |
| |
| |
| |
| | |
put_weighted_pred_avg should be put_unweighted_pred_avg, there is no
weighting there.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '059a934806d61f7af9ab3fd9f74994b838ea5eba':
lavc: Consistently prefix input buffer defines
Conflicts:
doc/examples/decoding_encoding.c
libavcodec/4xm.c
libavcodec/aac_adtstoasc_bsf.c
libavcodec/aacdec.c
libavcodec/aacenc.c
libavcodec/ac3dec.h
libavcodec/asvenc.c
libavcodec/avcodec.h
libavcodec/avpacket.c
libavcodec/dvdec.c
libavcodec/ffv1enc.c
libavcodec/g2meet.c
libavcodec/gif.c
libavcodec/h264.c
libavcodec/h264_mp4toannexb_bsf.c
libavcodec/huffyuvdec.c
libavcodec/huffyuvenc.c
libavcodec/jpeglsenc.c
libavcodec/libxvid.c
libavcodec/mdec.c
libavcodec/motionpixels.c
libavcodec/mpeg4videodec.c
libavcodec/mpegvideo.c
libavcodec/noise_bsf.c
libavcodec/nuv.c
libavcodec/nvenc.c
libavcodec/options.c
libavcodec/parser.c
libavcodec/pngenc.c
libavcodec/proresenc_kostya.c
libavcodec/qsvdec.c
libavcodec/svq1enc.c
libavcodec/tiffenc.c
libavcodec/truemotion2.c
libavcodec/utils.c
libavcodec/utvideoenc.c
libavcodec/vc1dec.c
libavcodec/wmalosslessdec.c
libavformat/adxdec.c
libavformat/aiffdec.c
libavformat/apc.c
libavformat/apetag.c
libavformat/avidec.c
libavformat/bink.c
libavformat/cafdec.c
libavformat/flvdec.c
libavformat/id3v2.c
libavformat/isom.c
libavformat/matroskadec.c
libavformat/mov.c
libavformat/mpc.c
libavformat/mpc8.c
libavformat/mpegts.c
libavformat/mvi.c
libavformat/mxfdec.c
libavformat/mxg.c
libavformat/nutdec.c
libavformat/oggdec.c
libavformat/oggparsecelt.c
libavformat/oggparseflac.c
libavformat/oggparseopus.c
libavformat/oggparsespeex.c
libavformat/omadec.c
libavformat/rawdec.c
libavformat/riffdec.c
libavformat/rl2.c
libavformat/rmdec.c
libavformat/rtpdec_latm.c
libavformat/rtpdec_mpeg4.c
libavformat/rtpdec_qdm2.c
libavformat/rtpdec_svq3.c
libavformat/sierravmd.c
libavformat/smacker.c
libavformat/smush.c
libavformat/spdifenc.c
libavformat/takdec.c
libavformat/tta.c
libavformat/utils.c
libavformat/vqf.c
libavformat/westwood_vqa.c
libavformat/xmv.c
libavformat/xwma.c
libavformat/yop.c
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| |
| |
| | |
vertical mc functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
cherry picked from commit 1b9ee47d2f43b0a029a9468233626102eb1473b8
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| | |
The stride_src parameter is always 2 * MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE.
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
As with sao_band_filter, pass instead the two variables from the struct needed in the function.
This simplifies writing asm optimized versions.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| | |
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
|
| |
| |
| |
| |
| |
| | |
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips
width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
Pass instead the two variables from the struct needed in the function.
This simplifies writing asm optimized versions of the function
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
The dststride parameter is always MAX_PB_SIZE.
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
The x86 asm expects int32_t so use that type.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d':
hevc: SSE2 and SSSE3 loop filters
Conflicts:
libavcodec/hevcdsp.c
libavcodec/hevcdsp.h
libavcodec/x86/Makefile
libavcodec/x86/hevc_deblock.asm
libavcodec/x86/hevcdsp_init.c
See: de7b89fd43f850d77cf24ad6ae50185dfe391e91 and several others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Additional contributions by James Almer <jamrial@gmail.com>,
Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and
Anton Khirnov <anton@khirnov.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| | |
beta0 and beta1 will always be the same
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).
Benchmarks on an Intel Core i5-4200U:
idct8x8_dc
SSE2 MMXEXT C
cycles 22 26 57
idct16x16_dc
AVX2 SSE2 C
cycles 27 32 249
idct32x32_dc
AVX2 SSE2 C
cycles 62 126 1375
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
From openhevc
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
beta0 and beta1 will always be the same within a CU
Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr>
cherry picked from commit 4a23d824741a289c7d2d2f2871d1e2621b63fa1b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- adding one extra pixel all around the frame
- do not copy when SAO is not applied
5% improvement
cherry picked from commit 10fc29fc19a12c4d8168fbe1a954b76386db12d0
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
SPS features/flags:
- transform_skip_rotation_enabled_flag
- transform_skip_context_enabled_flag
- implicit_rdpcm_enabled_flag
- explicit_rdpcm_enabled_flag
- intra_smoothing_disabled_flag
- persistent_rice_adaptation_enabled_flag
PPS features/flags:
- log2_max_transform_skip_block_size
- cross_component_prediction_enabled_flag
- chroma_qp_offset_list_enabled_flag
- diff_cu_chroma_qp_offset_depth
- chroma_qp_offset_list_len_minus1
- cb_qp_offset_list
- cr_qp_offset_list
- log2_sao_offset_scale_luma
- log2_sao_offset_scale_chroma
(cherry picked from commit 005294c5b939a23099871c6130c8a7cc331f73ee)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
- support for 4:2:2 and 4:4:4 up to 12 bits
- add a new profile for range extension
(cherry picked from commit d3c067fa65bbc871758d28aa07f54123430ca346)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
(cherry picked from commit 6b3856ef57d66f2e59ee61fd2eb5f83b6d0d7d4a)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
(cherry picked from commit f2c5f647cec786df26f442a85e6d685a131a50c9)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
pretty print x86
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
pretty print C
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
hevc: move DSP declarations from hevc.h into hevcdsp.h
Conflicts:
libavcodec/hevc.h
libavcodec/hevcdsp.c
libavcodec/hevcdsp.h
See: c8dd048ab8cff815c9f4b16a62db0b74df011f0a
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
| |
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
Initially written by Guillaume Martres <smarter@ubuntu.com> as a GSoC
project. Further contributions by the OpenHEVC project and other
developers, namely:
Mickaël Raulet <mraulet@insa-rennes.fr>
Seppo Tomperi <seppo.tomperi@vtt.fi>
Gildas Cocherel <gildas.cocherel@laposte.net>
Khaled Jerbi <khaled_jerbi@yahoo.fr>
Wassim Hamidouche <wassim.hamidouche@insa-rennes.fr>
Vittorio Giovara <vittorio.giovara@gmail.com>
Jan Ekström <jeebjp@gmail.com>
Anton Khirnov <anton@khirnov.net>
Martin Storsjö <martin@martin.st>
Luca Barbato <lu_zero@gentoo.org>
Yusuke Nakamura <muken.the.vfrmaniac@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|