libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	x86: hevc: remove a parameter to WP internals	Christophe Gisquet	2015-02-14
\| \| \| \| \| \| \|	The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to get the value in bytes). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32	James Almer	2015-02-12
\| \| \| \| \| \|	Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/hevc_sao: make sao_band_filter work on x86_32	James Almer	2015-02-09
\| \| \| \| \|	Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86: hevc_mc: use epel_hv 16-wide function	Christophe Gisquet	2015-02-06
\| \| \| \| \| \| \|	The epel_hv functions were still relying on only epel_hv 8-wide being the maximum width instanciated. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86: hevc_mc: add AVX2 optimizations	Pierre Edouard Lepere	2015-02-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	before 33304 decicycles in luma_bi_1, 523066 runs, 1222 skips 38138 decicycles in luma_bi_2, 523427 runs, 861 skips 13490 decicycles in luma_uni, 516138 runs, 8150 skips after 20185 decicycles in luma_bi_1, 519970 runs, 4318 skips 24620 decicycles in luma_bi_2, 521024 runs, 3264 skips 10397 decicycles in luma_uni, 515715 runs, 8573 skips Conflicts: libavcodec/x86/hevc_mc.asm libavcodec/x86/hevcdsp_init.c Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}	James Almer	2015-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original x86 intrinsics code by Pierre-Edouard Lepere. Yasm port, refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips 29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips 13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips Width 64 581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips 59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips 28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}	James Almer	2015-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips Width 64 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}	James Almer	2015-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere. 10/12bit yasm ports, refactoring and optimizations by James Almer Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U width 32 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips 8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips 7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips 4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips width 64 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips 28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips 26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips 14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/hevc_res_add: add ff_hevc_transform_add32_8_avx2	James Almer	2014-09-04
\| \| \| \| \| \| \|	~20% faster than AVX. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86: hevc_mc: split differently calls	Christophe Gisquet	2014-08-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases, 2 or 3 calls are performed to functions for unusual widths. Instead, perform 2 calls for different widths to split the workload. The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't be processed that way without modifications: some calls use unaligned buffers, and having branches to handle this was resulting in no micro-benchmark benefit. For block_w == 12 (around 1% of the pixels of the sequence): Before: 12758 decicycles in epel_uni, 4093 runs, 3 skips 19389 decicycles in qpel_uni, 8187 runs, 5 skips 22699 decicycles in epel_bi, 32743 runs, 25 skips 34736 decicycles in qpel_bi, 32733 runs, 35 skips After: 11929 decicycles in epel_uni, 4096 runs, 0 skips 18131 decicycles in qpel_uni, 8184 runs, 8 skips 20065 decicycles in epel_bi, 32750 runs, 18 skips 31458 decicycles in qpel_bi, 32753 runs, 15 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	hevcdsp: remove more instances of compile-time-fixed parameters	Christophe Gisquet	2014-08-22
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	hevcdsp: remove compilation-time-fixed parameter	Christophe Gisquet	2014-08-22
\| \| \| \| \| \| \|	The dststride parameter is always MAX_PB_SIZE. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/hevc_res_add: refactor ff_hevc_transform_add{16,32}_8	James Almer	2014-08-21
\| \| \| \| \| \| \| \| \| \| \|	* Reduced xmm register count to 7 (As such they are now enabled for x86_32). * Removed four movdqa (affects the sse2 version only). * pxor is now used to clear m0 only once. ~5% faster. Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/hecv_res_add: add ff_hevc_transform_add{8,16,32}_8_avx	James Almer	2014-08-20
\| \| \| \| \| \| \| \|	~15% faster than sse2 Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86: hevc: adding transform_add	Pierre Edouard Lepere	2014-08-20
\| \| \| \| \| \|	Reviewed-by: James Almer <jamrial@gmail.com> Approved-by: Ronald S. Bultje Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/hevc_deblock: add add ff_hevc_[hv]_loop_filter_luma_{8, 10, 12}_avx	James Almer	2014-07-29
\| \| \| \| \| \| \|	~5% faster than SSSE3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/hevc_idct: add 12bit idct_dc	James Almer	2014-07-27
\| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avcodec/x86/hevcdsp_init: make license header consistent	Michael Niedermayer	2014-07-27
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	Merge commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d'	Michael Niedermayer	2014-07-27
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d': hevc: SSE2 and SSSE3 loop filters Conflicts: libavcodec/hevcdsp.c libavcodec/hevcdsp.h libavcodec/x86/Makefile libavcodec/x86/hevc_deblock.asm libavcodec/x86/hevcdsp_init.c See: de7b89fd43f850d77cf24ad6ae50185dfe391e91 and several others Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	hevc: SSE2 and SSSE3 loop filters	Pierre Edouard Lepere	2014-07-26
\| \| \| \| \| \| \| \|	Additional contributions by James Almer <jamrial@gmail.com>, Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and Anton Khirnov <anton@khirnov.net> Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	x86/hevc_idct: replace old and unused idct functions	James Almer	2014-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/hevc: add 12bits support for MC	Mickaël Raulet	2014-07-26
\| \| \| \| \| \|	cherry picked from commit 3fcb7a4595a6f40100a22110a5805e3b7510c0fd Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/hevc: add 12bits support for deblocking filter	Mickaël Raulet	2014-07-26
\| \| \| \| \| \|	cherry picked from commit 97d46afe320c7d61d7b9525e5f5588355cde4bb0 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86: hevcdsp: align	Christophe Gisquet	2014-07-23
\| \| \| \| \|	Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avcodec/x86/hevcdsp_init: Fix "warning: assignment from incompatible pointer ↵	Michael Niedermayer	2014-07-22
\| \| \| \|	type"
*	x86/hevc_deblock: add ff_hevc_[hv]_loop_filter_luma_{8, 10}_sse2	James Almer	2014-07-13
\| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Kieran Kunhya <kierank@obe.tv> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avcodec/x86/hevc: add avx2 dc idct	plepere	2014-06-25
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avcodec/hevc: new idct + asm	plepere	2014-06-17
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86: hevcdsp_init: fix macro usage	Christophe Gisquet	2014-06-01
\| \| \| \| \| \|	The macro was not using the parameter but unconditionally using sse4. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avcodec/x86/hevc: added DBF assembly functions	plepere	2014-05-16
\| \| \| \| \| \|	Reviewed-by: James Almer <jamrial@gmail.com> Reviewed-by: Ronald S. Bultje Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avcodec/x86/hevcdsp_init: fix build failure with --disable-mmx	Michael Niedermayer	2014-05-09
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	hvcodec/x86/hevcdsp: make macros more modular to support functions that are ↵	plepere	2014-05-09
\| \| \| \| \| \|	not sse4 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avcodec/x86/hevcdsp_init: fix SSE4 checks	Michael Niedermayer	2014-05-06
\| \| \| \| \|	Found-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avcodec/x86/hevcdsp_init: fix build on 32bit	Michael Niedermayer	2014-05-06
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	HEVC : added assembly MC functions	plepere	2014-05-06
	pretty print x86 Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>