libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	avutil/x86util : add macro for loading a 128 bits constants in an xmm or in ↵	Martin Vignali	2017-12-02
\| \| \| \|	each part of an ymm in order to simplify avx2 asm func
*	Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2'	James Almer	2017-10-21
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2': x86util: Port all macros to cpuflags See d5f8a642f6eb1c6e305c41dabddd0fd36ffb3f77 Merged-by: James Almer <jamrial@gmail.com>
\| *	x86util: Port all macros to cpuflags	Diego Biurrun	2017-03-14
\| \| \| \| \| \| \| \| \| \| \| \|	Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2 macro name, drop pointless check for MMX support, we always assume MMX is available in our SIMD code, fix spelling.
* \|	Add macros to x86util.asm .	Ivan Kalvachev	2017-08-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
* \|	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}	James Almer	2017-06-18
\| \| \| \| \| \| \| \|	About 2x faster than the c version.
* \|	avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args ↵	James Almer	2017-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	are the same Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>
* \|	Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'	Clément Bœsch	2017-03-20
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793': x86util: Document SBUTTERFLY macro Merged-by: Clément Bœsch <u@pkh.me>
\| *	x86util: Document SBUTTERFLY macro	Alexandra Hájková	2016-09-19
\| \| \| \| \| \| \| \|	Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
\| *	x86util: Extend SPLATW for avx2	James Almer	2016-07-18
\| \| \| \| \| \| \| \| \| \| \| \|	Integration to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
\| *	v210enc: Add SIMD optimised 8-bit and 10-bit encoders	Kieran Kunhya	2014-12-05
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* \|	avcodec/h264: sse2, avx h luma mbaff deblock/loop filter	James Darnley	2017-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
* \|	x86util: import MOVHL macro	James Darnley	2017-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
* \|	avcodec/x86: deduplicate PASS8ROWS macro	James Darnley	2017-02-18
\| \|
* \|	vp9: add 16x16 idct avx2 (8-bit).	Ronald S. Bultje	2016-07-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
* \|	x86/showcqt: use three operand format for some instructions	James Almer	2016-06-08
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes failures with yasm 1.1.0 and older Signed-off-by: James Almer <jamrial@gmail.com>
* \|	avutil/x86util: move haddps sse emulation from showcqt	James Almer	2016-06-08
\| \| \| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
* \|	x86: port PSIGNW to cpuflags	James Almer	2015-09-11
\| \| \| \| \| \| \| \| \| \|	Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* \|	x86: move XOP emulation code back to x86inc	James Almer	2015-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* \|	x86/swr: add SSE2/AVX pack_8ch functions	James Almer	2014-12-30
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* \|	v210enc: Add SIMD optimised 8-bit and 10-bit encoders	Kieran Kunhya	2014-11-26
\| \| \| \| \| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/hevc_deblock: improve 8bit transpose store macros	James Almer	2014-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up to four instructions less depending on function and instruction set. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/hevc_idct: replace old and unused idct functions	James Almer	2014-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86util: add and use RSHIFT/LSHIFT macros	Christophe Gisquet	2014-06-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: hpeldsp: better factorization	Christophe Gisquet	2014-05-29
\| \| \| \| \| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}	James Almer	2014-05-28
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: move horizontal add macros to x86util	James Almer	2014-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: Move XOP emulation to x86util	James Almer	2014-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497'	Michael Niedermayer	2013-10-14
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497': x86inc: FMA3/4 Support Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86inc: FMA3/4 Support	Jason Garrett-Glaser	2013-10-14
\| \| \| \| \| \| \| \|	Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* \|	Merge commit '206895708ea2b464755d340e44501daf9a07c310'	Michael Niedermayer	2013-10-14
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '206895708ea2b464755d340e44501daf9a07c310': x86inc: Remove our FMA4 support Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86inc: Remove our FMA4 support	Derek Buitenhuis	2013-10-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* \|	Merge commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d'	Michael Niedermayer	2013-01-19
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d': x86inc: Add cvisible macro for C functions with public prefix Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86inc: Add cvisible macro for C functions with public prefix	Diego Biurrun	2013-01-18
\| \| \| \| \| \| \| \| \| \| \| \|	This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	Merge commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b'	Michael Niedermayer	2013-01-19
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b': x86inc: Rename "program_name" to "private_prefix" configure: Run SHFLAGS through ldflags_filter() Conflicts: configure Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86inc: Rename "program_name" to "private_prefix"	Diego Biurrun	2013-01-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	Merge commit 'dae1d507af94261bafd3b11549884e5d1eca590e'	Michael Niedermayer	2013-01-16
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'dae1d507af94261bafd3b11549884e5d1eca590e': x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags vf_fps: add final flushed frames to the dropped frame count rv34_parser: Adjust #if for disabling individual parsers Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags	Diego Biurrun	2013-01-15
\| \|
* \|	Merge remote-tracking branch 'qatar/master'	Michael Niedermayer	2013-01-15
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* qatar/master: x86: ABSB2: port to cpuflags Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: ABSB2: port to cpuflags	Diego Biurrun	2013-01-15
\| \|
* \|	Merge commit '094a7405e5d8463d7d167d893e04934ec1a84ecd'	Michael Niedermayer	2013-01-15
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '094a7405e5d8463d7d167d893e04934ec1a84ecd': x86: ABSB: port to cpuflags sdp: Include SRTP crypto params if using the srtp protocol Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: ABSB: port to cpuflags	Diego Biurrun	2013-01-15
\| \|
* \|	Merge commit 'd8c772de53d29afb1bada88afa859fce8489c668'	Michael Niedermayer	2013-01-15
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'd8c772de53d29afb1bada88afa859fce8489c668': nutdec: Always return a value from nut_read_timestamp() configure: Make warnings from -Wreturn-type fatal errors x86: ABS2: port to cpuflags vdpau: Remove av_unused attribute from function declaration h264: fix ff_generate_sliding_window_mmcos() prototype. Conflicts: configure libavformat/nutdec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: ABS2: port to cpuflags	Diego Biurrun	2013-01-14
\| \|
* \|	Merge commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0'	Michael Niedermayer	2013-01-07
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0': x86: ABS1: port to cpuflags v210x: cosmetics, reformat Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: ABS1: port to cpuflags	Diego Biurrun	2013-01-06
\| \|
* \|	Merge commit '9d5c62ba5b586c80af508b5914934b1c439f6652'	Michael Niedermayer	2012-12-06
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '9d5c62ba5b586c80af508b5914934b1c439f6652': lavu/opt: do not filter out the initial sign character except for flags eval: treat dB as decibels instead of decibytes float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Conflicts: libavutil/eval.c tests/ref/fate/eval Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	float_dsp: add vector_dmul_scalar() to multiply a vector of doubles	Justin Ruggles	2012-12-05
\| \| \| \| \| \| \| \|	Include x86-optimized versions for SSE2 and AVX.
* \|	Merge remote-tracking branch 'qatar/master'	Michael Niedermayer	2012-11-19
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* qatar/master: x86: h264_intrapred: Fix C function names in comments x86: SPLATD: port to cpuflags Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: SPLATD: port to cpuflags	Diego Biurrun	2012-11-18
\| \|
* \|	Merge remote-tracking branch 'qatar/master'	Michael Niedermayer	2012-11-14
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* qatar/master: x86: mmx2 ---> mmxext in asm constructs Conflicts: libavcodec/x86/h264_chromamc_10bit.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264dsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>