libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	aacenc: add SIMD optimizations for abs_pow34 and quantization	Rostislav Pehlivanov	2016-10-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>
*	avcodec: fix arguments on xmm/neon clobber test wrappers	James Almer	2016-10-02
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	avcodec: add missing xmm/neon clobber test wrappers for the new encode API	James Almer	2016-10-01
\| \| \| \| \|	Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/h264_weight: use appropriate register size for weight parameters	Hendrik Leppkes	2016-09-23
\| \| \| \| \| \| \|	Fixes trac 5579 Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Acked-by: Michael Niedermayer <michael@niedermayer.cc>
*	avcodec/h264: Use ptrdiff_t for (bi)weight functions	Michael Niedermayer	2016-09-23
\| \| \| \|	Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
*	avcodec/ttadsp: cosmetics	James Almer	2016-08-06
\| \| \| \| \| \| \|	Clean some header includes and use the same naming scheme as in ttaencdsp Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}	James Almer	2016-08-02
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	Merge commit '9df889a5f116c1ee78c2f239e0ba599c492431aa'	Clément Bœsch	2016-07-29
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	* commit '9df889a5f116c1ee78c2f239e0ba599c492431aa': h264: rename h264.[ch] to h264dec.[ch] Merged-by: Clément Bœsch <u@pkh.me>
\| *	h264: rename h264.[ch] to h264dec.[ch]	Anton Khirnov	2016-06-21
\| \| \| \| \| \| \| \|	This is more consistent with the naming of other decoders.
* \|	vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters.	Ronald S. Bultje	2016-07-26
\| \| \| \| \| \| \| \| \| \|	Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).
* \|	vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters.	Ronald S. Bultje	2016-07-26
\| \| \| \| \| \| \| \| \| \|	Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).
* \|	vp9: add 32x32 idct AVX2 implementation.	Ronald S. Bultje	2016-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	About 1.8x speedup compared to AVX version for full IDCT. Other sub-IDCT scenarios also see speedups. Full --bench output for idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): nop: 16.5 vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
* \|	x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32	James Almer	2016-07-20
\| \| \| \| \| \| \| \| \| \|	Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* \|	diracdsp_init: add missing ARCH_X86_64 check	Rostislav Pehlivanov	2016-07-12
\| \| \| \| \| \| \| \| \| \| \| \|	That SIMD is still x86_64 only for now. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
* \|	diracdsp: add SIMD for the 10 bit version of put_signed_rect_clamped	Rostislav Pehlivanov	2016-07-11
\| \| \| \| \| \| \| \|	Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
* \|	diracdsp: add dequantization SIMD	Rostislav Pehlivanov	2016-07-11
\| \| \| \| \| \| \| \| \| \| \| \|	Currently unused, to be used in the following commits. Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
* \|	vp9: add 16x16 idct avx2 (8-bit).	Ronald S. Bultje	2016-07-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
* \|	Merge commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c'	Clément Bœsch	2016-07-09
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c': x86: Add missing movsxd for the int stride parameter Merged-by: Clément Bœsch <u@pkh.me>
\| *	x86: Add missing movsxd for the int stride parameter	Martin Storsjö	2016-06-17
\| \| \| \| \| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
\| *	asm: FF_-prefix internal macros used in inline assembly	Diego Biurrun	2016-05-28
\| \| \| \| \| \| \| \| \| \|	These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.
* \|	x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32	James Almer	2016-07-05
\| \| \| \| \| \| \| \| \| \| \| \|	About 10% faster. Signed-off-by: James Almer <jamrial@gmail.com>
* \|	avcodec: add missing xmm/neon clobber test wrappers for the new decode API	James Almer	2016-07-03
\| \| \| \| \| \| \| \| \| \|	Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* \|	asm: FF_-prefix internal macros used in inline assembly	Matthieu Bouron	2016-06-27
\| \| \| \| \| \| \| \|	See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
* \|	Merge commit 'dc40a70c5755bccfb1a1349639943e1f408bea50'	Hendrik Leppkes	2016-06-26
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'dc40a70c5755bccfb1a1349639943e1f408bea50': Drop unnecessary libavutil/x86/asm.h #includes Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
\| *	Drop unnecessary libavutil/x86/asm.h #includes	Diego Biurrun	2016-05-28
\| \|
* \|	Merge commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196'	Clément Bœsch	2016-06-22
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196': tests: Move all test programs to a subdirectory Merged-by: Clément Bœsch <clement@stupeflix.com>
\| *	tests: Move all test programs to a subdirectory	Diego Biurrun	2016-05-13
\| \|
* \|	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'	Clément Bœsch	2016-06-21
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <u@pkh.me>
\| *	cosmetics: Fix spelling mistakes	Vittorio Giovara	2016-05-04
\| \| \| \| \| \| \| \|	Signed-off-by: Diego Biurrun <diego@biurrun.de>
\| *	build: miscellaneous cosmetics	Diego Biurrun	2016-04-07
\| \| \| \| \| \| \| \| \| \| \| \|	Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.
\| *	fft: Split MDCT bits off from FFT	Diego Biurrun	2016-03-01
\| \|
* \|	x86/aacpsdsp: optimize add_squares loop	James Almer	2016-06-14
\| \| \| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
* \|	x86/aacdec: use HADDPS macro	James Almer	2016-06-08
\| \| \| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
* \|	x86: lossless audio: SSE4 madd 32bits	Christophe Gisquet	2016-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 68 -> 49 cycles Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* \|	Merge commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622'	Derek Buitenhuis	2016-04-12
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622': fft: x86: cosmetics: Drop silly comments, add comment, whitespace Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
\| *	fft: x86: cosmetics: Drop silly comments, add comment, whitespace	Diego Biurrun	2016-02-26
\| \|
\| *	x86: hevc: Fix linking with both yasm and optimizations disabled	Diego Biurrun	2016-02-23
\| \| \| \| \| \| \| \| \| \|	Some optimized functions reference optimized symbols, so the functions must be explicitly disabled when those symbols are unavailable.
* \|	avcodec/fft: Add revtab32 for FFTs with more than 65536 samples	Michael Niedermayer	2016-03-04
\| \| \| \| \| \| \| \| \| \| \| \|	x86 optimizations are used only for the cases they support (<=65536 samples) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* \|	avcodec: Extend fft to size 2^17	Michael Niedermayer	2016-03-04
\| \| \| \| \| \| \| \| \| \| \| \|	Asked-for-by: durandal_1707 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* \|	x86/vc1dsp: Split the file into MC and loopfilter	Timothy Gu	2016-02-29
\| \|
* \|	Merge commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c'	Derek Buitenhuis	2016-02-24
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c': build: Add vc1dsp component for more fine-grained dependencies Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
\| *	build: Add vc1dsp component for more fine-grained dependencies	Diego Biurrun	2016-02-19
\| \|
* \|	x86/dcadec: add ff_lfe_fir1_float_{sse3,avx}	James Almer	2016-02-22
\| \| \| \| \| \| \| \| \| \|	Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* \|	Merge commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45'	Derek Buitenhuis	2016-02-16
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45': v210: Use separate sample_factors Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
\| *	v210: Use separate sample_factors	Luca Barbato	2016-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 10bit and the 8bit functions can now be implemented to process a different amount of samples. And while at it simplify a little the code.
\| *	v210: Add avx2 version of the 10-bit line encoder	James Darnley	2016-02-01
\| \| \| \| \| \| \| \| \| \| \| \|	Around 25% faster than the ssse3 version. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
\| *	v210: Add avx2 version of the 8-bit line encoder	James Darnley	2016-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Around 35% faster than the avx version. Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* \|	Merge commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a'	Derek Buitenhuis	2016-02-16
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a': v210: x86: Add the correct guards around the asm code Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
\| *	v210: x86: Add the correct guards around the asm code	Luca Barbato	2016-01-26
\| \| \| \| \| \| \| \|	Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
\| *	x86inc: Add debug symbols indicating sizes of compiled functions	Geza Lore	2016-01-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. Signed-off-by: Anton Khirnov <anton@khirnov.net>