libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	dca: remove unused decode_hf function and quant_d tables	Alexandra Hájková	2015-12-24
\| \| \| \| \|	They were superseded with their integer equivalents. Rename integer decode_hf to decode_hf.
*	x86: check for AV_CPU_FLAG_AVXSLOW where useful	James Almer	2015-05-31
\| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
*	x86/synth_filter: remove the fma3 version ifdefs	James Almer	2014-04-13
\| \| \| \| \| \| \|	This fixes compilation failures with --disable-fma3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	x86/synth_filter: add synth_filter_fma3	James Almer	2014-04-04
\| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	x86/synth_filter: add synth_filter_avx	James Almer	2014-04-04
\| \| \| \| \| \| \| \| \| \| \| \|	Sandy Bridge Win64: 180 cycles in ff_synth_filter_inner_sse2 150 cycles in ff_synth_filter_inner_avx Also switch some instructions to a three operand format to avoid assembly errors with Yasm 1.1.0 or older. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	x86/synth_filter: add synth_filter_sse	James Almer	2014-04-04
\| \| \| \| \| \| \|	Build only on x86_32 targets. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	x86: dcadsp: Fix linking with yasm and optimizations disabled	Diego Biurrun	2014-03-05
\| \| \| \| \|	Some optimized functions reference optimized symbols, so the functions must be explicitly disabled when those symbols are unavailable.
*	dcadec: simplify decoding of VQ high frequencies	Christophe Gisquet	2014-02-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The vector dequantization has a test in a loop preventing effective SIMD implementation. By moving it out of the loop, this loop can be DSPized. Therefore, modify the current DSP implementation. In particular, the DSP implementation no longer has to handle null loop sizes. The decode_hf implementations have following timings: For x86 Arrandale: C SSE SSE2 SSE4 win32: 260 162 119 104 win64: 242 N/A 89 72 The arm NEON optimizations follow in a later patch as external asm. The now unused check for the y modifier in arm inline asm is removed from configure.
*	x86: synth filter float: implement SSE2 version	Christophe Gisquet	2014-02-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
*	x86: dcadsp: implement SSE lfe_dir	Christophe Gisquet	2014-02-28
\| \| \| \| \| \| \| \|	Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by: Janne Grunau <janne-libav@jannau.net>
*	x86: dcadsp: implement int8x8_fmul_int32	Christophe Gisquet	2014-02-07
	For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 29 26 Win64: 30 33 25 23 The SSE version is neither compiled nor set for ARCH_X86_64, as the inlinable function takes over. Signed-off-by: Janne Grunau <janne-libav@jannau.net>