libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	Reinstate proper FFmpeg license for all files.	Thilo Borgmann	2013-08-30
\|
*	Merge commit '2c299d4165cd9653153e12270971c2368551b79e'	Michael Niedermayer	2013-05-10
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '2c299d4165cd9653153e12270971c2368551b79e': x86: sbrdsp: implement SSE2 qmf_pre_shuffle Conflicts: libavcodec/x86/sbrdsp.asm libavcodec/x86/sbrdsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: sbrdsp: implement SSE2 qmf_pre_shuffle	Christophe Gisquet	2013-05-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* \|	x86: sbrdsp: force PIC addressing for Win64	Christophe Gisquet	2013-05-08
\| \| \| \| \| \| \| \| \| \| \| \|	MSVC complains about the 32bits addressing, while mingw/gcc does not. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Merge commit '4a7af92cc80ced8498626401ed21f25ffe6740c8'	Michael Niedermayer	2013-05-04
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '4a7af92cc80ced8498626401ed21f25ffe6740c8': sbrdsp: Unroll and use integer operations sbrdsp: Unroll sbr_autocorrelate_c x86: sbrdsp: Implement SSE2 qmf_deint_bfly Conflicts: libavcodec/sbrdsp.c libavcodec/x86/sbrdsp.asm libavcodec/x86/sbrdsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: sbrdsp: Implement SSE2 qmf_deint_bfly	Christophe Gisquet	2013-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sandybridge: 47 cycles Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	avcodec/x86/sbrdsp_init: disable using the noise code in x86_64 MSVC, Try #2	Michael Niedermayer	2013-04-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should fix building with MSVC until someone can change the code so it works with MSVC Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: sbrdsp: implement SSE2 hf_apply_noise	Christophe Gisquet	2013-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	233 to 105 cycles on Arrandale and Win64. Replacing the multiplication by s_m[m] by a pand and a pxor with appropriate vectors is slower. Unrolling is a 15 cycles win. A SSE version was 4 cycles slower. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: sbrdsp: implement SSE2 qmf_pre_shuffle	Christophe Gisquet	2013-04-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: sbrdsp: implement SSE qmf_deint_bfly	Christophe Gisquet	2013-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From 312 to 89/68 (sse/sse2) cycles on Arrandale and Win64. Sandybridge: 68/47 cycles. Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Merge remote-tracking branch 'qatar/master'	Michael Niedermayer	2013-04-06
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* qatar/master: x86: sbrdsp: Implement SSE neg_odd_64 Conflicts: libavcodec/x86/sbrdsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: sbrdsp: Implement SSE neg_odd_64	Christophe Gisquet	2013-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Timing on Arrandale: C SSE Win32: 57 44 Win64: 47 38 Unrolling and not storing mask both save some cycles. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	x86: sbrdsp: implement SSE neg_odd_64	Christophe Gisquet	2013-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Timing on Arrandale: C SSE Win32: 57 44 Win64: 47 38 Unrolling and not storing mask both save some cycles. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Merge commit '4f50646697606df39317b93c2a427603b77636ee'	Michael Niedermayer	2013-01-07
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '4f50646697606df39317b93c2a427603b77636ee': x86: sbrdsp: Implement SSE qmf_post_shuffle Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: sbrdsp: Implement SSE qmf_post_shuffle	Christophe Gisquet	2013-01-06
\| \| \| \| \| \| \| \| \| \| \| \|	255 to 174 cycles on Arrandale / Win64. Unrolling yields no gain. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	Merge commit '44a0036d10579ed91e48df24859e54b08a582742'	Michael Niedermayer	2013-01-07
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '44a0036d10579ed91e48df24859e54b08a582742': x86: sbrdsp: Implement SSE sum64x5 Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: sbrdsp: Implement SSE sum64x5	Christophe Gisquet	2013-01-06
\| \| \| \| \| \| \| \| \| \| \| \|	698 to 174 cycles on Arrandale. Unrolling is a 6 cycles gain. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	sbr_hf_gen_sse: Optimize code a bit more.	Michael Niedermayer	2012-12-08
\|/ \| \| \| \| \| \| \|	Core I7 (Sandy Bridge) 135 to 107 cycles Core i5 (Arrandale) 162 to 142 (Thanks to Christophe Gisquet for testing) Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	SBR DSP x86: implement SSE sbr_hf_gen	Christophe Gisquet	2012-12-07
\| \| \| \| \| \| \| \| \| \| \| \|	Start and end index are multiple of 2, therefore guaranteeing aligned access. Also, this allows to generate 4 floats per loop, keeping the alignment all along. Timing: - 32 bits: 326c -> 172c - 64 bits: 323c -> 156c Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	x86: yasm: Use complete source path for macro helper %includes	Diego Biurrun	2012-10-31
\| \| \| \| \|	This is more consistent with the way we handle C #includes and it simplifies the build system.
*	x86: include x86inc.asm in x86util.asm	Diego Biurrun	2012-10-31
\| \| \| \|	This is necessary to allow refactoring some x86util macros with cpuflags.
*	dsputil x86: use SSE float instruction instead of SSE2 integer equivalent	Christophe GISQUET	2012-04-04
\| \| \| \| \| \|	All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	aacsbr: handle m_max values smaller than 4.	Ronald S. Bultje	2012-03-23
\| \| \| \| \| \| \| \|	Prevents a signflip in the counter, and a subsequent crash because of overreads/overwrites. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org
*	sbrdsp.asm: convert all instructions to float/SSE ones.	Reimar Döffinger	2012-03-07
\| \| \| \| \| \| \| \| \| \| \|	Since the values are floats, using the float operations makes sense, improves performance on some CPUs and makes the code SSE compatible instead of needing SSE2. Based on suggestion by Jason. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	SBR DSP: fix SSE code to not use SSE2 instructions.	Reimar Döffinger	2012-03-06
\| \| \| \| \| \| \| \|	movq from SSE register _to_ memory is an SSE2 instruction. Use the SSE movlps function instead that does the same thing. Signed-off-by: Reimar DÃ¶ffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	SBR DSP x86: implement SSE sbr_hf_g_filt	Christophe GISQUET	2012-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	SBR DSP x86: implement SSE sbr_sum_square_sse	Christophe GISQUET	2012-02-23
	The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>