libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	Reinstate proper FFmpeg license for all files.	Thilo Borgmann	2013-08-30
\|
*	Merge remote-tracking branch 'qatar/master'	Michael Niedermayer	2013-07-18
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* qatar/master: Consistently use "cpu_flags" as variable/parameter name for CPU flags Conflicts: libavcodec/x86/dsputil_init.c libavcodec/x86/h264dsp_init.c libavcodec/x86/hpeldsp_init.c libavcodec/x86/motion_est.c libavcodec/x86/mpegvideo.c libavcodec/x86/proresdsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	Consistently use "cpu_flags" as variable/parameter name for CPU flags	Diego Biurrun	2013-07-18
\| \|
\| *	x86: sbrdsp: implement SSE2 qmf_pre_shuffle	Christophe Gisquet	2013-05-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by: Anton Khirnov <anton@khirnov.net>
\| *	x86: sbrdsp: Implement SSE2 qmf_deint_bfly	Christophe Gisquet	2013-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sandybridge: 47 cycles Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	x86: sbrdsp: force PIC addressing for Win64	Christophe Gisquet	2013-05-08
\| \| \| \| \| \| \| \| \| \| \| \|	MSVC complains about the 32bits addressing, while mingw/gcc does not. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	avcodec/x86/sbrdsp_init: disable using the noise code in x86_64 MSVC, Try #2	Michael Niedermayer	2013-04-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should fix building with MSVC until someone can change the code so it works with MSVC Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	avcodec/x86/sbrdsp_init: disable using the noise code in x86_64 MSVC	Michael Niedermayer	2013-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should fix building with MSVC until someone can change the code so it works with MSVC Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: sbrdsp: implement SSE2 hf_apply_noise	Christophe Gisquet	2013-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	233 to 105 cycles on Arrandale and Win64. Replacing the multiplication by s_m[m] by a pand and a pxor with appropriate vectors is slower. Unrolling is a 15 cycles win. A SSE version was 4 cycles slower. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: sbrdsp: implement SSE2 qmf_pre_shuffle	Christophe Gisquet	2013-04-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: sbrdsp: implement SSE qmf_deint_bfly	Christophe Gisquet	2013-04-08
\|/ \| \| \| \| \| \| \| \| \| \|	From 312 to 89/68 (sse/sse2) cycles on Arrandale and Win64. Sandybridge: 68/47 cycles. Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86: sbrdsp: Implement SSE neg_odd_64	Christophe Gisquet	2013-04-05
\| \| \| \| \| \| \| \| \| \|	Timing on Arrandale: C SSE Win32: 57 44 Win64: 47 38 Unrolling and not storing mask both save some cycles. Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	Add av_cold attributes to arch-specific init functions	Diego Biurrun	2013-02-05
\|
*	x86: sbrdsp: Implement SSE qmf_post_shuffle	Christophe Gisquet	2013-01-06
\| \| \| \| \| \|	255 to 174 cycles on Arrandale / Win64. Unrolling yields no gain. Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	x86: sbrdsp: Implement SSE sum64x5	Christophe Gisquet	2013-01-06
\| \| \| \| \| \|	698 to 174 cycles on Arrandale. Unrolling is a 6 cycles gain. Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	SBR DSP x86: implement SSE sbr_hf_gen	Christophe Gisquet	2012-12-07
\| \| \| \| \| \| \| \| \| \| \| \|	Start and end index are multiple of 2, therefore guaranteeing aligned access. Also, this allows to generate 4 floats per loop, keeping the alignment all along. Timing: - 32 bits: 326c -> 172c - 64 bits: 323c -> 156c Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	x86: Replace checks for CPU extensions and flags by convenience macros	Diego Biurrun	2012-09-08
\| \| \| \| \|	This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.
*	SBR DSP x86: implement SSE sbr_hf_g_filt	Christophe GISQUET	2012-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	SBR DSP x86: implement SSE sbr_sum_square_sse	Christophe GISQUET	2012-02-23
	The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>