libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	snowdsp: explicitily state instruction size.	Ronald S. Bultje	2012-05-02
\| \| \| \|	Fixes a compile error with clang at -O0.
*	dsputil x86: revert a test back to its previous value	Christophe GISQUET	2012-04-28
\| \| \| \| \| \|	Commit 356ee8d caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	rv34dsp x86: implement MMX2 inverse transform	Christophe Gisquet	2012-04-28
\| \| \| \| \| \|	141 cycles down to 51. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	h264: new assembly version of get_cabac for x86_64 with PIC	Roland Scheidegger	2012-04-28
\| \| \| \| \| \| \| \| \| \|	This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	h264: use one table instead of several for cabac functions	Roland Scheidegger	2012-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \|	The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	h264: (trivial) remove unneeded macro argument in x86/cabac.h	Roland Scheidegger	2012-04-28
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	Remove lowres video decoding	Mans Rullgard	2012-04-21
\| \| \| \| \| \| \|	This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	avcodec: remove AVCodecContext.dsp_mask	Mans Rullgard	2012-04-21
\| \| \| \| \| \| \| \|	This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	h264: use proper PROLOGUE statement for a function using 8 registers.	Ronald S. Bultje	2012-04-16
\| \| \| \|	Fixes crashes when using biweight on win64.
*	dsputil: fix optimized emu_edge function on Win64.	Ronald S. Bultje	2012-04-13
\| \| \| \| \| \| \| \|	Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.
*	ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE	Justin Ruggles	2012-04-12
\| \| \| \| \| \|	Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	h264: fix 10bit biweight functions after recent x86inc.asm fixes.	Ronald S. Bultje	2012-04-12
\| \| \| \| \|	This should have been updated in the x86inc.asm update, but was accidently forgotten.
*	build: Consistently handle conditional compilation for all optimization OBJS.	Diego Biurrun	2012-04-12
\|
*	x86inc improvements for 64-bit	Henrik Gramner	2012-04-11
\| \| \| \| \| \| \| \| \| \| \| \|	Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
*	rv40dsp x86: use only one register, for both increment and loop counter	Christophe GISQUET	2012-04-10
\| \| \| \| \| \|	Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	rv40dsp: implement prescaled versions for biweight.	Christophe GISQUET	2012-04-10
\| \| \| \| \| \| \| \| \| \|	Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	dsputil x86: use SSE float instruction instead of SSE2 integer equivalent	Christophe GISQUET	2012-04-04
\| \| \| \| \| \|	All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype	Christophe GISQUET	2012-04-04
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	vp8dsp x86: perform rounding shift with a single instruction	Christophe GISQUET	2012-04-04
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	cabac: add overread protection to BRANCHLESS_GET_CABAC().	Ronald S. Bultje	2012-03-28
\| \| \| \|	Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
*	cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().	Ronald S. Bultje	2012-03-28
\|
*	cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().	Ronald S. Bultje	2012-03-28
\|
*	cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().	Ronald S. Bultje	2012-03-28
\|
*	h264: add overread protection to get_cabac_bypass_sign_x86().	Ronald S. Bultje	2012-03-28
\|
*	h264: reindent get_cabac_bypass_sign_x86().	Ronald S. Bultje	2012-03-28
\|
*	h264: use struct offsets in get_cabac_bypass_sign_x86().	Ronald S. Bultje	2012-03-28
\|
*	build: prettyprinting cosmetics	Diego Biurrun	2012-03-26
\|
*	x86: dsputil: prettyprint gcc inline asm	Diego Biurrun	2012-03-25
\|
*	x86: K&R prettyprinting cosmetics for dsputil_mmx.c	Diego Biurrun	2012-03-25
\|
*	x86: conditionally compile H.264 QPEL optimizations	Diego Biurrun	2012-03-25
\|
*	dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks.	Diego Biurrun	2012-03-25
\| \| \| \|	This makes them safe to use in non-fully braced if-blocks and similar.
*	aacsbr: handle m_max values smaller than 4.	Ronald S. Bultje	2012-03-23
\| \| \| \| \| \| \| \|	Prevents a signflip in the counter, and a subsequent crash because of overreads/overwrites. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org
*	vp8: convert mbedge loopfilter x86 assembly to use named arguments.	Ronald S. Bultje	2012-03-10
\|
*	vp8: convert inner loopfilter x86 assembly to use named arguments.	Ronald S. Bultje	2012-03-10
\|
*	sbrdsp.asm: convert all instructions to float/SSE ones.	Reimar Döffinger	2012-03-07
\| \| \| \| \| \| \| \| \| \| \|	Since the values are floats, using the float operations makes sense, improves performance on some CPUs and makes the code SSE compatible instead of needing SSE2. Based on suggestion by Jason. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	dsputil: remove shift parameter from scalarproduct_int16	Christophe GISQUET	2012-03-07
\| \| \| \| \| \| \| \| \|	There is only one caller, which does not need the shifting. Other use cases are situations where different roundings would be needed. The x86 and neon versions are modified accordingly. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	x86: Remove duplicated AVG_3DNOW_OP / AVG_MMX2_OP macros from h264_qpel_mmx.c.	Diego Biurrun	2012-03-07
\|
*	SBR DSP: fix SSE code to not use SSE2 instructions.	Reimar Döffinger	2012-03-06
\| \| \| \| \| \| \| \|	movq from SSE register _to_ memory is an SSE2 instruction. Use the SSE movlps function instead that does the same thing. Signed-off-by: Reimar DÃ¶ffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	x86: clean up ff_dsputil_init_mmx()	Mans Rullgard	2012-03-05
\| \| \| \| \| \| \| \|	This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	vp8: convert simple loopfilter x86 assembly to use named arguments.	Ronald S. Bultje	2012-03-03
\|
*	vp8: convert idct x86 assembly to use named arguments.	Ronald S. Bultje	2012-03-03
\|
*	vp8: convert mc x86 assembly to use named arguments.	Ronald S. Bultje	2012-03-03
\|
*	vp8: convert loopfilter x86 assembly to use cpuflags().	Ronald S. Bultje	2012-03-03
\|
*	vp8: convert idct/mc x86 assembly to use cpuflags().	Ronald S. Bultje	2012-03-03
\|
*	h264: change underread for 10bit QPEL to overread.	Ronald S. Bultje	2012-03-02
\| \| \| \| \|	This prevents us from reading before the start of the buffer, and thus prevents crashes resulting from this behaviour. Fixes bug 237.
*	vp8: disable mmx functions with sse/sse2 counterparts on x86-64.	Ronald S. Bultje	2012-03-02
\| \| \| \| \|	x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.
*	vp8: change int stride to ptrdiff_t stride.	Ronald S. Bultje	2012-03-02
\| \| \| \| \|	On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.
*	h264: fix mmxext chroma deblock to use correct TC values.	Ronald S. Bultje	2012-02-27
\|
*	SBR DSP x86: implement SSE sbr_hf_g_filt	Christophe GISQUET	2012-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	SBR DSP x86: implement SSE sbr_sum_square_sse	Christophe GISQUET	2012-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>