libav.git - [no description]

	Commit message (Collapse)	Author	Age
...
*	x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros	Loren Merritt	2012-07-05
\| \| \| \|	Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	x86: h264_intrapred: port to cpuflag macros	Diego Biurrun	2012-07-05
\|
*	vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too	Martin Storsjö	2012-07-05
\| \| \| \| \| \|	This was missed in the the previous commit in 70a1c800. Signed-off-by: Martin Storsjö <martin@martin.st>
*	vp8: loopfilter >=sse2 functions need aligned stack on x86-32.	Martin Storsjö	2012-07-04
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	dsputilenc: group yasm and inline asm function pointer assignment.	Ronald S. Bultje	2012-07-04
\|
*	dsputilenc_mmx: split assignment of ff_sse16_sse2 to SSE2 section.	Ronald S. Bultje	2012-06-30
\|
*	x86: fmtconvert: add special asm for float_to_int16_interleave_misc_*	Ronald S. Bultje	2012-06-30
\| \| \| \| \| \|	This gets rid of a variable-length array and a for loop in C code. Signed-off-by: Martin Storsjö <martin@martin.st>
*	x86: vc1: fix and enable optimised loop filter	Mans Rullgard	2012-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is that the ssse3 psign instruction does the wrong thing here. Commit ea60dfe incorrectly removed a macro emulating this instruction for pre-ssse3 code. However, the emulation is incorrect, and the code relies on the behaviour of the macro. Specifically, the psign sets destination elements to zero where the corresponding source element is zero, whereas the emulation only negates destination elements where the source is negative. Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus, which is why the original VC-1 code had an additional right shift when using it. Since the psign instruction cannot be used here, skip all the macro hell and use the working instruction sequence directly. None of this was noticed due a stray return statement in ff_vc1dsp_init_mmx() which meant that only the mmx version of the loop filter was ever used (before being removed in ea60dfe). Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: fft: replace call to memcpy by a loop	Christophe Gisquet	2012-06-27
\| \| \| \| \| \| \| \| \|	The function call was a mess to handle, and memcpy cannot make the assumptions we do in the new code. Tested on an IMC sample: 430c -> 370c. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: fft: elf64: fix PIC build	Mans Rullgard	2012-06-25
\| \| \| \| \| \| \|	In a 64-bit PIC build, external functions must be called through the PLT. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: fft: win64: fix stack alignment for memcpy() call	Mans Rullgard	2012-06-25
\|
*	x86: fft: convert sse inline asm to yasm	Mans Rullgard	2012-06-25
\|
*	x86: place some inline asm under #if HAVE_INLINE_ASM	Ronald S. Bultje	2012-06-25
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	h264: use asm cabac reader under a generic condition	Mans Rullgard	2012-06-23
\| \| \| \| \| \| \| \|	This removes a dependency on implementation details from generic code and allows easy addition of the equivalent optimisation for other architectures than x86. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: Only use optimizations with cmov if the CPU supports the instruction	Diego Biurrun	2012-06-23
\|
*	x86: remove unused inline asm macros from dsputil_mmx.h	Mans Rullgard	2012-06-23
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: move some inline asm macros to the only places they are used	Mans Rullgard	2012-06-23
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	cosmetics: do not use full path for local headers	Diego Biurrun	2012-06-22
\|
*	dwt: remove variable-length arrays	Ronald S. Bultje	2012-06-17
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	Add a float DSP framework to libavutil	Justin Ruggles	2012-06-08
\| \| \| \|	Move vector_fmul() from DSPContext to AVFloatDSPContext.
*	x86: use new schema for ASM macros	Vitor Sessak	2012-05-29
\| \| \| \|	Signed-off-by: Janne Grunau <janne-libav@jannau.net>
*	x86: lavc: use %if HAVE_AVX guards around AVX functions in yasm code.	Justin Ruggles	2012-05-22
\| \| \| \| \| \|	This is needed for older versions of yasm/nasm that do not support AVX. Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	Convert vector_fmul range of functions to YASM and add AVX versions	Kieran Kunhya	2012-05-21
\| \| \| \|	Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
*	x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.	Michael Kostylev	2012-05-15
\|
*	ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16	Justin Ruggles	2012-05-15
\| \| \| \| \| \|	Simplifies the code by using cpuflags and a new macro. Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only function.
*	x86: use more standard construct for setting ASM functions in FFT code	Vitor Sessak	2012-05-14
\| \| \| \|	Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions.	Michael Kostylev	2012-05-12
\|
*	rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC	Christophe Gisquet	2012-05-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	snowdsp: explicitily state instruction size.	Ronald S. Bultje	2012-05-02
\| \| \| \|	Fixes a compile error with clang at -O0.
*	dsputil x86: revert a test back to its previous value	Christophe GISQUET	2012-04-28
\| \| \| \| \| \|	Commit 356ee8d caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	rv34dsp x86: implement MMX2 inverse transform	Christophe Gisquet	2012-04-28
\| \| \| \| \| \|	141 cycles down to 51. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	h264: new assembly version of get_cabac for x86_64 with PIC	Roland Scheidegger	2012-04-28
\| \| \| \| \| \| \| \| \| \|	This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	h264: use one table instead of several for cabac functions	Roland Scheidegger	2012-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \|	The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	h264: (trivial) remove unneeded macro argument in x86/cabac.h	Roland Scheidegger	2012-04-28
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	Remove lowres video decoding	Mans Rullgard	2012-04-21
\| \| \| \| \| \| \|	This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	avcodec: remove AVCodecContext.dsp_mask	Mans Rullgard	2012-04-21
\| \| \| \| \| \| \| \|	This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	h264: use proper PROLOGUE statement for a function using 8 registers.	Ronald S. Bultje	2012-04-16
\| \| \| \|	Fixes crashes when using biweight on win64.
*	dsputil: fix optimized emu_edge function on Win64.	Ronald S. Bultje	2012-04-13
\| \| \| \| \| \| \| \|	Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.
*	ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE	Justin Ruggles	2012-04-12
\| \| \| \| \| \|	Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	h264: fix 10bit biweight functions after recent x86inc.asm fixes.	Ronald S. Bultje	2012-04-12
\| \| \| \| \|	This should have been updated in the x86inc.asm update, but was accidently forgotten.
*	build: Consistently handle conditional compilation for all optimization OBJS.	Diego Biurrun	2012-04-12
\|
*	x86inc improvements for 64-bit	Henrik Gramner	2012-04-11
\| \| \| \| \| \| \| \| \| \| \| \|	Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
*	rv40dsp x86: use only one register, for both increment and loop counter	Christophe GISQUET	2012-04-10
\| \| \| \| \| \|	Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	rv40dsp: implement prescaled versions for biweight.	Christophe GISQUET	2012-04-10
\| \| \| \| \| \| \| \| \| \|	Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	dsputil x86: use SSE float instruction instead of SSE2 integer equivalent	Christophe GISQUET	2012-04-04
\| \| \| \| \| \|	All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype	Christophe GISQUET	2012-04-04
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	vp8dsp x86: perform rounding shift with a single instruction	Christophe GISQUET	2012-04-04
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	cabac: add overread protection to BRANCHLESS_GET_CABAC().	Ronald S. Bultje	2012-03-28
\| \| \| \|	Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
*	cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().	Ronald S. Bultje	2012-03-28
\|
*	cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().	Ronald S. Bultje	2012-03-28
\|