libav.git - [no description]

	Commit message (Collapse)	Author	Age
...
*	dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.	Ronald S. Bultje	2012-08-03
\| \| \| \| \| \| \| \| \| \|	This makes add_hfyu_left_prediction_sse4() handle sources that are not 16-byte aligned in its own function rather than by proxying the call to add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs) does not restore it, thus leading to XMM clobbering and RSP being off. Fixes bug 342.
*	x86: Use consistent 3dnowext function and macro name suffixes	Diego Biurrun	2012-08-03
\| \| \| \| \| \|	Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.
*	x86: proresdsp: improve SIGNEXTEND macro comments	Diego Biurrun	2012-08-02
\|
*	x86: h264dsp: K&R formatting cosmetics	Diego Biurrun	2012-08-02
\|
*	x86: fft: fix imdct_half() for AVX	Ronald S. Bultje	2012-08-02
\| \| \| \| \| \| \| \| \|	Some calculations were changed in b6a3849 to use mmsize, which was not correct for the AVX version, which uses INIT_YMM and therefore has mmsize == 32. Fixes Bug 341. Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
*	x86: remove libmpeg2 mmx(ext) idct functions	Mans Rullgard	2012-08-02
\| \| \| \| \| \| \| \|	These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64.	Ronald S. Bultje	2012-07-31
\| \| \| \| \|	64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.
*	x86/dsputilenc: bury inline asm under HAVE_INLINE_ASM.	Ronald S. Bultje	2012-07-31
\|
*	x86: h264dsp: Remove unused variable ff_pb_3_1	Diego Biurrun	2012-08-01
\|
*	x86: h264dsp: Adjust YASM #ifdefs	Diego Biurrun	2012-07-31
\| \| \| \|	This fixes compilation with YASM disabled.
*	h264: convert loop filter strength dsp function to yasm.	Ronald S. Bultje	2012-07-30
\| \| \| \| \| \| \|	This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.
*	h264_idct_10bit: port x86 assembly to cpuflags.	Ronald S. Bultje	2012-07-28
\|
*	fft: rename "z" to "zc" to prevent name collision.	Ronald S. Bultje	2012-07-28
\| \| \| \| \| \|	Without this, cglobal will expand "z" to "zh" to access the high byte in a register's word, which causes a name collision with the ZH(x) macro further up in this file.
*	vp3: don't compile mmx IDCT functions on x86-64.	Ronald S. Bultje	2012-07-27
\| \| \| \| \|	64-bit CPUs always have SSE2, and a SSE2 version exists, thus the MMX version will never be used.
*	h264_loopfilter: port x86 simd to cpuflags.	Ronald S. Bultje	2012-07-27
\|
*	h264_chromamc_10bit: port x86 simd to cpuflags.	Ronald S. Bultje	2012-07-27
\|
*	vp3: port x86 SIMD to cpuflags.	Ronald S. Bultje	2012-07-27
\|
*	rv34: port x86 SIMD to cpuflags.	Ronald S. Bultje	2012-07-27
\|
*	vp56: only compile MMX SIMD on x86-32.	Ronald S. Bultje	2012-07-27
\| \| \| \| \|	All x86-64 CPUs have SSE2, so the MMX version will never be used. This leads to smaller binaries.
*	vp56: port x86 simd to cpuflags.	Ronald S. Bultje	2012-07-27
\|
*	proresdsp: port x86 assembly to cpuflags.	Ronald S. Bultje	2012-07-27
\|
*	mpegaudio: bury inline asm under HAVE_INLINE_ASM.	Ronald S. Bultje	2012-07-26
\|
*	x86inc: automatically insert vzeroupper for YMM functions.	Ronald S. Bultje	2012-07-26
\|
*	vp3: don't use calls to inline asm in yasm code.	Ronald S. Bultje	2012-07-25
\| \| \| \| \| \| \| \|	Mixing yasm and inline asm is a bad idea, since if either yasm or inline asm is not supported by your toolchain, all of the asm stops working. Thus, better to use either one or the other alone. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
*	x86/dsputil: put inline asm under HAVE_INLINE_ASM.	Ronald S. Bultje	2012-07-25
\| \| \| \| \| \| \|	This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
*	dsputil_mmx: fix incorrect assembly code	Yang Wang	2012-07-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t", and so forth, have problems. From above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: "movq 8(%edi), %mm1". During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8. This will cause a segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build using Intel C++ Compiler, with IPO+PGO optimization enabled. Crashed when decoding an MJPEG video. Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
*	dsputil: x86: add SHUFFLE_MASK_W macro	Jason Garrett-Glaser	2012-07-22
\| \| \| \|	Simplifies pshufb masks that operate on words.
*	x86: dsputil: drop some unused CPU flag debug code	Diego Biurrun	2012-07-19
\|
*	vp3: move idct and loop filter pointers to new vp3dsp context	Mans Rullgard	2012-07-18
\| \| \| \| \| \| \| \|	This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	build: add CONFIG_VP3DSP, reduce repetition in OBJS lists	Mans Rullgard	2012-07-18
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: h264_intrapred: Don't add the 'd' suffix to the SPLATB_REG macro	Martin Storsjö	2012-07-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	The SPLATB_REG macro already adds the 'd' suffix internally. This fixes building on Win64, which has been broken since 878e66902. This worked for unix, where r2 happened to be rdx in this case, which with the first suffix rdxd was mapped to eax, and eaxd is defined back to eax. On win64 however, r2 happened to be R8 in this case, and R8d mapps to R8D just fine, but there's no mapping for R8Dd to anything. Signed-off-by: Martin Storsjö <martin@martin.st>
*	x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros	Diego Biurrun	2012-07-05
\|
*	x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros	Loren Merritt	2012-07-05
\| \| \| \|	Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	x86: h264_intrapred: port to cpuflag macros	Diego Biurrun	2012-07-05
\|
*	vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too	Martin Storsjö	2012-07-05
\| \| \| \| \| \|	This was missed in the the previous commit in 70a1c800. Signed-off-by: Martin Storsjö <martin@martin.st>
*	vp8: loopfilter >=sse2 functions need aligned stack on x86-32.	Martin Storsjö	2012-07-04
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	dsputilenc: group yasm and inline asm function pointer assignment.	Ronald S. Bultje	2012-07-04
\|
*	dsputilenc_mmx: split assignment of ff_sse16_sse2 to SSE2 section.	Ronald S. Bultje	2012-06-30
\|
*	x86: fmtconvert: add special asm for float_to_int16_interleave_misc_*	Ronald S. Bultje	2012-06-30
\| \| \| \| \| \|	This gets rid of a variable-length array and a for loop in C code. Signed-off-by: Martin Storsjö <martin@martin.st>
*	x86: vc1: fix and enable optimised loop filter	Mans Rullgard	2012-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is that the ssse3 psign instruction does the wrong thing here. Commit ea60dfe incorrectly removed a macro emulating this instruction for pre-ssse3 code. However, the emulation is incorrect, and the code relies on the behaviour of the macro. Specifically, the psign sets destination elements to zero where the corresponding source element is zero, whereas the emulation only negates destination elements where the source is negative. Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus, which is why the original VC-1 code had an additional right shift when using it. Since the psign instruction cannot be used here, skip all the macro hell and use the working instruction sequence directly. None of this was noticed due a stray return statement in ff_vc1dsp_init_mmx() which meant that only the mmx version of the loop filter was ever used (before being removed in ea60dfe). Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: fft: replace call to memcpy by a loop	Christophe Gisquet	2012-06-27
\| \| \| \| \| \| \| \| \|	The function call was a mess to handle, and memcpy cannot make the assumptions we do in the new code. Tested on an IMC sample: 430c -> 370c. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: fft: elf64: fix PIC build	Mans Rullgard	2012-06-25
\| \| \| \| \| \| \|	In a 64-bit PIC build, external functions must be called through the PLT. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: fft: win64: fix stack alignment for memcpy() call	Mans Rullgard	2012-06-25
\|
*	x86: fft: convert sse inline asm to yasm	Mans Rullgard	2012-06-25
\|
*	x86: place some inline asm under #if HAVE_INLINE_ASM	Ronald S. Bultje	2012-06-25
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	h264: use asm cabac reader under a generic condition	Mans Rullgard	2012-06-23
\| \| \| \| \| \| \| \|	This removes a dependency on implementation details from generic code and allows easy addition of the equivalent optimisation for other architectures than x86. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: Only use optimizations with cmov if the CPU supports the instruction	Diego Biurrun	2012-06-23
\|
*	x86: remove unused inline asm macros from dsputil_mmx.h	Mans Rullgard	2012-06-23
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: move some inline asm macros to the only places they are used	Mans Rullgard	2012-06-23
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	cosmetics: do not use full path for local headers	Diego Biurrun	2012-06-22
\|