libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	vp8: convert loopfilter x86 assembly to use cpuflags().	Ronald S. Bultje	2012-03-03
\|
*	vp8: convert idct/mc x86 assembly to use cpuflags().	Ronald S. Bultje	2012-03-03
\|
*	h264: change underread for 10bit QPEL to overread.	Ronald S. Bultje	2012-03-02
\| \| \| \| \|	This prevents us from reading before the start of the buffer, and thus prevents crashes resulting from this behaviour. Fixes bug 237.
*	vp8: disable mmx functions with sse/sse2 counterparts on x86-64.	Ronald S. Bultje	2012-03-02
\| \| \| \| \|	x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.
*	vp8: change int stride to ptrdiff_t stride.	Ronald S. Bultje	2012-03-02
\| \| \| \| \|	On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.
*	h264: fix mmxext chroma deblock to use correct TC values.	Ronald S. Bultje	2012-02-27
\|
*	SBR DSP x86: implement SSE sbr_hf_g_filt	Christophe GISQUET	2012-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	SBR DSP x86: implement SSE sbr_sum_square_sse	Christophe GISQUET	2012-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	rv34: change most "int stride" into "ptrdiff_t stride".	Ronald S. Bultje	2012-02-20
\| \| \| \| \| \|	This prevents having to sign-extend on 64-bit systems with 32-bit ints, such as x86-64. Also fixes crashes on systems where we don't do it and arguments are not in registers, such as Win64 for all weight functions.
*	h264: don't use redzone in loopfilter on win64.	Ronald S. Bultje	2012-02-19
\| \| \| \|	Red zone usage is not allowed in the Win64 ABI.
*	mpegaudio: replace memcpy by SIMD code	Christophe GISQUET	2012-02-15
\| \| \| \| \| \| \| \| \| \| \| \|	By replacing memcpy with an unrolled loop using the alignment knowledge it has, some speedup can be obtained. Before (gcc 4.6.1): ~400 cycles After: ~370 cycles Overall, around 2% speed increase when decoding a 2400s mp3 to f32le. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	mpegvideo: Add ff_ prefix to nonstatic functions	Martin Storsjö	2012-02-15
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	dsputil: Add ff_ prefix to inv_zigzag_direct16	Martin Storsjö	2012-02-15
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	dsputil: Add ff_ prefix to the dsputil_init functions	Martin Storsjö	2012-02-15
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	ac3dsp: do not use pshufb in ac3_extract_exponents_ssse3()	Justin Ruggles	2012-02-09
\| \| \| \| \| \| \|	We need to do unsigned saturation in order to cover the corner case when the absolute coefficient value is 16777215 (the maximum value). Fixes Bug #216
*	cosmetics: Delete empty lines at end of file.	Diego Biurrun	2012-02-09
\|
*	h264: manually save/restore XMM registers for functions using INIT_MMX.	Ronald S. Bultje	2012-02-08
\| \| \| \| \|	On Win64, these registers are callee-save, so not saving/restoring them correctly is a violation of ABI and can lead to crashes or corrupt data.
*	pngdsp: swap argument inversion.	Ronald S. Bultje	2012-02-07
\|
*	h264: mark h264_idct_add8_10 with number of XMM registers.	Michael Kostylev	2012-02-07
\| \| \| \| \| \|	This fixes XMM register clobber problems on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	win64: add a XMM clobber test configure option.	Ronald S. Bultje	2012-02-02
\| \| \| \| \| \| \|	This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
*	Fix a typo in the x86 asm version of ff_vector_clip_int32()	Justin Ruggles	2012-02-01
\| \| \| \| \|	Specifies the correct number of xmm registers used so that they can be saved and restored on Win64 if necessary.
*	rv40: x86 SIMD for biweight	Christophe Gisquet	2012-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	x86: Give RV40 init file a more suitable name.	Diego Biurrun	2012-01-30
\|
*	x86: Place mm_flags variable declaration below the appropriate #ifdef.	Diego Biurrun	2012-01-30
\| \| \| \|	This fixes some unused variable warnings with YASM disabled.
*	x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf	Christophe Gisquet	2012-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While pshufb allows emulating bswap on XMM registers for SSSE3, more shuffling is needed for SSE2. Alignment is critical, so specific codepaths are provided for this case. For the huffyuv sequence "angels_480-huffyuvcompress.avi": C (using bswap instruction): ~ 55k cycles SSE2: ~ 40k cycles SSSE3 using unaligned loads: ~ 35k cycles SSSE3 using aligned loads: ~ 30k cycles Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	png: add support for bpp>4 to paeth x86 SIMD code.	Ronald S. Bultje	2012-01-29
\| \| \| \| \|	This fixes playback of e.g. RGB48 (bpp=6) content on x86 CPUs. Fixes bug 214.
*	png: add SSE2 version for add_bytes_l2.	Ronald S. Bultje	2012-01-29
\|
*	png: convert DSP functions to yasm.	Ronald S. Bultje	2012-01-29
\|
*	png: add missing #if HAVE_SSSE3 around function pointer assignment.	Ronald S. Bultje	2012-01-29
\|
*	imdct36: mark SSE functions as using all 16 XMM registers.	Ronald S. Bultje	2012-01-29
\| \| \| \| \| \|	On x86-64, it indeed uses all 16 registers (and on x86-32, this gets clipped to 8). Not marking it properly causes callers of this function to fail randomly because of XMM register clobbering.
*	png: move DSP functions to their own DSP context.	Ronald S. Bultje	2012-01-29
\|
*	config.asm: change %ifdef directives to %if directives.	Ronald S. Bultje	2012-01-27
\| \| \| \|	This allows combining multiple conditionals in a single statement.
*	dsputil: use vertical component for drawing bottom edge.	Ronald S. Bultje	2012-01-25
\| \| \| \| \|	Current code only writes 8 pixels of vertical edge for YUV422, which causes MC artifacts when subsequent frames use data from that edge.
*	rv34: 1-pass inter MB reconstruction	Christophe GISQUET	2012-01-16
\| \| \| \|	Implement 1-pass inverse transform and reconstruction for inter blocks.
*	rv34: Intra 16x16 handling	Christophe GISQUET	2012-01-16
\| \| \| \| \| \| \|	Extract processing of intra 16x16 blocks from intra macroblock processing. Also implement a function performing inverse transform and block reconstruction for DC-only blocks in 1 pass instead of 2.
*	rv34: DC-only inverse transform	Christophe GISQUET	2012-01-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When decoding coefficients, detect whether the block is DC-only, and take advantage of this knowledge to perform DC-only inverse transform. This is achieved by: - first, changing the 108x4 element modulo_three_table into a 108 element table (kind of base4), and accessing each value using mask and shifts. - then, checking low bits for 0 (as they represent the presence of higher frequency coefficients) Also provide x86 SIMD code for the DC-only inverse transform. Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
*	fft: init functions with INIT_XMM/YMM.	Henrik Gramner	2012-01-11
\| \| \| \| \| \| \| \|	This is required to handle clobbering of XMM registers on Win64 correctly. Fixes FFT and all tests depending on FFT on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Janne Grunau <janne-libav@jannau.net>
*	mpegaudiodec: optimized iMDCT transform	Vitor Sessak	2012-01-08
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	x86: Fix constraints for decode_significance*_x86	Martin Storsjö	2011-12-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally, prior to 8742a4ff8, the caller code was compiled within this condition: ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS) Since HAVE_7REGS is defined as (ARCH_X86_64 \|\| (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE)) the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal to HAVE_7REGS (for 32 bit at least). The correct simplification of the original condition thus is HAVE_7REGS, not HAVE_EBX_AVAILABLE. This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0 and HAVE_EBX_AVAILABLE = 1. Signed-off-by: Martin Storsjö <martin@martin.st>
*	x86: Tighten register constraints for decode_significance*_x86.	Diego Biurrun	2011-12-21
\| \| \| \| \| \| \|	On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register is not available, but required to assemble the functions. This reverts commit 8742a4f to a simplified version of the original constraints.
*	x86: conditionally compile dnxhd encoder optimizations	Diego Biurrun	2011-12-19
\|
*	build: conditionally compile x86 H.264 chroma optimizations	Diego Biurrun	2011-12-14
\|
*	x86: Require 7 registers for the cabac asm	Martin Storsjö	2011-12-12
\| \| \| \| \| \| \|	The change in 599b4c6ef didn't turn out to work properly on i386 on OS X, where it broke building with PIC enabled. Signed-off-by: Martin Storsjö <martin@martin.st>
*	x86: cabac: replace explicit memory references with "m" operands	Mans Rullgard	2011-12-11
\| \| \| \| \| \| \| \|	This replaces the explicit offset(reg) memory references with "m" operands for the same locations. As a result, one fewer register operand is needed for these inline asm statements. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	Fix a bunch of common typos.	Diego Biurrun	2011-12-11
\|
*	dsputil: use cpuflags in x86 emu_edge_core	Justin Ruggles	2011-11-22
\| \| \| \|	avoids passing around the extra argument among all the macros it uses
*	dsputil: use movups instead of movdqu in ff_emu_edge_core_sse()	Justin Ruggles	2011-11-22
\| \| \| \| \|	This allows emulated_edge_mc_sse() and gmc_sse() to be used under AV_CPU_FLAG_SSE.
*	twinvq: add SSE/AVX optimized sum/difference stereo interleaving	Justin Ruggles	2011-11-11
\|
*	Remove redundant filename self-references inside files.	Diego Biurrun	2011-11-08
\| \| \| \|	Filenames are brittle across renames and add no useful information.
*	x86: drop pointless ARCH_X86 #ifdef from files in x86 subdirectory	Diego Biurrun	2011-11-08
\|