libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	Revert r24931, it broke Win32 and some BSD compiles (yay fate).	Ronald S. Bultje	2010-08-25
\| \| \| \|	Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Mark xmm6 and xmm7 as clobbered in ff_vp3_idct_sse2(), which is contributing	Ronald S. Bultje	2010-08-25
\| \| \| \| \| \|	to the VP6 fate failures on Win64. Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP6: fix vp6_filter_diag4_mmx/sse on 64-bit	Måns Rullgård	2010-08-25
\| \| \| \| \| \| \|	The stride can be negative and must be sign extended before being used in pointer arithmetic. Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should	Ronald S. Bultje	2010-08-25
\| \| \| \| \| \|	help in fixing the Win64 fate failures. Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Move vp6_filter_diag4() from DSPContext to VP56DSPContext.	Ronald S. Bultje	2010-08-25
\| \| \| \|	Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Remove global mm_flags variable	Måns Rullgård	2010-08-24
\| \| \| \|	Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Mark xmm registers as clobbered in simple loopfilter. Should fix the last	Ronald S. Bultje	2010-08-24
\| \| \| \| \| \|	two VP8-related fate failures on Win64. Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	imdct/x86: Use "s->mdct_size" instead of "1 << s->mdct_bits".	Alex Converse	2010-08-23
\| \| \| \| \| \|	It generates smaller cleaner code. Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures).	Ronald S. Bultje	2010-08-23
\| \| \| \|	Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Convert ff_imdct_half_sse() to yasm.	Alex Converse	2010-08-22
\| \| \| \| \| \| \|	This is to avoid split asm sections that attempt to preserve some registers between sections. Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP5/6/8: ~7% faster arithmetic decoding	Jason Garrett-Glaser	2010-08-12
\| \| \| \| \| \| \|	Grab from the bitstream in 16-bit chunks instead of 8-bit chunks. TODO: grab in 32-bit chunks on 64-bit systems. Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Split h264dsp and h264pred in configure.	Jason Garrett-Glaser	2010-08-07
\| \| \| \| \| \| \| \| \|	Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Add file missing in r24702	Jason Garrett-Glaser	2010-08-05
\| \| \| \|	Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	H.264: SSE2/SSSE3 weighted prediction asm	Eli Friedman	2010-08-05
\| \| \| \| \| \|	Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Move cavs dsp functions to their own struct	Måns Rullgård	2010-08-03
\| \| \| \|	Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP5/6/8: add one inline missed in r24677	Jason Garrett-Glaser	2010-08-03
\| \| \| \|	Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: move zeroing of luma DC block into the WHT	Jason Garrett-Glaser	2010-08-02
\| \| \| \| \| \| \|	Lets us do the zeroing in asm instead of C. Also makes it consistent with the way the regular iDCT code does it. Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use word-writing instead of dword-writing (with two cached but otherwise	Ronald S. Bultje	2010-07-31
\| \| \| \| \| \| \| \| \| \|	unchanged bytes) in the horizontal simple loopfilter. This makes the filter quite a bit faster in itself (~30 cycles less on Core1), probably mostly because we don't need a complex 4x4 transpose, but only a simple byte interleave. Also allows using pextrw on SSE4, which speeds up even more (e.g. 25% faster on Core i7). Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Remove x86/mmx.h. It is not used anymore and has been deprecated for years.	Vitor Sessak	2010-07-31
\| \| \| \|	Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Convert deinterlacing MMX code to YASM	Vitor Sessak	2010-07-31
\| \| \| \|	Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix compilation in x86_64. I broke it with r24580.	Vitor Sessak	2010-07-29
\| \| \| \|	Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Translate libmpeg2 MMX IDCT to plain asm	Vitor Sessak	2010-07-29
\| \| \| \|	Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster.	Ronald S. Bultje	2010-07-26
\| \| \| \|	Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: Much faster SSE2 MC	Jason Garrett-Glaser	2010-07-26
\| \| \| \| \| \| \|	5-10% faster or more on Phenom, Athlon 64, and some others. Helps some on pre-SSSE3 Intel chips as well, but not as much. Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Enable no-loop memory/register saving for ssse3/sse4 also.	Ronald S. Bultje	2010-07-26
\| \| \| \|	Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Save a register (or regsize of stackspace for x86-32) for the no-loop	Ronald S. Bultje	2010-07-26
\| \| \| \| \| \| \|	mbedge loopfilter functions, by re-using space that holds a variable that we no longer need. Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this	Ronald S. Bultje	2010-07-26
\| \| \| \| \| \|	construct was always enabled, even for <ssse3 versions). Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Split pextrw macro-spaghetti into several opt-specific macros, this will make	Ronald S. Bultje	2010-07-26
\| \| \| \| \| \| \| \|	future new optimizations (imagine a sse5) much easier. Also fix a bug where we used the direction (%2) rather than optimization (%1) to enable this, which means it wasn't ever actually used... Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix obvious bug in assignment. Somehow, the test vectors don't test this...	Ronald S. Bultje	2010-07-25
\| \| \| \|	Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this	Ronald S. Bultje	2010-07-24
\| \| \| \| \| \| \| \|	splits it into small optimization-specific macros which are selected for each DSP function. The advantage of this approach is that the sse4 functions now use the ssse3 codepath also without needing an explicit sse4 codepath. Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Inline asm for VP56 arith coder	Eli Friedman	2010-07-23
\| \| \| \| \| \| \| \| \|	This is a lot more reliable to get cmov rather than trying to trick gcc into generating it, useful since it's 2% faster overall. Patch by Eli Friedman <eli.friedman at gmail> Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: optimize DC-only chroma case in the same way as luma.	Jason Garrett-Glaser	2010-07-23
\| \| \| \| \| \| \|	Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8 asm: cosmetics (spacing)	Jason Garrett-Glaser	2010-07-23
\| \| \| \|	Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: 30% faster idct_mb	Jason Garrett-Glaser	2010-07-23
\| \| \| \| \| \| \| \| \| \|	Take shortcuts based on statistically common situations. Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT blocks are common. TODO: tie this more directly into the MB mode, since the DC-level transform is only used for non-splitmv blocks? Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: clear DCT blocks in iDCT instead of using clear_blocks.	Jason Garrett-Glaser	2010-07-23
\| \| \| \| \| \|	~0.3% faster overall. Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on	Ronald S. Bultje	2010-07-22
\| \| \| \| \| \|	CPUs supporting it. Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix and enable horizontal >=SSE2 mbedge loopfilter.	Ronald S. Bultje	2010-07-22
\| \| \| \|	Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	relicense h264 deblock sse2 to lgpl	Loren Merritt	2010-07-22
\| \| \| \|	Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	sync yasm macros from x264	Loren Merritt	2010-07-21
\| \| \| \|	Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Eliminate one instruction in VP8 dc_add_sse4	Jason Garrett-Glaser	2010-07-21
\| \| \| \|	Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Various VP8 x86 deblocking speedups	Jason Garrett-Glaser	2010-07-21
\| \| \| \| \| \| \|	SSSE3 versions, improve SSE2 versions a bit. SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them. Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Make mmx VP8 WHT faster	Jason Garrett-Glaser	2010-07-21
\| \| \| \| \| \| \|	Avoid pextrw, since it's slow on many older CPUs. Now it doesn't require mmxext either. Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Add header declarations for mmx/sse constants missing them	David Conrad	2010-07-21
\| \| \| \|	Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c	David Conrad	2010-07-21
\| \| \| \| \| \|	Should fix compilation with icc and should help prevent any future duplicates Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)	Ronald S. Bultje	2010-07-20
\| \| \| \| \| \|	and chroma (width=8). Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.	Ronald S. Bultje	2010-07-20
\| \| \| \|	Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's	Ronald S. Bultje	2010-07-19
\| \| \| \| \| \|	wrong with it tomorrow or so, then re-submit. Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than	Ronald S. Bultje	2010-07-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag, FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that have been checked specifically on such CPUs and are actually faster than their MMX counterparts. In addition, use this flag to enable particular VP8 and LPC SSE2 functions that are faster than their MMX counterparts. Based on a patch by Loren Merritt <lorenm AT u washington edu>. Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.	Ronald S. Bultje	2010-07-19
\| \| \| \|	Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Be more efficient with registers or stack memory. Saves 8/16 bytes stack	Ronald S. Bultje	2010-07-19
\| \| \| \| \| \|	for x86-32, or 2 MM registers on x86-64. Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk