libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	VP8: move zeroing of luma DC block into the WHT	Jason Garrett-Glaser	2010-08-02
\| \| \| \| \| \| \|	Lets us do the zeroing in asm instead of C. Also makes it consistent with the way the regular iDCT code does it. Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use word-writing instead of dword-writing (with two cached but otherwise	Ronald S. Bultje	2010-07-31
\| \| \| \| \| \| \| \| \| \|	unchanged bytes) in the horizontal simple loopfilter. This makes the filter quite a bit faster in itself (~30 cycles less on Core1), probably mostly because we don't need a complex 4x4 transpose, but only a simple byte interleave. Also allows using pextrw on SSE4, which speeds up even more (e.g. 25% faster on Core i7). Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster.	Ronald S. Bultje	2010-07-26
\| \| \| \|	Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: Much faster SSE2 MC	Jason Garrett-Glaser	2010-07-26
\| \| \| \| \| \| \|	5-10% faster or more on Phenom, Athlon 64, and some others. Helps some on pre-SSSE3 Intel chips as well, but not as much. Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Enable no-loop memory/register saving for ssse3/sse4 also.	Ronald S. Bultje	2010-07-26
\| \| \| \|	Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Save a register (or regsize of stackspace for x86-32) for the no-loop	Ronald S. Bultje	2010-07-26
\| \| \| \| \| \| \|	mbedge loopfilter functions, by re-using space that holds a variable that we no longer need. Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this	Ronald S. Bultje	2010-07-26
\| \| \| \| \| \|	construct was always enabled, even for <ssse3 versions). Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Split pextrw macro-spaghetti into several opt-specific macros, this will make	Ronald S. Bultje	2010-07-26
\| \| \| \| \| \| \| \|	future new optimizations (imagine a sse5) much easier. Also fix a bug where we used the direction (%2) rather than optimization (%1) to enable this, which means it wasn't ever actually used... Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix obvious bug in assignment. Somehow, the test vectors don't test this...	Ronald S. Bultje	2010-07-25
\| \| \| \|	Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this	Ronald S. Bultje	2010-07-24
\| \| \| \| \| \| \| \|	splits it into small optimization-specific macros which are selected for each DSP function. The advantage of this approach is that the sse4 functions now use the ssse3 codepath also without needing an explicit sse4 codepath. Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: optimize DC-only chroma case in the same way as luma.	Jason Garrett-Glaser	2010-07-23
\| \| \| \| \| \| \|	Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8 asm: cosmetics (spacing)	Jason Garrett-Glaser	2010-07-23
\| \| \| \|	Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: 30% faster idct_mb	Jason Garrett-Glaser	2010-07-23
\| \| \| \| \| \| \| \| \| \|	Take shortcuts based on statistically common situations. Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT blocks are common. TODO: tie this more directly into the MB mode, since the DC-level transform is only used for non-splitmv blocks? Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8: clear DCT blocks in iDCT instead of using clear_blocks.	Jason Garrett-Glaser	2010-07-23
\| \| \| \| \| \|	~0.3% faster overall. Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on	Ronald S. Bultje	2010-07-22
\| \| \| \| \| \|	CPUs supporting it. Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix and enable horizontal >=SSE2 mbedge loopfilter.	Ronald S. Bultje	2010-07-22
\| \| \| \|	Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Eliminate one instruction in VP8 dc_add_sse4	Jason Garrett-Glaser	2010-07-21
\| \| \| \|	Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Various VP8 x86 deblocking speedups	Jason Garrett-Glaser	2010-07-21
\| \| \| \| \| \| \|	SSSE3 versions, improve SSE2 versions a bit. SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them. Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Make mmx VP8 WHT faster	Jason Garrett-Glaser	2010-07-21
\| \| \| \| \| \| \|	Avoid pextrw, since it's slow on many older CPUs. Now it doesn't require mmxext either. Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)	Ronald S. Bultje	2010-07-20
\| \| \| \| \| \|	and chroma (width=8). Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.	Ronald S. Bultje	2010-07-20
\| \| \| \|	Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's	Ronald S. Bultje	2010-07-19
\| \| \| \| \| \|	wrong with it tomorrow or so, then re-submit. Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.	Ronald S. Bultje	2010-07-19
\| \| \| \|	Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Be more efficient with registers or stack memory. Saves 8/16 bytes stack	Ronald S. Bultje	2010-07-19
\| \| \| \| \| \|	for x86-32, or 2 MM registers on x86-64. Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Change function prototypes for width=8 inner and mbedge loopfilter functions	Ronald S. Bultje	2010-07-19
\| \| \| \| \| \| \| \| \| \| \|	so that it does both U and V planes at the same time. This will have speed advantages when using SSE2 (or higher) optimizations, since we can do both the U and V rows together in a single xmm register. This also renames filter16 to filter16y and filter8 to filter8uv so that it's more obvious what each function is used for. Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Attempt to fix x86-64 testsuite on fate.	Ronald S. Bultje	2010-07-16
\| \| \| \|	Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Remove duplicate define.	Ronald S. Bultje	2010-07-16
\| \| \| \|	Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Revert 24270, it contained some stuff that shouldn't have been in there.	Ronald S. Bultje	2010-07-16
\| \| \| \|	Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Remove duplicate define.	Ronald S. Bultje	2010-07-16
\| \| \| \|	Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Give x86 r%d registers names, this will simplify implementation of the chroma	Ronald S. Bultje	2010-07-16
\| \| \| \| \| \|	inner loopfilter, and it also allows us to save one register on x86-64/sse2. Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Change return statement, the REP_RET is a mistake since the else case (x86-64,	Ronald S. Bultje	2010-07-16
\| \| \| \| \| \|	sse2) doesn't actually loop, so REP_RET isn't necessary. Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.	Ronald S. Bultje	2010-07-15
\| \| \| \|	Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).	Ronald S. Bultje	2010-07-03
\| \| \| \|	Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	SSSE3 versions of vp8 width4 bilinear MC functions	Jason Garrett-Glaser	2010-07-03
\| \| \| \|	Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	SSSE3 versions of width4 VP8 6-tap MC functions	Jason Garrett-Glaser	2010-07-02
\| \| \| \| \| \| \| \| \|	Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a non-bitexactness bug. Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>. Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Use add instead of lshift in mmxext vp8 idct	Jason Garrett-Glaser	2010-06-29
\| \| \| \|	Originally committed as revision 23891 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Remove unused macros (duplicates from the now-LGPL x86util.asm).	Ronald S. Bultje	2010-06-29
\| \| \| \|	Originally committed as revision 23890 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	MMX idct_add for VP8.	Ronald S. Bultje	2010-06-29
\| \| \| \|	Originally committed as revision 23886 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Add mmxext version of VP8 DC Hadamard transform	Jason Garrett-Glaser	2010-06-29
\| \| \| \|	Originally committed as revision 23878 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Fix VP8 bilinear mc on x86_64	Jason Garrett-Glaser	2010-06-28
\| \| \| \|	Originally committed as revision 23872 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Add x86 asm functions for VP8 put_pixels	Jason Garrett-Glaser	2010-06-28
\| \| \| \|	Originally committed as revision 23858 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	Add MMX, SSE2, SSSE3 asm for VP8 bilinear MC	Jason Garrett-Glaser	2010-06-28
\| \| \| \|	Originally committed as revision 23857 to svn://svn.ffmpeg.org/ffmpeg/trunk
*	First shot at VP8 optimizations:	Jason Garrett-Glaser	2010-06-27
	- MMXEXT, SSE2 and SSSE3 MC functions - MMX and SSE4 IDCT dc_add functions Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself. Originally committed as revision 23815 to svn://svn.ffmpeg.org/ffmpeg/trunk