| Commit message (Collapse) | Author | Age |
|
|
|
| |
Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
to the VP6 fate failures on Win64.
Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
The stride can be negative and must be sign extended before being
used in pointer arithmetic.
Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
help in fixing the Win64 fate failures.
Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
two VP8-related fate failures on Win64.
Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
It generates smaller cleaner code.
Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
This is to avoid split asm sections that attempt to preserve some
registers between sections.
Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
Grab from the bitstream in 16-bit chunks instead of 8-bit chunks.
TODO: grab in 32-bit chunks on 64-bit systems.
Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
|
| |
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself.
Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
Patch by Eli Friedman <eli.friedman at gmail dot com>
Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.
Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
|
|
| |
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).
Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
5-10% faster or more on Phenom, Athlon 64, and some others.
Helps some on pre-SSSE3 Intel chips as well, but not as much.
Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.
Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
construct was always enabled, even for <ssse3 versions).
Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
| |
future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...
Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
| |
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.
Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
|
| |
This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.
Patch by Eli Friedman <eli.friedman at gmail>
Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.
Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
|
|
| |
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?
Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
~0.3% faster overall.
Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
CPUs supporting it.
Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.
Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.
Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
Should fix compilation with icc and should help prevent any future duplicates
Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
and chroma (width=8).
Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
wrong with it tomorrow or so, then re-submit.
Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag,
FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that
have been checked specifically on such CPUs and are actually faster than
their MMX counterparts.
In addition, use this flag to enable particular VP8 and LPC SSE2 functions
that are faster than their MMX counterparts.
Based on a patch by Loren Merritt <lorenm AT u washington edu>.
Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
for x86-32, or 2 MM registers on x86-64.
Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk
|