| Commit message (Collapse) | Author | Age |
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
|
|
| |
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2
functions will never be used in practice.
|
|
|
|
| |
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
| |
This allows using it in swscale also.
|
|
|
|
| |
This allows using it in libswscale/ also.
|
|
|
|
|
|
|
| |
Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
|
| |
This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.
Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
two VP8-related fate failures on Win64.
Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.
Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
|
|
| |
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).
Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
5-10% faster or more on Phenom, Athlon 64, and some others.
Helps some on pre-SSSE3 Intel chips as well, but not as much.
Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.
Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
construct was always enabled, even for <ssse3 versions).
Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
| |
future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...
Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
| |
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.
Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.
Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
|
|
| |
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?
Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
~0.3% faster overall.
Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
CPUs supporting it.
Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.
Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
| |
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.
Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
and chroma (width=8).
Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
wrong with it tomorrow or so, then re-submit.
Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
for x86-32, or 2 MM registers on x86-64.
Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
|
|
|
| |
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.
This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.
Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
inner loopfilter, and it also allows us to save one register on x86-64/sse2.
Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
| |
sse2) doesn't actually loop, so REP_RET isn't necessary.
Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk
|