| Commit message (Collapse) | Author | Age |
|
|
|
| |
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
| |
This allows using it in swscale also.
|
|
|
|
| |
This allows using it in libswscale/ also.
|
| |
|
|
|
|
| |
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
|
|
|
|
| |
Needs some ARM/PPC asm modifications.
|
|
|
|
| |
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
|
|
|
|
| |
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
|
|
|
|
|
|
|
| |
Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
|
|
|
| |
About 2.5x the speed.
NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.
Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
| |
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
| |
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.
See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.
Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
|
|
|
|
|
|
| |
code directly also and remove loop setup. 20% faster in function, 0.8% overall.
See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.
Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.
Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.
Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
|