summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* Inline asm for VP56 arith coderEli Friedman2010-07-23
| | | | | | | | | This is a lot more reliable to get cmov rather than trying to trick gcc into generating it, useful since it's 2% faster overall. Patch by Eli Friedman <eli.friedman at gmail> Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8: optimize DC-only chroma case in the same way as luma.Jason Garrett-Glaser2010-07-23
| | | | | | | Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8 asm: cosmetics (spacing)Jason Garrett-Glaser2010-07-23
| | | | Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8: 30% faster idct_mbJason Garrett-Glaser2010-07-23
| | | | | | | | | | Take shortcuts based on statistically common situations. Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT blocks are common. TODO: tie this more directly into the MB mode, since the DC-level transform is only used for non-splitmv blocks? Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8: clear DCT blocks in iDCT instead of using clear_blocks.Jason Garrett-Glaser2010-07-23
| | | | | | ~0.3% faster overall. Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles onRonald S. Bultje2010-07-22
| | | | | | CPUs supporting it. Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix and enable horizontal >=SSE2 mbedge loopfilter.Ronald S. Bultje2010-07-22
| | | | Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
* relicense h264 deblock sse2 to lgplLoren Merritt2010-07-22
| | | | Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk
* sync yasm macros from x264Loren Merritt2010-07-21
| | | | Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Eliminate one instruction in VP8 dc_add_sse4Jason Garrett-Glaser2010-07-21
| | | | Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Various VP8 x86 deblocking speedupsJason Garrett-Glaser2010-07-21
| | | | | | | SSSE3 versions, improve SSE2 versions a bit. SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them. Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Make mmx VP8 WHT fasterJason Garrett-Glaser2010-07-21
| | | | | | | Avoid pextrw, since it's slow on many older CPUs. Now it doesn't require mmxext either. Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add header declarations for mmx/sse constants missing themDavid Conrad2010-07-21
| | | | Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.cDavid Conrad2010-07-21
| | | | | | Should fix compilation with icc and should help prevent any future duplicates Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)Ronald S. Bultje2010-07-20
| | | | | | and chroma (width=8). Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.Ronald S. Bultje2010-07-20
| | | | Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Revert r24339 (it causes fate failures on x86-64) - I'll figure out what'sRonald S. Bultje2010-07-19
| | | | | | wrong with it tomorrow or so, then re-submit. Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster thanRonald S. Bultje2010-07-19
| | | | | | | | | | | | | | regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag, FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that have been checked specifically on such CPUs and are actually faster than their MMX counterparts. In addition, use this flag to enable particular VP8 and LPC SSE2 functions that are faster than their MMX counterparts. Based on a patch by Loren Merritt <lorenm AT u washington edu>. Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.Ronald S. Bultje2010-07-19
| | | | Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Be more efficient with registers or stack memory. Saves 8/16 bytes stackRonald S. Bultje2010-07-19
| | | | | | for x86-32, or 2 MM registers on x86-64. Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Change function prototypes for width=8 inner and mbedge loopfilter functionsRonald S. Bultje2010-07-19
| | | | | | | | | | | so that it does both U and V planes at the same time. This will have speed advantages when using SSE2 (or higher) optimizations, since we can do both the U and V rows together in a single xmm register. This also renames filter16 to filter16y and filter8 to filter8uv so that it's more obvious what each function is used for. Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
* more credits to D. J. Bernstein for fftLoren Merritt2010-07-18
| | | | Originally committed as revision 24308 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Attempt to fix x86-64 testsuite on fate.Ronald S. Bultje2010-07-16
| | | | Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove duplicate define.Ronald S. Bultje2010-07-16
| | | | Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Revert 24270, it contained some stuff that shouldn't have been in there.Ronald S. Bultje2010-07-16
| | | | Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove duplicate define.Ronald S. Bultje2010-07-16
| | | | Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Give x86 r%d registers names, this will simplify implementation of the chromaRonald S. Bultje2010-07-16
| | | | | | inner loopfilter, and it also allows us to save one register on x86-64/sse2. Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Change return statement, the REP_RET is a mistake since the else case (x86-64,Ronald S. Bultje2010-07-16
| | | | | | sse2) doesn't actually loop, so REP_RET isn't necessary. Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.Ronald S. Bultje2010-07-15
| | | | Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk
* MMX/SSE VC1 loop filterDavid Conrad2010-07-11
| | | | Originally committed as revision 24208 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Make ff_pw_4 128 bitsDavid Conrad2010-07-11
| | | | Originally committed as revision 24207 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move SSE optimized 32-point DCT to its own file. Should fix breakage with YASMVitor Sessak2010-07-06
| | | | | | disabled. Originally committed as revision 24078 to svn://svn.ffmpeg.org/ffmpeg/trunk
* SSE optimized 32-point DCTVitor Sessak2010-07-06
| | | | Originally committed as revision 24077 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).Ronald S. Bultje2010-07-03
| | | | Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk
* SSSE3 versions of vp8 width4 bilinear MC functionsJason Garrett-Glaser2010-07-03
| | | | Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk
* SSSE3 versions of width4 VP8 6-tap MC functionsJason Garrett-Glaser2010-07-02
| | | | | | | | | Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a non-bitexactness bug. Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>. Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix 100L in vp8dsp asm initJason Garrett-Glaser2010-07-01
| | | | Originally committed as revision 23946 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix h264/vp8 intra pred on Athlon XPJason Garrett-Glaser2010-07-01
| | | | | | Whose idea was it to have a CPU that didn't SIGILL on an invalid instruction? Originally committed as revision 23927 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix grammar errors in documentationMåns Rullgård2010-06-30
| | | | Originally committed as revision 23904 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Use add instead of lshift in mmxext vp8 idctJason Garrett-Glaser2010-06-29
| | | | Originally committed as revision 23891 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove unused macros (duplicates from the now-LGPL x86util.asm).Ronald S. Bultje2010-06-29
| | | | Originally committed as revision 23890 to svn://svn.ffmpeg.org/ffmpeg/trunk
* MMX idct_add for VP8.Ronald S. Bultje2010-06-29
| | | | Originally committed as revision 23886 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add missing mm_support call toff_h264_pred_init_x86.Jason Garrett-Glaser2010-06-29
| | | | | | I'm not sure if this is supposed to be here, but it can't hurt. Originally committed as revision 23885 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add mmxext version of VP8 DC Hadamard transformJason Garrett-Glaser2010-06-29
| | | | Originally committed as revision 23878 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Make x86util.asm LGPL so we can use it in LGPL asmJason Garrett-Glaser2010-06-29
| | | | | | Strip out most x264-specific stuff (not used anywhere in ffmpeg). Originally committed as revision 23877 to svn://svn.ffmpeg.org/ffmpeg/trunk
* MMXEXT version of vp8 4x4 vertical predJason Garrett-Glaser2010-06-29
| | | | Originally committed as revision 23876 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add mmx/mmxext/ssse3 4x4 TM intra pred functions for vp8Jason Garrett-Glaser2010-06-28
| | | | Originally committed as revision 23875 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add missing comment header for predict_4x4_dc_mmxextJason Garrett-Glaser2010-06-28
| | | | Originally committed as revision 23874 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix some intra pred MMX functions that used MMXEXT instructionsJason Garrett-Glaser2010-06-28
| | | | | | Also add predict_4x4_dc MMXEXT function for vp8/h264. Originally committed as revision 23873 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix VP8 bilinear mc on x86_64Jason Garrett-Glaser2010-06-28
| | | | Originally committed as revision 23872 to svn://svn.ffmpeg.org/ffmpeg/trunk