summaryrefslogtreecommitdiff
path: root/libavcodec/x86/h264dsp_mmx.c
Commit message (Collapse)AuthorAge
* H264: change weight/biweight functions to take a height argument.Ronald S. Bultje2011-10-21
| | | | Neon parts by Mans Rullgard <mans@mansr.com>.
* Support for lossless and inter H264 4:2:2.Ronald S. Bultje2011-10-21
|
* h264: 4:2:2 intra decoding supportBaptiste Coudurier2011-10-21
| | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* H.264: add filter_mb_fast support for >8-bit decodingJason Garrett-Glaser2011-07-11
| | | | Much faster high bit depth deblocking.
* h264: Add x86 assembly for 10-bit weight/biweight H.264 functions.Daniel Kang2011-06-21
| | | | | | Mainly ported from 8-bit H.264 weight/biweight. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* h264/10bit: add HAVE_ALIGNED_STACK checks.Daniel Kang2011-05-31
| | | | | | | Fixes regression in 836f47d34b49e8ba9883e738a42f154130421caa in ICC-10.x, since ICC<=11.0 doesn't align stack upon function calls. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* Update 8-bit H.264 IDCT function names to reflect bit-depth.Daniel Kang2011-05-31
| | | | Signed-off-by: Ronald S. Bultje <rbultje@google.com>
* Add IDCT functions for 10-bit H.264.Daniel Kang2011-05-31
| | | | | | | | Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by: Ronald S. Bultje <rbultje@google.com>
* h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64.Gil Pedersen2011-05-16
| | | | | | This fixes linking errors due to undefined symbols on x86_64 OS X. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* 10-bit H.264 x86 chroma v loopfilter asmJason Garrett-Glaser2011-05-11
| | | | Also delete some unused deblock asm macros.
* Port x86 10-bit H.264 deblock asm from x264Jason Garrett-Glaser2011-05-10
|
* Update x86 H.264 deblock asmJason Garrett-Glaser2011-05-10
| | | | Includes AVX versions from x264.
* h264dsp_mmx: place bracket outside #if/#endif block.Ronald S. Bultje2011-05-10
| | | | Should fix compile on systems missing yasm/nasm.
* Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 ↵Oskar Arvidsson2011-05-10
| | | | | | | | | | | | | | | | | decoder. This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* Replace FFmpeg with Libav in licence headersMans Rullgard2011-03-19
| | | | Signed-off-by: Mans Rullgard <mans@mansr.com>
* H.264: split luma dc idct out and implement MMX/SSE2 versionsJason Garrett-Glaser2011-01-14
| | | | | | | | | | About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move static inline function to a macro, so that constant propagation inRonald S. Bultje2010-09-29
| | | | | | | inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE breakage after r25254. Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Merge b_idx and edge variables, and optimize the ASM to directly load variablesRonald S. Bultje2010-09-29
| | | | | | | | from memory locations/offsets depending on b_idx plus constants, rather than having gcc do this. This saves several lea calls and together saves about 10 cycles in h264_loop_filter_strength_mmx2(). Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove mv_mask variable. Replace the related pand -1/0 instructions by eitherRonald S. Bultje2010-09-29
| | | | | | | a pxor, or remove the instruction alltogether. Altogether, this saves 1 instruction. Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove d_idx as a variable, and instead load it as a constant in the asm.Ronald S. Bultje2010-09-29
| | | | | | | This has no measurable speed effect because the surrounding code doesn't take advantage of this yet. Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets ridRonald S. Bultje2010-09-29
| | | | | | | of the d_idx variable and therefore allows for future optimizations. No speed difference by this commit itself. Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allowsRonald S. Bultje2010-09-29
| | | | | | | inlining various constants within the loop code. 20 cycles faster on cathedral sample. Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove unused variable.Ronald S. Bultje2010-09-24
| | | | Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk
* x86: disable SSE functions using stack when stack is not alignedMåns Rullgård2010-09-21
| | | | | | This fixes crashes with ICC 10.1. Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk
* x86: remove hack disabling sse2 h264 loop filter with 32-bit iccMåns Rullgård2010-09-18
| | | | Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm fromRonald S. Bultje2010-09-14
| | | | | | | | | | | | | h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
* LGPL SSE2 H.264 iDCTJason Garrett-Glaser2010-09-10
| | | | | | | | This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move mm_support() from libavcodec to libavutil, make it a publicStefano Sabatini2010-09-08
| | | | | | function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_Stefano Sabatini2010-09-04
| | | | | | symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h. Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Port latest x264 deblock asm (before they moved to using NV12 as internalRonald S. Bultje2010-09-03
| | | | | | | format), LGPL'ed with permission from Jason and Loren. This includes mmx2 code, so remove inline asm from h264dsp_mmx.c accordingly. Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-squareRonald S. Bultje2010-09-01
| | | | | | | | biweight code to sse2/ssse3; add sse2 weight code; and use that same code to create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be removed. OK'ed by Jason on IRC. Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,Ronald S. Bultje2010-09-01
| | | | | | | still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c, which represents H264DSPContext and is now compiled on its own. Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Split intra prediction initialization (i.e. assigning of function pointers)Ronald S. Bultje2010-08-30
| | | | | | | into its own file, it doesn't belong in h264dsp_mmx.c (much less so in dsputil_mmx.c). Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1Ronald S. Bultje2010-08-30
| | | | | | fate failures on Win64. Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()Ronald S. Bultje2010-08-30
| | | | | | functions. Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove global mm_flags variableMåns Rullgård2010-08-24
| | | | Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Split h264dsp and h264pred in configure.Jason Garrett-Glaser2010-08-07
| | | | | | | | | Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
* H.264: SSE2/SSSE3 weighted prediction asmEli Friedman2010-08-05
| | | | | | Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix h264/vp8 intra pred on Athlon XPJason Garrett-Glaser2010-07-01
| | | | | | Whose idea was it to have a CPU that didn't SIGILL on an invalid instruction? Originally committed as revision 23927 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add missing mm_support call toff_h264_pred_init_x86.Jason Garrett-Glaser2010-06-29
| | | | | | I'm not sure if this is supposed to be here, but it can't hurt. Originally committed as revision 23885 to svn://svn.ffmpeg.org/ffmpeg/trunk
* MMXEXT version of vp8 4x4 vertical predJason Garrett-Glaser2010-06-29
| | | | Originally committed as revision 23876 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add mmx/mmxext/ssse3 4x4 TM intra pred functions for vp8Jason Garrett-Glaser2010-06-28
| | | | Originally committed as revision 23875 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix some intra pred MMX functions that used MMXEXT instructionsJason Garrett-Glaser2010-06-28
| | | | | | Also add predict_4x4_dc MMXEXT function for vp8/h264. Originally committed as revision 23873 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Change MMXEXT to MMX2, MMXEXT is deprecatedBaptiste Coudurier2010-06-28
| | | | Originally committed as revision 23865 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix x86 build with h264dsp disabledMåns Rullgård2010-06-28
| | | | Originally committed as revision 23844 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Cosmetics: Fix indentation.Carl Eugen Hoyos2010-06-25
| | | | Originally committed as revision 23785 to svn://svn.ffmpeg.org/ffmpeg/trunk
* 16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264Jason Garrett-Glaser2010-06-25
| | | | Originally committed as revision 23783 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Replace more "m" constraints with MANGLE to fix compilation issuesReimar Döffinger2010-05-10
| | | | | | with x86_32 gcc 4.4.4 and -fPIC. Originally committed as revision 23082 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Convert two "m" constraints to MANGLE to fix compilation with some compilers.Reimar Döffinger2010-04-01
| | | | Originally committed as revision 22760 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove DECLARE_ALIGNED_{8,16} macrosMåns Rullgård2010-03-06
| | | | | | | These macros are redundant. All uses are replaced with the generic DECLARE_ALIGNED macro instead. Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk