summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* In yadif filter, declare asm constants directly to avoid dependency on ↵Baptiste Coudurier2010-12-06
| | | | | | libavcodec Originally committed as revision 25895 to svn://svn.ffmpeg.org/ffmpeg/trunk
* 10l, add ff_pw_1 to dsputil_mmx for yadif sse2Baptiste Coudurier2010-12-04
| | | | Originally committed as revision 25881 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Use SECTION .text for yasm code.avcoder2010-12-01
| | | | | | Patch by avcoder, ffmpeg gmail Originally committed as revision 25859 to svn://svn.ffmpeg.org/ffmpeg/trunk
* dnxhd_mmx: prefer xmm registers below xmm6 when they are availableRamiro Polla2010-11-02
| | | | Originally committed as revision 25634 to svn://svn.ffmpeg.org/ffmpeg/trunk
* dsputil: Use explicit movzbl instead of movzxİsmail Dönmez2010-11-01
| | | | | | | | This fixes compilation with the latest clang trunk version. Patch by İsmail Dönmez, ismail at namtrac dot org Originally committed as revision 25628 to svn://svn.ffmpeg.org/ffmpeg/trunk
* lpc_mmx: add xmm registers to clobber listRamiro Polla2010-10-31
| | | | Originally committed as revision 25620 to svn://svn.ffmpeg.org/ffmpeg/trunk
* lpc_mmx: merge some asm blocksRamiro Polla2010-10-31
| | | | | | | These blocks depended on the compiler keeping xmm registers untouched between them. Originally committed as revision 25619 to svn://svn.ffmpeg.org/ffmpeg/trunk
* sad16_sse2: merge 2 asm blocksRamiro Polla2010-10-31
| | | | Originally committed as revision 25617 to svn://svn.ffmpeg.org/ffmpeg/trunk
* xmm_clobbers: list xmm registers first in clobber listRamiro Polla2010-10-31
| | | | | | | suncc does not like the leading commas inside the macro, but it has no problem with trailing commas. Originally committed as revision 25615 to svn://svn.ffmpeg.org/ffmpeg/trunk
* idct_sse2_xvid: only mark xmm>=8 as clobbered on x86_64Ramiro Polla2010-10-31
| | | | Originally committed as revision 25614 to svn://svn.ffmpeg.org/ffmpeg/trunk
* motion_est_mmx: prefer xmm registers below xmm6 when they are availableRamiro Polla2010-10-31
| | | | Originally committed as revision 25612 to svn://svn.ffmpeg.org/ffmpeg/trunk
* dsputil_mmx: add xmm registers to clobber listRamiro Polla2010-10-31
| | | | Originally committed as revision 25611 to svn://svn.ffmpeg.org/ffmpeg/trunk
* cosmetics: split long lineRamiro Polla2010-10-31
| | | | Originally committed as revision 25610 to svn://svn.ffmpeg.org/ffmpeg/trunk
* fdct_mmx: add xmm registers to clobber listRamiro Polla2010-10-31
| | | | Originally committed as revision 25609 to svn://svn.ffmpeg.org/ffmpeg/trunk
* idct_sse2_xvid: add xmm registers to clobber listRamiro Polla2010-10-31
| | | | Originally committed as revision 25608 to svn://svn.ffmpeg.org/ffmpeg/trunk
* mpegvideo_mmx: add xmm registers to clobber listRamiro Polla2010-10-31
| | | | Originally committed as revision 25607 to svn://svn.ffmpeg.org/ffmpeg/trunk
* dsputil_mmx: prefer xmm registers below xmm6 when they are availableRamiro Polla2010-10-31
| | | | Originally committed as revision 25606 to svn://svn.ffmpeg.org/ffmpeg/trunk
* h264dsp: add xmm registers to clobber listRamiro Polla2010-10-30
| | | | Originally committed as revision 25604 to svn://svn.ffmpeg.org/ffmpeg/trunk
* indentRamiro Polla2010-10-28
| | | | Originally committed as revision 25598 to svn://svn.ffmpeg.org/ffmpeg/trunk
* h264dsp: merge some more asm blocksRamiro Polla2010-10-28
| | | | Originally committed as revision 25597 to svn://svn.ffmpeg.org/ffmpeg/trunk
* dct32: mark xmm registers in clobber list in ff_dct32_float_sse()Ramiro Polla2010-10-25
| | | | Originally committed as revision 25569 to svn://svn.ffmpeg.org/ffmpeg/trunk
* h264dsp: merge some asm blocksRamiro Polla2010-10-25
| | | | | | | Some code was initializing some xmm registers in one asm block and using them in the following block, assuming they wouldn't be changed in between blocks. Originally committed as revision 25568 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add d modifier to asm argument to fix nasm compilation.Reimar Döffinger2010-10-07
| | | | Originally committed as revision 25397 to svn://svn.ffmpeg.org/ffmpeg/trunk
* fft: mark xmm registers as clobbered in ff_imdct_calc_sseRamiro Polla2010-10-06
| | | | Originally committed as revision 25363 to svn://svn.ffmpeg.org/ffmpeg/trunk
* MMX, MMX2, SSE2 and SSSE3 optimizations for pred16x16/8x8_plane H264 intraRonald S. Bultje2010-10-05
| | | | | | | prediction (plus some with different rounding for svq3/rv40). Speedup (for SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample. Originally committed as revision 25361 to svn://svn.ffmpeg.org/ffmpeg/trunk
* snowdsp: Explicitly state the operand sizesİsmail Dönmez2010-10-04
| | | | | | | | Fixes compilation with clang's builtin assembler Patch by İsmail Dönmez, ismail at namtrac dot org Originally committed as revision 25331 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move static inline function to a macro, so that constant propagation inRonald S. Bultje2010-09-29
| | | | | | | inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE breakage after r25254. Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speedEli Friedman2010-09-29
| | | | | | | | increase to e.g. vc1, snow and mpeg decoding. Patch by Eli Friedman <eli dot friedman gmail com>. Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Merge b_idx and edge variables, and optimize the ASM to directly load variablesRonald S. Bultje2010-09-29
| | | | | | | | from memory locations/offsets depending on b_idx plus constants, rather than having gcc do this. This saves several lea calls and together saves about 10 cycles in h264_loop_filter_strength_mmx2(). Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove mv_mask variable. Replace the related pand -1/0 instructions by eitherRonald S. Bultje2010-09-29
| | | | | | | a pxor, or remove the instruction alltogether. Altogether, this saves 1 instruction. Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove d_idx as a variable, and instead load it as a constant in the asm.Ronald S. Bultje2010-09-29
| | | | | | | This has no measurable speed effect because the surrounding code doesn't take advantage of this yet. Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets ridRonald S. Bultje2010-09-29
| | | | | | | of the d_idx variable and therefore allows for future optimizations. No speed difference by this commit itself. Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allowsRonald S. Bultje2010-09-29
| | | | | | | inlining various constants within the loop code. 20 cycles faster on cathedral sample. Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add d suffix to movd target register to make it work with nasm.Reimar Döffinger2010-09-26
| | | | Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Split and then simplify address generation macro.Reimar Döffinger2010-09-26
| | | | | | Allows nasm to work for this code. Originally committed as revision 25205 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove unused variable.Ronald S. Bultje2010-09-24
| | | | Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, thisRonald S. Bultje2010-09-24
| | | | | | | | inlines scan8[] and removes loop setup. 15% faster, 0.4% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in theRonald S. Bultje2010-09-24
| | | | | | | | code directly also and remove loop setup. 20% faster in function, 0.8% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
* x86: disable SSE functions using stack when stack is not alignedMåns Rullgård2010-09-21
| | | | | | This fixes crashes with ICC 10.1. Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk
* x86: remove hack disabling sse2 h264 loop filter with 32-bit iccMåns Rullgård2010-09-18
| | | | Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Don't access upper 32 bits of a 32-bit int on 64-bit systems.Ronald S. Bultje2010-09-17
| | | | Originally committed as revision 25140 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Properly add HAVE_YASM around yasmified symbols. Should fix compile errorRonald S. Bultje2010-09-17
| | | | | | on configurations using --disable-yasm. Originally committed as revision 25138 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move hadamard_diff{,16}_{mmx,mmx2,sse2,ssse3}() from inline asm to yasm,Ronald S. Bultje2010-09-17
| | | | | | which will hopefully solve the Win64/FATE failures caused by these functions. Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move sse16_sse2() from inline asm to yasm. It is one of the functions causingRonald S. Bultje2010-09-17
| | | | | | Win64/FATE issues. Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm fromRonald S. Bultje2010-09-14
| | | | | | | | | | | | | h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
* LGPL SSE2 H.264 iDCTJason Garrett-Glaser2010-09-10
| | | | | | | | This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move mm_support() from libavcodec to libavutil, make it a publicStefano Sabatini2010-09-08
| | | | | | function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Use "d" suffix for general-purpose registers used with movd.Reimar Döffinger2010-09-05
| | | | | | | | This increases compatibilty with nasm and is also more consistent, e.g. with h264_intrapred.asm and h264_chromamc.asm that already do it that way. Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_Stefano Sabatini2010-09-04
| | | | | | symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h. Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Port latest x264 deblock asm (before they moved to using NV12 as internalRonald S. Bultje2010-09-03
| | | | | | | format), LGPL'ed with permission from Jason and Loren. This includes mmx2 code, so remove inline asm from h264dsp_mmx.c accordingly. Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk