summaryrefslogtreecommitdiff
path: root/libavcodec/x86/h264_idct.asm
Commit message (Collapse)AuthorAge
* x86: Make function prototype comments in assembly code consistentDiego Biurrun2014-03-13
| | | | This helps grepping for functions, among other things.
* x86: h264_idct: Update comments to match 8/10-bit depth optimization splitDiego Biurrun2013-10-07
|
* x86: h264_idct: Remove incorrect commentDiego Biurrun2013-08-21
|
* h264: Integrate clear_blocks calls with IDCTRonald S. Bultje2013-04-10
| | | | | | | | | The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Martin Storsjö <martin@martin.st>
* Drop DCTELEM typedefDiego Biurrun2013-01-22
| | | | | | It does not help as an abstraction and adds dsputil dependencies. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* x86: h264_idct: port to cpuflagsDiego Biurrun2012-11-28
|
* x86: mmx2 ---> mmxext in asm constructsDiego Biurrun2012-11-14
|
* x86: MMX2 ---> MMXEXT in macro namesDiego Biurrun2012-10-31
|
* x86: yasm: Use complete source path for macro helper %includesDiego Biurrun2012-10-31
| | | | | This is more consistent with the way we handle C #includes and it simplifies the build system.
* x86: include x86inc.asm in x86util.asmDiego Biurrun2012-10-31
| | | | This is necessary to allow refactoring some x86util macros with cpuflags.
* x86: add colons after labelsMans Rullgard2012-08-07
| | | | | | nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2Diego Biurrun2012-08-05
|
* x86inc improvements for 64-bitHenrik Gramner2012-04-11
| | | | | | | | | | | | Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* h264: manually save/restore XMM registers for functions using INIT_MMX.Ronald S. Bultje2012-02-08
| | | | | On Win64, these registers are callee-save, so not saving/restoring them correctly is a violation of ABI and can lead to crashes or corrupt data.
* config.asm: change %ifdef directives to %if directives.Ronald S. Bultje2012-01-27
| | | | This allows combining multiple conditionals in a single statement.
* Fix NASM include directiveDave Yeo2011-08-15
| | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* Move x86util.asm from libavcodec/ to libavutil/.Ronald S. Bultje2011-08-12
| | | | This allows using it in swscale also.
* Move x86inc.asm to libavutil/.Ronald S. Bultje2011-08-12
| | | | This allows using it in libswscale/ also.
* H.264: tweak some other x86 asm for AtomJason Garrett-Glaser2011-07-29
|
* 4:4:4 H.264 decoding supportJason Garrett-Glaser2011-06-13
| | | | Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
* Roll back 4:4:4 H.264 for nowJason Garrett-Glaser2011-06-13
| | | | Needs some ARM/PPC asm modifications.
* 4:4:4 H.264 decoding supportJason Garrett-Glaser2011-06-13
| | | | Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
* Update 8-bit H.264 IDCT function names to reflect bit-depth.Daniel Kang2011-05-31
| | | | Signed-off-by: Ronald S. Bultje <rbultje@google.com>
* Modify x86util.asm to ease transitioning to 10-bit H.264 assembly.Daniel Kang2011-05-17
| | | | | | | Arguments for variable size instructions are added to many macros, along with other various changes. The x86util.asm code was ported from x264. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* Fix FSF address copy paste error in some license headers.Diego Biurrun2011-05-14
|
* Replace FFmpeg with Libav in licence headersMans Rullgard2011-03-19
| | | | Signed-off-by: Mans Rullgard <mans@mansr.com>
* H.264: split luma dc idct out and implement MMX/SSE2 versionsJason Garrett-Glaser2011-01-14
| | | | | | | | | | About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add d suffix to movd target register to make it work with nasm.Reimar Döffinger2010-09-26
| | | | Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, thisRonald S. Bultje2010-09-24
| | | | | | | | inlines scan8[] and removes loop setup. 15% faster, 0.4% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in theRonald S. Bultje2010-09-24
| | | | | | | | code directly also and remove loop setup. 20% faster in function, 0.8% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm fromRonald S. Bultje2010-09-14
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk