summaryrefslogtreecommitdiff
path: root/libavcodec/x86/h264_idct_10bit.asm
Commit message (Collapse)AuthorAge
* avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functionsJames Darnley2016-11-30
| | | | | | | | | | | | | | | Yorkfield: - sse2: - complex: 4.13x faster (1514 vs. 367 cycles) - simple: 4.38x faster (1836 vs. 419 cycles) Skylake: - sse2: - complex: 3.61x faster ( 936 vs. 260 cycles) - simple: 3.97x faster (1126 vs. 284 cycles) - avx (versus sse2): - complex: 1.07x faster (260 vs. 244 cycles) - simple: 1.03x faster (284 vs. 274 cycles)
* Merge commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c'Clément Bœsch2016-07-09
|\ | | | | | | | | | | | | * commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c': x86: Add missing movsxd for the int stride parameter Merged-by: Clément Bœsch <u@pkh.me>
| * x86: Add missing movsxd for the int stride parameterMartin Storsjö2016-06-17
| | | | | | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* | vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.Ronald S. Bultje2015-10-03
| |
* | x86: lavc: share more constant through definesChristophe Gisquet2015-02-07
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '55519926ef855c671d084ccc151056de9e3d3a77'Michael Niedermayer2014-03-14
|\| | | | | | | | | | | | | | | | | | | * commit '55519926ef855c671d084ccc151056de9e3d3a77': x86: Make function prototype comments in assembly code consistent Conflicts: libavcodec/x86/sbrdsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Make function prototype comments in assembly code consistentDiego Biurrun2014-03-13
| | | | | | | | This helps grepping for functions, among other things.
* | Merge commit 'edd1f833fa145eb9c5026877c699ebe6efca00a0'Michael Niedermayer2014-03-14
|\| | | | | | | | | | | | | * commit 'edd1f833fa145eb9c5026877c699ebe6efca00a0': x86: h264_idct_10_bit: Use proper type in function prototype comments Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: h264_idct_10_bit: Use proper type in function prototype commentsDiego Biurrun2014-03-13
| |
| * h264: Integrate clear_blocks calls with IDCTRonald S. Bultje2013-04-10
| | | | | | | | | | | | | | | | | | The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Martin Storsjö <martin@martin.st>
* | Reinstate proper FFmpeg license for all files.Thilo Borgmann2013-08-30
| |
* | h264: integrate clear_blocks calls with IDCT.Ronald S. Bultje2013-02-19
| | | | | | | | | | | | | | | | | | The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-11-14
|\| | | | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: mmx2 ---> mmxext in asm constructs Conflicts: libavcodec/x86/h264_chromamc_10bit.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264dsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: mmx2 ---> mmxext in asm constructsDiego Biurrun2012-11-14
| |
| * build: Drop AVX assembly ifdefsDiego Biurrun2012-11-11
| | | | | | | | An assembler able to cope with AVX instructions is now required.
| * x86: yasm: Use complete source path for macro helper %includesDiego Biurrun2012-10-31
| | | | | | | | | | This is more consistent with the way we handle C #includes and it simplifies the build system.
* | Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'Michael Niedermayer2012-10-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: include x86inc.asm in x86util.asmDiego Biurrun2012-10-31
| | | | | | | | This is necessary to allow refactoring some x86util macros with cpuflags.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: MSS1 and MSS2: set final pixel format after common stuff has been initialised MSS2 decoder configure: handle --disable-asm before check_deps x86: Split inline and external assembly #ifdefs configure: x86: Separate inline from standalone assembler capabilities pktdumper: Use a custom define instead of PATH_MAX for buffers pktdumper: Use av_strlcpy instead of strncpy pktdumper: Use sizeof(variable) instead of the direct buffer length Conflicts: Changelog configure libavcodec/allcodecs.c libavcodec/avcodec.h libavcodec/codec_desc.c libavcodec/dct-test.c libavcodec/imgconvert.c libavcodec/mss12.c libavcodec/version.h libavfilter/x86/gradfun.c libswscale/x86/yuv2rgb.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Split inline and external assembly #ifdefsDiego Biurrun2012-08-31
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-07
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: fix build with nasm 2.08 x86: use nop cpu directives only if supported x86: fix rNmp macros with nasm build: add trailing / to yasm/nasm -I flags x86: use 32-bit source registers with movd instruction x86: add colons after labels Conflicts: Makefile libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: add colons after labelsMans Rullgard2012-08-07
| | | | | | | | | | | | nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-07-29
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (35 commits) h264_idct_10bit: port x86 assembly to cpuflags. x86inc: clip num_args to 7 on x86-32. x86inc: sync to latest version from x264. fft: rename "z" to "zc" to prevent name collision. wv: return meaningful error codes. wv: return AVERROR_EOF on EOF, not EIO. mp3dec: forward errors for av_get_packet(). mp3dec: remove a pointless local variable. mp3dec: remove commented out cruft. lavfi: bump minor to mark stabilizing the ABI. FATE: add tests for yadif. FATE: add a test for delogo video filter. FATE: add a test for amix audio filter. audiogen: allow specifying random seed as a commandline parameter. vc1dec: Override invalid macroblock quantizer vc1: avoid reading beyond the last line in vc1_draw_sprites() vc1dec: check that coded slice positions and interlacing match. vc1dec: Do not ignore ff_vc1_parse_frame_header_adv return value configure: Move parts that should not be user-selectable to CONFIG_EXTRA lavf: remove commented out cruft in avformat_find_stream_info() ... Conflicts: Makefile configure libavcodec/vc1dec.c libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264dsp_mmx.c libavfilter/version.h libavformat/mp3dec.c libavformat/utils.c libavformat/wv.c libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * h264_idct_10bit: port x86 assembly to cpuflags.Ronald S. Bultje2012-07-28
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-13
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: libxvid: remove disabled code qdm2: make a table static const qdm2: simplify bitstream reader setup for some subpacket types qdm2: use get_bits_left() build: Consistently handle conditional compilation for all optimization OBJS. avpacket, bfi, bgmc, rawenc: K&R prettyprinting cosmetics msrle: convert MS RLE decoding function to bytestream2. x86inc improvements for 64-bit Conflicts: common.mak libavcodec/avpacket.c libavcodec/bfi.c libavcodec/msrledec.c libavcodec/qdm2.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc improvements for 64-bitHenrik Gramner2012-04-11
| | | | | | | | | | | | | | | | | | | | | | | | Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-02-08
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: swscale: make yuv2yuv1 use named registers. h264: mark h264_idct_add8_10 with number of XMM registers. swscale: fix V plane memory location in bilinear/unscaled RGB/YUYV case. vp8: always update next_framep[] before returning from decode_frame(). avconv: estimate next_dts from framerate if it is set. avconv: better next_dts usage. avconv: rename InputStream.pts to last_dts. avconv: reduce overloading for InputStream.pts. avconv: rename InputStream.next_pts to next_dts. avconv: rework -t handling for encoding. avconv: set encoder timebase for subtitles. pva-demux test: add -vn swscale: K&R formatting cosmetics for SPARC code apedec: allow the user to set the maximum number of output samples per call apedec: do not unnecessarily zero output samples for mono frames apedec: allocate a single flat buffer for decoded samples apedec: use sizeof(field) instead of sizeof(type) swscale: split C output functions into separate file. swscale: Split C input functions into separate file. bytestream: Add bytestream2 writing API. The avconv changes are due to massive regressions and bugs not merged yet. Conflicts: ffmpeg.c libavcodec/vp8.c libswscale/swscale.c libswscale/x86/swscale_template.c tests/fate/demux.mak tests/ref/lavf/asf tests/ref/lavf/avi tests/ref/lavf/mkv tests/ref/lavf/mpg tests/ref/lavf/nut tests/ref/lavf/ogg tests/ref/lavf/rm tests/ref/lavf/ts tests/ref/seek/lavf_avi tests/ref/seek/lavf_mkv tests/ref/seek/lavf_rm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * h264: mark h264_idct_add8_10 with number of XMM registers.Michael Kostylev2012-02-07
| | | | | | | | | | | | This fixes XMM register clobber problems on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-01-28
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (71 commits) movenc: Allow writing to a non-seekable output if using empty moov movenc: Support adding isml (smooth streaming live) metadata libavcodec: Don't crash in avcodec_encode_audio if time_base isn't set sunrast: Document the different Sun Raster file format types. sunrast: Add a check for experimental type. libspeexenc: use AVSampleFormat instead of deprecated/removed SampleFormat lavf: remove disabled FF_API_SET_PTS_INFO cruft lavf: remove disabled FF_API_OLD_INTERRUPT_CB cruft lavf: remove disabled FF_API_REORDER_PRIVATE cruft lavf: remove disabled FF_API_SEEK_PUBLIC cruft lavf: remove disabled FF_API_STREAM_COPY cruft lavf: remove disabled FF_API_PRELOAD cruft lavf: remove disabled FF_API_NEW_STREAM cruft lavf: remove disabled FF_API_RTSP_URL_OPTIONS cruft lavf: remove disabled FF_API_MUXRATE cruft lavf: remove disabled FF_API_FILESIZE cruft lavf: remove disabled FF_API_TIMESTAMP cruft lavf: remove disabled FF_API_LOOP_OUTPUT cruft lavf: remove disabled FF_API_LOOP_INPUT cruft lavf: remove disabled FF_API_AVSTREAM_QUALITY cruft ... Conflicts: doc/APIchanges libavcodec/8bps.c libavcodec/avcodec.h libavcodec/libx264.c libavcodec/mjpegbdec.c libavcodec/options.c libavcodec/sunrast.c libavcodec/utils.c libavcodec/version.h libavcodec/x86/h264_deblock.asm libavdevice/libdc1394.c libavdevice/v4l2.c libavformat/avformat.h libavformat/avio.c libavformat/avio.h libavformat/aviobuf.c libavformat/dv.c libavformat/mov.c libavformat/utils.c libavformat/version.h libavformat/wtv.c libavutil/Makefile libavutil/file.c libswscale/x86/input.asm libswscale/x86/swscale_mmx.c libswscale/x86/swscale_template.c tests/ref/lavf/ffm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * config.asm: change %ifdef directives to %if directives.Ronald S. Bultje2012-01-27
| | | | | | | | This allows combining multiple conditionals in a single statement.
* | Move x264asm to libavutil.Kieran Kunhya2011-10-19
|/ | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Fix NASM include directiveDave Yeo2011-08-15
| | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* Move x86util.asm from libavcodec/ to libavutil/.Ronald S. Bultje2011-08-12
| | | | This allows using it in swscale also.
* Move x86inc.asm to libavutil/.Ronald S. Bultje2011-08-12
| | | | This allows using it in libswscale/ also.
* 4:4:4 H.264 decoding supportJason Garrett-Glaser2011-06-13
| | | | Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
* Roll back 4:4:4 H.264 for nowJason Garrett-Glaser2011-06-13
| | | | Needs some ARM/PPC asm modifications.
* 4:4:4 H.264 decoding supportJason Garrett-Glaser2011-06-13
| | | | Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
* Cosmetic changes to h264_idct_10bit.asm.Loren Merritt2011-06-02
| | | | | | Removes redundant dword tags and whitespace changes. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* 2x faster h264_idct_add8_10.Loren Merritt2011-06-02
| | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* Add IDCT functions for 10-bit H.264.Daniel Kang2011-05-31
Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by: Ronald S. Bultje <rbultje@google.com>