summaryrefslogtreecommitdiff
path: root/libavfilter/x86
Commit message (Collapse)AuthorAge
* x86/showcqt: use three operand format for some instructionsJames Almer2016-06-08
| | | | | | Fixes failures with yasm 1.1.0 and older Signed-off-by: James Almer <jamrial@gmail.com>
* x86/showcqt: add missing preprocessor checksJames Almer2016-06-08
| | | | | | Old yasm/nasm versions don't support some of these Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/x86util: move haddps sse emulation from showcqtJames Almer2016-06-08
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/avf_showcqt: cqt_calc optimization on x86Muhammad Faiz2016-06-08
| | | | | | | | | | | | | | | | | | | | | | | on x86_64: time PSNR plain 3.303 inf SSE 1.649 107.087535 SSE3 1.632 107.087535 AVX 1.409 106.986771 FMA3 1.265 107.108437 on x86_32 (PSNR compared to x86_64 plain): time PSNR plain 7.225 103.951979 SSE 1.827 105.859282 SSE3 1.819 105.859282 AVX 1.533 105.997661 FMA3 1.384 105.885377 FMA4 test is not available Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
* avfilter/vf_blend: fix incorrect Y variable when threading is usedPaul B Mahol2016-05-23
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* vf_colorspace: use enums for bpp/subsampling array indices.Ronald S. Bultje2016-05-10
| | | | Also add some documentation for each function to colorspacedsp.h.
* vf_colorspace: add const to yuv_stride[] argument in DSP functions.Ronald S. Bultje2016-05-10
|
* vf_colorspace: x86-64 SIMD (SSE2) optimizations.Ronald S. Bultje2016-04-12
|
* avfilter/vf_bwdif: Add yadif base information to copyright headerThomas Mundt2016-03-16
| | | | | Signed-off-by: Thomas Mundt <loudmax@yahoo.de> Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/vf_bwdif: add x86 SIMDThomas Mundt2016-03-13
| | | | Signed-off-by: Thomas Mundt <loudmax@yahoo.de>
* x86/vf_blend: Add SSE2 optimization for divideTimothy Gu2016-02-28
| | | | | | 4.5x faster than C float version with autovectorization 10 x faster than C int version 25 x faster than C float version without autovectorization
* vf_blend: Reduce number of arguments for kernel functionTimothy Gu2016-02-14
|
* x86/vf_blend: Add SSE2 optimization for screenTimothy Gu2016-02-10
| | | | | | 10x faster than C. Reviewed-by: Paul B Mahol <onemda@gmail.com>
* x86/vf_blend: Move multiplying to a macroTimothy Gu2016-02-10
| | | | Reviewed-by: Paul B Mahol <onemda@gmail.com>
* vf_blend: Add SSE2 optimization for multiplyTimothy Gu2016-02-08
| | | | 5 times faster than C, 3 times overall.
* x86/vf_w3fdif: 32-bit compatibility for w3fdif_simple_highHendrik Leppkes2016-01-08
|
* x86/vf_stereo3d: remove a few unnecessary movasJames Almer2016-01-03
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_stereo3d: make ff_anaglyph_sse4 work on x86_32James Almer2015-12-28
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_stereo3d: optimize register usageJames Almer2015-12-28
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_blend: add sse2 versions of blend_difference and blend_negationJames Almer2015-12-24
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_blend: make all functions work on x86_32James Almer2015-12-24
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_blend: simplify using macrosJames Almer2015-12-24
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_maskedmerge: make ff_maskedmerge8_sse2 work on x86_32James Almer2015-12-24
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_maskedmerge: Clear upper part of widthMichael Niedermayer2015-12-23
| | | | | | | Fixes crash Fixes: Ticket5055 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avfilter/x86/vf_maskedmerge: move %define out of .nextrowPaul B Mahol2015-12-10
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* x86/vf_w3fdif: use aligned loads in w3fdif_complex_highJames Almer2015-10-27
| | | | | Found-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_w3fdif: use aligned loads in w3fdif_simple_highJames Almer2015-10-11
| | | | | Found-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_w3fdif: simplify w3fdif_simple_highJames Almer2015-10-11
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_w3fdif: move pxor outside the loop in w3fdif_complex_lowJames Almer2015-10-11
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_w3fdif: add colons after labelsPaul B Mahol2015-10-10
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_w3fdif: add x86 SIMDPaul B Mahol2015-10-10
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* doc: fix spelling errorsAndreas Cadhalpun2015-10-09
| | | | | Reviewed-by: Lou Logan <lou@lrcd.com> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
* avfilter/x86/vf_blend.asm: hardmix: do same with two pxor instructions lessPaul B Mahol2015-10-07
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/x86/vf_blend.asm: 11th register is used, update functionsPaul B Mahol2015-10-07
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/x86/vf_blend.asm: add hardmix and phoenix sse2 SIMDPaul B Mahol2015-10-07
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_stereo3d: add x86 SIMD for anaglyph outputsPaul B Mahol2015-10-06
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_blend: Fix argument types, fix segfault in asmMichael Niedermayer2015-10-03
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avfilter/vf_blend: add x86 SIMD for some modesPaul B Mahol2015-10-03
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_maskedmerge: add SIMD for maskedmerge with 8 bit depth inputPaul B Mahol2015-10-02
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/x86/vf_psnr.asm: fix typoPaul B Mahol2015-10-01
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* Replace all remaining occurances of step/depth_minus1 and offset_plus1Hendrik Leppkes2015-09-08
|
* options: mark av_get_{int,double,q} as deprecated.Ronald S. Bultje2015-08-18
| | | | Convert last users to av_opt_get_*() counterparts.
* x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-04
| | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
* x86/vf_interlace: add missing colon to labelsJames Almer2015-07-26
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_ssim: add ff_ssim_4x4_line_xopJames Almer2015-07-20
| | | | | | | ~20% faster than ssse3. Also enabled for x86_32 Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_ssim: fix some instruction commentsJames Almer2015-07-20
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_psnr.asm: split one line of license text into twoPaul B Mahol2015-07-14
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_removegrain: add x86 and x86_64 SSE2 functionsJames Darnley2015-07-14
| | | | | | | | | | | Speed of all modes increased by a factor between 7.4 and 19.8 largely depending on whether bytes are unpacked into words. Modes 2, 3, and 4 have been sped-up by a factor of 43 (thanks quick sort!) All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20, 21, and 22 are available on x86 due to the number of SIMD registers used. With a contribution from James Almer <jamrial@gmail.com>
* vf_psnr: sse2 optimizations for sum-squared-error.Ronald S. Bultje2015-07-14
| | | | | | | | | | | | The internal line accumulator for 16bit can overflow, so I changed that from int to uint64_t in the C code. The matching assembly looks a little weird but output looks correct. (avx2 should be trivial to add later.) Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.Ronald S. Bultje2015-07-14
| | | | | | | | Both are 2-2.5x faster than their C counterpart. Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>