summaryrefslogtreecommitdiff
path: root/libavfilter/x86/Makefile
Commit message (Collapse)AuthorAge
* avfilter/vf_lut3d: add x86-optimized tetrahedral interpolationMark Reid2021-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines.  I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp,       1 runs,      0 skips 1434035335 UNITS in lut3d->interp,       2 runs,      0 skips 1423615347 UNITS in lut3d->interp,       4 runs,      0 skips 1426268863 UNITS in lut3d->interp,       8 runs,      0 skips sse2 905484420 UNITS in lut3d->interp,       1 runs,      0 skips 905659010 UNITS in lut3d->interp,       2 runs,      0 skips 915167140 UNITS in lut3d->interp,       4 runs,      0 skips 915834222 UNITS in lut3d->interp,       8 runs,      0 skips avx 574794860 UNITS in lut3d->interp,       1 runs,      0 skips 581035090 UNITS in lut3d->interp,       2 runs,      0 skips 584116720 UNITS in lut3d->interp,       4 runs,      0 skips 581460290 UNITS in lut3d->interp,       8 runs,      0 skips avx2 301698880 UNITS in lut3d->interp,       1 runs,      0 skips 301982880 UNITS in lut3d->interp,       2 runs,      0 skips 306962430 UNITS in lut3d->interp,       4 runs,      0 skips 305472025 UNITS in lut3d->interp,       8 runs,      0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp,       1 runs,      0 skips 1502922990 UNITS in lut3d->interp,       2 runs,      0 skips 1496114307 UNITS in lut3d->interp,       4 runs,      0 skips 1492554551 UNITS in lut3d->interp,       8 runs,      0 skips sse2 980777180 UNITS in lut3d->interp,       1 runs,      0 skips 986121520 UNITS in lut3d->interp,       2 runs,      0 skips 986489840 UNITS in lut3d->interp,       4 runs,      0 skips 998832248 UNITS in lut3d->interp,       8 runs,      0 skips avx 622212360 UNITS in lut3d->interp,       1 runs,      0 skips 622981160 UNITS in lut3d->interp,       2 runs,      0 skips 645396315 UNITS in lut3d->interp,       4 runs,      0 skips 641057075 UNITS in lut3d->interp,       8 runs,      0 skips avx2 321336400 UNITS in lut3d->interp,       1 runs,      0 skips 321268920 UNITS in lut3d->interp,       2 runs,      0 skips 323459895 UNITS in lut3d->interp,       4 runs,      0 skips 324949967 UNITS in lut3d->interp,       8 runs,      0 skips
* avfilter/vf_maskedclamp: add x86 SIMDPaul B Mahol2019-10-23
|
* avfilter/vf_transpose: add x86 SIMDPaul B Mahol2019-10-21
|
* avfilter/vf_adadenoise: add x86 SIMDPaul B Mahol2019-10-17
|
* avfilter/vf_eq: fix compilation with x86 asm disabledJames Almer2019-09-26
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_eq: Change inline assembly into nasm codeTing Fu2019-09-26
| | | | Signed-off-by: Ting Fu <ting.fu@intel.com>
* avfilter/vf_v360: x86 SIMD for interpolationsPaul B Mahol2019-09-06
|
* avfilter/vf_convolution: add x86 SIMD for filter_3x3()Ruiling Song2019-08-07
| | | | | | | | | | | Tested using a simple command (apply edge enhance): ./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \ -vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \ -an -vframes 1000 -f null /dev/null The fps increase from 151 to 270 on my local machine. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
* avfilter/vf_gblur: add x86 SIMD optimizationsRuiling Song2019-06-12
| | | | | | | | | | | | | The horizontal pass get ~2x performance with the patch under single thread. Tested overall performance using the command(avx2 enabled): ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null For single thread, the fps improves from 43 to 60, about 40%. For multi-thread, the fps improves from 110 to 130, about 20%. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
* avfilter: add anlmdn filter x86 SIMD optimizationsPaul B Mahol2019-01-10
|
* avfilter/vf_framerate: factorize SAD functions which compute SAD for a whole ↵Marton Balint2018-11-11
| | | | | | | | | frame Also add SIMD which works on lines because it is faster then calculating it on 8x8 blocks using pixelutils. Signed-off-by: Marton Balint <cus@passwd.hu>
* avfilter/vf_overlay: add x86 SIMDPaul B Mahol2018-05-02
| | | | | | | Specifically for yuv444, yuv422, yuv420 format when main stream has no alpha, and alpha is straight. Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_interlace: remove duplicate code with same funcionalityVasile Toncu2018-04-23
|
* avfilter/vf_framerate: add SIMD functions for frame blendingMarton Balint2018-01-28
| | | | | | | | | | | | | | | | | | Blend function speedups on x86_64 Core i5 4460: ffmpeg -f lavfi -i allyuv -vf framerate=60:threads=1 -f null none C: 447548411 decicycles in Blend, 2048 runs, 0 skips SSSE3: 130020087 decicycles in Blend, 2048 runs, 0 skips AVX2: 128508221 decicycles in Blend, 2048 runs, 0 skips ffmpeg -f lavfi -i allyuv -vf format=yuv420p12,framerate=60:threads=1 -f null none C: 228932745 decicycles in Blend, 2048 runs, 0 skips SSE4: 123357781 decicycles in Blend, 2048 runs, 0 skips AVX2: 121215353 decicycles in Blend, 2048 runs, 0 skips Signed-off-by: Marton Balint <cus@passwd.hu>
* avfilter: add hflip x86 SIMDPaul B Mahol2017-12-04
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_threshold: add x86 SIMDPaul B Mahol2017-12-02
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter: add limiter filterPaul B Mahol2017-07-08
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* build: Generalize yasm/nasm-related variable namesDiego Biurrun2017-06-21
| | | | | | | | None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1) Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter: add arbitrary audio FIR filterPaul B Mahol2017-05-09
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/avf_showcqt: cqt_calc optimization on x86Muhammad Faiz2016-06-08
| | | | | | | | | | | | | | | | | | | | | | | on x86_64: time PSNR plain 3.303 inf SSE 1.649 107.087535 SSE3 1.632 107.087535 AVX 1.409 106.986771 FMA3 1.265 107.108437 on x86_32 (PSNR compared to x86_64 plain): time PSNR plain 7.225 103.951979 SSE 1.827 105.859282 SSE3 1.819 105.859282 AVX 1.533 105.997661 FMA3 1.384 105.885377 FMA4 test is not available Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
* vf_colorspace: x86-64 SIMD (SSE2) optimizations.Ronald S. Bultje2016-04-12
|
* avfilter/vf_bwdif: add x86 SIMDThomas Mundt2016-03-13
| | | | Signed-off-by: Thomas Mundt <loudmax@yahoo.de>
* avfilter/vf_w3fdif: add x86 SIMDPaul B Mahol2015-10-10
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_stereo3d: add x86 SIMD for anaglyph outputsPaul B Mahol2015-10-06
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_blend: add x86 SIMD for some modesPaul B Mahol2015-10-03
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_maskedmerge: add SIMD for maskedmerge with 8 bit depth inputPaul B Mahol2015-10-02
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* avfilter/vf_removegrain: add x86 and x86_64 SSE2 functionsJames Darnley2015-07-14
| | | | | | | | | | | Speed of all modes increased by a factor between 7.4 and 19.8 largely depending on whether bytes are unpacked into words. Modes 2, 3, and 4 have been sped-up by a factor of 43 (thanks quick sort!) All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20, 21, and 22 are available on x86 due to the number of SIMD registers used. With a contribution from James Almer <jamrial@gmail.com>
* vf_psnr: sse2 optimizations for sum-squared-error.Ronald S. Bultje2015-07-14
| | | | | | | | | | | | The internal line accumulator for 16bit can overflow, so I changed that from int to uint64_t in the C code. The matching assembly looks a little weird but output looks correct. (avx2 should be trivial to add later.) Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.Ronald S. Bultje2015-07-14
| | | | | | | | Both are 2-2.5x faster than their C counterpart. Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avfilter: Port mp=eq/eq2 to lavfiArwa Arif2015-01-26
| | | | | | | Code adapted from James Darnley's port Some fixes from Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/vf_pp7: port dctB_mmx to yasmJames Almer2015-01-09
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* lavfi: port mp=pp7 to libavfilterArwa Arif2015-01-09
| | | | | | | The only difference with mp=pp7 is that default mode is "medium", as stated in the MPlayer docs, rather than "hard". Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
* x86/vf_fspp: port inline asm to yasmJames Almer2014-12-26
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* lavfi: port mp=fspp to a native libavfilter filterArwa Arif2014-12-24
| | | | Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
* avfilter/tinterlace: add Support for ff_lowpass_line_avx() & ↵Michael Niedermayer2014-11-15
| | | | | | | | ff_lowpass_line_sse2() Based-on: 2e1704059ae8625beda2ffde847ad22c5ba416dc by Kieran Kunhya Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Merge commit '2e1704059ae8625beda2ffde847ad22c5ba416dc'Michael Niedermayer2014-11-15
|\ | | | | | | | | | | | | | | | | | | | | * commit '2e1704059ae8625beda2ffde847ad22c5ba416dc': vf_interlace: Add SIMD for lowpass filter Conflicts: libavfilter/vf_interlace.c libavfilter/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * vf_interlace: Add SIMD for lowpass filterKieran Kunhya2014-11-15
| | | | | | | | Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* | x86/vf_noise: move asm code to a separate fileJames Almer2014-10-17
| | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* | avfilter/vf_idet: MMX/MMXEXT/SSE2 implementation of idet's filter_line()skal2014-09-04
| | | | | | | | | | | | | | | | integration by Neil Birkbeck, with help from Vitor Sessak. core SSE2 loop by Skal (pascal.massimino@gmail.com) Reviewed-by: Clément Bœsch <u@pkh.me> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Revert "Revert "vf_yadif: move x86 init code to x86/yadif.c""Robert Krüger2014-01-14
| | | | | | | | | | | | | | This reverts commit 975110a85ef8e794fdc041455ff41b0ad30bc01e. Signed-off-by: Robert Krüger <krueger@lesspain.de> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Revert "vf_yadif: move x86 init code to x86/yadif.c"Michael Niedermayer2013-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit a87b17f3283aada762820f1b797eeb7a2dff6c61. This reduces the amount of non LGPL code, making a relicensing to LGPL easier Conflicts: libavfilter/vf_yadif.c libavfilter/x86/yadif.c libavfilter/x86/yadif_template.c libavfilter/yadif.h Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '0e730494160d973400aed8d2addd1f58a0ec883e'Michael Niedermayer2013-10-24
|\| | | | | | | | | | | | | | | | | | | * commit '0e730494160d973400aed8d2addd1f58a0ec883e': avfilter: x86: Port gradfun filter optimizations to yasm Conflicts: libavfilter/x86/vf_gradfun_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * avfilter: x86: Port gradfun filter optimizations to yasmDaniel Kang2013-10-23
| | | | | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* | avfilter: port pullup filter from libmpcodecsPaul B Mahol2013-09-17
| | | | | | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* | lavfi: add spp filter.Clément Bœsch2013-06-14
| |
* | yadif: x86 assembly for 9 to 14-bit samplesJames Darnley2013-03-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These smaller samples do not need to be unpacked to double words allowing the code to process more pixels every iteration (still 2 in MMX but 6 in SSE2). It also avoids emulating the missing double word instructions on older instruction sets. Like with the previous code for 16-bit samples this has been tested on an Athlon64 and a Core2Quad. Athlon64: 1809275 decicycles in C, 32718 runs, 50 skips 911675 decicycles in mmx, 32727 runs, 41 skips, 2.0x faster 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster Core2Quad: 921363 decicycles in C, 32756 runs, 12 skips 486537 decicycles in mmx, 32764 runs, 4 skips, 1.9x faster 293296 decicycles in sse2, 32759 runs, 9 skips, 3.1x faster 284910 decicycles in ssse3, 32759 runs, 9 skips, 3.2x faster Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | yadif: x86 assembly for 16-bit samplesJames Darnley2013-03-16
|/ | | | | | | | | | | | | | | | | | | | This is a fairly dumb copy of the assembly for 8-bit samples but it works and produces identical output to the C version. The options have been tested on an Athlon64 and a Core2Quad. Athlon64: 1810385 decicycles in C, 32726 runs, 42 skips 1080744 decicycles in mmx, 32744 runs, 24 skips, 1.7x faster 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster Core2Quad: 924025 decicycles in C, 32750 runs, 18 skips 623995 decicycles in mmx, 32767 runs, 1 skips, 1.5x faster 406223 decicycles in sse2, 32764 runs, 4 skips, 2.3x faster 387842 decicycles in ssse3, 32767 runs, 1 skips, 2.4x faster 307726 decicycles in sse4, 32763 runs, 5 skips, 3.0x faster Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avfilter: x86: consistent filenames for filter optimizationsDiego Biurrun2013-02-04
|
* vf_hqdn3d: x86: Add proper arch optimization initializationDiego Biurrun2013-02-01
|
* yadif: Port inline assembly to yasmDaniel Kang2013-01-09
| | | | Signed-off-by: Luca Barbato <lu_zero@gentoo.org>