summaryrefslogtreecommitdiff
path: root/libavfilter/x86
Commit message (Collapse)AuthorAge
* avfilter/vf_nlmeans: add x86 SIMDPaul B Mahol2021-11-11
|
* x86/vf_lut3d: use three operand form for some instructionsJames Almer2021-10-14
| | | | | | Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/vf_lut3d: fix building with --disable-optimizationsMark Reid2021-10-13
|
* avfilter/vf_lut3d: add x86-optimized tetrahedral interpolationMark Reid2021-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines.  I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp,       1 runs,      0 skips 1434035335 UNITS in lut3d->interp,       2 runs,      0 skips 1423615347 UNITS in lut3d->interp,       4 runs,      0 skips 1426268863 UNITS in lut3d->interp,       8 runs,      0 skips sse2 905484420 UNITS in lut3d->interp,       1 runs,      0 skips 905659010 UNITS in lut3d->interp,       2 runs,      0 skips 915167140 UNITS in lut3d->interp,       4 runs,      0 skips 915834222 UNITS in lut3d->interp,       8 runs,      0 skips avx 574794860 UNITS in lut3d->interp,       1 runs,      0 skips 581035090 UNITS in lut3d->interp,       2 runs,      0 skips 584116720 UNITS in lut3d->interp,       4 runs,      0 skips 581460290 UNITS in lut3d->interp,       8 runs,      0 skips avx2 301698880 UNITS in lut3d->interp,       1 runs,      0 skips 301982880 UNITS in lut3d->interp,       2 runs,      0 skips 306962430 UNITS in lut3d->interp,       4 runs,      0 skips 305472025 UNITS in lut3d->interp,       8 runs,      0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp,       1 runs,      0 skips 1502922990 UNITS in lut3d->interp,       2 runs,      0 skips 1496114307 UNITS in lut3d->interp,       4 runs,      0 skips 1492554551 UNITS in lut3d->interp,       8 runs,      0 skips sse2 980777180 UNITS in lut3d->interp,       1 runs,      0 skips 986121520 UNITS in lut3d->interp,       2 runs,      0 skips 986489840 UNITS in lut3d->interp,       4 runs,      0 skips 998832248 UNITS in lut3d->interp,       8 runs,      0 skips avx 622212360 UNITS in lut3d->interp,       1 runs,      0 skips 622981160 UNITS in lut3d->interp,       2 runs,      0 skips 645396315 UNITS in lut3d->interp,       4 runs,      0 skips 641057075 UNITS in lut3d->interp,       8 runs,      0 skips avx2 321336400 UNITS in lut3d->interp,       1 runs,      0 skips 321268920 UNITS in lut3d->interp,       2 runs,      0 skips 323459895 UNITS in lut3d->interp,       4 runs,      0 skips 324949967 UNITS in lut3d->interp,       8 runs,      0 skips
* avfilter/x86/vf_blend: unify indentation formatWu Jianhua2021-10-03
| | | | Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: correct the order of loop stepWu Jianhua2021-09-18
| | | | | | | | | | | The problem was caused by if the width of the processed block minus 1 is a multiple of the aligned number the instruction jle .bscale_scalar would skip the Optimized Loop Step, which will lead to an incorrect sampling when specifying steps more than 1. Move the Optimized Loop Step after .bscale_scalar to ensure the loop step is enabled. Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: fixed the fate-test failed on MacOSWu Jianhua2021-09-18
| | | | Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512()Wu Jianhua2021-08-29
| | | | | | | | | | | | | | | | | | We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data: ff_horiz_slice_avx2(old): 109.89 ff_horiz_slice_avx2(new): 666.67 ff_horiz_slice_avx512: 1000 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512()Wu Jianhua2021-08-29
| | | | | | | | | | | | | | The new vertical slice with AVX2/512 acceleration can significantly improve the performance of Gaussian Filter 2D. Performance data: ff_verti_slice_c: 32.57 ff_verti_slice_avx2: 476.19 ff_verti_slice_avx512: 833.33 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: add ff_postscale_slice_avx512()Wu Jianhua2021-08-29
| | | | | | Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* avfilter/avf_showcqt: switch to TX FFT from avutilPaul B Mahol2021-07-27
|
* Remove unnecessary mem.h inclusionsAndreas Rheinhardt2021-07-22
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* x86/vf_gblur: fix reg name in UNIX64 prologueJames Almer2021-02-17
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_gblur: fix postscale_slice prologueJames Almer2021-02-17
| | | | | | | x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64 ABI uses only the first four regs for this purpose. Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_gblur: add postscale SIMDPaul B Mahol2021-02-16
|
* avfilter/vf_convolution: add 16-column operation for filter_column()Paul B Mahol2021-02-13
| | | | Based on patch by Xu Jun <xujunzz@sjtu.edu.cn>
* avfilter/vf_atadenoise: add sigma optionsPaul B Mahol2021-01-22
|
* avfilter/vf_v360: add mitchell interpolationPaul B Mahol2020-10-04
|
* avfilter/x86/vf_convolution_init: there is asm only for 8bit depthPaul B Mahol2020-09-15
|
* Revert "avfilter/yadif: simplify the code for better readability"Limin Wang2020-08-27
| | | | This reverts commit 2a9b934675b9e2d3850b46f8a618c19b03f02551.
* avfilter/yadif: simplify the code for better readabilityLimin Wang2020-08-26
| | | | Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
* x86/vf_blend: fix warnings about trailing empty parametersJames Almer2020-07-12
| | | | | | Finishes fixing ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_v360_init: add missing casesPaul B Mahol2020-04-02
|
* avfilter/vf_v360: add SIMD for lagrange9 interpolationPaul B Mahol2020-04-02
|
* vf_ssim: Fix loading doubles to float registers on i386Martin Storsjö2020-02-05
| | | | | | | This fixes the tests filter-refcmp-ssim-yuv and filter-refcmp-ssim-rgb on i386 after breaking in fcc0424c933742c8fc852371e985d16b6eb4bfe9. Signed-off-by: Martin Storsjö <martin@martin.st>
* avfilter/vf_ssim: improve precisionPaul B Mahol2020-02-04
| | | | Use doubles for accumulating floats.
* avfilter/vf_v360: change remaps to int16_t typePaul B Mahol2020-01-19
|
* avfilter/x86/vf_interlace: always use unaligned movsMarton Balint2019-12-15
| | | | | | | | | | Fixes crashes in command lines such as: ffmpeg -f lavfi -i testsrc2=704x576:r=50,interlace,pad=720:576:8 -f null none Related to ticket #6491. Signed-off-by: Marton Balint <cus@passwd.hu>
* avfilter/vf_maskedclamp: add x86 SIMDPaul B Mahol2019-10-23
|
* x86/vf_transpose: make ff_transpose_8x8_16_sse2 work on x86_32James Almer2019-10-22
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_transpose: fix cpuflags checkJames Almer2019-10-21
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/vf_transpose: add x86 SIMDPaul B Mahol2019-10-21
|
* avfilter/x86/vf_atadenoise: fix commentPaul B Mahol2019-10-21
|
* avfilter/x86/vf_atadenoise: add SIMD for serial tooPaul B Mahol2019-10-17
|
* avfilter/vf_atadenoise: add option to use additional algorithmPaul B Mahol2019-10-17
|
* avfilter/vf_adadenoise: add x86 SIMDPaul B Mahol2019-10-17
|
* avfilter/vf_gblur: fix heap-buffer overflowPaul B Mahol2019-10-16
| | | | Fixes #8282
* avcodec/filter: Remove extra '; ' outside of functionsAndreas Rheinhardt2019-10-07
| | | | | | | | They are not allowed outside of functions. Fixes the warning "ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]" when compiling with GCC and -pedantic. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* avfilter/vf_eq: fix compilation with x86 asm disabledJames Almer2019-09-26
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_eq: add SSE2 versionTing Fu2019-09-26
| | | | Signed-off-by: Ting Fu <ting.fu@intel.com>
* avfilter/x86/vf_eq: Change inline assembly into nasm codeTing Fu2019-09-26
| | | | Signed-off-by: Ting Fu <ting.fu@intel.com>
* avfilter/x86/vf_360: add most of >8 depth asmPaul B Mahol2019-09-16
|
* x86/vf_v360: use a faster horizontal add in remap4_8bit_line_avx2James Almer2019-09-06
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_v360: make remap{1,2}_8bit_line_avx2 work on x86_32James Almer2019-09-06
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/vf_v360: x86 SIMD for interpolationsPaul B Mahol2019-09-06
|
* avfilter/vf_convolution: add x86 SIMD for filter_3x3()Ruiling Song2019-08-07
| | | | | | | | | | | Tested using a simple command (apply edge enhance): ./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \ -vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \ -an -vframes 1000 -f null /dev/null The fps increase from 151 to 270 on my local machine. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
* avfilter/vf_gblur: add missing preprocessor checkJames Almer2019-06-12
| | | | | | Fixes compilation on x86_32 Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/vf_gblur: add x86 SIMD optimizationsRuiling Song2019-06-12
| | | | | | | | | | | | | The horizontal pass get ~2x performance with the patch under single thread. Tested overall performance using the command(avx2 enabled): ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null For single thread, the fps improves from 43 to 60, about 40%. For multi-thread, the fps improves from 110 to 130, about 20%. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
* avfilter: add anlmdn filter x86 SIMD optimizationsPaul B Mahol2019-01-10
|
* x86/af_afir: use three operand form forat some instructionsJames Almer2019-01-03
| | | | | | Fixes compilation with old yasm versions. Signed-off-by: James Almer <jamrial@gmail.com>