libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	avfilter/vf_nlmeans: add x86 SIMD	Paul B Mahol	2021-11-11
\|
*	x86/vf_lut3d: use three operand form for some instructions	James Almer	2021-10-14
\| \| \| \| \| \|	Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/vf_lut3d: fix building with --disable-optimizations	Mark Reid	2021-10-13
\|
*	avfilter/vf_lut3d: add x86-optimized tetrahedral interpolation	Mark Reid	2021-10-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines. I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp, 1 runs, 0 skips 1434035335 UNITS in lut3d->interp, 2 runs, 0 skips 1423615347 UNITS in lut3d->interp, 4 runs, 0 skips 1426268863 UNITS in lut3d->interp, 8 runs, 0 skips sse2 905484420 UNITS in lut3d->interp, 1 runs, 0 skips 905659010 UNITS in lut3d->interp, 2 runs, 0 skips 915167140 UNITS in lut3d->interp, 4 runs, 0 skips 915834222 UNITS in lut3d->interp, 8 runs, 0 skips avx 574794860 UNITS in lut3d->interp, 1 runs, 0 skips 581035090 UNITS in lut3d->interp, 2 runs, 0 skips 584116720 UNITS in lut3d->interp, 4 runs, 0 skips 581460290 UNITS in lut3d->interp, 8 runs, 0 skips avx2 301698880 UNITS in lut3d->interp, 1 runs, 0 skips 301982880 UNITS in lut3d->interp, 2 runs, 0 skips 306962430 UNITS in lut3d->interp, 4 runs, 0 skips 305472025 UNITS in lut3d->interp, 8 runs, 0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp, 1 runs, 0 skips 1502922990 UNITS in lut3d->interp, 2 runs, 0 skips 1496114307 UNITS in lut3d->interp, 4 runs, 0 skips 1492554551 UNITS in lut3d->interp, 8 runs, 0 skips sse2 980777180 UNITS in lut3d->interp, 1 runs, 0 skips 986121520 UNITS in lut3d->interp, 2 runs, 0 skips 986489840 UNITS in lut3d->interp, 4 runs, 0 skips 998832248 UNITS in lut3d->interp, 8 runs, 0 skips avx 622212360 UNITS in lut3d->interp, 1 runs, 0 skips 622981160 UNITS in lut3d->interp, 2 runs, 0 skips 645396315 UNITS in lut3d->interp, 4 runs, 0 skips 641057075 UNITS in lut3d->interp, 8 runs, 0 skips avx2 321336400 UNITS in lut3d->interp, 1 runs, 0 skips 321268920 UNITS in lut3d->interp, 2 runs, 0 skips 323459895 UNITS in lut3d->interp, 4 runs, 0 skips 324949967 UNITS in lut3d->interp, 8 runs, 0 skips
*	avfilter/x86/vf_blend: unify indentation format	Wu Jianhua	2021-10-03
\| \| \| \|	Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: correct the order of loop step	Wu Jianhua	2021-09-18
\| \| \| \| \| \| \| \| \| \| \|	The problem was caused by if the width of the processed block minus 1 is a multiple of the aligned number the instruction jle .bscale_scalar would skip the Optimized Loop Step, which will lead to an incorrect sampling when specifying steps more than 1. Move the Optimized Loop Step after .bscale_scalar to ensure the loop step is enabled. Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: fixed the fate-test failed on MacOS	Wu Jianhua	2021-09-18
\| \| \| \|	Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512()	Wu Jianhua	2021-08-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data: ff_horiz_slice_avx2(old): 109.89 ff_horiz_slice_avx2(new): 666.67 ff_horiz_slice_avx512: 1000 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512()	Wu Jianhua	2021-08-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new vertical slice with AVX2/512 acceleration can significantly improve the performance of Gaussian Filter 2D. Performance data: ff_verti_slice_c: 32.57 ff_verti_slice_avx2: 476.19 ff_verti_slice_avx512: 833.33 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: add ff_postscale_slice_avx512()	Wu Jianhua	2021-08-29
\| \| \| \| \| \|	Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	avfilter/avf_showcqt: switch to TX FFT from avutil	Paul B Mahol	2021-07-27
\|
*	Remove unnecessary mem.h inclusions	Andreas Rheinhardt	2021-07-22
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	x86/vf_gblur: fix reg name in UNIX64 prologue	James Almer	2021-02-17
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/vf_gblur: fix postscale_slice prologue	James Almer	2021-02-17
\| \| \| \| \| \| \|	x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64 ABI uses only the first four regs for this purpose. Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/x86/vf_gblur: add postscale SIMD	Paul B Mahol	2021-02-16
\|
*	avfilter/vf_convolution: add 16-column operation for filter_column()	Paul B Mahol	2021-02-13
\| \| \| \|	Based on patch by Xu Jun <xujunzz@sjtu.edu.cn>
*	avfilter/vf_atadenoise: add sigma options	Paul B Mahol	2021-01-22
\|
*	avfilter/vf_v360: add mitchell interpolation	Paul B Mahol	2020-10-04
\|
*	avfilter/x86/vf_convolution_init: there is asm only for 8bit depth	Paul B Mahol	2020-09-15
\|
*	Revert "avfilter/yadif: simplify the code for better readability"	Limin Wang	2020-08-27
\| \| \| \|	This reverts commit 2a9b934675b9e2d3850b46f8a618c19b03f02551.
*	avfilter/yadif: simplify the code for better readability	Limin Wang	2020-08-26
\| \| \| \|	Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
*	x86/vf_blend: fix warnings about trailing empty parameters	James Almer	2020-07-12
\| \| \| \| \| \|	Finishes fixing ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/x86/vf_v360_init: add missing cases	Paul B Mahol	2020-04-02
\|
*	avfilter/vf_v360: add SIMD for lagrange9 interpolation	Paul B Mahol	2020-04-02
\|
*	vf_ssim: Fix loading doubles to float registers on i386	Martin Storsjö	2020-02-05
\| \| \| \| \| \| \|	This fixes the tests filter-refcmp-ssim-yuv and filter-refcmp-ssim-rgb on i386 after breaking in fcc0424c933742c8fc852371e985d16b6eb4bfe9. Signed-off-by: Martin Storsjö <martin@martin.st>
*	avfilter/vf_ssim: improve precision	Paul B Mahol	2020-02-04
\| \| \| \|	Use doubles for accumulating floats.
*	avfilter/vf_v360: change remaps to int16_t type	Paul B Mahol	2020-01-19
\|
*	avfilter/x86/vf_interlace: always use unaligned movs	Marton Balint	2019-12-15
\| \| \| \| \| \| \| \| \| \|	Fixes crashes in command lines such as: ffmpeg -f lavfi -i testsrc2=704x576:r=50,interlace,pad=720:576:8 -f null none Related to ticket #6491. Signed-off-by: Marton Balint <cus@passwd.hu>
*	avfilter/vf_maskedclamp: add x86 SIMD	Paul B Mahol	2019-10-23
\|
*	x86/vf_transpose: make ff_transpose_8x8_16_sse2 work on x86_32	James Almer	2019-10-22
\| \| \| \| \|	Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/vf_transpose: fix cpuflags check	James Almer	2019-10-21
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/vf_transpose: add x86 SIMD	Paul B Mahol	2019-10-21
\|
*	avfilter/x86/vf_atadenoise: fix comment	Paul B Mahol	2019-10-21
\|
*	avfilter/x86/vf_atadenoise: add SIMD for serial too	Paul B Mahol	2019-10-17
\|
*	avfilter/vf_atadenoise: add option to use additional algorithm	Paul B Mahol	2019-10-17
\|
*	avfilter/vf_adadenoise: add x86 SIMD	Paul B Mahol	2019-10-17
\|
*	avfilter/vf_gblur: fix heap-buffer overflow	Paul B Mahol	2019-10-16
\| \| \| \|	Fixes #8282
*	avcodec/filter: Remove extra '; ' outside of functions	Andreas Rheinhardt	2019-10-07
\| \| \| \| \| \| \| \|	They are not allowed outside of functions. Fixes the warning "ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]" when compiling with GCC and -pedantic. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
*	avfilter/vf_eq: fix compilation with x86 asm disabled	James Almer	2019-09-26
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/x86/vf_eq: add SSE2 version	Ting Fu	2019-09-26
\| \| \| \|	Signed-off-by: Ting Fu <ting.fu@intel.com>
*	avfilter/x86/vf_eq: Change inline assembly into nasm code	Ting Fu	2019-09-26
\| \| \| \|	Signed-off-by: Ting Fu <ting.fu@intel.com>
*	avfilter/x86/vf_360: add most of >8 depth asm	Paul B Mahol	2019-09-16
\|
*	x86/vf_v360: use a faster horizontal add in remap4_8bit_line_avx2	James Almer	2019-09-06
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/vf_v360: make remap{1,2}_8bit_line_avx2 work on x86_32	James Almer	2019-09-06
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/vf_v360: x86 SIMD for interpolations	Paul B Mahol	2019-09-06
\|
*	avfilter/vf_convolution: add x86 SIMD for filter_3x3()	Ruiling Song	2019-08-07
\| \| \| \| \| \| \| \| \| \| \|	Tested using a simple command (apply edge enhance): ./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \ -vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \ -an -vframes 1000 -f null /dev/null The fps increase from 151 to 270 on my local machine. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
*	avfilter/vf_gblur: add missing preprocessor check	James Almer	2019-06-12
\| \| \| \| \| \|	Fixes compilation on x86_32 Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/vf_gblur: add x86 SIMD optimizations	Ruiling Song	2019-06-12
\| \| \| \| \| \| \| \| \| \| \| \| \|	The horizontal pass get ~2x performance with the patch under single thread. Tested overall performance using the command(avx2 enabled): ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null For single thread, the fps improves from 43 to 60, about 40%. For multi-thread, the fps improves from 110 to 130, about 20%. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
*	avfilter: add anlmdn filter x86 SIMD optimizations	Paul B Mahol	2019-01-10
\|
*	x86/af_afir: use three operand form forat some instructions	James Almer	2019-01-03
\| \| \| \| \| \|	Fixes compilation with old yasm versions. Signed-off-by: James Almer <jamrial@gmail.com>