libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	avfilter/vf_lut3d: add x86-optimized tetrahedral interpolation	Mark Reid	2021-10-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines. I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp, 1 runs, 0 skips 1434035335 UNITS in lut3d->interp, 2 runs, 0 skips 1423615347 UNITS in lut3d->interp, 4 runs, 0 skips 1426268863 UNITS in lut3d->interp, 8 runs, 0 skips sse2 905484420 UNITS in lut3d->interp, 1 runs, 0 skips 905659010 UNITS in lut3d->interp, 2 runs, 0 skips 915167140 UNITS in lut3d->interp, 4 runs, 0 skips 915834222 UNITS in lut3d->interp, 8 runs, 0 skips avx 574794860 UNITS in lut3d->interp, 1 runs, 0 skips 581035090 UNITS in lut3d->interp, 2 runs, 0 skips 584116720 UNITS in lut3d->interp, 4 runs, 0 skips 581460290 UNITS in lut3d->interp, 8 runs, 0 skips avx2 301698880 UNITS in lut3d->interp, 1 runs, 0 skips 301982880 UNITS in lut3d->interp, 2 runs, 0 skips 306962430 UNITS in lut3d->interp, 4 runs, 0 skips 305472025 UNITS in lut3d->interp, 8 runs, 0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp, 1 runs, 0 skips 1502922990 UNITS in lut3d->interp, 2 runs, 0 skips 1496114307 UNITS in lut3d->interp, 4 runs, 0 skips 1492554551 UNITS in lut3d->interp, 8 runs, 0 skips sse2 980777180 UNITS in lut3d->interp, 1 runs, 0 skips 986121520 UNITS in lut3d->interp, 2 runs, 0 skips 986489840 UNITS in lut3d->interp, 4 runs, 0 skips 998832248 UNITS in lut3d->interp, 8 runs, 0 skips avx 622212360 UNITS in lut3d->interp, 1 runs, 0 skips 622981160 UNITS in lut3d->interp, 2 runs, 0 skips 645396315 UNITS in lut3d->interp, 4 runs, 0 skips 641057075 UNITS in lut3d->interp, 8 runs, 0 skips avx2 321336400 UNITS in lut3d->interp, 1 runs, 0 skips 321268920 UNITS in lut3d->interp, 2 runs, 0 skips 323459895 UNITS in lut3d->interp, 4 runs, 0 skips 324949967 UNITS in lut3d->interp, 8 runs, 0 skips
*	avfilter/vf_maskedclamp: add x86 SIMD	Paul B Mahol	2019-10-23
\|
*	avfilter/vf_transpose: add x86 SIMD	Paul B Mahol	2019-10-21
\|
*	avfilter/vf_adadenoise: add x86 SIMD	Paul B Mahol	2019-10-17
\|
*	avfilter/vf_eq: fix compilation with x86 asm disabled	James Almer	2019-09-26
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/x86/vf_eq: Change inline assembly into nasm code	Ting Fu	2019-09-26
\| \| \| \|	Signed-off-by: Ting Fu <ting.fu@intel.com>
*	avfilter/vf_v360: x86 SIMD for interpolations	Paul B Mahol	2019-09-06
\|
*	avfilter/vf_convolution: add x86 SIMD for filter_3x3()	Ruiling Song	2019-08-07
\| \| \| \| \| \| \| \| \| \| \|	Tested using a simple command (apply edge enhance): ./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \ -vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \ -an -vframes 1000 -f null /dev/null The fps increase from 151 to 270 on my local machine. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
*	avfilter/vf_gblur: add x86 SIMD optimizations	Ruiling Song	2019-06-12
\| \| \| \| \| \| \| \| \| \| \| \| \|	The horizontal pass get ~2x performance with the patch under single thread. Tested overall performance using the command(avx2 enabled): ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null For single thread, the fps improves from 43 to 60, about 40%. For multi-thread, the fps improves from 110 to 130, about 20%. Signed-off-by: Ruiling Song <ruiling.song@intel.com>
*	avfilter: add anlmdn filter x86 SIMD optimizations	Paul B Mahol	2019-01-10
\|
*	avfilter/vf_framerate: factorize SAD functions which compute SAD for a whole ↵	Marton Balint	2018-11-11
\| \| \| \| \| \| \| \| \|	frame Also add SIMD which works on lines because it is faster then calculating it on 8x8 blocks using pixelutils. Signed-off-by: Marton Balint <cus@passwd.hu>
*	avfilter/vf_overlay: add x86 SIMD	Paul B Mahol	2018-05-02
\| \| \| \| \| \| \|	Specifically for yuv444, yuv422, yuv420 format when main stream has no alpha, and alpha is straight. Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	avfilter/vf_interlace: remove duplicate code with same funcionality	Vasile Toncu	2018-04-23
\|
*	avfilter/vf_framerate: add SIMD functions for frame blending	Marton Balint	2018-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Blend function speedups on x86_64 Core i5 4460: ffmpeg -f lavfi -i allyuv -vf framerate=60:threads=1 -f null none C: 447548411 decicycles in Blend, 2048 runs, 0 skips SSSE3: 130020087 decicycles in Blend, 2048 runs, 0 skips AVX2: 128508221 decicycles in Blend, 2048 runs, 0 skips ffmpeg -f lavfi -i allyuv -vf format=yuv420p12,framerate=60:threads=1 -f null none C: 228932745 decicycles in Blend, 2048 runs, 0 skips SSE4: 123357781 decicycles in Blend, 2048 runs, 0 skips AVX2: 121215353 decicycles in Blend, 2048 runs, 0 skips Signed-off-by: Marton Balint <cus@passwd.hu>
*	avfilter: add hflip x86 SIMD	Paul B Mahol	2017-12-04
\| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	avfilter/vf_threshold: add x86 SIMD	Paul B Mahol	2017-12-02
\| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	avfilter: add limiter filter	Paul B Mahol	2017-07-08
\| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	build: Generalize yasm/nasm-related variable names	Diego Biurrun	2017-06-21
\| \| \| \| \| \| \| \|	None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1) Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter: add arbitrary audio FIR filter	Paul B Mahol	2017-05-09
\| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	avfilter/avf_showcqt: cqt_calc optimization on x86	Muhammad Faiz	2016-06-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	on x86_64: time PSNR plain 3.303 inf SSE 1.649 107.087535 SSE3 1.632 107.087535 AVX 1.409 106.986771 FMA3 1.265 107.108437 on x86_32 (PSNR compared to x86_64 plain): time PSNR plain 7.225 103.951979 SSE 1.827 105.859282 SSE3 1.819 105.859282 AVX 1.533 105.997661 FMA3 1.384 105.885377 FMA4 test is not available Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
*	vf_colorspace: x86-64 SIMD (SSE2) optimizations.	Ronald S. Bultje	2016-04-12
\|
*	avfilter/vf_bwdif: add x86 SIMD	Thomas Mundt	2016-03-13
\| \| \| \|	Signed-off-by: Thomas Mundt <loudmax@yahoo.de>
*	avfilter/vf_w3fdif: add x86 SIMD	Paul B Mahol	2015-10-10
\| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	avfilter/vf_stereo3d: add x86 SIMD for anaglyph outputs	Paul B Mahol	2015-10-06
\| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	avfilter/vf_blend: add x86 SIMD for some modes	Paul B Mahol	2015-10-03
\| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	avfilter/vf_maskedmerge: add SIMD for maskedmerge with 8 bit depth input	Paul B Mahol	2015-10-02
\| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
*	avfilter/vf_removegrain: add x86 and x86_64 SSE2 functions	James Darnley	2015-07-14
\| \| \| \| \| \| \| \| \| \| \|	Speed of all modes increased by a factor between 7.4 and 19.8 largely depending on whether bytes are unpacked into words. Modes 2, 3, and 4 have been sped-up by a factor of 43 (thanks quick sort!) All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20, 21, and 22 are available on x86 due to the number of SIMD registers used. With a contribution from James Almer <jamrial@gmail.com>
*	vf_psnr: sse2 optimizations for sum-squared-error.	Ronald S. Bultje	2015-07-14
\| \| \| \| \| \| \| \| \| \| \| \|	The internal line accumulator for 16bit can overflow, so I changed that from int to uint64_t in the C code. The matching assembly looks a little weird but output looks correct. (avx2 should be trivial to add later.) Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
*	vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.	Ronald S. Bultje	2015-07-14
\| \| \| \| \| \| \| \|	Both are 2-2.5x faster than their C counterpart. Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
*	avfilter: Port mp=eq/eq2 to lavfi	Arwa Arif	2015-01-26
\| \| \| \| \| \| \|	Code adapted from James Darnley's port Some fixes from Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/vf_pp7: port dctB_mmx to yasm	James Almer	2015-01-09
\| \| \| \| \|	Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
*	lavfi: port mp=pp7 to libavfilter	Arwa Arif	2015-01-09
\| \| \| \| \| \| \|	The only difference with mp=pp7 is that default mode is "medium", as stated in the MPlayer docs, rather than "hard". Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
*	x86/vf_fspp: port inline asm to yasm	James Almer	2014-12-26
\| \| \| \| \|	Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
*	lavfi: port mp=fspp to a native libavfilter filter	Arwa Arif	2014-12-24
\| \| \| \|	Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
*	avfilter/tinterlace: add Support for ff_lowpass_line_avx() & ↵	Michael Niedermayer	2014-11-15
\| \| \| \| \| \| \| \|	ff_lowpass_line_sse2() Based-on: 2e1704059ae8625beda2ffde847ad22c5ba416dc by Kieran Kunhya Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	Merge commit '2e1704059ae8625beda2ffde847ad22c5ba416dc'	Michael Niedermayer	2014-11-15
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '2e1704059ae8625beda2ffde847ad22c5ba416dc': vf_interlace: Add SIMD for lowpass filter Conflicts: libavfilter/vf_interlace.c libavfilter/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	vf_interlace: Add SIMD for lowpass filter	Kieran Kunhya	2014-11-15
\| \| \| \| \| \| \| \|	Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* \|	x86/vf_noise: move asm code to a separate file	James Almer	2014-10-17
\| \| \| \| \| \| \| \| \| \|	Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* \|	avfilter/vf_idet: MMX/MMXEXT/SSE2 implementation of idet's filter_line()	skal	2014-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	integration by Neil Birkbeck, with help from Vitor Sessak. core SSE2 loop by Skal (pascal.massimino@gmail.com) Reviewed-by: Clément Bœsch <u@pkh.me> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Revert "Revert "vf_yadif: move x86 init code to x86/yadif.c""	Robert Krüger	2014-01-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 975110a85ef8e794fdc041455ff41b0ad30bc01e. Signed-off-by: Robert Krüger <krueger@lesspain.de> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Revert "vf_yadif: move x86 init code to x86/yadif.c"	Michael Niedermayer	2013-12-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit a87b17f3283aada762820f1b797eeb7a2dff6c61. This reduces the amount of non LGPL code, making a relicensing to LGPL easier Conflicts: libavfilter/vf_yadif.c libavfilter/x86/yadif.c libavfilter/x86/yadif_template.c libavfilter/yadif.h Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Merge commit '0e730494160d973400aed8d2addd1f58a0ec883e'	Michael Niedermayer	2013-10-24
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '0e730494160d973400aed8d2addd1f58a0ec883e': avfilter: x86: Port gradfun filter optimizations to yasm Conflicts: libavfilter/x86/vf_gradfun_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	avfilter: x86: Port gradfun filter optimizations to yasm	Daniel Kang	2013-10-23
\| \| \| \| \| \| \| \|	Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	avfilter: port pullup filter from libmpcodecs	Paul B Mahol	2013-09-17
\| \| \| \| \| \| \| \|	Signed-off-by: Paul B Mahol <onemda@gmail.com>
* \|	lavfi: add spp filter.	Clément Bœsch	2013-06-14
\| \|
* \|	yadif: x86 assembly for 9 to 14-bit samples	James Darnley	2013-03-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These smaller samples do not need to be unpacked to double words allowing the code to process more pixels every iteration (still 2 in MMX but 6 in SSE2). It also avoids emulating the missing double word instructions on older instruction sets. Like with the previous code for 16-bit samples this has been tested on an Athlon64 and a Core2Quad. Athlon64: 1809275 decicycles in C, 32718 runs, 50 skips 911675 decicycles in mmx, 32727 runs, 41 skips, 2.0x faster 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster Core2Quad: 921363 decicycles in C, 32756 runs, 12 skips 486537 decicycles in mmx, 32764 runs, 4 skips, 1.9x faster 293296 decicycles in sse2, 32759 runs, 9 skips, 3.1x faster 284910 decicycles in ssse3, 32759 runs, 9 skips, 3.2x faster Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	yadif: x86 assembly for 16-bit samples	James Darnley	2013-03-16
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fairly dumb copy of the assembly for 8-bit samples but it works and produces identical output to the C version. The options have been tested on an Athlon64 and a Core2Quad. Athlon64: 1810385 decicycles in C, 32726 runs, 42 skips 1080744 decicycles in mmx, 32744 runs, 24 skips, 1.7x faster 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster Core2Quad: 924025 decicycles in C, 32750 runs, 18 skips 623995 decicycles in mmx, 32767 runs, 1 skips, 1.5x faster 406223 decicycles in sse2, 32764 runs, 4 skips, 2.3x faster 387842 decicycles in ssse3, 32767 runs, 1 skips, 2.4x faster 307726 decicycles in sse4, 32763 runs, 5 skips, 3.0x faster Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avfilter: x86: consistent filenames for filter optimizations	Diego Biurrun	2013-02-04
\|
*	vf_hqdn3d: x86: Add proper arch optimization initialization	Diego Biurrun	2013-02-01
\|
*	yadif: Port inline assembly to yasm	Daniel Kang	2013-01-09
\| \| \| \|	Signed-off-by: Luca Barbato <lu_zero@gentoo.org>