| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
| |
The problem was caused by if the width of the processed block
minus 1 is a multiple of the aligned number the instruction
jle .bscale_scalar would skip the Optimized Loop Step, which
will lead to an incorrect sampling when specifying steps more
than 1. Move the Optimized Loop Step after .bscale_scalar to
ensure the loop step is enabled.
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
| |
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm.
In a nutshell, the new algorithm does three things, gathering data from
8/16 rows, blurring data, and scattering data back to the image buffer.
Here we used a customized transpose 8x8/16x16 to avoid the huge overhead
brought by gather and scatter instructions, which is dependent on the
temporary buffer called localbuf added newly.
Performance data:
ff_horiz_slice_avx2(old): 109.89
ff_horiz_slice_avx2(new): 666.67
ff_horiz_slice_avx512: 1000
Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new vertical slice with AVX2/512 acceleration can significantly
improve the performance of Gaussian Filter 2D.
Performance data:
ff_verti_slice_c: 32.57
ff_verti_slice_avx2: 476.19
ff_verti_slice_avx512: 833.33
Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
| |
Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
| |
|
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64
ABI uses only the first four regs for this purpose.
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
|
|
|
| |
Based on patch by Xu Jun <xujunzz@sjtu.edu.cn>
|
| |
|
| |
|
| |
|
|
|
|
| |
This reverts commit 2a9b934675b9e2d3850b46f8a618c19b03f02551.
|
|
|
|
| |
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
|
|
|
|
|
|
| |
Finishes fixing ticket #8771
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
| |
|
|
|
|
|
|
|
| |
This fixes the tests filter-refcmp-ssim-yuv and filter-refcmp-ssim-rgb
on i386 after breaking in fcc0424c933742c8fc852371e985d16b6eb4bfe9.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Use doubles for accumulating floats.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Fixes crashes in command lines such as:
ffmpeg -f lavfi -i testsrc2=704x576:r=50,interlace,pad=720:576:8 -f null none
Related to ticket #6491.
Signed-off-by: Marton Balint <cus@passwd.hu>
|
| |
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Fixes #8282
|
|
|
|
|
|
|
|
| |
They are not allowed outside of functions. Fixes the warning
"ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]"
when compiling with GCC and -pedantic.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: Ting Fu <ting.fu@intel.com>
|
|
|
|
| |
Signed-off-by: Ting Fu <ting.fu@intel.com>
|
| |
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Tested using a simple command (apply edge enhance):
./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \
-vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \
-an -vframes 1000 -f null /dev/null
The fps increase from 151 to 270 on my local machine.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
| |
Fixes compilation on x86_32
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The horizontal pass get ~2x performance with the patch
under single thread.
Tested overall performance using the command(avx2 enabled):
./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
For single thread, the fps improves from 43 to 60, about 40%.
For multi-thread, the fps improves from 110 to 130, about 20%.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
|
| |
|
|
|
|
|
|
| |
Fixes compilation with old yasm versions.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
fcmul_add_c: 1228.8
fcmul_add_sse3: 334.3
fcmul_add_avx: 186.3
Tested on a Core i5 4460 @ 3.2GHz
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
ff_fcmul_add_sse3() is now identical to the C version.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|