From 4041c1029b93162faacda9e3f3cd083d1fbca7ce Mon Sep 17 00:00:00 2001 From: Wu Jianhua Date: Wed, 4 Aug 2021 10:06:15 +0800 Subject: libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512() We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data: ff_horiz_slice_avx2(old): 109.89 ff_horiz_slice_avx2(new): 666.67 ff_horiz_slice_avx512: 1000 Co-authored-by: Cheng Yanfei Co-authored-by: Jin Jun Signed-off-by: Wu Jianhua --- libavfilter/gblur.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'libavfilter/gblur.h') diff --git a/libavfilter/gblur.h b/libavfilter/gblur.h index 367575a6db..3a66984b06 100644 --- a/libavfilter/gblur.h +++ b/libavfilter/gblur.h @@ -39,9 +39,11 @@ typedef struct GBlurContext { int flt; int depth; + int stride; int planewidth[4]; int planeheight[4]; float *buffer; + float *localbuf; ///< temporary buffer for horiz_slice. NULL if not used float boundaryscale; float boundaryscaleV; float postscale; @@ -49,7 +51,7 @@ typedef struct GBlurContext { float nu; float nuV; int nb_planes; - void (*horiz_slice)(float *buffer, int width, int height, int steps, float nu, float bscale); + void (*horiz_slice)(float *buffer, int width, int height, int steps, float nu, float bscale, float *localbuf); void (*verti_slice)(float *buffer, int width, int height, int slice_start, int slice_end, int steps, float nu, float bscale); void (*postscale_slice)(float *buffer, int length, float postscale, float min, float max); -- cgit v1.2.3