libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	swscale: document some missing arguments	Marvin Scholz	2022-10-17
\|
*	swscale: Fix bogus doxy comment #ifdefs	Marvin Scholz	2022-10-17
\| \| \| \| \| \| \| \|	The intention here was probably to document this as use of conditionals does not make sense in a comment. Fixes doxy warning: warning: explicit link request to 'if' could not be resolved
*	libswscale: force a minimum size of the slide for bayer sources	Chema Gonzalez	2022-10-14
\| \| \| \| \| \| \| \| \|	Bayer sources are read in groups of 2 lines (e.g. for a BGGR flavor, the first row contains only B and G samples, while the second row contains only G and R samples). They need to be read as a whole. Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	sws/rgb2rgb: RISC-V 64-bit V packed YUYV/UYVY to planar 4:2:2	Rémi Denis-Courmont	2022-09-30
\| \| \| \| \| \| \|	This is currently 64-bit only because the stack spilling code would not assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory). This could be added later in the unlikely that someone wants it.
*	sws/rgb2rgb: RISC-V V interleaveBytes	Rémi Denis-Courmont	2022-09-30
\|
*	sws/rgb2rgb: RISC-V V shuffle_bytes_xxxx functions	Rémi Denis-Courmont	2022-09-30
\|
*	swscale/output: Don't call av_pix_fmt_desc_get() in a loop	Andreas Rheinhardt	2022-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up until now, libswscale/output.c used a macro to write an output pixel which involved a call to av_pix_fmt_desc_get() to find out whether the input pixel format is BE or LE despite this being known at compile-time (there are templates per pixfmt). Even worse, these calls are made in a loop, so that e.g. there are eight calls to av_pix_fmt_desc_get() for every pixel processed in yuv2rgba64_X_c_template() for 64bit RGB formats. This commit modifies these macros to ensure that isBE() is evaluated at compile-time. This saved 41184B of .text for me (GCC 11.2, -O3). Of course, it also improved performance. E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \ -threads 1 -t 1:00 -f null - (which uses yuv2rgba64le_X_c, which is an invocation of yuv2rgba64_X_c_template() mentioned above), performance improved from 95589 to 41387 decicycles for one call to yuv2packedX; for the be variant the numbers went down from 76087 to 43024 decicycles. Reviewed-by: Anton Khirnov <anton@khirnov.net> Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/input: Avoid calls to av_pix_fmt_desc_get()	Andreas Rheinhardt	2022-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up until now, libswscale/input.c used a macro to read an input pixel which involved a call to av_pix_fmt_desc_get() to find out whether the input pixel format is BE or LE despite this being known at compile-time (there are templates per pixfmt). Even worse, these calls are made in a loop, so that e.g. there are six calls to av_pix_fmt_desc_get() for every pair of UV pixel processed in rgb64ToUV_half_c_template(). This commit modifies these macros to ensure that isBE() is evaluated at compile-time. This saved 9743B of .text for me (GCC 11.2, -O3). For a simple RGB64LE->YUV420P transformation like ffmpeg -f lavfi -i haldclutsrc,format=rgba64le -pix_fmt yuv420p \ -threads 1 -t 1:00 -f null - the amount of decicycles spent in rgb64LEToUV_half_c (which is created via the template mentioned above) decreases from 19751 to 5341; for RGBA64BE the number went down from 11945 to 5393. For shared builds (where the call to av_pix_fmt_desc_get() is indirect) the old numbers are 15230 for RGBA64BE and 27502 for RGBA64LE, whereas the numbers with this patch are indistinguishable from the numbers from a static build. Also make the macros that are touched conform to the usual convention of using uppercase names while just at it. Reviewed-by: Anton Khirnov <anton@khirnov.net> Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/la: Add output_lasx.c file.	Hao Chen	2022-09-10
\| \| \| \| \| \| \| \| \| \| \|	ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -pix_fmt rgb24 -y /dev/null -an before: 150fps after: 183fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
*	swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files	Hao Chen	2022-09-10
\| \| \| \| \| \| \| \| \| \|	ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -pix_fmt rgb24 -y /dev/null -an before: 178fps after: 210fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
*	swscale/la: Optimize hscale functions with lasx.	Hao Chen	2022-09-10
\| \| \| \| \| \| \| \| \| \|	ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -y /dev/null -an before: 101fps after: 138fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
*	swscale/output: add support for Y210LE and Y212LE	Philip Langdale	2022-09-10
\|
*	swscale/output: add support for XV30LE	Philip Langdale	2022-09-10
\|
*	swscale/output: add support for XV36LE	Philip Langdale	2022-09-10
\|
*	swscale/output: add support for P012	Philip Langdale	2022-09-10
\| \| \| \|	This generalises the existing P010 support.
*	swscale/input: Remove spec-incompliant ';'	Andreas Rheinhardt	2022-09-08
\| \| \| \| \| \| \| \| \|	These macros are definitions, not only declarations and therefore should not contain a semicolon. Such a semicolon is actually spec-incompliant, but compilers happen to accept them. Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/input: add support for Y212LE	Philip Langdale	2022-09-06
\|
*	swscale/input: add support for XV30LE	Philip Langdale	2022-09-06
\|
*	swscale/input: add support for P012	Philip Langdale	2022-09-06
\| \| \| \| \|	As we now have three of these formats, I added macros to generate the conversion functions.
*	swscale/input: add support for XV36LE	Philip Langdale	2022-09-06
\|
*	libswscale: add support for VUYX format	Philip Langdale	2022-08-25
\| \| \| \| \| \|	As we already have support for VUYA, I figured I should do the small amount of work to support VUYX as well. That means a little refactoring to share code.
*	swscale/x86/rgb_2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext	Andreas Rheinhardt	2022-08-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x filter-paletteuse-bayer filter-paletteuse-bayer0 filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten when using SSSE3). Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/input: add rgbaf16 input support	Timo Rothenpieler	2022-08-19
\| \| \| \| \| \|	This is by no means perfect, since at least ddagrab will return scRGB data with values outside of 0.0f to 1.0f for HDR values. Its primary purpose is to be able to work with the format at all.
*	swscale: add opaque parameter to input functions	Timo Rothenpieler	2022-08-19
\|
*	swscale/x86/yuv2yuvX: Remove unused ff_yuv2yuvX_mmx()	Andreas Rheinhardt	2022-08-19
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	libswscale: Enable hscale_avx2 for all input sizes.	Alan Kelly	2022-08-18
\| \| \| \| \| \|	ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	sws: allow avx2 hscale to process inputs of any size.	Alan Kelly	2022-08-18
\| \| \| \| \| \| \|	The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext	Alan Kelly	2022-08-18
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/aarch64: add vscale specializations	Swinney, Jonathan	2022-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. On AWS c7g (Graviton 3, Neoverse V1) instances: before after yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9 yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9 yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
*	swscale/aarch64: vscale optimization	Swinney, Jonathan	2022-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \|	Use scalar times vector multiply accumlate instructions instead of vector times vector to remove the need for replicating load instructions which are slightly slower. On AWS c7g (Graviton 3, Neoverse V1) instances: yuv2yuvX_8_0_512_accurate_neon: 1144.8 987.4 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
*	checkasm: updated tests for sw_scale	Swinney, Jonathan	2022-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
*	libswscale/aarch64: add another hscale specialization	Swinney, Jonathan	2022-08-16
\| \| \| \| \| \| \| \| \| \| \| \|	This specialization handles the case where filtersize is 4 mod 8, e.g. 12, 20, etc. Aarch64 was previously using the c function for this case. This implementation speeds up that case significantly. hscale_8_to_15__fs_12_dstW_512_c: 6234.1 hscale_8_to_15__fs_12_dstW_512_neon: 1505.6 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
*	configure: always enable gnu_windres if available	Timo Rothenpieler	2022-08-13
\| \| \| \| \|	Use the appropiate Makefile variable to ensure the resource file is only built into shared libraries instead.
*	swscale/output: fix reading chroma values when generating vuya output	James Almer	2022-08-08
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	swscale/output: add VUYA output support	James Almer	2022-08-07
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	swscale/input: add VUYA input support	James Almer	2022-08-05
\| \| \| \| \|	Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: James Almer <jamrial@gmail.com>
*	swscale/rgb2rgb: Don't cast const away	Andreas Rheinhardt	2022-07-31
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale: add NV16 input/output	Matthieu Bouron	2022-07-19
\| \| \| \|	Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	Bump versions after 5.1 branch	Michael Niedermayer	2022-07-13
\| \| \| \|	Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
*	Bump Versions for 5.1 branch	Michael Niedermayer	2022-07-13
\| \| \| \|	Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
*	swscale/x86/swscale: Simplify macro	Andreas Rheinhardt	2022-06-22
\| \| \| \| \| \|	This is possible now that it is no longer used by MMX. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/x86/swscale: Remove obsolete and harmful MMX(EXT) functions	Andreas Rheinhardt	2022-06-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Moreover, some of the removed code was buggy/not bitexact and lead to failures involving the f32le and f32be versions of gray, gbrp and gbrap on x86-32 when SSE2 was not disabled. See e.g. https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx Notice that yuv2yuvX_mmx is not removed, because it is used by SSE3 and AVX2 as fallback in case of unaligned data and also for tail processing. I don't know why yuv2yuvX_mmxext isn't being used for this; an earlier version [1] of 554c2bc7086f49ef5a6a989ad6bc4bc11807eb6f used it, but the version that was eventually applied does not. [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/x86/yuv2rgb: Remove obsolete MMX functions	Andreas Rheinhardt	2022-06-22
\| \| \| \| \| \| \| \| \| \| \|	x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/x86/rgb2rgb: Remove obsolete MMX, 3dnow functions	Andreas Rheinhardt	2022-06-22
\| \| \| \| \| \| \| \| \| \| \|	x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	all: Replace if (ARCH_FOO) checks by #if ARCH_FOO	Andreas Rheinhardt	2022-06-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is more spec-compliant because it does not rely on dead-code elimination by the compiler. Especially MSVC has problems with this, as can be seen in https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html or https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html This commit does not eliminate every instance where we rely on dead code elimination: It only tackles branching to the initialization of arch-specific dsp code, not e.g. all uses of CONFIG_ and HAVE_ checks. But maybe it is already enough to compile FFmpeg with MSVC with whole-programm-optimizations enabled (if one does not disable too many components). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale/x86/yuv_2_rgb: fix access to memory past the frame data in yuv to ↵	Vardan Margaryan	2022-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rgb conversion Y, U, V data is loaded at the end of the current iteration for the next iteration. It results in memory access past the frame data on the last iteration (that data is never used after the loading). So load data at the start of the iteration, so that only useful data is loaded. Signed-off-by: Vardan Margaryan <v.t.margaryan@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	swscale/aarch64: add hscale specializations	Swinney, Jonathan	2022-05-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds code to support specializations of the hscale function and adds a specialization for filterSize == 4. ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is loading the data from src, this data is loaded a whole block ahead and stored back to the stack to be loaded again with ld4. This arranges the data for most efficient use of the vector instructions and removes the need for completion adds at the end. The number of iterations of the C per iteration of the assembly is increased from 4 to 8, but because of the prefetching, there must be a special section without prefetching when dstW < 16. This improves speed on Graviton 2 (Neoverse N1) dramatically in the case where previously fs=8 would have been required. before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8 after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
*	lib*/version: Move library version functions into files of their own	Andreas Rheinhardt	2022-05-10
\| \| \| \| \| \| \|	This avoids having to rebuild big files every time FFMPEG_VERSION changes (which it does with every commit). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	swscale: aarch64: Optimize the final summation in the hscale routine	Martin Storsjö	2022-04-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before: Cortex A53 A72 A73 Graviton 2 Graviton 3 hscale_8_to_15_width8_neon: 8273.0 4602.5 4289.5 2429.7 1629.1 hscale_8_to_15_width16_neon: 12405.7 6803.0 6359.0 3549.0 2378.4 hscale_8_to_15_width32_neon: 21258.7 11491.7 11469.2 5797.2 3919.6 hscale_8_to_15_width40_neon: 25652.0 14173.7 12488.2 6893.5 4810.4 After: hscale_8_to_15_width8_neon: 7633.0 3981.5 3350.2 1980.7 1261.1 hscale_8_to_15_width16_neon: 11666.7 5951.0 5512.0 3080.7 2131.4 hscale_8_to_15_width32_neon: 20900.7 10733.2 9481.7 5275.2 3862.1 hscale_8_to_15_width40_neon: 24826.0 13536.2 11502.0 6397.2 4731.9 Thus, this gives overall a 8-29% speedup for the smaller filter sizes, around 1-8% for the larger filter sizes. Inspired by a patch by Jonathan Swinney <jswinney@amazon.com>. Signed-off-by: Martin Storsjö <martin@martin.st>
*	Keep including the full version.h when headers are included externally	Martin Storsjö	2022-03-19
\| \| \| \| \| \| \| \| \|	This avoids unnecessary churn and build breakage for users, by making sure the whole version.h is included like it has been so far, while keeping the benefit of not needing to rebuild most files in the ffmpeg tree on minor/micro bumps. Signed-off-by: Martin Storsjö <martin@martin.st>