| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
Fixes https://trac.ffmpeg.org/ticket/8903
relevant https://github.com/msys2/MINGW-packages/discussions/9258
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
| |
Fixes many -Warray-parameter warnings from GCC 11.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
| |
Also remove other unnecessary headers and include headers directly while
at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
|
|
| |
These inclusions are not necessary, as cpu.h is already included
wherever it is needed (via direct inclusion or via the arch-specific
headers).
Also remove other unnecessary cpu.h inclusions from ordinary
non-headers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
| |
Deprecated in commits 7fc329e2dd6226dfecaa4a1d7adf353bf2773726
and 31f6a4b4b83aca1d73f3cfc99ce2b39331970bf3.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
| |
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
| |
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
| |
Unused since 80944df720da98d6e5ee0e355db5814735914ec9.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
|
|
|
|
|
|
|
|
| |
This allowed to remove forward declarations. Because compilers expect
declarations for all functions they encounter even when it is within
blocks disabled via "if (0 && foo)", one has to use a real #if in
ff_diracdsp_init_x86.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
|
|
|
|
|
|
|
| |
Unused since a1f3b18bf55f106c974eacb1dc831be4d2bd5277, yet as nonstatic
functions the compiler can't detect this, so that these functions aren't
stripped and no warning is emitted.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
|
|
|
|
|
| |
Fixes checkheaders after 8c01eb0a315fec8f09ba6210ce8b0296de6cc784.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
| |
|
|
|
|
| |
They are not properly namespaced and not intended for public use.
|
|
|
|
| |
That is a more appropriate place for it.
|
|
|
|
|
|
|
|
|
|
| |
The only thing missing for this is to make ff_mpadsp_init_x86()
thread-safe; it currently isn't because a static table is initialized
every time ff_mpadsp_init() is called (when ARCH_X86 is true). Solve
this by initializing this table only once, namely together with the
ordinary not-arch specific tables. This also allows to reuse their AVOnce.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
if taken from stack, they may have garbage in the upper bits otherwise.
Also, there are only 8 arguments, so don't attempt to load 11.
Fixes SIGSEV crashes in some targets.
Reviewed-by: durandal_1707
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
|
|
|
| |
Overall speed changes for 1920x1080, yuv422p10le, 60fps from: 0.19x to 0.343x
|
|
|
|
|
|
| |
Fixes part of ticket #8771
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use this in vf_spp.c, where the get_pixels operation is done on
unaligned source addresses.
Hook up the x86 (mmx and sse) versions of get_pixels to this
function pointer, as those implementations seem to support unaligned
use.
This fixes fate-filter-spp on armv7.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix overflow for coeff -32768 in function ADD_RES_SSE_16_32_8 with no
performance drop.(SSE2/AVX/AVX2)
./checkasm --test=hevc_add_res --bench
Mainline:
- hevc_add_res.add_residual [OK]
hevc_add_res_32x32_8_sse2: 127.5
hevc_add_res_32x32_8_avx: 127.0
hevc_add_res_32x32_8_avx2: 86.5
Add overflow test case:
- hevc_add_res.add_residual [FAILED]
After:
- hevc_add_res.add_residual [OK]
hevc_add_res_32x32_8_sse2: 126.8
hevc_add_res_32x32_8_avx: 128.3
hevc_add_res_32x32_8_avx2: 86.8
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix overflow for coeff -32768 in function ADD_RES_SSE_8_8 with
no performance drop.
./checkasm --test=hevc_add_res --bench
Mainline:
- hevc_add_res.add_residual [OK]
hevc_add_res_8x8_8_sse2: 15.5
Add overflow test case:
- hevc_add_res.add_residual [FAILED]
After:
- hevc_add_res.add_residual [OK]
hevc_add_res_8x8_8_sse2: 15.5
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix overflow for coeff -32768 in function ADD_RES_MMX_4_8 with no
performance drop.
./checkasm --test=hevc_add_res --bench
Mainline:
- hevc_add_res.add_residual [OK]
hevc_add_res_4x4_8_mmxext: 15.5
Add overflow test case:
- hevc_add_res.add_residual [FAILED]
After:
- hevc_add_res.add_residual [OK]
hevc_add_res_4x4_8_mmxext: 15.0
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
| |
Found-by: james
|
|
|
|
|
|
|
|
|
|
| |
Fixes: Segfault (not reproducable with asm, which made this hard to debug)
Fixes: decoding errors
Fixes: 19854/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DIRAC_fuzzer-5729372837511168
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
|
|
|
|
| |
VP4 applies a loop filter during motion compensation, causing the block offset
will often by unaligned. This produces a bus error on some platforms, namely
ARMv7 NEON.
This patch adds a unaligned version of the loop filter function pointer
to VP3DSPContext.
Reported-by: Mike Melanson <mike@multimedia.cx>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
| |
It's not using ymm registers, so limiting it to CPUs with fast AVX
is not necessary.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
Fixes checkasm on systems like win64.
Reviewed-by: Lynne
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
Prevents pointless register saving on win64 for the sse3 and avx
versions of the function.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Assembly failed when using yasm rather than nasm.
|
|
|
|
|
| |
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck
AVX2 is 1.4x faster than AVX
|
|
|
|
|
|
|
|
| |
Has a slight speedup.
Can't be carried over to aarch64, since it has no shufps-like instruction.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
58893 decicycles in deemphasis_c, 130548 runs, 524 skips
9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup
24866 decicycles in postfilter_c, 65386 runs, 150 skips
5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup
Total decoder speedup: ~14%
Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;
for (int i = 0; i < len; i += 4) {
y[0] = x[0] + c1*state;
y[1] = x[1] + c2*state + c1*x[0];
y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];
state = y[3];
y += 4;
x += 4;
}
|
|
|
|
| |
The entire function was defined away before.
|
|
|
|
| |
Its only used in the encoder and in CELT's PVQ.
|
| |
|
| |
|
|
|
|
|
|
| |
bits per raw sample
based on patch by Kieran Kunhya
|
|
|
|
|
|
| |
Removes an unneeded copy and does the 5-point permute in-place.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
|
|
|
|
|
|
| |
Saves 1 gpr and 2 instructions and simplifies the macros a bit.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
|
|
|
|
| |
This is a profile supporting > 8-bit video and has a higher quality DCT
|
|
|
|
|
|
|
|
| |
This was originally based on libsbc, and was fully integrated into ffmpeg.
Rough speed test:
C version: speed= 592x
MMX version: speed= 785x
|