summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* get_cabac_inline_x86: Don't inline if 32-bit clang on windowsChristopher Degawa2021-08-19
| | | | | | | | | Fixes https://trac.ffmpeg.org/ticket/8903 relevant https://github.com/msys2/MINGW-packages/discussions/9258 Signed-off-by: Christopher Degawa <ccom@randomderp.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* avcodec/h264dsp, h264idct: Fix lengths of array parametersAndreas Rheinhardt2021-08-08
| | | | | | Fixes many -Warray-parameter warnings from GCC 11. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* Remove/replace some unnecessary avcodec.h inclusionsAndreas Rheinhardt2021-07-22
| | | | | | | Also remove other unnecessary headers and include headers directly while at it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* Remove unnecessary mem.h inclusionsAndreas Rheinhardt2021-07-22
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avutil/internal, swresample/audioconvert: Remove cpu.h inclusionsAndreas Rheinhardt2021-07-22
| | | | | | | | | | These inclusions are not necessary, as cpu.h is already included wherever it is needed (via direct inclusion or via the arch-specific headers). Also remove other unnecessary cpu.h inclusions from ordinary non-headers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec: Remove deprecated old encode/decode APIsAndreas Rheinhardt2021-04-27
| | | | | | | | Deprecated in commits 7fc329e2dd6226dfecaa4a1d7adf353bf2773726 and 31f6a4b4b83aca1d73f3cfc99ce2b39331970bf3. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> Signed-off-by: James Almer <jamrial@gmail.com>
* Include attributes.h directlyAndreas Rheinhardt2021-04-19
| | | | | | | | Some files currently rely on libavutil/cpu.h to include it for them; yet said file won't use include it any more after the currently deprecated functions are removed, so include attributes.h directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/x86: add cfhdenc SIMDPaul B Mahol2021-02-27
|
* avcodec: add missing FF_API_OLD_ENCDEC wrappers to xmm clobber functionsJames Almer2021-02-26
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/x86/constants: Remove unused ff_pw_17Andreas Rheinhardt2021-02-24
| | | | | | Unused since 80944df720da98d6e5ee0e355db5814735914ec9. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* avcodec/x86/diracdsp_init: Reuse macroAndreas Rheinhardt2021-02-24
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* avcodec/x86/diracdsp_init: Simplify macroAndreas Rheinhardt2021-02-24
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* avcodec/x86/diracdsp_init: Make functions only used here staticAndreas Rheinhardt2021-02-24
| | | | | | | | | This allowed to remove forward declarations. Because compilers expect declarations for all functions they encounter even when it is within blocks disabled via "if (0 && foo)", one has to use a real #if in ff_diracdsp_init_x86. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* avcodec/x86/diracdsp_init: Remove unused MMX functionsAndreas Rheinhardt2021-02-24
| | | | | | | | Unused since a1f3b18bf55f106c974eacb1dc831be4d2bd5277, yet as nonstatic functions the compiler can't detect this, so that these functions aren't stripped and no warning is emitted. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* avcodec/cabac_functions, x86/cabac: Include stddef.hAndreas Rheinhardt2021-02-04
| | | | | | Fixes checkheaders after 8c01eb0a315fec8f09ba6210ce8b0296de6cc784. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* ac3enc_fixed: drop unnecessary fixed-point DSP codeLynne2021-01-14
|
* lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bumpAnton Khirnov2021-01-01
| | | | They are not properly namespaced and not intended for public use.
* lavu: move LOCAL_ALIGNED from internal.h to mem_internal.hAnton Khirnov2021-01-01
| | | | That is a more appropriate place for it.
* avcodec/mpegaudiodsp: Make ff_mpadsp_init() thread-safeAndreas Rheinhardt2020-11-24
| | | | | | | | | | The only thing missing for this is to make ff_mpadsp_init_x86() thread-safe; it currently isn't because a static table is initialized every time ff_mpadsp_init() is called (when ARCH_X86 is true). Solve this by initializing this table only once, namely together with the ordinary not-arch specific tables. This also allows to reuse their AVOnce. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* x86/cfhddsp: zero extend int argumentsJames Almer2020-08-28
| | | | | | | | | | if taken from stack, they may have garbage in the upper bits otherwise. Also, there are only 8 arguments, so don't attempt to load 11. Fixes SIGSEV crashes in some targets. Reviewed-by: durandal_1707 Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/x86/cfhddsp: try to fix build on x32Paul B Mahol2020-08-26
|
* avcodec/cfhd: add x86 SIMDPaul B Mahol2020-08-26
| | | | Overall speed changes for 1920x1080, yuv422p10le, 60fps from: 0.19x to 0.343x
* x86/h264_deblock: fix warning about trailing empty parameterJames Almer2020-07-12
| | | | | | Fixes part of ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com>
* pixblockdsp, avdct: Add get_pixels_unalignedMartin Storsjö2020-05-13
| | | | | | | | | | | | | Use this in vf_spp.c, where the get_pixels operation is done on unaligned source addresses. Hook up the x86 (mmx and sse) versions of get_pixels to this function pointer, as those implementations seem to support unaligned use. This fixes fate-filter-spp on armv7. Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc/x86/hevc_add_res: Fix coeff overflow in ADD_RES_SSE_16_32_8Linjie Fu2020-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix overflow for coeff -32768 in function ADD_RES_SSE_16_32_8 with no performance drop.(SSE2/AVX/AVX2) ./checkasm --test=hevc_add_res --bench Mainline: - hevc_add_res.add_residual [OK] hevc_add_res_32x32_8_sse2: 127.5 hevc_add_res_32x32_8_avx: 127.0 hevc_add_res_32x32_8_avx2: 86.5 Add overflow test case: - hevc_add_res.add_residual [FAILED] After: - hevc_add_res.add_residual [OK] hevc_add_res_32x32_8_sse2: 126.8 hevc_add_res_32x32_8_avx: 128.3 hevc_add_res_32x32_8_avx2: 86.8 Signed-off-by: Xu Guangxin <guangxin.xu@intel.com> Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* lavc/x86/hevc_add_res: Fix overflow in ADD_RES_SSE_8_8Linjie Fu2020-03-27
| | | | | | | | | | | | | | | | | | | | | | Fix overflow for coeff -32768 in function ADD_RES_SSE_8_8 with no performance drop. ./checkasm --test=hevc_add_res --bench Mainline: - hevc_add_res.add_residual [OK] hevc_add_res_8x8_8_sse2: 15.5 Add overflow test case: - hevc_add_res.add_residual [FAILED] After: - hevc_add_res.add_residual [OK] hevc_add_res_8x8_8_sse2: 15.5 Signed-off-by: Xu Guangxin <guangxin.xu@intel.com> Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* lavc/x86/hevc_add_res: Fix overflow in ADD_RES_MMX_4_8Linjie Fu2020-03-27
| | | | | | | | | | | | | | | | | | | | | | Fix overflow for coeff -32768 in function ADD_RES_MMX_4_8 with no performance drop. ./checkasm --test=hevc_add_res --bench Mainline: - hevc_add_res.add_residual [OK] hevc_add_res_4x4_8_mmxext: 15.5 Add overflow test case: - hevc_add_res.add_residual [FAILED] After: - hevc_add_res.add_residual [OK] hevc_add_res_4x4_8_mmxext: 15.0 Signed-off-by: Xu Guangxin <guangxin.xu@intel.com> Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* avcodec/x86/diracdsp: Fix high bits on Windows x86_64Michael Niedermayer2020-01-31
| | | | Found-by: james
* avcodec/x86/diracdsp: Fix incorrect src addressing in dequant_subband_32()Michael Niedermayer2020-01-30
| | | | | | | | | | Fixes: Segfault (not reproducable with asm, which made this hard to debug) Fixes: decoding errors Fixes: 19854/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DIRAC_fuzzer-5729372837511168 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* vp4: prevent unaligned memory access in loop filterPeter Ross2019-10-30
| | | | | | | | | | | | VP4 applies a loop filter during motion compensation, causing the block offset will often by unaligned. This produces a bus error on some platforms, namely ARMv7 NEON. This patch adds a unaligned version of the loop filter function pointer to VP3DSPContext. Reported-by: Mike Melanson <mike@multimedia.cx> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x85/opusdsp: enable the functions on all FMA3 CPUsJames Almer2019-09-11
| | | | | | | It's not using ymm registers, so limiting it to CPUs with fast AVX is not necessary. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/opusdps: clear the high bits from some gprsJames Almer2019-09-11
| | | | | | | Fixes checkasm on systems like win64. Reviewed-by: Lynne Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/Makefile: add missing pngdsp dependency to the lscr decoderJames Almer2019-05-14
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/v210dec: use named registersJames Almer2019-05-03
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/v210dec: don't reserve more xmm regs than neededJames Almer2019-05-03
| | | | | | | Prevents pointless register saving on win64 for the sse3 and avx versions of the function. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/v210dec: remove duplicate load instructionJames Almer2019-05-03
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/x86/v210: fix operands of vpblendd used in new avx2 codeJames Darnley2019-05-02
| | | | Assembly failed when using yasm rather than nasm.
* libavcodec Adding ff_v210_planar_unpack AVX2Michael Stoner2019-05-02
| | | | | Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck AVX2 is 1.4x faster than AVX
* x86/opusdsp: replace loads with shufflesLynne2019-04-26
| | | | | | | | Has a slight speedup. Can't be carried over to aarch64, since it has no shufps-like instruction. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/opusdsp: fix WIN64 return valueLynne2019-04-01
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/opusdsp: implement FMA3 accelerated postfilter and deemphasisLynne2019-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | 58893 decicycles in deemphasis_c, 130548 runs, 524 skips 9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup 24866 decicycles in postfilter_c, 65386 runs, 150 skips 5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup Total decoder speedup: ~14% Deemphasis SIMD based on the following unrolling: const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1; float state = coeff; for (int i = 0; i < len; i += 4) { y[0] = x[0] + c1*state; y[1] = x[1] + c2*state + c1*x[0]; y[2] = x[2] + c3*state + c1*x[1] + c2*x[0]; y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0]; state = y[3]; y += 4; x += 4; }
* celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabledLynne2019-03-31
| | | | The entire function was defined away before.
* x86/opus_dsp: rename to celt_pvqLynne2019-03-31
| | | | Its only used in the encoder and in CELT's PVQ.
* avcodec/h264dsp: change loop filter stride argument to ptrdiff_tJames Almer2019-02-20
|
* avcodec/proresdsp indent after prev commitMartin Vignali2018-12-02
|
* avcodec/proresdec : rename dsp part for 10b and check dspinit for supported ↵Martin Vignali2018-12-02
| | | | | | bits per raw sample based on patch by Kieran Kunhya
* mdct15: simplify x86 exptab permutationRostislav Pehlivanov2018-05-07
| | | | | | Removes an unneeded copy and does the 5-point permute in-place. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
* mdct15: simplify the fft15 x86 SIMDRostislav Pehlivanov2018-05-07
| | | | | | Saves 1 gpr and 2 instructions and simplifies the macros a bit. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
* mpeg4video: Add support for MPEG-4 Simple Studio Profile.Kieran Kunhya2018-04-02
| | | | This is a profile supporting > 8-bit video and has a higher quality DCT
* sbcenc: add MMX optimizationsAurelien Jacobs2018-03-07
| | | | | | | | This was originally based on libsbc, and was fully integrated into ffmpeg. Rough speed test: C version: speed= 592x MMX version: speed= 785x