summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* avcodec/x86: allow future 8-bit simple idct to use slightly different ↵James Darnley2017-06-20
| | | | coefficients
* avcodec/x86: modify simple_idct10 macros to add an action paramterJames Darnley2017-06-20
|
* avcodec/x86: cleanup simple_idct10James Darnley2017-06-20
| | | | | | Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register.
* avcodec/x86/mpegenc: support transpose permuation typeJames Darnley2017-06-20
|
* avcodec/x86/mpegenc: check IDCT permutation type is a valid valueJames Darnley2017-06-20
|
* avcodec/x86/mpegvideo: Use intra scantable in dct_unquantize_h263_intra_mmx()Michael Niedermayer2017-06-20
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86/aacpsdsp: add ff_ps_hybrid_analysis_ileave_sseJames Almer2017-06-18
| | | | About 2x faster than the c version.
* x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}James Almer2017-06-18
| | | | About 2x faster than the c version.
* avcodec/aacps: move checks for valid length outside the stereo_interpolate ↵James Almer2017-06-15
| | | | | | dsp function Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vorbisdsp: optimize ff_vorbis_inverse_coupling_sseJames Almer2017-06-15
| | | | About 7% faster.
* vp9: fix overwrite in ff_vp9_ipred_dr_16x16_16_avx2.Ronald S. Bultje2017-06-14
| | | | Fixes trac issue 6459.
* avcodec/vp9: ipred_dr_16x16_16 avx2 implementationIlia Valiakhmetov2017-06-12
| | | | | Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* x86/aacpsdsp: fix output of ff_ps_stereo_interpolate_ipdopd_sse3James Almer2017-06-07
| | | | The fate-aac-al_sbr_ps_04_ur test did not detect this mistake.
* libavcodec/vp9: ipred_dl_32x32_16 avx2 implementationIlia Valiakhmetov2017-06-06
| | | | | | | | | | | | | | | | | | | | | vp9_diag_downleft_32x32_8bpp_c: 580.2 vp9_diag_downleft_32x32_8bpp_sse2: 75.6 vp9_diag_downleft_32x32_8bpp_ssse3: 73.7 vp9_diag_downleft_32x32_8bpp_avx: 72.7 vp9_diag_downleft_32x32_10bpp_c: 1101.2 vp9_diag_downleft_32x32_10bpp_sse2: 145.4 vp9_diag_downleft_32x32_10bpp_ssse3: 137.5 vp9_diag_downleft_32x32_10bpp_avx: 134.8 vp9_diag_downleft_32x32_10bpp_avx2: 94.0 vp9_diag_downleft_32x32_12bpp_c: 1108.5 vp9_diag_downleft_32x32_12bpp_sse2: 145.5 vp9_diag_downleft_32x32_12bpp_ssse3: 137.3 vp9_diag_downleft_32x32_12bpp_avx: 135.2 vp9_diag_downleft_32x32_12bpp_avx2: 94.0 ~30% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* x86/aacpsdsp: optimize ff_ps_mul_pair_single_sseJames Almer2017-06-04
| | | | ~2% faster.
* x86/aacpsdsp: optimize ff_ps_stereo_interpolate_sse3James Almer2017-06-03
| | | | | | | Move the unpacking outside of the loop. 5% to 10% faster. Suggested-by: ubitux Signed-off-by: James Almer <jamrial@gmail.com>
* x86/aacps: add ff_ps_stereo_interpolate_ipdopd_sse3()James Almer2017-06-02
| | | | | | About 2x faster than the c version. Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/x86/idctdsp_init: reindentJames Darnley2017-05-30
|
* avcodec/x86: move simple_idct to external assemblyJames Darnley2017-05-30
|
* lavc/mpegvideoenc: reformat inv_zigzag_direct16 so the zigzag pattern is visibleClément Bœsch2017-05-19
|
* Merge commit 'b4a911c189962e563a09fb0efaf6fa9ab56263a4'Clément Bœsch2017-05-19
|\ | | | | | | | | | | | | * commit 'b4a911c189962e563a09fb0efaf6fa9ab56263a4': mpegvideoenc: make a table const Merged-by: Clément Bœsch <u@pkh.me>
| * mpegvideoenc: make a table constAnton Khirnov2017-01-19
| |
* | avcodec/h264: add sse2 versions of previous idct functionsJames Darnley2017-05-15
| | | | | | | | | | | | Kaby Lake Pentium: - ff_h264_idct_add_8_sse2: ~1.18x faster than mmxext - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext
* | avcodec/h264: add avx 8-bit h264_idct_dc_addJames Darnley2017-05-15
| | | | | | | | | | | | | | | | Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext
* | avcodec/h264: add avx 8-bit h264_idct_addJames Darnley2017-05-15
| | | | | | | | | | | | | | | | Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
* | avcodec/h264: use some 3 operand formsJames Darnley2017-05-15
| |
* | avcodec/h264: change RETs into REP_RETs where appropriateJames Darnley2017-05-15
| |
* | avcodec/x86/vc1dsp_init: Fix build failure with --disable-optimizations and ↵Michael Niedermayer2017-04-27
| | | | | | | | | | | | | | | | | | clang compilers doing DCE at -O0 do not necessarily understand "complex" boolean expressions Build succeeds with this change, this was the only failure Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | Merge commit '0a35f128f3c6e0ae9a0a2236c557602c108da269'Clément Bœsch2017-04-08
|\| | | | | | | | | | | | | * commit '0a35f128f3c6e0ae9a0a2236c557602c108da269': cabac: x86: Give optimizations header a more meaningful name Merged-by: Clément Bœsch <u@pkh.me>
| * cabac: x86: Give optimizations header a more meaningful nameDiego Biurrun2016-12-01
| |
* | x86/idctdsp_init: reindent.Ronald S. Bultje2017-04-06
| |
* | x86/simple_idct: add explicit sse2 simple_idct_put/add versions.Ronald S. Bultje2017-04-06
| | | | | | | | | | | | These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. This way we don't need to use the ff_put/add_pixels_clamped function pointers.
* | cavs: add a sse2 idct implementation.Ronald S. Bultje2017-04-06
| | | | | | | | | | This makes using the function pointer ff_add_pixels_clamped() unnecessary, since we always know what the best implementation is at compile-time.
* | cavs: convert idct from inline asm to yasm.Ronald S. Bultje2017-04-06
| |
* | x86/xvididct: remove use of ff_put/add_pixels_clamped function pointer.Ronald S. Bultje2017-04-06
| | | | | | | | | | Since there's separate SSE2 implementations of xvid_idct_put/add, this patch has no practical impact on performance.
* | x86/hevc_add_res: merge last remaining changes from ↵James Almer2017-03-31
| | | | | | | | | | | | 3d6535983282bea542dac2e568ae50da5796be34 See https://lists.libav.org/pipermail/libav-devel/2016-October/079829.html
* | Merge commit '0361e4dcb4d394c88c33364415a3b8fe315b67d1'Clément Bœsch2017-03-31
|\| | | | | | | | | | | | | | | | | * commit '0361e4dcb4d394c88c33364415a3b8fe315b67d1': h264_qpel: x86: Move function with only one instance out of template macro Note: warning is present with clang. Merged-by: Clément Bœsch <cboesch@gopro.com>
| * h264_qpel: x86: Move function with only one instance out of template macroDiego Biurrun2016-11-08
| | | | | | | | libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
| * x86: Drop stray semicolons after function definitionsDiego Biurrun2016-11-05
| | | | | | | | | | libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic] libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
| * vp9: Flip the order of arguments in MC functionsMartin Storsjö2016-11-03
| | | | | | | | | | | | | | | | | | This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>
* | vp9: re-split the decoder/format/dsp interface header files.Ronald S. Bultje2017-03-28
| | | | | | | | | | The advantage here is that the internal software decoder interface is not exposed to the DSP functions or the hardware accelerations.
* | lavc/vp9: split into vp9{block,data,mvs}Clément Bœsch2017-03-27
| | | | | | | | This is following Libav layout to ease merges.
* | avcodec/x86/idctdsp: Remove duplicate includeMichael Niedermayer2017-03-26
| | | | | | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | x86/hevc_add_res: merge missing changes from ↵James Almer2017-03-24
| | | | | | | | | | | | | | 3d6535983282bea542dac2e568ae50da5796be34 Unrolling the loops triplicates the size of the assembled output while not generating any gain in performance.
* | Merge commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b'Clément Bœsch2017-03-24
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b': hevc: x86: Add add_residual() SIMD optimizations See a6af4bf64dae46356a5f91537a1c8c5f86456b37 This merge is only cosmetics (renames, space shuffling, etc). The functionnal changes in the ASM are *not* merged: - unrolling with %rep is kept - ADD_RES_MMX_4_8 is left untouched: this needs investigation Merged-by: Clément Bœsch <u@pkh.me>
| * hevc: x86: Add add_residual() SIMD optimizationsPierre Edouard Lepere2016-10-22
| | | | | | | | | | | | | | Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>, extended by James Almer <jamrial@gmail.com>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
| * audiodsp: x86: Remove pointless header fileDiego Biurrun2016-10-19
| | | | | | | | | | Its single forward declaration can be moved to the only place it is used, like is done for all other dsp init files.
* | lavc/x86/hevc: rename hevc_res_add to hevc_add_resClément Bœsch2017-03-24
| | | | | | | | This will simplify incoming merge.
* | Merge commit 'b89804da9bad2d94dd95bf20ac6187447e9c17e9'James Almer2017-03-23
|\| | | | | | | | | | | | | * commit 'b89804da9bad2d94dd95bf20ac6187447e9c17e9': x86: videodsp: Add parentheses to expression to work around warning Merged-by: James Almer <jamrial@gmail.com>
| * x86: videodsp: Add parentheses to expression to work around warningDiego Biurrun2016-10-19
| | | | | | | | libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds