summaryrefslogtreecommitdiff
path: root/libavcodec/x86/Makefile
Commit message (Collapse)AuthorAge
* avcodec/x86: add cfhdenc SIMDPaul B Mahol2021-02-27
|
* avcodec/cfhd: add x86 SIMDPaul B Mahol2020-08-26
| | | | Overall speed changes for 1920x1080, yuv422p10le, 60fps from: 0.19x to 0.343x
* avcodec/Makefile: add missing pngdsp dependency to the lscr decoderJames Almer2019-05-14
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/opusdsp: implement FMA3 accelerated postfilter and deemphasisLynne2019-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | 58893 decicycles in deemphasis_c, 130548 runs, 524 skips 9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup 24866 decicycles in postfilter_c, 65386 runs, 150 skips 5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup Total decoder speedup: ~14% Deemphasis SIMD based on the following unrolling: const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1; float state = coeff; for (int i = 0; i < len; i += 4) { y[0] = x[0] + c1*state; y[1] = x[1] + c2*state + c1*x[0]; y[2] = x[2] + c3*state + c1*x[1] + c2*x[0]; y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0]; state = y[3]; y += 4; x += 4; }
* celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabledLynne2019-03-31
| | | | The entire function was defined away before.
* x86/opus_dsp: rename to celt_pvqLynne2019-03-31
| | | | Its only used in the encoder and in CELT's PVQ.
* sbcenc: add MMX optimizationsAurelien Jacobs2018-03-07
| | | | | | | | This was originally based on libsbc, and was fully integrated into ffmpeg. Rough speed test: C version: speed= 592x MMX version: speed= 785x
* libavcodec/exr : add X86 SIMD for reorder_pixelsMartin Vignali2017-09-17
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* SIMD opus pvq_search implementationIvan Kalvachev2017-08-18
| | | | | | | | | | | | Explanation on the workings and methods used by the Pyramid Vector Quantization Search function could be found in the following Work-In-Progress mail threads: http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
* avcodec/utvideodec: add SIMD for restore_rgb_planesPaul B Mahol2017-06-27
| | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* mdct15: add assembly optimizations for the 15-point FFTRostislav Pehlivanov2017-06-23
| | | | | | | c: 1802 decicycles in fft15,16774635 runs, 2581 skips avx: 865 decicycles in fft15,16776378 runs, 838 skips Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
* build: Generalize yasm/nasm-related variable namesDiego Biurrun2017-06-21
| | | | | | | | None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1) Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/x86: move simple_idct to external assemblyJames Darnley2017-05-30
|
* cavs: convert idct from inline asm to yasm.Ronald S. Bultje2017-04-06
|
* lavc/x86/hevc: rename hevc_res_add to hevc_add_resClément Bœsch2017-03-24
| | | | This will simplify incoming merge.
* Merge commit 'b57e38f52cc3f31a27105c28887d57cd6812c3eb'Clément Bœsch2017-03-22
|\ | | | | | | | | | | | | * commit 'b57e38f52cc3f31a27105c28887d57cd6812c3eb': ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm Merged-by: Clément Bœsch <u@pkh.me>
| * ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asmJustin Ruggles2016-10-01
| | | | | | | | | | | | | | | | | | Adds a wrapper function for downmixing which detects channel count changes and updates the selected downmix function accordingly. Simplification and porting to current x86inc infrastructure by Diego Biurrun. Signed-off-by: Diego Biurrun <diego@biurrun.de>
| * audiodsp/x86: yasmify vector_clipf_sseAnton Khirnov2016-09-22
| |
| * vp9/x86: rename vp9dsp to vp9mcAnton Khirnov2016-08-03
| | | | | | | | It only contains the MC SIMD, other SIMD will go into different files.
* | Merge commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5'James Almer2017-01-31
|\| | | | | | | | | | | | | * commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5': x86: hpeldsp: Split off VP3-specific bits into a separate file Merged-by: James Almer <jamrial@gmail.com>
| * x86: hpeldsp: Split off VP3-specific bits into a separate fileDiego Biurrun2016-07-20
| |
| * hevc: Add AVX2 DC IDCTJames Almer2016-07-18
| | | | | | | | | | | | | | Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>. Integrated to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
| * build: miscellaneous cosmeticsDiego Biurrun2016-04-07
| | | | | | | | | | | | Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.
| * fft: Split MDCT bits off from FFTDiego Biurrun2016-03-01
| |
* | huffyuvencdsp: move shared functions to a new lossless_videoencdsp contextJames Almer2017-01-12
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | aacenc: add SIMD optimizations for abs_pow34 and quantizationRostislav Pehlivanov2016-10-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>
* | x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}James Almer2016-08-02
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/vc1dsp: Split the file into MC and loopfilterTimothy Gu2016-02-29
| |
* | Merge commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c'Derek Buitenhuis2016-02-24
|\| | | | | | | | | | | | | * commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c': build: Add vc1dsp component for more fine-grained dependencies Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
| * build: Add vc1dsp component for more fine-grained dependenciesDiego Biurrun2016-02-19
| |
| * x86: build: Group all encoder objects togetherDiego Biurrun2016-01-18
| |
| * hevcdsp: add x86 SIMD for MCAnton Khirnov2015-12-05
| |
* | x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}James Almer2016-02-06
| | | | | | | | | | | | | | Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | dirac_dwt: Make x86 files/functions names consistentTimothy Gu2016-02-05
| |
* | diracdsp: Make x86 files/functions names consistentTimothy Gu2016-02-05
| |
* | avcodec/dca: add new decoder based on libdcadecfoo862016-01-31
| |
* | avcodec/dca: remove old decoderfoo862016-01-31
| | | | | | | | | | Remove all files and functions which are not going to be reused, and disable all functions and FATE tests temporarily which will be.
* | avcodec/synth_filter: split off remaining code from dcadec filesJames Almer2016-01-25
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/Makefile: move decoder/encoder objects out of the subsystems sectionJames Almer2015-10-22
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | huffyuvencdsp: Convert ff_diff_bytes_mmx to yasmTimothy Gu2015-10-20
| | | | | | | | | | | | | | Heavily based upon ff_add_bytes by Christophe Gisquet. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Timothy Gu <timothygu99@gmail.com>
* | vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.Ronald S. Bultje2015-10-13
| |
* | x86: simple_idct(_put): 10bits versionsChristophe Gisquet2015-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modeled from the prores version. Clips to [0;1023] and is bitexact. Bitexactness requires to add offsets in different places compared to prores or C, and makes the function approximately 2% slower. For 16 frames of a DNxHD 4:2:2 10bits test sequence: C: 60861 decicycles in idct, 1048205 runs, 371 skips sse2: 27567 decicycles in idct, 1048216 runs, 360 skips avx: 26272 decicycles in idct, 1048171 runs, 405 skips The add version is not implemented, so the corresponding dsp function is set to NULL to make it clear in a code executing it. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | avcodec/takdec: add x86 SIMD for rest of decorrelation modesPaul B Mahol2015-10-09
| | | | | | | | Signed-off-by: Paul B Mahol <onemda@gmail.com>
* | x86/alacdsp: add simd optimized functionsJames Almer2015-10-06
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.Ronald S. Bultje2015-10-03
| |
* | vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.Ronald S. Bultje2015-10-03
| |
* | x86/hevc_sao: move 10/12bit functions into a separate fileJames Almer2015-09-30
| | | | | | | | | | Tested-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* | vp9: add subpel MC SIMD for 10/12bpp.Ronald S. Bultje2015-09-16
| |
* | vp9: add fullpel (put) MC SIMD for 10/12bpp.Ronald S. Bultje2015-09-16
| |
* | Merge commit 'cad40a3833ad81a352e7657ec6f7d637cea3b798'Hendrik Leppkes2015-09-05
|\| | | | | | | | | | | | | * commit 'cad40a3833ad81a352e7657ec6f7d637cea3b798': lavc: Drop deprecated deinterlace module Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>