summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* Merge commit '2ec9fa5ec60dcd10e1cb10d8b4e4437e634ea428'James Almer2017-03-21
|\ | | | | | | | | | | | | * commit '2ec9fa5ec60dcd10e1cb10d8b4e4437e634ea428': idct: Change type of array stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>
| * idct: Change type of array stride parameters to ptrdiff_tDiego Biurrun2016-09-29
| | | | | | | | ptrdiff_t is the correct type for array strides and similar.
* | Merge commit '009adfd4fbdd78a890a4a65d6f141c467bb027fa'Clément Bœsch2017-03-21
|\| | | | | | | | | | | | | * commit '009adfd4fbdd78a890a4a65d6f141c467bb027fa': x86: fpel: Remove unnecessary sign extend Merged-by: Clément Bœsch <u@pkh.me>
| * x86: fpel: Remove unnecessary sign extendDiego Biurrun2016-09-29
| |
* | Merge commit 'de2ae3c1fae5a2eb539b9abd7bc2a9ca8c286ff0'Clément Bœsch2017-03-21
|\| | | | | | | | | | | | | | | | | * commit 'de2ae3c1fae5a2eb539b9abd7bc2a9ca8c286ff0': lavc: add clobber tests for the new encoding/decoding API The merge only re-order what we already have. Merged-by: Clément Bœsch <u@pkh.me>
| * lavc: add clobber tests for the new encoding/decoding APIAnton Khirnov2016-09-28
| |
* | Merge commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5'Clément Bœsch2017-03-20
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5': audiodsp/x86: yasmify vector_clipf_sse audiodsp: reorder arguments for vector_clipf Merged the version from Libav after a discussion with James Almer on IRC: 19:22 <ubitux> jamrial: opinion on 12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5? 19:23 <ubitux> it was apparently yasmified differently 19:23 <ubitux> (it depends on the previous commit arg shuffle) 19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw 19:24 <ubitux> it's a port from 1d36defe94c7d7ebf995d4dbb4f878d06272f9c6 19:25 <jamrial> seems better thanks to said arg shuffle 19:25 <jamrial> the loop is the same, but init is simpler 19:25 <jamrial> probably worth merging 19:25 <ubitux> OK 19:25 <ubitux> thanks 19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh 19:26 <ubitux> yeah indeed Both commits are merged at the same time to prevent a conflict with our existing yasmified ff_vector_clipf_sse. Merged-by: Clément Bœsch <u@pkh.me>
| * audiodsp/x86: yasmify vector_clipf_sseAnton Khirnov2016-09-22
| |
| * audiodsp: reorder arguments for vector_clipfAnton Khirnov2016-09-22
| | | | | | | | | | | | | | This will make the x86 asm simpler. ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau <janne-libav@jannau.net>
| * blockdsp: drop the high_bit_depth parameterAnton Khirnov2016-09-22
| | | | | | | | | | It has no effect, since the code is supposed to operate the same way for any bit depth.
* | Merge commit '75d98e30afab61542faab3c0f11880834653bd6b'Clément Bœsch2017-03-20
|\| | | | | | | | | | | | | * commit '75d98e30afab61542faab3c0f11880834653bd6b': audiodsp/x86: clear the high bits of the order parameter on 64bit Merged-by: Clément Bœsch <u@pkh.me>
| * audiodsp/x86: clear the high bits of the order parameter on 64bitAnton Khirnov2016-09-19
| | | | | | | | | | | | Also change shl to add, since it can be faster on some CPUs. CC: libav-stable@libav.org
* | Merge commit '1d6c76e11febb58738c9647c47079d02b5e10094'Clément Bœsch2017-03-20
|\| | | | | | | | | | | | | | | | | | | * commit '1d6c76e11febb58738c9647c47079d02b5e10094': audiodsp/x86: fix ff_vector_clip_int32_sse2 No functionnal changes, only cosmetics. This issue was fixed in 9a9e2f1c8aa4539a261625145e5c1f46a8106ac2. Merged-by: Clément Bœsch <u@pkh.me>
| * audiodsp/x86: fix ff_vector_clip_int32_sse2Anton Khirnov2016-09-19
| | | | | | | | | | | | | | | | This version, which is the only one doing two processing cycles per loop iteration, computes the load/store indices incorrectly for the second cycle. CC: libav-stable@libav.org
* | Merge commit 'de452e503734ebb0fdbce86e9d16693b3530fad3'Clément Bœsch2017-03-20
|\| | | | | | | | | | | | | * commit 'de452e503734ebb0fdbce86e9d16693b3530fad3': pixblockdsp: Change type of stride parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>
| * pixblockdsp: Change type of stride parameters to ptrdiff_tDiego Biurrun2016-09-14
| | | | | | | | | | | | | | This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. Also adjust parameter names to be "stride" everywhere.
* | avcodec/vp9: avx2 implementation of ipred_dl_16x16_16Ilia2017-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vp9_diag_downleft_16x16_10bpp_c: 263.0 vp9_diag_downleft_16x16_10bpp_sse2: 44.7 vp9_diag_downleft_16x16_10bpp_ssse3: 32.5 vp9_diag_downleft_16x16_10bpp_avx: 31.9 vp9_diag_downleft_16x16_10bpp_avx2: 25.7 vp9_diag_downleft_16x16_12bpp_c: 264.7 vp9_diag_downleft_16x16_12bpp_sse2: 44.4 vp9_diag_downleft_16x16_12bpp_ssse3: 32.0 vp9_diag_downleft_16x16_12bpp_avx: 32.4 vp9_diag_downleft_16x16_12bpp_avx2: 25.5 Benchmarked with 10000 runs Signed-off-by: Ilia <zakne0ne@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* | h264pred: added AVX2 implementation for tm_vp8 16x16.Mirage Abeysekara2017-03-20
| | | | | | | | | | | | | | | | | | | | | | | | checkasm --bench results with 5000 runs pred16x16_tm_vp8_c: 302.8 pred16x16_tm_vp8_mmx: 101.4 pred16x16_tm_vp8_mmxext: 95.5 pred16x16_tm_vp8_sse2: 95.1 pred16x16_tm_vp8_avx2: 38.2 Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* | Merge commit '721d57e608dc4fd6c86f27c5ae76ef559d646220'James Almer2017-03-19
|\| | | | | | | | | | | | | * commit '721d57e608dc4fd6c86f27c5ae76ef559d646220': vp56: Separate VP5 and VP6 dsp initialization Merged-by: James Almer <jamrial@gmail.com>
| * vp56: Separate VP5 and VP6 dsp initializationDiego Biurrun2016-08-26
| | | | | | | | | | VP5 has no arch-specific optimizations (nor will it get some in the future), so it makes no sense to try to share dsp init code with VP6.
* | Merge commit '3fd22538bc0e0de84b31335266b4b1577d3d609e'James Almer2017-03-19
|\| | | | | | | | | | | | | * commit '3fd22538bc0e0de84b31335266b4b1577d3d609e': prores: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>
| * prores: Change type of stride parameters to ptrdiff_tDiego Biurrun2016-08-26
| | | | | | | | | | | | | | This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. Also adjust parameter names to be "linesize" everywhere.
* | Merge commit 'f81be06cf614919d71ded29b8f595bef40123ad8'James Almer2017-03-19
|\| | | | | | | | | | | | | * commit 'f81be06cf614919d71ded29b8f595bef40123ad8': cavs: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>
| * cavs: Change type of stride parameters to ptrdiff_tDiego Biurrun2016-08-26
| | | | | | | | ptrdiff_t is the correct type for array strides and similar.
* | Merge commit '802727b538b484e3f9d1345bfcc4ab24cfea8898'James Almer2017-03-19
|\| | | | | | | | | | | | | * commit '802727b538b484e3f9d1345bfcc4ab24cfea8898': vp8: Update some assembly comments left unchanged in bd66f073fe7286bd3c Merged-by: James Almer <jamrial@gmail.com>
| * vp8: Update some assembly comments left unchanged in bd66f073fe7286bd3cDiego Biurrun2016-08-26
| |
* | Merge commit 'd9d26a3674f31f482f54e936fcb382160830877a'James Almer2017-03-19
|\| | | | | | | | | | | | | * commit 'd9d26a3674f31f482f54e936fcb382160830877a': vp56: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>
| * vp56: Change type of stride parameters to ptrdiff_tDiego Biurrun2016-08-26
| | | | | | | | | | This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic.
* | Merge commit '6892df9294d93322d43255ada299507465bc93c8'Clément Bœsch2017-03-19
|\| | | | | | | | | | | | | * commit '6892df9294d93322d43255ada299507465bc93c8': vp3: Change type of stride parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>
| * vp3: Change type of stride parameters to ptrdiff_tDiego Biurrun2016-08-26
| | | | | | | | | | | | | | This avoids SIMD-optimized functions having to sign-extend their stride argument manually to be able to do pointer arithmetic. Also adjust parameter names to be "stride" everywhere.
* | Merge commit 'e2b9993558b6adee42dcc6eb385a14943aaca974'Clément Bœsch2017-03-19
|\| | | | | | | | | | | | | * commit 'e2b9993558b6adee42dcc6eb385a14943aaca974': simple_idct: x86: Drop disabled IDCT implementation Merged-by: Clément Bœsch <u@pkh.me>
| * simple_idct: x86: Drop disabled IDCT implementationDiego Biurrun2016-08-17
| | | | | | | | This gem has been disabled since 2001.
* | Merge commit 'e99ecda55082cb9dde8fd349361e169dc383943a'Clément Bœsch2017-03-16
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'e99ecda55082cb9dde8fd349361e169dc383943a': checkasm: add vp9 MC tests. vp9mc/x86: sse2 MC assembly. vp9mc/x86: add AVX and AVX2 MC vp9mc/x86: rename ff_* to ff_vp9_* vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext vp9mc/x86: simplify a few inits. vp9mc/x86: add 16px functions (64bit only). Noop (aside from a formatting comment in vp9mc.asm). We already have all of this. We should consider making a final diff between the two projects when the dust comes down. Merged-by: Clément Bœsch <u@pkh.me>
| * vp9mc/x86: sse2 MC assembly.Ronald S. Bultje2016-08-03
| | | | | | | | | | | | | | Also a slight change to the ssse3 code, which prevents a theoretical overflow in the sharp filter. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * vp9mc/x86: add AVX and AVX2 MCJames Almer2016-08-03
| | | | | | | | | | | | | | | | Roughly 25% faster MC than ssse3 for blocksizes 32 and 64. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * vp9mc/x86: rename ff_* to ff_vp9_*Clément Bœsch2016-08-03
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxextJames Almer2016-08-03
| | | | | | | | | | | | | | | | pavgb is an sse integer instruction, so the mmxext flag is enough Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * vp9mc/x86: simplify a few inits.Clément Bœsch2016-08-03
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * vp9mc/x86: add 16px functions (64bit only).Ronald S. Bultje2016-08-03
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
* | Merge commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6'Clément Bœsch2017-03-16
|\| | | | | | | | | | | | | | | | | * commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6': vp9/x86: rename vp9dsp to vp9mc File was already renamed, only the top description is updated. Merged-by: Clément Bœsch <u@pkh.me>
| * vp9/x86: rename vp9dsp to vp9mcAnton Khirnov2016-08-03
| | | | | | | | It only contains the MC SIMD, other SIMD will go into different files.
* | Merge commit '3c504bc3599f00bfc5923adc114beef34bce11d0'James Almer2017-03-15
|\| | | | | | | | | | | | | * commit '3c504bc3599f00bfc5923adc114beef34bce11d0': x86: deduplicate some constants Merged-by: James Almer <jamrial@gmail.com>
| * x86: deduplicate some constantsChristophe Gisquet2016-08-03
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
* | avcodec/x86/cavsdsp: Put MMX code under mmx checkMichael Niedermayer2017-03-06
| | | | | | | | | | | | | | Without this the FPU state becomes trashed and causes mysterious fate failures with cpuflags=0 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | avcodec/h264: enable sse2 chroma deblock/loop filter functionsJames Darnley2017-02-27
| | | | | | | | | | Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad. Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
* | avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filterJames Darnley2017-02-27
| | | | | | | | ~1.37x faster (147 vs. 108 cycles) compared to mmxext function
* | avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filterJames Darnley2017-02-27
| | | | | | | | ~1.10x faster (69 vs. 63 cycles) compared to mmxext function
* | avcodec/h264: add avx 8-bit chroma v intra deblock/loop filterJames Darnley2017-02-27
| | | | | | | | ~1.14x faster (90 vs 78 cycles) compared with mmxext
* | avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filterJames Darnley2017-02-27
| | | | | | | | ~1.21x faster (68 vs. 56 cycles) compared with mmxext function
* | avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filterJames Darnley2017-02-27
| | | | | | | | ~1.14x faster (93 vs. 81 cycles) compared with mmxext function