| Commit message (Collapse) | Author | Age |
|\
| |
| |
| |
| |
| |
| | |
* commit '2ec9fa5ec60dcd10e1cb10d8b4e4437e634ea428':
idct: Change type of array stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
ptrdiff_t is the correct type for array strides and similar.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '009adfd4fbdd78a890a4a65d6f141c467bb027fa':
x86: fpel: Remove unnecessary sign extend
Merged-by: Clément Bœsch <u@pkh.me>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'de2ae3c1fae5a2eb539b9abd7bc2a9ca8c286ff0':
lavc: add clobber tests for the new encoding/decoding API
The merge only re-order what we already have.
Merged-by: Clément Bœsch <u@pkh.me>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5':
audiodsp/x86: yasmify vector_clipf_sse
audiodsp: reorder arguments for vector_clipf
Merged the version from Libav after a discussion with James Almer on
IRC:
19:22 <ubitux> jamrial: opinion on 12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5?
19:23 <ubitux> it was apparently yasmified differently
19:23 <ubitux> (it depends on the previous commit arg shuffle)
19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw
19:24 <ubitux> it's a port from 1d36defe94c7d7ebf995d4dbb4f878d06272f9c6
19:25 <jamrial> seems better thanks to said arg shuffle
19:25 <jamrial> the loop is the same, but init is simpler
19:25 <jamrial> probably worth merging
19:25 <ubitux> OK
19:25 <ubitux> thanks
19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh
19:26 <ubitux> yeah indeed
Both commits are merged at the same time to prevent a conflict with our
existing yasmified ff_vector_clipf_sse.
Merged-by: Clément Bœsch <u@pkh.me>
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
This will make the x86 asm simpler.
ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau
<janne-libav@jannau.net>
|
| |
| |
| |
| |
| | |
It has no effect, since the code is supposed to operate the same way for
any bit depth.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '75d98e30afab61542faab3c0f11880834653bd6b':
audiodsp/x86: clear the high bits of the order parameter on 64bit
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| |
| |
| | |
Also change shl to add, since it can be faster on some CPUs.
CC: libav-stable@libav.org
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '1d6c76e11febb58738c9647c47079d02b5e10094':
audiodsp/x86: fix ff_vector_clip_int32_sse2
No functionnal changes, only cosmetics. This issue was fixed in
9a9e2f1c8aa4539a261625145e5c1f46a8106ac2.
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This version, which is the only one doing two processing cycles per loop
iteration, computes the load/store indices incorrectly for the second
cycle.
CC: libav-stable@libav.org
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'de452e503734ebb0fdbce86e9d16693b3530fad3':
pixblockdsp: Change type of stride parameters to ptrdiff_t
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| |
| |
| |
| | |
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "stride" everywhere.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
vp9_diag_downleft_16x16_10bpp_c: 263.0
vp9_diag_downleft_16x16_10bpp_sse2: 44.7
vp9_diag_downleft_16x16_10bpp_ssse3: 32.5
vp9_diag_downleft_16x16_10bpp_avx: 31.9
vp9_diag_downleft_16x16_10bpp_avx2: 25.7
vp9_diag_downleft_16x16_12bpp_c: 264.7
vp9_diag_downleft_16x16_12bpp_sse2: 44.4
vp9_diag_downleft_16x16_12bpp_ssse3: 32.0
vp9_diag_downleft_16x16_12bpp_avx: 32.4
vp9_diag_downleft_16x16_12bpp_avx2: 25.5
Benchmarked with 10000 runs
Signed-off-by: Ilia <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
checkasm --bench results with 5000 runs
pred16x16_tm_vp8_c: 302.8
pred16x16_tm_vp8_mmx: 101.4
pred16x16_tm_vp8_mmxext: 95.5
pred16x16_tm_vp8_sse2: 95.1
pred16x16_tm_vp8_avx2: 38.2
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '721d57e608dc4fd6c86f27c5ae76ef559d646220':
vp56: Separate VP5 and VP6 dsp initialization
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| | |
VP5 has no arch-specific optimizations (nor will it get some in the
future), so it makes no sense to try to share dsp init code with VP6.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '3fd22538bc0e0de84b31335266b4b1577d3d609e':
prores: Change type of stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "linesize" everywhere.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'f81be06cf614919d71ded29b8f595bef40123ad8':
cavs: Change type of stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
ptrdiff_t is the correct type for array strides and similar.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '802727b538b484e3f9d1345bfcc4ab24cfea8898':
vp8: Update some assembly comments left unchanged in bd66f073fe7286bd3c
Merged-by: James Almer <jamrial@gmail.com>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'd9d26a3674f31f482f54e936fcb382160830877a':
vp56: Change type of stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| | |
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '6892df9294d93322d43255ada299507465bc93c8':
vp3: Change type of stride parameters to ptrdiff_t
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| |
| |
| |
| | |
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "stride" everywhere.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'e2b9993558b6adee42dcc6eb385a14943aaca974':
simple_idct: x86: Drop disabled IDCT implementation
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| | |
This gem has been disabled since 2001.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'e99ecda55082cb9dde8fd349361e169dc383943a':
checkasm: add vp9 MC tests.
vp9mc/x86: sse2 MC assembly.
vp9mc/x86: add AVX and AVX2 MC
vp9mc/x86: rename ff_* to ff_vp9_*
vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
vp9mc/x86: simplify a few inits.
vp9mc/x86: add 16px functions (64bit only).
Noop (aside from a formatting comment in vp9mc.asm). We already have all
of this. We should consider making a final diff between the two projects
when the dust comes down.
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| |
| |
| |
| | |
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| | |
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
pavgb is an sse integer instruction, so the mmxext flag is enough
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| | |
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| | |
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6':
vp9/x86: rename vp9dsp to vp9mc
File was already renamed, only the top description is updated.
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| | |
It only contains the MC SIMD, other SIMD will go into different files.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '3c504bc3599f00bfc5923adc114beef34bce11d0':
x86: deduplicate some constants
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| |
| | |
Without this the FPU state becomes trashed and causes mysterious
fate failures with cpuflags=0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| | |
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad.
Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
|
| |
| |
| |
| | |
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
|
| |
| |
| |
| | |
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
|
| |
| |
| |
| | |
~1.14x faster (90 vs 78 cycles) compared with mmxext
|
| |
| |
| |
| | |
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
|
| |
| |
| |
| | |
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
|