| Commit message (Collapse) | Author | Age |
|
|
|
| |
coefficients
|
| |
|
|
|
|
|
|
| |
Use named arguments for the functions so we can remove a define. The
stride/linesize argument is now ptrdiff_t type so we no longer need to
sign extend the register.
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
| |
About 2x faster than the c version.
|
|
|
|
| |
About 2x faster than the c version.
|
|
|
|
|
|
| |
dsp function
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
About 7% faster.
|
|
|
|
| |
Fixes trac issue 6459.
|
|
|
|
|
| |
Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
| |
The fate-aac-al_sbr_ps_04_ur test did not detect this mistake.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vp9_diag_downleft_32x32_8bpp_c: 580.2
vp9_diag_downleft_32x32_8bpp_sse2: 75.6
vp9_diag_downleft_32x32_8bpp_ssse3: 73.7
vp9_diag_downleft_32x32_8bpp_avx: 72.7
vp9_diag_downleft_32x32_10bpp_c: 1101.2
vp9_diag_downleft_32x32_10bpp_sse2: 145.4
vp9_diag_downleft_32x32_10bpp_ssse3: 137.5
vp9_diag_downleft_32x32_10bpp_avx: 134.8
vp9_diag_downleft_32x32_10bpp_avx2: 94.0
vp9_diag_downleft_32x32_12bpp_c: 1108.5
vp9_diag_downleft_32x32_12bpp_sse2: 145.5
vp9_diag_downleft_32x32_12bpp_ssse3: 137.3
vp9_diag_downleft_32x32_12bpp_avx: 135.2
vp9_diag_downleft_32x32_12bpp_avx2: 94.0
~30% faster than avx implementation
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
| |
~2% faster.
|
|
|
|
|
|
|
| |
Move the unpacking outside of the loop. 5% to 10% faster.
Suggested-by: ubitux
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
| |
About 2x faster than the c version.
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| |
| |
| | |
* commit 'b4a911c189962e563a09fb0efaf6fa9ab56263a4':
mpegvideoenc: make a table const
Merged-by: Clément Bœsch <u@pkh.me>
|
| | |
|
| |
| |
| |
| |
| |
| | |
Kaby Lake Pentium:
- ff_h264_idct_add_8_sse2: ~1.18x faster than mmxext
- ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext
|
| |
| |
| |
| |
| |
| |
| |
| | |
Haswell:
- 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext
Skylake-U:
- 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext
|
| |
| |
| |
| |
| |
| |
| |
| | |
Haswell:
- 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext
Skylake-U:
- 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
clang
compilers doing DCE at -O0 do not necessarily understand "complex" boolean expressions
Build succeeds with this change, this was the only failure
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '0a35f128f3c6e0ae9a0a2236c557602c108da269':
cabac: x86: Give optimizations header a more meaningful name
Merged-by: Clément Bœsch <u@pkh.me>
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations.
This way we don't need to use the ff_put/add_pixels_clamped function
pointers.
|
| |
| |
| |
| |
| | |
This makes using the function pointer ff_add_pixels_clamped() unnecessary,
since we always know what the best implementation is at compile-time.
|
| | |
|
| |
| |
| |
| |
| | |
Since there's separate SSE2 implementations of xvid_idct_put/add, this
patch has no practical impact on performance.
|
| |
| |
| |
| |
| |
| | |
3d6535983282bea542dac2e568ae50da5796be34
See https://lists.libav.org/pipermail/libav-devel/2016-October/079829.html
|
|\|
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '0361e4dcb4d394c88c33364415a3b8fe315b67d1':
h264_qpel: x86: Move function with only one instance out of template macro
Note: warning is present with clang.
Merged-by: Clément Bœsch <cboesch@gopro.com>
|
| |
| |
| |
| | |
libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
|
| |
| |
| |
| |
| | |
libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This makes it match the pattern already used for VP8 MC functions.
This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| | |
The advantage here is that the internal software decoder interface is
not exposed to the DSP functions or the hardware accelerations.
|
| |
| |
| |
| | |
This is following Libav layout to ease merges.
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| |
| | |
3d6535983282bea542dac2e568ae50da5796be34
Unrolling the loops triplicates the size of the assembled output
while not generating any gain in performance.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b':
hevc: x86: Add add_residual() SIMD optimizations
See a6af4bf64dae46356a5f91537a1c8c5f86456b37
This merge is only cosmetics (renames, space shuffling, etc).
The functionnal changes in the ASM are *not* merged:
- unrolling with %rep is kept
- ADD_RES_MMX_4_8 is left untouched: this needs investigation
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| |
| |
| |
| | |
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
|
| |
| |
| |
| |
| | |
Its single forward declaration can be moved to the only place
it is used, like is done for all other dsp init files.
|
| |
| |
| |
| | |
This will simplify incoming merge.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'b89804da9bad2d94dd95bf20ac6187447e9c17e9':
x86: videodsp: Add parentheses to expression to work around warning
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
|