| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
ff_hevc_put_hevc_qpel_h64_8_sse4 56782981
ff_hevc_put_hevc_qpel_h64_8_avx2 40097816
ff_hevc_put_hevc_qpel_h64_8_avx512icl 25488576
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
|
|
|
| |
ff_hevc_put_hevc_qpel_h32_8_sse4 14122151
ff_hevc_put_hevc_qpel_h32_8_avx2 9337675
ff_hevc_put_hevc_qpel_h32_8_avx512icl 6424654
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
|
|
| |
ff_hevc_put_hevc_qpel_h4_8_sse4 993694
ff_hevc_put_hevc_qpel_h4_8_avx512icl 686647
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
|
|
| |
ff_hevc_put_hevc_qpel_h16_8_sse4 3290870
ff_hevc_put_hevc_qpel_h16_8_avx512icl 1730033
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit uses the instruction `vpdpbusd` introduced by AVX512 VNNI
to calculate the horizontal filter.
ff_hevc_put_hevc_qpel_h8_8_sse4 1039169
ff_hevc_put_hevc_qpel_h8_8_avx512icl 677153
ff_hevc_put_hevc_qpel_hv8_8_sse4 3603511
ff_hevc_put_hevc_qpel_hv8_8_avx512icl 2995354
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b':
hevc: x86: Add add_residual() SIMD optimizations
See a6af4bf64dae46356a5f91537a1c8c5f86456b37
This merge is only cosmetics (renames, space shuffling, etc).
The functionnal changes in the ASM are *not* merged:
- unrolling with %rep is kept
- ADD_RES_MMX_4_8 is left untouched: this needs investigation
Merged-by: Clément Bœsch <u@pkh.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* commit 'fca3c3b61952aacc45e9ca54d86a762946c21942':
hevc: Add AVX2 DC IDCT
Mostly noop as we already have that code.
In the ASM, code is merged with the exception of SECTION which is kept
uppercase for consistency with the rest of the codebase.
Still in the ASM, the prototype comment is fixed to honor the '_' added
from the original commit.
idct_dc_proto() is dropped as it's not used anymore here.
Merged-by: Clément Bœsch <cboesch@gopro.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d':
hevc: Separate adding residual to prediction from IDCT
This commit should be a noop but isn't because of the following renames:
- transform_add → add_residual
- transform_skip → dequant
- idct_4x4_luma → transform_4x4_luma
Merged-by: Clément Bœsch <cboesch@gopro.com>
|
|
|
|
|
|
|
| |
The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to
get the value in bytes).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
before
33304 decicycles in luma_bi_1, 523066 runs, 1222 skips
38138 decicycles in luma_bi_2, 523427 runs, 861 skips
13490 decicycles in luma_uni, 516138 runs, 8150 skips
after
20185 decicycles in luma_bi_1, 519970 runs, 4318 skips
24620 decicycles in luma_bi_2, 521024 runs, 3264 skips
10397 decicycles in luma_uni, 515715 runs, 8573 skips
Conflicts:
libavcodec/x86/hevc_mc.asm
libavcodec/x86/hevcdsp_init.c
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
| |
~20% faster than AVX.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
| |
The dststride parameter is always MAX_PB_SIZE.
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
|
| |
~15% faster than sse2
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
| |
Reviewed-by: James Almer <jamrial@gmail.com>
Approved-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).
Benchmarks on an Intel Core i5-4200U:
idct8x8_dc
SSE2 MMXEXT C
cycles 22 26 57
idct16x16_dc
AVX2 SSE2 C
cycles 27 32 249
idct32x32_dc
AVX2 SSE2 C
cycles 62 126 1375
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
| |
cherry picked from commit 3fcb7a4595a6f40100a22110a5805e3b7510c0fd
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
|
| |
Including stdint.h was enough for systems like Mingw, but apparently not for Linux.
This should fix make checkheaders failures on every platform
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
| |
Fixes make checkheaders
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
| |
not sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
pretty print x86
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|