| Commit message (Collapse) | Author | Age |
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
| |
Also use a vpx prefix for them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
|
|
| |
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
vc1dsp.vc1_unescape_buffer_c: 918624.7
vc1dsp.vc1_unescape_buffer_neon: 142958.0
Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C
version can still outperform the NEON version in specific cases. The balance
between different code paths is stream-dependent, but in practice the best
case happens about 5% of the time, the worst case happens about 40% of the
time, and the complexity of the remaining cases fall somewhere in between.
Therefore, taking the average of the best and worst case timings is
probably a conservative estimate of the degree by which the NEON code
improves performance.
vc1dsp.vc1_h_loop_filter4_bestcase_c: 19.0
vc1dsp.vc1_h_loop_filter4_bestcase_neon: 48.5
vc1dsp.vc1_h_loop_filter4_worstcase_c: 144.7
vc1dsp.vc1_h_loop_filter4_worstcase_neon: 76.2
vc1dsp.vc1_h_loop_filter8_bestcase_c: 41.0
vc1dsp.vc1_h_loop_filter8_bestcase_neon: 75.0
vc1dsp.vc1_h_loop_filter8_worstcase_c: 294.0
vc1dsp.vc1_h_loop_filter8_worstcase_neon: 102.7
vc1dsp.vc1_h_loop_filter16_bestcase_c: 54.7
vc1dsp.vc1_h_loop_filter16_bestcase_neon: 130.0
vc1dsp.vc1_h_loop_filter16_worstcase_c: 569.7
vc1dsp.vc1_h_loop_filter16_worstcase_neon: 186.7
vc1dsp.vc1_v_loop_filter4_bestcase_c: 20.2
vc1dsp.vc1_v_loop_filter4_bestcase_neon: 47.2
vc1dsp.vc1_v_loop_filter4_worstcase_c: 164.2
vc1dsp.vc1_v_loop_filter4_worstcase_neon: 68.5
vc1dsp.vc1_v_loop_filter8_bestcase_c: 43.5
vc1dsp.vc1_v_loop_filter8_bestcase_neon: 55.2
vc1dsp.vc1_v_loop_filter8_worstcase_c: 316.2
vc1dsp.vc1_v_loop_filter8_worstcase_neon: 72.7
vc1dsp.vc1_v_loop_filter16_bestcase_c: 62.2
vc1dsp.vc1_v_loop_filter16_bestcase_neon: 103.7
vc1dsp.vc1_v_loop_filter16_worstcase_c: 646.5
vc1dsp.vc1_v_loop_filter16_worstcase_neon: 110.7
Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
| |
This avoids unnecessary rebuilds of most source files if only the
list of enabled components has changed, but not the other properties
of the build, set in config.h.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
| |
This reverts commit 2589060b92eeeb944c6e2b50e38412c0c5fabcf4 which was
originally to fix the FATE test. The real cause of the test breakage was
fixed in 22b7c37275c611b5417722d8941844028aed7f25.
Signed-off-by: J. Dekker <jdek@itanimul.li>
|
|
|
|
|
|
|
|
|
|
|
| |
The assembly is written assuming that the width is a multiple of 8.
However the real issue is the functions were errorneously assigned to
the 2, 4, 6 & 12 widths. This behaviour never broke the decoder as
samples which trigger the functions for these widths have not been found
in the wild. This relies on the mappings in ff_hevc_pel_weight[].
Signed-off-by: J. Dekker <jdek@itanimul.li>
|
|
|
|
|
|
|
| |
Don't use the loaded registers directly, avoiding stalls on in
order cores. Use vrhadd.u8 with q registers where easily possible.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This unbreaks the fate-checkasm-hevc_pel test on arm targets.
The assembly assumed that the width passed to the DSP functions is
a multiple of 8, while the checkasm test used other widths too.
This wasn't noticed before, because the hevc_pel checkasm tests
(that were added in 9c513edb7999a35ddcc6e3a8d984a96c8fb492a3 in
January) weren't run as part of fate until in
b492cacffd36ad4cb251ba1f13ac398318ee639a in August.
As this hasn't been an issue in practice with actual full decoding
tests, it seems like the actual decoder doesn't call these functions
with such widths. Therefore, we could alternatively fix the test
to only test things that the real decoder does, and this modification
could be reverted.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
| |
Fixes many -Warray-parameter warnings from GCC 11.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
| |
Deprecated in commits 7fc329e2dd6226dfecaa4a1d7adf353bf2773726
and 31f6a4b4b83aca1d73f3cfc99ce2b39331970bf3.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
| |
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
| |
No longer used by anything.
Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
simply too much work for code meant to be all removed anyway.
|
| |
|
|
|
|
| |
They are not properly namespaced and not intended for public use.
|
|
|
|
| |
That is a more appropriate place for it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cortex A7 A8 A9 A53 A72
get_pixels_c: 144.7 146.0 143.0 137.7 69.0
get_pixels_armv6: 112.0 106.7 90.2 95.0 72.5
get_pixels_neon: 69.0 29.7 68.7 40.2 19.0
get_pixels_unaligned_c: 144.7 146.2 143.0 137.7 69.0
get_pixels_unaligned_neon: 77.0 36.5 72.5 48.5 19.0
diff_pixels_c: 376.7 319.7 265.5 307.7 148.0
diff_pixels_armv6: 179.0 159.5 205.5 139.0 142.0
diff_pixels_neon: 69.0 40.2 77.5 53.2 26.0
diff_pixels_unaligned_c: 376.7 319.7 265.5 307.7 148.0
diff_pixels_unaligned_neon: 85.0 54.5 93.5 66.7 26.0
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix an occasional crash for hevc decoder in ARM 32 platform, the
root cause is the memory over read(read cross the memory boundary)
in SAO NENO functions ff_hevc_sao_band_filter_neon_8 and
ff_hevc_sao_edge_filter_neon_8.
After this fix, the crash disapper in the massive Android phone
test.
Signed-off-by: qoroliang <qoroliang@tencent.com>
|
|
|
|
| |
Signed-off-by: Aman Gupta <aman@tmm1.net>
|
|\
| |
| |
| |
| |
| |
| | |
* commit '0676de935b1e81bc5b5698fef3e7d48ff2ea77ff':
arm: Implement a NEON version of 422 h264_h_loop_filter_chroma
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, the 420 version was used even for 422.
This fixes occasional checkasm failures.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'cef914e08310166112ac09567e66452a7679bfc8':
arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This makes it similar to put_epel16_v6, and gives a 10-25%
speedup of this function.
Before: Cortex A7 A8 A9 A53 A72
vp8_put_epel16_h6v6_neon: 3058.0 2218.5 2459.8 2183.0 1572.2
After:
vp8_put_epel16_h6v6_neon: 2670.8 1934.2 2244.4 1729.4 1503.9
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| |
| | |
This was missed in d5d699ab6e6f8a8290748d107416fd5c19757a1b
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| | |
Signed-off-by: Meng Wang <wangmeng.kids@bytedance.com>
Reviewed-by: Shengbin Meng <shengbinmeng@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When targeting darwin, clang requires commas between arguments,
while the no-comma form is allowed for other targets.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Clang supports the macro expansion counter (used for making unique
labels within macro expansions), but not when targeting darwin.
Convert uses of the counter into normal local labels, as used
elsewhere.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'ab05d3934de8e932dbd77979a687e6598e67535c':
arm: vc1dsp: Add commas between macro arguments
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When targeting darwin, clang requires commas between arguments,
while the no-comma form is allowed for other targets.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Checkasm timings:
block size bitdepth C NEON
4 8 bit: 146.7 48.7
10 bit: 146.7 52.7
8 8 bit: 430.3 84.4
10 bit: 430.4 119.5
12 8 bit: 812.8 141.0
10 bit: 812.8 195.0
16 8 bit: 1499.1 268.0
10 bit: 1498.9 368.4
24 8 bit: 4394.2 574.8
10 bit: 3696.3 804.8
32 8 bit: 5108.6 568.9
10 bit: 4249.6 918.8
48 8 bit: 16819.6 2304.9
10 bit: 13882.0 3178.5
64 8 bit: 13490.8 1799.5
10 bit: 11018.5 2519.4
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| | |
This was originally based on libsbc, and was fully integrated into ffmpeg.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes high pitched shriek
Fixes: 25420848_1478428308873746_4255813235963330560_n.mp4
Reported-by: Dale Curtis <dalecurtis@google.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Compilation error "out of range" fixed for armeabi-v7a. Compilation failed
trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error
messages is "Offset out of range". The reason of the error is assembler LDR
directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage
in range <1k, but no such storage provided.
Based on a patch by Ihor Bobalo <bob@eleks.com>
Suggested-by: wbs
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'b487add7ecf78efda36d49815f8f8757bd24d4cb':
arm: Remove a redundant check in fmtconvert_init_arm.c
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
This was missed in e2710e790c0, where have_vfp && !have_vfpv3 were
converted into have_vfp_vm.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '9dde6ab06c48f9447cd16f39bee33569cddb7be4':
arm: Fix SIGBUS on ARM when compiled with binutils 2.29
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In binutils 2.29, the behavior of the ADR instruction changed so that 1 is
added to the address of a Thumb function (previously nothing was added). This
allows the loaded address to be passed to a BLX instruction and the correct
mode change will occur.
See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458
By using adr with a label that isn't annotated as a thumb function,
we avoid the new behaviour in binutils 2.29 and get the same behaviour
as in prior releases, and as in other assemblers (ms armasm.exe,
clang's built in assembler) - an idea that Janne Grunau came up with.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'd7320ca3ed10f0d35b3740fa03341161e74275ea':
arm: Avoid using .dn register aliases
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
clang now (in the upcoming 5.0 version) is capable of building our
arm assembly without relying on gas-preprocessor, although clang/LLVM
doesn't support .dn register aliases.
The VC1 MC assembly was only built and used if the chosen assembler
supported the .dn directives though. This was supported as long as
gas-preprocessor was used.
This means that VC1 decoding got a speed regression on clang 5.0,
unless the user manually chose using gas-preprocessor again.
By avoiding using the .dn register aliases, we can build the VC1 MC
assembly with the latest clang version.
Support for the .dn/.qn directives in clang/LLVM isn't actively planned,
see https://bugs.llvm.org/show_bug.cgi?id=18199.
This partially reverts 896a5bff64264f4d01ed98eacc97a67260c1e17e.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'ce080f47b8b55ab3d41eb00487b138d9906d114d':
hevc: Add NEON 32x32 IDCT
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '118dd4a321a2d67f67c21b076abd0b4d939ab642':
hevc: 16x16 NEON idct: Use the right element size for loads/stores
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
This doesn't change the actual behaviour of the code but improves
readability.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'edbf0fffb15dde7a1de70b05855529d5fc769f14':
hevc: Add NEON add_residual for bitdepth 10
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'e1c2453a4fac1f7116244d0d05310935c20887e6':
arm: hevc_idct: Tune the add_res_8x8 and add_res_32x32 functions
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Before: Cortex A7 A8 A9 A53
hevc_add_res_8x8_8_neon: 116.0 58.7 80.2 90.7
hevc_add_res_32x32_8_neon: 1230.0 737.5 1187.5 974.4
After:
hevc_add_res_8x8_8_neon: 97.7 57.0 73.7 80.0
hevc_add_res_32x32_8_neon: 1216.0 698.7 1127.5 827.1
Signed-off-by: Martin Storsjö <martin@martin.st>
|