summaryrefslogtreecommitdiff
path: root/libavcodec/aarch64
Commit message (Collapse)AuthorAge
...
| * aarch64: vp8: Skip saturating in shrn in ff_vp8_idct_add_neonMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | The original arm version didn't do saturation here. This probably doesn't make any difference for performance, but reduces the differences. Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit '37394ef01b040605f8e1c98e73aa12b1c0bcba07'James Almer2019-03-14
|\| | | | | | | | | | | | | * commit '37394ef01b040605f8e1c98e73aa12b1c0bcba07': aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2Martin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | This makes it similar to put_epel16_v6, and gives a large speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on A73. Before: Cortex A53 A72 A73 vp8_put_epel16_h6v6_neon: 2211.4 1586.5 1431.7 After: vp8_put_epel16_h6v6_neon: 1736.9 1522.0 1448.1 Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit 'e39a9212ab37a55b346801c77487d8a47b6f9fe2'James Almer2019-03-14
|\| | | | | | | | | | | | | * commit 'e39a9212ab37a55b346801c77487d8a47b6f9fe2': aarch64: vp8: Port bilin functions from arm version Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: vp8: Port bilin functions from arm versionMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cortex A53 A72 A73 vp8_put_bilin4_h_c: 303.8 102.2 161.8 vp8_put_bilin4_h_neon: 100.0 40.9 41.2 vp8_put_bilin4_hv_c: 322.8 201.0 305.9 vp8_put_bilin4_hv_neon: 156.8 72.6 77.0 vp8_put_bilin4_v_c: 304.7 101.7 166.5 vp8_put_bilin4_v_neon: 82.7 41.2 33.0 vp8_put_bilin8_h_c: 1192.7 352.5 623.8 vp8_put_bilin8_h_neon: 213.5 70.2 87.8 vp8_put_bilin8_hv_c: 1098.6 769.2 1041.9 vp8_put_bilin8_hv_neon: 324.0 123.5 146.0 vp8_put_bilin8_v_c: 1193.9 350.4 617.7 vp8_put_bilin8_v_neon: 183.9 60.7 64.7 vp8_put_bilin16_h_c: 2353.1 671.2 1223.3 vp8_put_bilin16_h_neon: 261.9 140.7 145.0 vp8_put_bilin16_hv_c: 2453.2 1470.9 2355.2 vp8_put_bilin16_hv_neon: 383.9 196.0 217.0 vp8_put_bilin16_v_c: 2349.3 669.8 1251.2 vp8_put_bilin16_v_neon: 202.9 110.7 96.2 Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit '58d154922707bfeb873cb3a7476e0f94b17463dd'James Almer2019-03-14
|\| | | | | | | | | | | | | * commit '58d154922707bfeb873cb3a7476e0f94b17463dd': aarch64: vp8: Port epel4 functions from arm version Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: vp8: Port epel4 functions from arm versionMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cortex A53 A72 A73 vp8_put_epel4_h4_c: 631.4 291.7 367.8 vp8_put_epel4_h4_neon: 241.0 131.0 155.7 vp8_put_epel4_h4v4_c: 967.5 529.3 667.7 vp8_put_epel4_h4v4_neon: 429.3 241.8 279.7 vp8_put_epel4_h4v6_c: 1374.7 657.5 864.5 vp8_put_epel4_h4v6_neon: 515.5 295.5 334.7 vp8_put_epel4_h6_c: 851.0 421.0 486.0 vp8_put_epel4_h6_neon: 321.5 195.0 217.7 vp8_put_epel4_h6v4_c: 1111.3 621.1 781.2 vp8_put_epel4_h6v4_neon: 539.2 328.0 365.3 vp8_put_epel4_h6v6_c: 1561.3 763.3 999.7 vp8_put_epel4_h6v6_neon: 645.5 401.0 434.7 vp8_put_epel4_v4_c: 663.8 298.3 357.0 vp8_put_epel4_v4_neon: 116.0 81.5 72.5 vp8_put_epel4_v6_c: 870.5 437.0 507.4 vp8_put_epel4_v6_neon: 147.7 108.8 92.0 Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit 'cc7ba00c35faf0478f1f56215e926f70ccb31282'James Almer2019-03-14
|\| | | | | | | | | | | | | * commit 'cc7ba00c35faf0478f1f56215e926f70ccb31282': aarch64: vp8: Port missing epel8 functions from arm version Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: vp8: Port missing epel8 functions from arm versionMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Cortex A53 A72 A73 vp8_put_epel8_h4_c: 2594.8 1159.6 1374.8 vp8_put_epel8_h4_neon: 506.4 244.2 314.0 vp8_put_epel8_h6_c: 3445.8 1677.1 1811.3 vp8_put_epel8_h6_neon: 634.4 371.7 433.0 vp8_put_epel8_v4_c: 2614.0 1174.8 1378.0 vp8_put_epel8_v4_neon: 321.0 221.7 235.8 vp8_put_epel8_v6_c: 3635.5 1703.0 2079.2 vp8_put_epel8_v6_neon: 416.9 317.0 295.5 Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit '52c9b0a6c0d02cff6caebcf6989e565e05b55200'James Almer2019-03-14
|\| | | | | | | | | | | | | * commit '52c9b0a6c0d02cff6caebcf6989e565e05b55200': aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm versionMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | Cortex A53 A72 A73 vp8_luma_dc_wht_c: 115.7 75.7 90.7 vp8_luma_dc_wht_neon: 60.7 41.2 45.7 vp8_idct_dc_add4uv_c: 376.1 262.9 282.5 vp8_idct_dc_add4uv_neon: 52.0 29.0 37.0 Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit 'c513fcd7d235aa4cef45a6c3125bd4dcc03bf276'James Almer2019-03-14
|\| | | | | | | | | | | | | * commit 'c513fcd7d235aa4cef45a6c3125bd4dcc03bf276': aarch64: vp8: Fix a typo in a comment Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: vp8: Fix a typo in a commentMartin Storsjö2019-02-19
| | | | | | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit 'f1011ea28a4048ddec97794ca3e9901474fe055f'James Almer2019-03-14
|\| | | | | | | | | | | | | * commit 'f1011ea28a4048ddec97794ca3e9901474fe055f': aarch64: vp8: Reorder the function pointer inits to match the arm original Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: vp8: Reorder the function pointer inits to match the arm originalMartin Storsjö2019-02-19
| | | | | | | | Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp8: Move the vp8dsp makefile entries to the right placesMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | | | Even if NEON would be disabled, the init functions should be built as they are called as long as ARCH_AARCH64 is set. These functions are part of a generic DSP subsytem, not tied directly to one decoder. (They should be built if the vp7 decoder is enabled, even if the vp8 decoder is disabled.) Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp8: Remove superfluous includesMartin Storsjö2019-02-19
| | | | | | | | | | | | This fixes building with MSVC, which lacks unistd.h. Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit '85bfaa4949f4afcde19061def3e8a18988964858'James Almer2019-03-14
|\| | | | | | | | | | | | | * commit '85bfaa4949f4afcde19061def3e8a18988964858': aarch64: vp8: Use the proper aarch64 form for conditional branches Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: vp8: Use the proper aarch64 form for conditional branchesMartin Storsjö2019-02-19
| | | | | | | | | | | | | | The previous form also does seem to assemble on current tools, but I think it might fail on some older aarch64 tools. Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp8: Fix assembling with armasm64Martin Storsjö2019-02-19
| | | | | | | | Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp8: Fix assembling with clangMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This also partially fixes assembling with MS armasm64 (via gas-preprocessor). The movrel macro invocations need to pass the offset via a separate parameter. Mach-o and COFF relocations don't allow a negative offset to a symbol, which is handled properly if the offset is passed via the parameter. If no offset parameter is given, the macro evaluates to something like "adrp x17, subpel_filters-16+(0)", which older clang versions also fail to parse (the older clang versions only support one single offset term, although it can be a parenthesis. Signed-off-by: Martin Storsjö <martin@martin.st>
* | Merge commit '0801853e640624537db386727b36fa97aa6258e7'James Almer2019-03-14
|\| | | | | | | | | | | | | | | | | * commit '0801853e640624537db386727b36fa97aa6258e7': libavcodec: vp8 neon optimizations for aarch64 See 833fed5253617924c41132e0ab261c1d8c076360 Merged-by: James Almer <jamrial@gmail.com>
| * libavcodec: vp8 neon optimizations for aarch64Magnus Röös2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Partial port of the ARM Neon for aarch64. Benchmarks from fate: benchmarking with Linux Perf Monitoring API nop: 58.6 checkasm: using random seed 1760970128 NEON: - vp8dsp.idct [OK] - vp8dsp.mc [OK] - vp8dsp.loopfilter [OK] checkasm: all 21 tests passed vp8_idct_add_c: 201.6 vp8_idct_add_neon: 83.1 vp8_idct_dc_add_c: 107.6 vp8_idct_dc_add_neon: 33.8 vp8_idct_dc_add4y_c: 426.4 vp8_idct_dc_add4y_neon: 59.4 vp8_loop_filter8uv_h_c: 688.1 vp8_loop_filter8uv_h_neon: 216.3 vp8_loop_filter8uv_inner_h_c: 649.3 vp8_loop_filter8uv_inner_h_neon: 195.3 vp8_loop_filter8uv_inner_v_c: 544.8 vp8_loop_filter8uv_inner_v_neon: 131.3 vp8_loop_filter8uv_v_c: 706.1 vp8_loop_filter8uv_v_neon: 141.1 vp8_loop_filter16y_h_c: 668.8 vp8_loop_filter16y_h_neon: 242.8 vp8_loop_filter16y_inner_h_c: 647.3 vp8_loop_filter16y_inner_h_neon: 224.6 vp8_loop_filter16y_inner_v_c: 647.8 vp8_loop_filter16y_inner_v_neon: 128.8 vp8_loop_filter16y_v_c: 721.8 vp8_loop_filter16y_v_neon: 154.3 vp8_loop_filter_simple_h_c: 387.8 vp8_loop_filter_simple_h_neon: 187.6 vp8_loop_filter_simple_v_c: 384.1 vp8_loop_filter_simple_v_neon: 78.6 vp8_put_epel8_h4v4_c: 3971.1 vp8_put_epel8_h4v4_neon: 855.1 vp8_put_epel8_h4v6_c: 5060.1 vp8_put_epel8_h4v6_neon: 989.6 vp8_put_epel8_h6v4_c: 4320.8 vp8_put_epel8_h6v4_neon: 1007.3 vp8_put_epel8_h6v6_c: 5449.3 vp8_put_epel8_h6v6_neon: 1158.1 vp8_put_epel16_h6_c: 6683.8 vp8_put_epel16_h6_neon: 831.8 vp8_put_epel16_h6v6_c: 11110.8 vp8_put_epel16_h6v6_neon: 2214.8 vp8_put_epel16_v6_c: 7024.8 vp8_put_epel16_v6_neon: 799.6 vp8_put_pixels8_c: 112.8 vp8_put_pixels8_neon: 78.1 vp8_put_pixels16_c: 131.3 vp8_put_pixels16_neon: 129.8 This contains a fix to include guards by Carl Eugen Hoyos. Signed-off-by: Martin Storsjö <martin@martin.st>
* | lavc/aarch64/h264dsp_init: Only use neon horizontal intra loopfilter for 4:2:0.Carl Eugen Hoyos2019-02-20
| |
* | aarch64/h264dsp: change loop filter stride argument to ptrdiff_tJames Almer2019-02-20
| | | | | | | | | | | | This was missed in d5d699ab6e6f8a8290748d107416fd5c19757a1b Signed-off-by: James Almer <jamrial@gmail.com>
* | Merge commit '28a8b5413b64b831dfb8650208bccd8b78360484'James Almer2019-02-20
|\| | | | | | | | | | | | | * commit '28a8b5413b64b831dfb8650208bccd8b78360484': h264/aarch64: add intra loop filter neon asm Merged-by: James Almer <jamrial@gmail.com>
| * h264/aarch64: add intra loop filter neon asmJanne Grunau2019-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported (x264 uses nv12 chroma) and optimized. Cycle count for checkasm --bench on a Snapdragon 820e: h264_h_loop_filter_luma_intra_8bpp_c: 60.0 h264_h_loop_filter_luma_intra_8bpp_neon: 54.2 h264_v_loop_filter_luma_intra_8bpp_c: 148.3 h264_v_loop_filter_luma_intra_8bpp_neon: 73.8 h264_h_loop_filter_chroma_intra_8bpp_c: 27.8 h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4 h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8 h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7 h264_v_loop_filter_chroma_intra_8bpp_c: 45.8 h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
* | Merge commit '846c3d6aca5484904e60946c4fe8b8833bc07f92'James Almer2019-02-20
|\| | | | | | | | | | | | | * commit '846c3d6aca5484904e60946c4fe8b8833bc07f92': h264/aarch64: optimize neon loop filter Merged-by: James Almer <jamrial@gmail.com>
| * h264/aarch64: optimize neon loop filterJanne Grunau2019-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Exit as soon as possible if no filtering will be done. Improves the checkasm --bench cycle count on a Snapdragon 820e: h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5 h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3 h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5 h264_v_loop_filter_luma_8bpp_neon: 62.9 -> 60.9 h264_h_loop_filter_chroma_8bpp_c: 30.2 -> 30.3 h264_h_loop_filter_chroma_8bpp_neon: 51.6 -> 25.7 h264_v_loop_filter_chroma_8bpp_c: 57.3 -> 57.3 h264_v_loop_filter_chroma_8bpp_neon: 28.0 -> 24.0
* | Merge commit 'bb515e3a735f526ccb1068031e289eb5aeb69e22'James Almer2019-02-20
|\| | | | | | | | | | | | | * commit 'bb515e3a735f526ccb1068031e289eb5aeb69e22': h264/aarch64: sign extend int stride in loop filter asm Merged-by: James Almer <jamrial@gmail.com>
| * h264/aarch64: sign extend int stride in loop filter asmJanne Grunau2019-01-26
| |
* | aarch64: vp8: Move the vp8dsp makefile entries to the right placesMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | Even if NEON would be disabled, the init functions should be built as they are called as long as ARCH_AARCH64 is set. These functions are part of a generic DSP subsytem, not tied directly to one decoder. (They should be built if the vp7 decoder is enabled, even if the vp8 decoder is disabled.) Signed-off-by: Martin Storsjö <martin@martin.st> (cherry picked from commit b4b27dce95a6d40bfcd78043d3abec7d80dae143)
* | aarch64: vp8: Remove superfluous includesMartin Storsjö2019-02-19
| | | | | | | | | | | | | | This fixes building with MSVC, which lacks unistd.h. Signed-off-by: Martin Storsjö <martin@martin.st> (cherry picked from commit ad32f7b1264dbc614f0db1c443d5361420e9e07e)
* | aarch64: vp8: Fix assembling with armasm64Martin Storsjö2019-02-19
| | | | | | | | | | Signed-off-by: Martin Storsjö <martin@martin.st> (cherry picked from commit 2eeac79936e83c4495cbe5905064ab797e9b45ff)
* | aarch64: vp8: Fix assembling with clangMartin Storsjö2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This also partially fixes assembling with MS armasm64 (via gas-preprocessor). The movrel macro invocations need to pass the offset via a separate parameter. Mach-o and COFF relocations don't allow a negative offset to a symbol, which is handled properly if the offset is passed via the parameter. If no offset parameter is given, the macro evaluates to something like "adrp x17, subpel_filters-16+(0)", which older clang versions also fail to parse (the older clang versions only support one single offset term, although it can be a parenthesis. Signed-off-by: Martin Storsjö <martin@martin.st> (cherry picked from commit 26d7af4c381ee3c7b13b032b3817168b84b98ca6)
* | lavc/aarch64/vp8dsp: Fix the include guard.Carl Eugen Hoyos2019-01-31
| | | | | | | | Fixes fate-source.
* | libavcodec: vp8 neon optimizations for aarch64Magnus Röös2019-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Partial port of the ARM Neon for aarch64. Benchmarks from fate: benchmarking with Linux Perf Monitoring API nop: 58.6 checkasm: using random seed 1760970128 NEON: - vp8dsp.idct [OK] - vp8dsp.mc [OK] - vp8dsp.loopfilter [OK] checkasm: all 21 tests passed vp8_idct_add_c: 201.6 vp8_idct_add_neon: 83.1 vp8_idct_dc_add_c: 107.6 vp8_idct_dc_add_neon: 33.8 vp8_idct_dc_add4y_c: 426.4 vp8_idct_dc_add4y_neon: 59.4 vp8_loop_filter8uv_h_c: 688.1 vp8_loop_filter8uv_h_neon: 216.3 vp8_loop_filter8uv_inner_h_c: 649.3 vp8_loop_filter8uv_inner_h_neon: 195.3 vp8_loop_filter8uv_inner_v_c: 544.8 vp8_loop_filter8uv_inner_v_neon: 131.3 vp8_loop_filter8uv_v_c: 706.1 vp8_loop_filter8uv_v_neon: 141.1 vp8_loop_filter16y_h_c: 668.8 vp8_loop_filter16y_h_neon: 242.8 vp8_loop_filter16y_inner_h_c: 647.3 vp8_loop_filter16y_inner_h_neon: 224.6 vp8_loop_filter16y_inner_v_c: 647.8 vp8_loop_filter16y_inner_v_neon: 128.8 vp8_loop_filter16y_v_c: 721.8 vp8_loop_filter16y_v_neon: 154.3 vp8_loop_filter_simple_h_c: 387.8 vp8_loop_filter_simple_h_neon: 187.6 vp8_loop_filter_simple_v_c: 384.1 vp8_loop_filter_simple_v_neon: 78.6 vp8_put_epel8_h4v4_c: 3971.1 vp8_put_epel8_h4v4_neon: 855.1 vp8_put_epel8_h4v6_c: 5060.1 vp8_put_epel8_h4v6_neon: 989.6 vp8_put_epel8_h6v4_c: 4320.8 vp8_put_epel8_h6v4_neon: 1007.3 vp8_put_epel8_h6v6_c: 5449.3 vp8_put_epel8_h6v6_neon: 1158.1 vp8_put_epel16_h6_c: 6683.8 vp8_put_epel16_h6_neon: 831.8 vp8_put_epel16_h6v6_c: 11110.8 vp8_put_epel16_h6v6_neon: 2214.8 vp8_put_epel16_v6_c: 7024.8 vp8_put_epel16_v6_neon: 799.6 vp8_put_pixels8_c: 112.8 vp8_put_pixels8_neon: 78.1 vp8_put_pixels16_c: 131.3 vp8_put_pixels16_neon: 129.8 Signed-off-by: Magnus Röös <mla2.roos@gmail.com>
* | libavcodec: Remove dynamic relocs from aarch64/h264idct_neon.SManoj Gupta2019-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the assembly functions e.g. ff_h264_idct_dc_add_neon has code like: movrel x14, X(ff_h264_idct_add_neon) Linker cannot resolve them fully at link time and emits dynamic relocations. Use explicit labels instead so that no dynamic relocations are needed at all. This avoids lld complains about text relocations. For background, see https://crbug.com/917919 Signed-off-by: Manoj Gupta <manojgupta@chromium.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | lavc/aarch64/h264dsp_init_aarch64: Fix weight function prototypes.Carl Eugen Hoyos2018-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes the following warnings: libavcodec/aarch64/h264dsp_init_aarch64.c: In function ‘ff_h264dsp_init_aarch64’: libavcodec/aarch64/h264dsp_init_aarch64.c:84:38: warning: assignment from incompatible pointer type [enabled by default] c->weight_h264_pixels_tab[0] = ff_weight_h264_pixels_16_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:85:38: warning: assignment from incompatible pointer type [enabled by default] c->weight_h264_pixels_tab[1] = ff_weight_h264_pixels_8_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:86:38: warning: assignment from incompatible pointer type [enabled by default] c->weight_h264_pixels_tab[2] = ff_weight_h264_pixels_4_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:88:40: warning: assignment from incompatible pointer type [enabled by default] c->biweight_h264_pixels_tab[0] = ff_biweight_h264_pixels_16_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:89:40: warning: assignment from incompatible pointer type [enabled by default] c->biweight_h264_pixels_tab[1] = ff_biweight_h264_pixels_8_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:90:40: warning: assignment from incompatible pointer type [enabled by default] c->biweight_h264_pixels_tab[2] = ff_biweight_h264_pixels_4_neon; ^
* | lavc/aarch64/sbrdsp_neon: fix build on old binutilsRodger Combs2018-01-26
| |
* | Merge commit '732510636e597585a79be7d111c88b3f7e174fe7'James Almer2017-11-11
|\| | | | | | | | | | | | | * commit '732510636e597585a79be7d111c88b3f7e174fe7': aarch64: Remove a dot from a label Merged-by: James Almer <jamrial@gmail.com>
| * aarch64: Remove a dot from a labelMartin Storsjö2017-10-18
| | | | | | | | | | | | This fixes building with armasm64 (when run through gas-preprocessor). Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp9: Fix assembling with Xcode 6.2 and olderMemphiz2017-06-20
| | | | | | | | | | | | | | | | | | Properly use the b.eq/b.ge forms instead of the nonstandard forms (which both gas and newer clang accept though), and expand the register list that used a range (which the Xcode 6.2 clang, based on clang 3.5 svn, didn't support). Signed-off-by: Martin Storsjö <martin@martin.st>
| * arm/aarch64: vp9: Fix vertical alignmentMartin Storsjö2017-03-16
| | | | | | | | | | | | | | | | | | | | Align the second/third operands as they usually are. Due to the wildly varying sizes of the written out operands in aarch64 assembly, the column alignment is usually not as clear as in arm assembly. Signed-off-by: Martin Storsjö <martin@martin.st>
| * arm/aarch64: vp9itxfm: Skip loading the min_eob pointer when it won't be usedMartin Storsjö2017-03-11
| | | | | | | | | | | | | | In the half/quarter cases where we don't use the min_eob array, defer loading the pointer until we know it will be needed. Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp9itxfm: Reorder iadst16 coeffsMartin Storsjö2017-02-24
| | | | | | | | | | | | | | | | | | | | | | | | This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp9itxfm: Reorder the idct coefficients for better pairingMartin Storsjö2017-02-24
| | | | | | | | | | | | | | | | | | | | | | | | All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp9itxfm: Avoid reloading the idct32 coefficientsMartin Storsjö2017-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idct32x32 function actually pushed d8-d15 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. After this, we still can skip pushing d12-d15. Before: vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3 Signed-off-by: Martin Storsjö <martin@martin.st>
| * aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1Martin Storsjö2017-02-24
| | | | | | | | | | | | | | | | | | | | | | This is one cycle faster in total, and three instructions fewer. Before: vp9_loop_filter_mix2_v_44_16_neon: 123.2 After: vp9_loop_filter_mix2_v_44_16_neon: 122.2 Signed-off-by: Martin Storsjö <martin@martin.st>
| * arm/aarch64: vp9lpf: Keep the comparison to E within 8 bitMartin Storsjö2017-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The theoretical maximum value of E is 193, so we can just saturate the addition to 255. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 After: vp9_loop_filter_v_4_8_neon: 136.0 125.7 112.6 84.0 83.0 vp9_loop_filter_v_8_8_neon: 234.0 195.5 171.5 136.0 133.7 vp9_loop_filter_v_16_8_neon: 490.0 417.5 377.7 289.0 271.0 vp9_loop_filter_v_16_16_neon: 951.2 814.7 732.3 571.0 446.7 Signed-off-by: Martin Storsjö <martin@martin.st>