swscale/aarch64: add hscale specializations

This patch adds code to support specializations of the hscale function and adds a specialization for filterSize == 4. ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is loading the data from src, this data is loaded a whole block ahead and stored back to the stack to be loaded again with ld4. This arranges the data for most efficient use of the vector instructions and removes the need for completion adds at the end. The number of iterations of the C per iteration of the assembly is increased from 4 to 8, but because of the prefetching, there must be a special section without prefetching when dstW < 16. This improves speed on Graviton 2 (Neoverse N1) dramatically in the case where previously fs=8 would have been required. before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8 after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
author: Swinney, Jonathan <jswinney@amazon.com> 2022-05-26 02:02:13 +0000
committer: Martin Storsjö <martin@martin.st> 2022-05-28 01:09:05 +0300
commit: 0ea61725b1bd35f47d0ebc49597e73e6798c553d (patch)
tree: b3e4fc037138a450c0538bfc74cc4665b5c9fc3e /libswscale/utils.c
parent: 92ea8e03dfc17cc580e8c6e0fb1923a2c02f68aa (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/libswscale/utils.c b/libswscale/utils.c
index ffa130524a..105781c4f4 100644
--- a/libswscale/utils.c
+++ b/libswscale/utils.c
@@ -1820,7 +1820,7 @@ av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter,
         {
             const int filterAlign = X86_MMX(cpu_flags)     ? 4 :
                                     PPC_ALTIVEC(cpu_flags) ? 8 :
-                                    have_neon(cpu_flags)   ? 8 : 1;
+                                    have_neon(cpu_flags)   ? 4 : 1;
 
             if ((ret = initFilter(&c->hLumFilter, &c->hLumFilterPos,
                            &c->hLumFilterSize, c->lumXInc,
author	Swinney, Jonathan <jswinney@amazon.com>	2022-05-26 02:02:13 +0000
committer	Martin Storsjö <martin@martin.st>	2022-05-28 01:09:05 +0300
commit	0ea61725b1bd35f47d0ebc49597e73e6798c553d (patch)
tree	b3e4fc037138a450c0538bfc74cc4665b5c9fc3e /libswscale/utils.c
parent	92ea8e03dfc17cc580e8c6e0fb1923a2c02f68aa (diff)