lavc/bswapdsp: RISC-V B bswap_buf

Simply taking the Zbb REV8 instruction into use in a simple loop gives some significant savings: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 771.0 But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with just one additional shift, and one fewer load, effectively doubling the bandwidth. Consequently, this patch is useful even if the compile-time target has Zbb enabled for C code: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 341.0 (this patch) On the other hand, this approach fails miserably for bswap16_buf as the ratio of shifts and stores becomes unfavorable compared to naïve C: bswap16_buf_c: 1542.0 bswap16_buf_rvb_b: 1803.7 Unrolling to process 128 bits (4 samples) at a time actually worsens performance ever so slightly: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 408.5
author: Rémi Denis-Courmont <remi@remlab.net> 2022-10-02 14:54:59 +0300
committer: Lynne <dev@lynne.ee> 2022-10-05 08:26:19 +0200
commit: f0ef11ea835181c74afe11fea445790873d5f6bc (patch)
tree: 864452d44ab968f39d198c1a4e8ac92636cdf1e0 /libavcodec/bswapdsp.c
parent: 37d5ddc317c35bded22fee8d79020653781d8230 (diff)
1 files changed, 3 insertions, 1 deletions
diff --git a/libavcodec/bswapdsp.c b/libavcodec/bswapdsp.c
index 4c4ea10acc..f0ea2b55c5 100644
--- a/libavcodec/bswapdsp.c
+++ b/libavcodec/bswapdsp.c
@@ -51,7 +51,9 @@ av_cold void ff_bswapdsp_init(BswapDSPContext *c)
     c->bswap_buf   = bswap_buf;
     c->bswap16_buf = bswap16_buf;
 
-#if ARCH_X86
+#if ARCH_RISCV
+    ff_bswapdsp_init_riscv(c);
+#elif ARCH_X86
     ff_bswapdsp_init_x86(c);
 #endif
 }
author	Rémi Denis-Courmont <remi@remlab.net>	2022-10-02 14:54:59 +0300
committer	Lynne <dev@lynne.ee>	2022-10-05 08:26:19 +0200
commit	f0ef11ea835181c74afe11fea445790873d5f6bc (patch)
tree	864452d44ab968f39d198c1a4e8ac92636cdf1e0 /libavcodec/bswapdsp.c
parent	37d5ddc317c35bded22fee8d79020653781d8230 (diff)