diff options
author | Rémi Denis-Courmont <remi@remlab.net> | 2022-10-02 14:54:59 +0300 |
---|---|---|
committer | Lynne <dev@lynne.ee> | 2022-10-05 08:26:19 +0200 |
commit | f0ef11ea835181c74afe11fea445790873d5f6bc (patch) | |
tree | 864452d44ab968f39d198c1a4e8ac92636cdf1e0 /libavcodec/bswapdsp.c | |
parent | 37d5ddc317c35bded22fee8d79020653781d8230 (diff) |
lavc/bswapdsp: RISC-V B bswap_buf
Simply taking the Zbb REV8 instruction into use in a simple loop gives
some significant savings:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 771.0
But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with
just one additional shift, and one fewer load, effectively doubling the
bandwidth. Consequently, this patch is useful even if the compile-time
target has Zbb enabled for C code:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 341.0 (this patch)
On the other hand, this approach fails miserably for bswap16_buf as the
ratio of shifts and stores becomes unfavorable compared to naïve C:
bswap16_buf_c: 1542.0
bswap16_buf_rvb_b: 1803.7
Unrolling to process 128 bits (4 samples) at a time actually worsens
performance ever so slightly:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 408.5
Diffstat (limited to 'libavcodec/bswapdsp.c')
-rw-r--r-- | libavcodec/bswapdsp.c | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/libavcodec/bswapdsp.c b/libavcodec/bswapdsp.c index 4c4ea10acc..f0ea2b55c5 100644 --- a/libavcodec/bswapdsp.c +++ b/libavcodec/bswapdsp.c @@ -51,7 +51,9 @@ av_cold void ff_bswapdsp_init(BswapDSPContext *c) c->bswap_buf = bswap_buf; c->bswap16_buf = bswap16_buf; -#if ARCH_X86 +#if ARCH_RISCV + ff_bswapdsp_init_riscv(c); +#elif ARCH_X86 ff_bswapdsp_init_x86(c); #endif } |