diff options
author | Christophe Gisquet <christophe.gisquet@gmail.com> | 2015-10-12 19:37:46 +0200 |
---|---|---|
committer | Michael Niedermayer <michael@niedermayer.cc> | 2015-10-13 12:51:10 +0200 |
commit | e652f69b354bc6b5819012979985794cfd2805c9 (patch) | |
tree | e4a5f232f369360fa587de0d9ccc64692004a1fb /libavcodec/x86/proresdsp.asm | |
parent | 3b336ec2fbd4b9e16144d3247428009c6fb301f0 (diff) |
x86: simple_idct10_template: fix overflow in pass
When the input of a pass has 15 or 16 bits of precision (in particular
the column pass), the addition of a bias to W4 may lead to overflows
in the input to pmaddwd.
This requires postponing the adding of the bias to after the first
butterfly. To do so, the fact that m15, unused although zeroed, is
exploited. In case the pass is safe, an address can be directly used,
and the number of xmm regs can be decreased. Otherwise, the 32bits bias
is loaded into it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Diffstat (limited to 'libavcodec/x86/proresdsp.asm')
-rw-r--r-- | libavcodec/x86/proresdsp.asm | 8 |
1 files changed, 4 insertions, 4 deletions
diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/x86/proresdsp.asm index 18cf15b3ca..3fb71badba 100644 --- a/libavcodec/x86/proresdsp.asm +++ b/libavcodec/x86/proresdsp.asm @@ -37,17 +37,17 @@ cextern pw_1019 section .text align=16 -%macro idct_put_fn 1 -cglobal prores_idct_put_10, 4, 4, %1 +%macro idct_put_fn 0 +cglobal prores_idct_put_10, 4, 4, 15 IDCT_PUT_FN pw_1, 15, pw_88, 18, pw_4, pw_1019, r3 RET %endmacro INIT_XMM sse2 -idct_put_fn 16 +idct_put_fn %if HAVE_AVX_EXTERNAL INIT_XMM avx -idct_put_fn 16 +idct_put_fn %endif %endif |