| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.
|
|
|
|
|
|
|
| |
Overall almost 4% faster, idct_add down from 350 to 85 cycles, idct_dc_add
down from 83 to 30 cycles.
squash: rv34 idct rearrange partial register loads
|
|
|
|
| |
Implement 1-pass inverse transform and reconstruction for inter blocks.
|
|
|
|
|
| |
30-50% faster than the C implementation, 0.5% overall speedup on
bourne.rmvb.
|
|
|
|
|
|
|
|
|
|
|
| |
Perform dequantization while decoding coefficients instead of performing it
on the entire coefficients buffer.
Since quantized coefficients are very sparse, this usually causes a small
speedup. Speedup of around 1% on Panda board compared to the removed here
neon code. Global speedup is probably around 3%.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
Signed-off-by: Mans Rullgard <mans@mansr.com>
|