aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 - libav.git

diff options

author	Martin Storsjö <martin@martin.st>	2016-11-18 12:26:04 +0200
committer	Martin Storsjö <martin@martin.st>	2016-11-30 23:57:05 +0200
commit	cad42fadcd2c2ae1b3676bb398844a1f521a2d7b (patch)
tree	f345f4a01a70a84dc6abf6a3cce6cc1ad984f887 /libavformat/img2dec.c
parent	9c8bc74c2b40537b0997f646c87c008042d788c2 (diff)

aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32

This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. Signed-off-by: Martin Storsjö <martin@martin.st>

Diffstat (limited to 'libavformat/img2dec.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: