| Commit message (Collapse) | Author | Age |
... | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
9680 decicycles in loop_filter_v_88_16_c, 4193765 runs, 539 skips
9233 decicycles in loop_filter_h_88_16_c, 4193751 runs, 553 skips
1929 decicycles in ff_vp9_loop_filter_v_88_16_ssse3, 4194118 runs, 186 skips
2738 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193861 runs, 443 skips
5.978 → 5.417 overall decode time on ped1080p.webm (-threads 1)
Adding SSE2 support should be relatively trivial (just a matter of
changing the pshufb [mask_mix] with something else), patch welcome.
|
| | |
| | |
| | |
| | |
| | |
| | | |
version>2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
AV_CH_LAYOUT_7POINT1_WIDE_BACK
This was suggested by Rodeo on IRC
<Rodeo> for consistency with the rest, MODE_7_1_FRONT_CENTER would be AV_CH_LAYOUT_7POINT1_WIDE_BACK (since LS+RS is mapped to back channels in other modes)
Reviewed-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
AV_CH_LAYOUT_7POINT1
This was suggested by Rodeo on IRC
<Rodeo> sorry, I meant MODE_7_1_REAR_SURROUND would probably be AV_CH_LAYOUT_7POINT1
Reviewed-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
| | |
| | |
| | | |
Allow some macro refactoring in filter14().
|
| | | |
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| | |
Introduce 2 additional registers for stride3 and mstride3 to allow
direct accesses (lea drops).
3931 → 3827 decicycles in ff_vp9_loop_filter_v_16_16_ssse3
Also uses defines to clarify the code.
|
| |
| |
| |
| |
| |
| | |
No testcase known.
Reviewed-by: Michael Bradshaw
|
| |
| |
| |
| |
| |
| | |
Suggested by heleppkes on https://trac.ffmpeg.org/ticket/3133
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes invalid reads and crashes in vp90-2-05-resize.webm and fuzzed6.ivf.
The output is still not identical to what libvpx does (because we don't
actually scale in MC).
Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Prevents some invalid memory accesses after resolution change in
vp90-2-05-resize.webm, and libvpx does this too.
Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
97479 -> 54891 decicycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
truncating
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
7.1(wide) and 7.1(wide-side) channel layouts are supported in fdk_aac since october 2013 (commit fa3eba1644)
Signed-off-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes the speed regression from 20626f53e9f41cb3db82329ed3db7d773cfa3a8f
and still checks sufficiently to prevent out of allocated memory accesses
due to the index
Before:
1823 decicycles in mpeg2_fast_decode_block_non_intra, 8388493 runs, 115 skips
After:
1808 decicycles in mpeg2_fast_decode_block_non_intra, 8388494 runs, 114 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes the speed regression from 20626f53e9f41cb3db82329ed3db7d773cfa3a8f
and still checks sufficiently to prevent out of allocated memory accesses
due to the index
Before:
1681 decicycles in mpeg2_fast_decode_block_intra, 4194238 runs, 66 skips
After:
1658 decicycles in mpeg2_fast_decode_block_intra, 4194248 runs, 56 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '6d93307f8df81808f0dcdbc064b848054a6e83b3':
mpeg12: check scantable indices in all decode_block functions
Benchmarks
Before:
1878 decicycles in mpeg2_decode_block_non_intra, 8388487 runs, 121 skips
1700 decicycles in mpeg2_decode_block_intra, 4194239 runs, 65 skips
1808 decicycles in mpeg2_fast_decode_block_non_intra, 8388492 runs, 116 skips
1669 decicycles in mpeg2_fast_decode_block_intra, 4194248 runs, 56 skips
--
2056 decicycles in mpeg1_decode_block_inter, 65535 runs, 1 skips
2346 decicycles in mpeg1_decode_block_intra, 32768 runs, 0 skips
2011 decicycles in mpeg1_fast_decode_block_inter, 65533 runs, 3 skips
----------------
After:
1858 decicycles in mpeg2_decode_block_non_intra, 8388490 runs, 118 skips
1691 decicycles in mpeg2_decode_block_intra, 4194233 runs, 71 skips
1823 decicycles in mpeg2_fast_decode_block_non_intra, 8388493 runs, 115 skips
1681 decicycles in mpeg2_fast_decode_block_intra, 4194238 runs, 66 skips
--
2010 decicycles in mpeg1_decode_block_inter, 65535 runs, 1 skips
2322 decicycles in mpeg1_decode_block_intra, 32766 runs, 2 skips
1995 decicycles in mpeg1_fast_decode_block_inter, 65535 runs, 1 skips
All benchmarks are the best scores of several runs
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add checks to the fast functions used with CODEC_FLAGS2_FAST and move
the check for all other functions to before the invalid memory is
accessed. Fixes https://trac.videolan.org/vlc/ticket/9713 with
CODEC_FLAGS2_FAST.
CC: libav-stable@libav.org
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'fb0c9d41d685abb58575c5482ca33b8cd457c5ec':
avutil: remove timer.h include from internal.h
Conflicts:
libavcodec/ffv1dec.c
libavutil/internal.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Added libavutil/timer.h include to all files with {START,STOP}_TIMER.
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Setting fps = 1/timebase is not correct
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| | |
|
| | |
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* rbultje/vp9-simd:
vp9: fix memory corruption if header decoding fails after size change.
vp9/x86: use explicit register for relative stack references.
vp9/x86: iwht4x4 (lossless) mmx.
vp9/x86: 4x4 iadst SIMD (ssse3) variants.
vp9/x86: 8x8 iadst SIMD (ssse3/avx) variants.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Before this patch, we explicitly modify rsp, which isn't necessarily
universally acceptable, since the space under the stack pointer might
be modified in things like signal handlers. Therefore, use an explicit
register to hold the stack pointer relative to the bottom of the stack
(i.e. rsp). This will also clear out valgrind errors about the use of
uninitialized data that started occurring after the idct16x16/ssse3
optimizations were first merged.
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Cycle measurements for intra itxfm_4x4_add on ped1080p.webm:
idct_idct: 66 -> 67 cycles (noise measurement)
idct_iadst: 199 -> 79 cycles
iadst_idct: 165 -> 70 cycles
iadst_iadst: 183 -> 82 cycles
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Cycle measurements for intra itxfm_8x8_add on ped1080p.webm:
idct_idct: 133 -> 135 cycles (noise measurement)
idct_iadst: 900 -> 241 cycles
iadst_idct: 864 -> 215 cycles
iadst_iadst: 973 -> 310 cycles
|
|\ \ \
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* qatar/master:
dxtory: compressed RGB555/RGB565 decoding support
Conflicts:
libavcodec/dxtory.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | | |
|
|\| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* commit '0e1ad2f591b87e944550c15b54e54f8189743289':
dxtory: add more compressed and uncompressed modes
Conflicts:
libavcodec/dxtory.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- The memcpy was completely wrong because
s->prob_ctx[s->framectxid].coef is a [4][2][2][6][6][3] array, whereas
s->prob.coef is a [4][2][2][6][6][11] array.
- The additional check was committed to ffmpeg by Ronald S. Bultje.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Fixes a particular youtube video that I unfortunately can't share.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |/
|/|
| |
| |
| |
| |
| | |
freeing the associated rects.
Signed-off-by: Wim Vander Schelden <lists@fixnum.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Mpeg1/2 should not need it
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
This is needed in case the checked bitstream reader is disabled
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Prevents some overreads at the cost of 1 cpu cycle
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
sandybridge i7 274->260 cycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
No speedloss meassured
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
breaking out on invalid vlcs
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Fixes CID1163850
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|