| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
| |
This fixes the test with mmxext disabled because the current reference
frame hashes correspond to the non-bitexact mmxext optimizations.
|
|
|
|
|
|
| |
This error is treated specially by the API.
CC: libav-stable@libav.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size field in the header/footer accounts for the entire APE tag
structure except the 32 bytes from header, for compatibility with
APEv1.
Signed-off-by: James Almer <jamrial@gmail.com>
CC: libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the spec[1], a value of 0 means the footer is present and a value
of 1 means it's absent, the exact opposite of header presence flag where 1
means present and 0 absent.
The reason for this is compatibility with APEv1 tags, where there's no header,
footer presence was mandatory for all files, and the flags field was a zeroed
reserved field.
[1] http://wiki.hydrogenaud.io/index.php?title=Ape_Tags_Flags
Signed-off-by: James Almer <jamrial@gmail.com>
CC: libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
|
|
|
| |
Currently it incorrectly compares bits with bytes.
Also, move the check right before where it's relevant, so that the
correct number of remaining bits is used.
CC: libav-stable@libav.org
|
|
|
|
| |
avio_skip returns file position and overflows int
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches the order they are in the 16 bpp version.
There they are in this order, to make sure we access them in the
same order they are declared, easing loading only half of the
coefficients at a time.
This makes the 8 bpp version match the 16 bpp version better.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches the order they are in the 16 bpp version.
There they are in this order, to make sure we access them in the
same order they are declared, easing loading only half of the
coefficients at a time.
This makes the 8 bpp version match the 16 bpp version better.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
| |
All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.
This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp version.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
| |
All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.
This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp version.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idct32x32 function actually pushed d8-d15 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.
After this, we still can skip pushing d12-d15.
Before:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
After:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idct32x32 function actually pushed q4-q7 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.
Since the idct16 core transform avoids clobbering q4-q7 (but clobbers
q2-q3 instead, to avoid needing to back up and restore q4-q7 at all
in the idct16 function), and the lanewise vmul needs a register in
the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5
while doing idct16.
While keeping these coefficients in registers, we still can skip pushing
q7.
Before: Cortex A7 A8 A9 A53
vp9_inv_dct_dct_32x32_sub32_add_neon: 18553.8 17182.7 14303.3 12089.7
After:
vp9_inv_dct_dct_32x32_sub32_add_neon: 18470.3 16717.7 14173.6 11860.8
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For this case, with 8 inputs but only changing 4 of them, we can fit
all 16 input pixels into a q register, and still have enough temporary
registers for doing the loop filter.
The wd=8 filters would require too many temporary registers for
processing all 16 pixels at once though.
Before: Cortex A7 A8 A9 A53
vp9_loop_filter_mix2_v_44_16_neon: 289.7 256.2 237.5 181.2
After:
vp9_loop_filter_mix2_v_44_16_neon: 221.2 150.5 177.7 138.0
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
| |
This is one cycle faster in total, and three instructions fewer.
Before:
vp9_loop_filter_mix2_v_44_16_neon: 123.2
After:
vp9_loop_filter_mix2_v_44_16_neon: 122.2
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The theoretical maximum value of E is 193, so we can just
saturate the addition to 255.
Before: Cortex A7 A8 A9 A53 A53/AArch64
vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7
vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7
vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7
vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0
After:
vp9_loop_filter_v_4_8_neon: 136.0 125.7 112.6 84.0 83.0
vp9_loop_filter_v_8_8_neon: 234.0 195.5 171.5 136.0 133.7
vp9_loop_filter_v_16_8_neon: 490.0 417.5 377.7 289.0 271.0
vp9_loop_filter_v_16_16_neon: 951.2 814.7 732.3 571.0 446.7
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
libavcodec/vaapi.h:58:1: warning: attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ts_offset was added to cluster timecode, but then effectively subtracted
back off the block timecode
When setting initial_padding for an audio stream, the timestamps are
written incorrectly to the mkv file. cluster timecode gets written
as pts0 + ts_offset which is correct, but then block timecode gets
written as pts - cluster timecode which expanded is
pts - (pts0 + ts_offset). Adding cluster and block tc back together:
cluster + block = (pts0 + ts_offset) + (pts - (pts0 + ts_offset)) = pts
But the result should be pts + ts_offset since demux will subtract the
CodecDelay element from pts and set initial_padding to CodecDelay.
This patch gives the correct result.
|
|
|
|
| |
This unclutters the top-level directory and groups related files together.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Have check_pkg_config() enable variables and set cflags and extralibs
instead of relegating that task to require_pkg_config. This simplifies
require_pkg_config(), is consistent with what other helper functions
like check_lib() do and allows getting rid of some manual variable
setting in places where check_pkg_config() is used.
|
|
|
|
| |
This is less confusing than encountering "" in the argument list.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This was broken by 4e528206bc4d968706401206cf54471739250ec7 - the webp
decoder was assuming that it could set the output pixfmt of the vp8
decoder directly, but after that change it no longer could because
ff_get_format() was used instead. This adds an internal get_format()
callback to webp use of the vp8 decoder to override the pixfmt
appropriately.
|
|
|
|
|
|
| |
The Intel proprietary VAAPI driver enforces the restriction that a
buffer must be created inside an existing context, so just ensure
this is always true.
|
|
|
|
|
| |
Previously this was leaking, though it actually hit an assert making
sure that the buffer had already been cleared when freeing the picture.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When slice_h is rounded up due to chroma subsampling, there's
a risk that jobnr * slice_h exceeds frame->height.
Prior to a638e9184d63, this wasn't an issue for the last slice
of a frame, since slice_end was set to frame->height for the last
slice.
a638e9184d63 tried to fix the case where other slices than the
last one would exceed frame->height (which can happen where the
number of slices/threads is very large compared to the frame
height).
However, the fix in a638e9184d63 instead broke other cases,
where slice_h * nb_threads < frame->height. Therefore, make
sure the last slice always ends at frame->height.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
|
|
|
|
|
|
| |
This fixes building with clang for linux with PIC enabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
| |
If the stream timebase is coarser than the muxing timebase then the
monotonisation process may fail because adding one to the timestamp
need not actually produce a different timestamp after the rescale.
|
|
|
|
|
|
| |
This avoids a lot of boilerplate code within the decoder wrapper itself.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
| |
Some muxers may use the BMP_HEADER Format Data size instead
of the ASF-specific one.
Bug-Id: 1020
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
|
|
|
| |
Newer versions of libxcb have xcb-foo pkg-config files that do not declare
their xcb dependency so that required linker flags will not be generated.
|
|
|
|
|
|
|
| |
Bug-Id: 1017
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
|
|
|
|
|
|
| |
The driver is somewhat bitrotten (not updated for years) but is still
usable for decoding with this change. To support it, this adds a new
driver quirk to indicate no support at all for surface attributes.
Based on a patch by wm4 <nfxjfg@googlemail.com>.
|
|
|
|
|
| |
In this case, the user only supplies a device and the frame context
is allocated internally by lavc.
|
|
|
|
| |
For use by codec implementations which can allocate frames internally.
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
| |
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
| |
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
|