summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAge
...
* avcodec/ffv1enc: Don't create and keep unnecessary referenceAndreas Rheinhardt2022-08-18
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/get_buffer: Don't get AVPixFmtDescriptor unnecessarilyAndreas Rheinhardt2022-08-18
| | | | | | | | | It is unused since 3575a495f6dcc395656343380e13c57d48b9f976 (and the error message is dangerous: av_get_pix_fmt_name(format) returns NULL iff av_pix_fmt_desc_get(format) returns NULL and using a NULL string for %s would be UB). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/mpegpicture: Reset fields explicitly instead of memsetting themAndreas Rheinhardt2022-08-18
| | | | | | | | Improves the grepability of the code. (Furthermore, I hope that no compiler will really call memset for 28 bytes.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/h263dec: Don't set frame parameters redundantlyAndreas Rheinhardt2022-08-18
| | | | | | | This frame will be reset later in ff_mpv_frame_start() anyway. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/h263dec: Remove redundant code to set cur_pic_ptrAndreas Rheinhardt2022-08-18
| | | | | | | | | | | | | | It is done later in ff_mpv_frame_start() (and nobody uses current_picture_ptr between setting it in ff_mpv_frame_start()). (The reason the vsynth*-h263-obmc ref files change is because the call to ff_find_unused_picture() now happens after the older pictures have been unreferenced in ff_mpv_frame_start(), so that their slots in the picture array can be immediately reused; the obmc code is somehow buggy and changes its output depending on the earlier contents of the motion_val buffer.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* checkasm/sw_scale: hscale does not requires cpuflag test.Alan Kelly2022-08-18
| | | | | | This is done in ff_shuffle_filter_coefficients. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* libswscale: Enable hscale_avx2 for all input sizes.Alan Kelly2022-08-18
| | | | | | ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* sws: allow avx2 hscale to process inputs of any size.Alan Kelly2022-08-18
| | | | | | | The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxextAlan Kelly2022-08-18
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* lavc/aarch64: hevc_add_res add 12bit variantsJ. Dekker2022-08-18
| | | | | | | | | | | | | hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0 Signed-off-by: J. Dekker <jdek@itanimul.li>
* aarch64: me_cmp: Remove a leftover unnecessary instructionMartin Storsjö2022-08-18
| | | | | | This was missed in a2e45ad407c526cd5ce2f3a361fb98084228cd6e. Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc/aarch64: Add neon implementation for pix_abs8Hubert Mazur2022-08-18
| | | | | | | | | | | | | | Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below. - pix_abs_1_0_c: 101.2 - pix_abs_1_0_neon: 22.5 - sad_1_c: 101.2 - sad_1_neon: 22.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc/aarch64: Add neon implementation for sse8Hubert Mazur2022-08-18
| | | | | | | | | | | | | Provide optimized implementation of sse8 function for arm64. Performance comparison tests are shown below. - sse_1_c: 130.7 - sse_1_neon: 29.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc/aarch64: Add neon implementation for pix_abs16_y2Hubert Mazur2022-08-18
| | | | | | | | | | | | | Provide optimized implementation of pix_abs16_y2 function for arm64. Performance comparison tests are shown below. pix_abs_0_2_c: 317.2 pix_abs_0_2_neon: 37.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc/aarch64: Add neon implementation for sse4Hubert Mazur2022-08-18
| | | | | | | | | | | | | Provide neon implementation for sse4 function. Performance comparison tests are shown below. - sse_2_c: 80.7 - sse_2_neon: 31.0 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc/aarch64: Add neon implementation for sse16Hubert Mazur2022-08-18
| | | | | | | | | | | | | Provide neon implementation for sse16 function. Performance comparison tests are shown below. - sse_0_c: 268.2 - sse_0_neon: 43.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* aarch64: me_cmp: Fix the indentation of function declarationsMartin Storsjö2022-08-18
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* ffprobe: restore reporting error code for failed inputsGyan Doshi2022-08-17
| | | | | | c11fb46731 led to a regression whereby the return code for missing input or input probe is overridden by writer close return code and hence not conveyed in the exit code.
* avcodec/me_cmp: Remove now incorrect av_assert2()Andreas Rheinhardt2022-08-17
| | | | | | | | | | | | Since d69d12a5b9236b9d2f1fd247ea452f84cdd1aaf9 these av_assert2() (or more exactly, the ones in hadamard8_diff8x8_c() and hadamard8_intra8x8_c()) are hit. So just remove all of these asserts. (If the test were improved to know which functions expect h == 8 and which support any value, the asserts could be readded at the appropriate places.) Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* tools: Make sure to create the tools directory before building decode_simple.oMartin Storsjö2022-08-17
| | | | | | | | | | | This directory dependency is normally added implicitly by rules in ffbuild/common.mak; for tools it's created by a rule for TOOLOBJS. TOOLOBJS is populated implicitly from TOOLS, and decode_simple.o doesn't end up there because it's an odd occurrance of a lone object file in the tools subdirectory, not belonging to any other tool. Signed-off-by: Martin Storsjö <martin@martin.st>
* checkasm: motion: Test different h parametersMartin Storsjö2022-08-17
| | | | | | | | | | | | | | | | | | Previously, the checkasm test always passed h=8, so no other cases were tested. Out of the me_cmp functions, in practice, some functions are hardcoded to always assume a 8x8 block (ignoring the h parameter), while others do use the parameter. For those with hardcoded height, both the reference C function and the assembly implementations ignore the parameter similarly. The documentation for the functions indicate that heights between w/2 and 2*w, within the range of 4 to 16, should be supported. This patch just tests random heights in that range, without knowing what width the current function actually uses. Signed-off-by: Martin Storsjö <martin@martin.st>
* x86: Don't hardcode the height to 8 in sad8_xy2_mmxMartin Storsjö2022-08-17
| | | | | | | | | | | | The height is hardcoded in some of the me_cmp functions, but not in all of them. But in the case of all other functions, it's hardcoded in the same place in SIMD functions as in the C reference functions, while this one function differs from the behaviour of the C code. (Before 542765ce3eccbca587d54262a512cbdb1407230d, there were a couple other sad8_*_mmx functions with similar hardcoded height.) Signed-off-by: Martin Storsjö <martin@martin.st>
* checkasm: Provide enough alignment in the new yuv2plane1 testMartin Storsjö2022-08-16
| | | | | | This fixes the checkasm test in some setups on x86. Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc/aarch64: reformat add_res funcsJ. Dekker2022-08-16
| | | | Signed-off-by: J. Dekker <jdek@itanimul.li>
* checkasm/hevc_add_res: add 12bit testJ. Dekker2022-08-16
| | | | | | | Also fix the bug where in every other byte only the lower 2 bits were used in the 8bit test. Signed-off-by: J. Dekker <jdek@itanimul.li>
* swscale/aarch64: add vscale specializationsSwinney, Jonathan2022-08-16
| | | | | | | | | | | | | | | | This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. On AWS c7g (Graviton 3, Neoverse V1) instances: before after yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9 yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9 yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* swscale/aarch64: vscale optimizationSwinney, Jonathan2022-08-16
| | | | | | | | | | | | | Use scalar times vector multiply accumlate instructions instead of vector times vector to remove the need for replicating load instructions which are slightly slower. On AWS c7g (Graviton 3, Neoverse V1) instances: yuv2yuvX_8_0_512_accurate_neon: 1144.8 987.4 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* checkasm: updated tests for sw_scaleSwinney, Jonathan2022-08-16
| | | | | | | | | | | | | | | | | Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* doc/APIchanges: add missing rgbaf16 pixfmt entryTimo Rothenpieler2022-08-16
|
* fftools/ffmpeg: store a separate copy of input codec parametersAnton Khirnov2022-08-16
| | | | | | | | | | | Use it instead of AVStream.codecpar in the main thread. While AVStream.codecpar is documented to only be updated when the stream is added or avformat_find_stream_info(), it is actually updated during demuxing. Accessing it from a different thread then constitutes a race. Ideally, some mechanism should eventually be provided for signalling parameter updates to the user. Then the demuxing thread could pick up the changes and propagate them to the decoder.
* libswscale/aarch64: add another hscale specializationSwinney, Jonathan2022-08-16
| | | | | | | | | | | | This specialization handles the case where filtersize is 4 mod 8, e.g. 12, 20, etc. Aarch64 was previously using the c function for this case. This implementation speeds up that case significantly. hscale_8_to_15__fs_12_dstW_512_c: 6234.1 hscale_8_to_15__fs_12_dstW_512_neon: 1505.6 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* avformat/mov: fix encryption index in the case of multiple trunZhao Zhili2022-08-16
| | | | | | | | | | frag_stream_info->index_entry isn't the first sample/trun index. cenc.frag_index_entry_base failed to catch the case since current_index > 0. Fix ticket #9807. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* avformat/mov: fix frag_index.current out of syncZhao Zhili2022-08-16
| | | | | | | | | frag_index.current is used by cenc_filter, and is updated inside mov_read_moof. It can out of sync regarding to mov_read_packet. Partly fix ticket #9807. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* lavu/tx: optimize and simplify inverse MDCTsLynne2022-08-16
| | | | | | | | | | | | | | | | Convert the input from a scatter to a gather instead, which is faster and better for SIMD. Also, add a pre-shuffled exptab version to avoid gathering there at all. This doubles the exptab size, but the speedup makes it worth it. In SIMD, the exptab will likely be purged to a higher cache anyway because of the FFT in the middle, and the amount of loads stays identical. For a 960-point inverse MDCT, the speedup is 10%. This makes it possible to write sane and fast SIMD versions of inverse MDCTs.
* ipfsgateway: Remove default gatewayDerek Buitenhuis2022-08-15
| | | | | | | A gateway can see everything, and we should not be shipping a hardcoded default from a third party company; it's a security risk. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* avcodec/mpegvideo: Don't zero unnecessarilyAndreas Rheinhardt2022-08-15
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/mpegvideodec: Constify some functionsAndreas Rheinhardt2022-08-15
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/mpegpicture: Don't copy unnecessarily, fix raceAndreas Rheinhardt2022-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mpegvideo uses an array of Pictures and when it is done with using them, it only unreferences them incompletely: Some buffers are kept so that they can be reused lateron if the same slot in the Picture array is reused, making this a sort of a bufferpool. (Basically, a Picture is considered used if the AVFrame's buf is set.) Yet given that other pieces of the decoder may have a reference to these buffers, they need not be writable and are made writable using av_buffer_make_writable() when preparing a new Picture. This involves reading the buffer's data, although the old content of the buffer need not be retained. Worse, this read can be racy, because the buffer can be used by another thread at the same time. This happens for Real Video 3 and 4. This commit fixes this race by no longer copying the data; instead the old buffer is replaced by a new, zero-allocated buffer. (Here are the details of what happens with three or more decoding threads when decoding rv30.rm from the FATE-suite as happens in the rv30 test: The first decoding thread uses the first slot of its picture array to store its current pic; update_thread_context copies this for the second thread that decodes a P-frame. It uses the second slot in its Picture array to store its P-frame. This arrangement is then copied to the third decode thread, which decodes a B-frame. It uses the third slot in its Picture array for its current frame. update_thread_context copies this to the next thread. It unreferences the third slot containing the other B-frame and then it reuses this slot for its current frame. Because the pic array slots are only incompletely unreferenced, the buffers of the previous B-frame are still in there and they are not writable; in fact the previous thread is concurrently writing to them, causing races when making the buffer writable.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/avcodec: Remove redundant checkAndreas Rheinhardt2022-08-15
| | | | | | | | At this point active_thread_type is set iff active_thread_type is set to FF_THREAD_FRAME iff AVCodecInternal.frame_thread_encoder is set. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/avcodec: Move initializing frame-thrd encoder to encode_preinitAndreas Rheinhardt2022-08-15
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avfilter/vsrc_ddagrab: add options for more control over output format fallbackTimo Rothenpieler2022-08-13
|
* avfilter/vsrc_ddagrab: add rgbaf16 output supportTimo Rothenpieler2022-08-13
|
* avutil/hwcontext_d3d11va: add support for rgbaf16 pixel formatTimo Rothenpieler2022-08-13
|
* lavu/pixfmt: add packed RGBA float16 formatTimo Rothenpieler2022-08-13
| | | | | This is the default format of the Windows compositor and what DXGI Desktop Duplication will give you for any kind of HDR output.
* compat: add msvc windres wrapperTimo Rothenpieler2022-08-13
| | | | | This is by no means a complete wrapper. It's only designed to fit the usecase ffmpegs build system has.
* fftools: add DPI awareness manifestTimo Rothenpieler2022-08-13
| | | | | Some filters, like gdigrab, rely on this to be set to see and report proper dimensions.
* configure: always enable gnu_windres if availableTimo Rothenpieler2022-08-13
| | | | | Use the appropiate Makefile variable to ensure the resource file is only built into shared libraries instead.
* fftools/ffmpeg: move packet timestamp processing to demuxer threadAnton Khirnov2022-08-13
| | | | | | | | | Discontinuity detection/correction is left in the main thread, as it is entangled with InputStream.next_dts and related variables, which may be set by decoding code. Fixes races e.g. in fate-ffmpeg-streamloop after aae9de0cb2887e6e0bbfda6ffdf85ab77d3390f0.
* fftools/ffmpeg: use a separate variable for discontinuity offsetAnton Khirnov2022-08-13
| | | | | | This will allow to move normal offset handling to demuxer thread, since discontinuities currently have to be processed in the main thread, as the code uses some decoder-produced values.
* fftools/ffmpeg: simplify conditions in ts_discontinuity_processAnton Khirnov2022-08-13
|