summaryrefslogtreecommitdiff
path: root/libavutil
Commit message (Collapse)AuthorAge
* avutil/dict: Error out in case of key == NULLAndreas Rheinhardt2022-09-19
| | | | | | | | | | | | | | | | | | | Up until now, using NULL as key in av_dict_get() on a non-empty AVDictionary would crash; using NULL as key in av_dict_set() would also crash for a non-empty AVDictionary unless AV_DICT_MULTIKEY was set; in case the dictionary was initially empty or AV_DICT_MULTIKEY was set, it was even possible for av_dict_set() to succeed when adding a NULL key, namely when one uses a value != NULL and the AV_DICT_DONT_STRDUP_VAL flag. Using av_dict_get() on such an AVDictionary will usually lead to crashes, though. Fix this by actually checking for key in both functions; error out if they are NULL. While just at it, also stop relying on av_strdup(NULL) to return NULL in av_dict_set(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* x86/tx_float: add asm call versions of the 2pt and 4pt transformsLynne2022-09-19
| | | | Verified to be working.
* x86/tx_float: fully support 128bit regs in LOAD64_LUTLynne2022-09-19
| | | | | The gather path didn't support 128bit registers. It's not faster on Zen 3, but it's here for completeness.
* x86/tx_float: simplify and describe the intra-asm call conventionLynne2022-09-19
|
* lavu/pixdesc: favour formats where depth and subsampling exactly matchPhilip Langdale2022-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since introducing the various packed formats used by VAAPI (and p012), we've noticed that there's actually a gap in how av_find_best_pix_fmt_of_2 works. It doesn't actually assign any value to having the same bit depth as the source format, when comparing against formats with a higher bit depth. This usually doesn't matter, because av_get_padded_bits_per_pixel() will account for it. However, as many of these formats use padding internally, we find that av_get_padded_bits_per_pixel() actually returns the same value for the 10 bit, 12 bit, 16 bit flavours, etc. In these tied situations, we end up just picking the first of the two provided formats, even if the second one should be preferred because it matches the actual bit depth. This bug already existed if you tried to compare yuv420p10 against p016 and p010, for example, but it simply hadn't come up before so we never noticed. But now, we actually got a situation in the VAAPI VP9 decoder where it offers both p010 and p012 because Profile 3 could be either depth and ends up picking p012 for 10 bit content due to the ordering of the testing. In addition, in the process of testing the fix, I realised we have the same gap when it comes to chroma subsampling - we do not favour a format that has exactly the same subsampling vs one with less subsampling when all else is equal. To fix this, I'm introducing a small score penalty if the bit depth or subsampling doesn't exactly match the source format. This will break the tie in favour of the format with the exact match, but not offset any of the other scoring penalties we already have. I have added a set of tests around these formats which will fail without this fix.
* lavu/riscv: fix off-by-one in bit-magnitude clipRémi Denis-Courmont2022-09-15
|
* avutil/lfg: fix comment typoRémi Denis-Courmont2022-09-15
|
* lavu/riscv: fix av_clip_int16Rémi Denis-Courmont2022-09-14
| | | | | | Some serious copy-paste / squash / rebase mismanipulation here. Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/dict: Improve appending valuesAndreas Rheinhardt2022-09-14
| | | | | | | | | | | | | | When appending two values (due to AV_DICT_APPEND), the earlier code would first zero-allocate a buffer of the required size and then copy both parts into it via av_strlcat(). This is problematic, as it leads to quadratic performance in case of frequent enlargements. Fix this by using av_realloc() (which is hopefully designed to handle such cases in a better way than simply throwing the buffer we already have away) and by copying the string via memcpy() (after all, we already calculated the strlen of both strings). Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avutil/dict: Fix memleak when using AV_DICT_APPENDAndreas Rheinhardt2022-09-14
| | | | | | | | | | | | | If a key already exists in an AVDictionary and the AV_DICT_APPEND flag is set, the old entry is at first discarded from the dictionary, but a pointer to the value is kept. Lateron enough memory to store the appended string is allocated; should this allocation fail, the old string is not freed and hence leaks. This commit changes this by moving creating the combined value to an earlier point in the function, which also ensures that the AVDictionary is unchanged in case of errors. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avutil/dict: Avoid check whose result is known in advanceAndreas Rheinhardt2022-09-14
| | | | | | | | | We know that an AVDictionary is not empty if we have just added an entry to it, so only check for it being empty on the branch that does not do so. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* Revert "avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx"Andreas Rheinhardt2022-09-14
| | | | | | | | This reverts commit 2c8dc7e953e532752500e8145aa1ceee908bda2f. The loongarch headers have been fixed, so that this wrapper is no longer necessary. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* Revert "avcodec/loongarch: Add wrapper for __lsx_vldx"Andreas Rheinhardt2022-09-14
| | | | | | | | This reverts commit 6c9a60ada4256cf5c388d8dc48860e24c15396c0. The loongarch headers have been fixed, so that this workaround is no longer necessary. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* lavu/riscv: add <intmath.h> optimisationsRémi Denis-Courmont2022-09-13
| | | | | This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension.
* lavu/riscv: byte-swap operationsRémi Denis-Courmont2022-09-13
| | | | | | | | | | | | | If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is not supported, then this patchset does nothing. Support for run-time detection is left for the future. Currently, there are no bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to do this.
* lavu/riscv: AV_READ_TIME cycle counterRémi Denis-Courmont2022-09-13
| | | | | | | | | | | This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessary to detect and fix up the race condition where the bottom half wraps exactly between the two reads.
* x86/float_dsp: use three operand form for some instructionsJames Almer2022-09-13
| | | | | | Fixes compilation with old yasm Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/x86/float_dsp: add fma3 for scalarproductPaul B Mahol2022-09-13
|
* avutil/x86/intreadwrite: Add ability to detect whether MMX code is usedAndreas Rheinhardt2022-09-11
| | | | | | It can be used to call emms_c() only when needed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* lavu/tx: remove av_cold from table definitionsLynne2022-09-11
| | | | How did this get here?
* lavu/tx: rotate 3 & 15-point exptabsLynne2022-09-10
| | | | This just inverts their signs. Simplifies SIMD.
* lavu/tx: generalize MDCTsLynne2022-09-10
| | | | The same code can perform any-length MDCTs with minimal changes.
* lavu/tx: add the inplace flag to PFA FFTsLynne2022-09-10
| | | | They support in-place, because they have to use a temporary buffer.
* lavu/tx: propagate the codelet flags into the contextLynne2022-09-10
| | | | The field is documented as a combination of both.
* lavu/hwcontext_qsv: add support for AV_PIX_FMT_VUYX on LinuxHaihao Xiang2022-09-07
| | | | | | | | | | | AV_PIX_FMT_VUYX is used for 8bit 4:4:4 content in FFmpeg VAAPI, so AV_PIX_FMT_VUYX should be used for 8bit 4:4:4 content in FFmpeg QSV too because QSV is based on VAAPI on Linux. However the SDK only declares support for AYUV and does nothing with the alpha, so this commit fudged a mapping between AV_PIX_FMT_VUYX and MFX_FOURCC_AYUV. Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
* x86/tx_float: add missing check for AVX2James Almer2022-09-06
| | | | | | Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/tx_float: set all operands for shufpsJames Almer2022-09-06
| | | | | | Fixes compilation with AVX2 enabled yasm. Signed-off-by: James Almer <jamrial@gmail.com>
* slicethread: Limit the automatic number of threads to 16Martin Storsjö2022-09-06
| | | | | | | | | | | This matches a similar cap on the number of automatic threads in libavcodec/pthread_slice.c. On systems with lots of cores, this fixes a couple fate failures in 32 bit mode on such machines (where spawning a huge number of threads runs out of address space). Signed-off-by: Martin Storsjö <martin@martin.st>
* x86/tx_float: Fix building for platforms with a symbol prefixMartin Storsjö2022-09-06
| | | | | | | This fixes building for x86 macOS (both i386 and x86_64) and i386 windows. Signed-off-by: Martin Storsjö <martin@martin.st>
* aarch64/tx_float: fix compilationLynne2022-09-06
| | | | Forgot to add the new function arguments.
* x86/tx_float: implement inverse MDCT AVX2 assemblyLynne2022-09-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit implements an iMDCT in pure assembly. This is capable of processing any mod-8 transforms, rather than just power of two, but since power of two is all we have assembly for currently, that's what's supported. It would really benefit if we could somehow use the C code to decide which function to jump into, but exposing function labels from assebly into C is anything but easy. The post-transform loop could probably be improved. This was somewhat annoying to write, as we must support arbitrary strides during runtime. There's a fast branch for stride == 4 bytes and a slower one which uses vgatherdps. Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx): 128pt: 2811 decicycles in av_tx (imdct),16775916 runs, 1300 skips 3082 decicycles in av_imdct_half,16776751 runs, 465 skips 256pt: 4920 decicycles in av_tx (imdct),16775820 runs, 1396 skips 5378 decicycles in av_imdct_half,16776411 runs, 805 skips 512pt: 9668 decicycles in av_tx (imdct),16775774 runs, 1442 skips 10626 decicycles in av_imdct_half,16775647 runs, 1569 skips 1024pt: 19812 decicycles in av_tx (imdct),16777144 runs, 72 skips 23036 decicycles in av_imdct_half,16777167 runs, 49 skips
* x86/tx_float: add support for calling assembly functions from assemblyLynne2022-09-06
| | | | | | Needed for the next patch. We get this for the extremely small cost of a branch on _ns functions, which wouldn't be used anyway with assembly.
* lavu/fifo: clarify interaction of AV_FIFO_FLAG_AUTO_GROW with av_fifo_write()Anton Khirnov2022-09-05
|
* lavu/fifo: clarify interaction of AV_FIFO_FLAG_AUTO_GROW with ↵Anton Khirnov2022-09-05
| | | | av_fifo_can_write()
* lavu/fifo: add the header to its own doxy groupAnton Khirnov2022-09-05
| | | | | Also, drop mentions of it being a circular buffer, as this is an internal implementation detail that should be invisible to the caller.
* avutil/tests/.gitignore: Add channel_layout testtoolAndreas Rheinhardt2022-09-05
| | | | | Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* lavu/hwcontext_vulkan: support mapping VUYX, P012, and XV36Philip Langdale2022-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | If we want to be able to map between VAAPI and Vulkan (to do Vulkan filtering), we need to have matching formats on each side. The mappings here are not exact. In the same way that P010 is still mapped to full 16 bit formats, P012 has to be mapped that way as well. Similarly, VUYX has to be mapped to an alpha-equipped format, and XV36 has to be mapped to a fully 16bit alpha-equipped format. While Vulkan seems to fundamentally lack formats with an undefined, but physically present, alpha channel, it has have 10X6 and 12X4 formats that you could imagine using for P010, P012 and XV36, but these formats don't support the STORAGE usage flag. Today, hwcontext_vulkan requires all formats to be storable because it wants to be able to use them to create writable images. Until that changes, which might happen, we have to restrict the set of formats we use. Finally, when mapping a Vulkan image back to vaapi, I observed that the VK_FORMAT_R16G16B16A16_UNORM format we have to use for XV36 going to Vulkan is mapped to Y416 when going to vaapi (which makes sense as it's the exact matching format) so I had to add an entry for it even though we don't use it directly.
* lavc/vaapi: Add support for remaining 10/12bit profilesPhilip Langdale2022-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the necessary pixel formats defined, we can now expose support for the remaining 10/12bit combinations that VAAPI can handle. Specifically, we are adding support for: * HEVC ** 12bit 420 ** 10bit 422 ** 12bit 422 ** 10bit 444 ** 12bit 444 * VP9 ** 10bit 444 ** 12bit 444 These obviously require actual hardware support to be usable, but where that exists, it is now enabled. Note that unlike YUVA/YUVX, the Intel driver does not formally expose support for the alphaless formats XV30 and XV360, and so we are implicitly discarding the alpha from the decoder and passing undefined values for the alpha to the encoder. If a future encoder iteration was to actually do something with the alpha bits, we would need to use a formal alpha capable format or the encoder would need to explicitly accept the alphaless format.
* lavu/pixfmt: Add P012, Y212, XV30, and XV36 formatsPhilip Langdale2022-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are the formats we want/need to use when dealing with the Intel VAAPI decoder for 12bit 4:2:0, 12bit 4:2:2, 10bit 4:4:4 and 12bit 4:4:4 respectively. As with the already supported Y210 and YUVX (XVUY) formats, they are based on formats Microsoft picked as their preferred 4:2:2 and 4:4:4 video formats, and Intel ran with it. P12 and Y212 are simply an extension of 10 bit formats to say 12 bits will be used, with 4 unused bits instead of 6. XV30, and XV36, as exotic as they sound, are variants of Y410 and Y412 where the alpha channel is left formally undefined. We prefer these over the alpha versions because the hardware cannot actually do anything with the alpha channel and respecting it is just overhead. Y412/XV46 is a normal looking packed 4 channel format where each channel is 16bits wide but only the 12msb are used (like P012). Y410/XV30 packs three 10bit channels in 32bits with 2bits of alpha, like A/X2RGB10 style formats. This annoying layout forced me to define the BE version as a bitstream format. It seems like our pixdesc infrastructure can handle the LE version being byte-defined, but not when it's reversed. If there's a better way to handle this, please let me know. Our existing X2 formats all have the 2 bits at the MSB end, but this format places them at the LSB end and that seems to be the root of the problem.
* arm: relax byte-swap assembler constraintsRémi Denis-Courmont2022-09-03
| | | | | | | | | | | There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect the same register for both operands either way. Signed-off-by: Martin Storsjö <martin@martin.st>
* aarch64: relax byte-swap assembler constraintsRémi Denis-Courmont2022-09-03
| | | | | | | | | | | There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect the same register for both operands either way. Signed-off-by: Martin Storsjö <martin@martin.st>
* avcodec/codec_internal: Add macros for update_thread_context(_for_user)Andreas Rheinhardt2022-09-03
| | | | | | | | | It reduces typing: Before this patch, there were 11 callbacks that exceeded the 80 char line length limit; now there are zero. It also allows to remove ONLY_IF_THREADS_ENABLED() in libavutil/internal.h. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/codec_internal: Add macro to set AVCodec.long_nameAndreas Rheinhardt2022-09-03
| | | | | | | | It reduces typing: Before this patch, there were 105 codecs whose long_name-definition exceeded the 80 char line length limit. Now there are only nine of them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avutil/file: Properly deprecate av_tempfile()Andreas Rheinhardt2022-09-03
| | | | | | | | | | It has been deprecated in b4f59beeb4c2171879d0d7607a4a7d6165f07791, but the attribute_deprecated was not set and there was no entry in APIchanges. This commit adds these and schedules it for removal. Given that the reason behind the deprecation is exactly the same as in av_fopen_utf8(), reuse its FF_API_AV_FOPEN_UTF8. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avutil/internal: Move avpriv-file API to a header of its ownAndreas Rheinhardt2022-09-03
| | | | | | | It is not used by the large majority of files that include lavu/internal.h. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avutil/dict: Move avpriv_dict_set_timestamp() to a header of its ownAndreas Rheinhardt2022-09-03
| | | | | | | It is used almost nowhere, so it needn't be auto-included almost everywhere. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avutil/internal: Remove unused FF_SYMVERAndreas Rheinhardt2022-09-03
| | | | | | | | | | They are unused since d63443b9684fa7b3e086634f7b44b203b6d9221e. Furthermore, they were always in the wrong header: libavutil/internal.h is auto-included almost everywhere, but FF_SYMVER would only ever be used at a few places, so a proper header of its own would be appropriate for it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avutil/internal: Remove unused ff_rint64_clip()Andreas Rheinhardt2022-09-03
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* libavcodec: Set hidden visibility on global symbols accessed from AArch64 ↵Martin Storsjö2022-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | assembly The AArch64 assembly accesses those symbols directly, without indirection via e.g. the GOT on ELF. In order for this not to require text relocations, those symbols need to be resolved fully at link time, i.e. those symbols can't be interposable. Normally, so far, this is achieved when linking shared libraries in two ways; we have a version script (libavcodec/libavcodec.v) which marks all symbols that don't start with av* as local. Additionally, we try to add -Wl,-Bsymbolic to the linker options if supported, making sure that such symbol references are resolved fully at link time, instead of making them interposable. When the libavcodec static library is linked into another shared library, there's no guarantee that it uses similar options (even though that would be favourable), which would end up requiring text relocations in the AArch64 assembly. Explicitly mark the symbols that are accessed from AArch64 assembly as hidden, so that they are resolved fully at link time even without the version script and -Wl,-Bsymbolic. Signed-off-by: Martin Storsjö <martin@martin.st>
* arm: Check the build time constants in av_clip_*intp2Martin Storsjö2022-09-02
| | | | | | This fixes building for arm targets with optimizations disabled. Signed-off-by: Martin Storsjö <martin@martin.st>