summaryrefslogtreecommitdiff
path: root/libswscale
Commit message (Collapse)AuthorAge
* libswscale/x86/yuv2rgb: add ssse3 versionTing Fu2020-02-10
| | | | | | | | | | Tested using this command: /ffmpeg -pix_fmt yuv420p -s 1920*1080 -i ArashRawYuv420.yuv \ -vcodec rawvideo -s 1920*1080 -pix_fmt rgb24 -f null /dev/null The fps increase from 389 to 640 on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz Signed-off-by: Ting Fu <ting.fu@intel.com>
* libswscale/utils.c: Fix bug #8255Gautam Ramakrishnan2020-02-09
| | | | | | | | | | Bug #8255 points out a double free error in libwscale/utils.c file. The double free is because the pointer to cascaded_context of an sw_context is not set to NULL after freeing it. When the sw_context is later freed, sws_freeContext is called on the cascaded_context, causing a double free. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libswscale/x86/yuv2rgb: Change inline assembly into nasm codeTing Fu2020-02-05
| | | | | | | | The original inline assembly and nasm code have the same fps when called by command. NASM code almost has no impact on the perfromance. Signed-off-by: Ting Fu <ting.fu@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/input: Fix several invalid shifts related to rgb2yuv constantsMichael Niedermayer2020-01-22
| | | | | | | | Fixes: Invalid shifts Fixes: #8140 Fixes: #8146 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/output: Fix several invalid shifts in yuv2rgb_full_1_c_template()Michael Niedermayer2020-01-22
| | | | | | | | Fixes: Invalid shifts Fixes: #8320 Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/swscale: Fix several invalid shifts related to vChrDropMichael Niedermayer2020-01-22
| | | | | | | | | Fixes: Invalid shifts Fixes: #8166 Fixes: filter-crop_scale_vflip FATE-test Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* Silence "string-plus-int" warning shown by clang.Carl Eugen Hoyos2020-01-06
| | | | libswscale/utils.c:89:42: warning: adding 'unsigned long' to a string does not append to the string [-Wstring-plus-int]
* swscale/aarch64: use multiply accumulate and shift-right narrowSebastian Pop2020-01-04
| | | | | | | | | | | | | | | | | | | This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and horizontal adds by using fused multiply adds. The patch also uses ld1r to load one element and replicate it across all lanes of the vector. The patch also improves the clipping code by removing the shift right instructions and performing the shift with the shift-right narrow instructions. I see 8% difference on an m6g instance with neoverse-n1 CPUs: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971 after: t:0.012985 avg:0.013013 max:0.013996 min:0.012818 Tested with `make check` on aarch64-linux. Signed-off-by: Sebastian Pop <spop@amazon.com> Reviewed-by: Clément Bœsch <u@pkh.me> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/utils: remove access of AV_PIX_FMT_NBZhao Zhili2019-12-31
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/aarch64: use multiply accumulate and increase vector factor to 4Sebastian Pop2019-12-17
| | | | | | | | | | | | | | | | | | | | | This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate and bumps the vectorization factor from 2 to 4. The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214 after: t:0.032168 avg:0.032215 max:0.033081 min:0.032146 The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181 after: t:0.014015 avg:0.014096 max:0.015018 min:0.013971 Tested with `make check` on aarch64-linux. Signed-off-by: Sebastian Pop <spop@amazon.com> Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/swscale_unscaled: add AV_PIX_FMT_GBRAP10 for LE and BE conversion ↵Limin Wang2019-12-10
| | | | | | | wrapper Signed-off-by: Limin Wang <lance.lmwang@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libswscale/swscale_unscaled.c: remove redundant codeTing Fu2019-12-06
| | | | | Signed-off-by: Ting Fu <ting.fu@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/swscale_unscaled: fix gbrap10be md5 different on big endian systemLimin Wang2019-11-01
| | | | | | | | | | | | | | | | You can reproduce it by below command: ./ffmpeg -f lavfi -i "testsrc=duration=1:rate=30" -vf format=gbrap10 -vcodec rawvideo \ -pix_fmt gbrap10le -flags +bitexact -sws_flags +accurate_rnd+bitexact -fflags +bitexact \ -frames:v 1 -f nut md5: little-endian: f91e2edd8098276579c1929e5e160416 big-endian: ba4d011dbbdc78ccbf6cc7d698630929 Signed-off-by: Limin Wang <lance.lmwang@gmail.com> Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/output: Avoid 64bit in Alpha in yuv2ya16_X_c_template()Michael Niedermayer2019-10-16
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/output: Correct Alpha in yuv2ya16_X_c_template()Michael Niedermayer2019-10-16
| | | | | | Untested, no testcase Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/output: Implement Luma computation from yuv2ya16_X_c_template() ↵Michael Niedermayer2019-10-16
| | | | | | | | | without 64bit This also reverts 21838cad2fc44023ad85e35d5c677e2f8d29a0ef The revert is in this commit to avoid 2 fate updates Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale: Fix AltiVec/VSX build with recent GCCDaniel Kolesa2019-10-04
| | | | | | | | | | | | | The argument to vec_splat_u16 must be a literal. By making the function always inline and marking the arguments const, gcc can turn those into literals, and avoid build errors like: swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal Fixes #7861. Signed-off-by: Daniel Kolesa <daniel@octaforge.org> Signed-off-by: Lauri Kasanen <cand@gmx.com>
* swscale: Replace illegal vector keyword usage in altivec codeDaniel Kolesa2019-10-04
| | | | | | | | | | | | | | | | | | | | | While this technically compiles in current ffmpeg, this is only because ffmpeg is compiled in strict ISO C mode, which disables the builtin 'vector' keyword for AltiVec/VSX. Instead this gets replaced with a macro inside altivec.h, which defines vector to be actually __vector, which accepts random types. Normally, the vector keyword should be used only with plain scalar non-typedef types, such as unsigned int. But we have the vec_(s|u)(8|16|32) macros, which can be used in a portable manner, in util_altivec.h in libavutil. This is also consistent with other AltiVec/VSX code elsewhere in the tree. Fixes #7861. Signed-off-by: Daniel Kolesa <daniel@octaforge.org> Signed-off-by: Lauri Kasanen <cand@gmx.com>
* swscale/utils: Fix invalid left shifts of negative numbersAndreas Rheinhardt2019-09-28
| | | | | | | | Affected the FATE-tests vsynth_lena-dv-411, vsynth1-dv-411, vsynth2-dv-411 and hevc-paramchange-yuv420p.yuv420p10. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/x86/swscale: Fix undefined left shifts of negative numbersAndreas Rheinhardt2019-09-28
| | | | | | | | | This affected many FATE-tests: The number of failing tests went down from 663 to 344. (Both numbers exclude tests that failed because of unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/swscale: cosmeticsLimin Wang2019-09-27
| | | | | Signed-off-by: Limin Wang <lance.lmwang@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/output: fix signed integer overflow for ya16Paul B Mahol2019-09-26
| | | | Fixes #7666.
* swscale/swscale: delete unwanted assignmentsLimin Wang2019-09-09
| | | | | Signed-off-by: Limin Wang <lance.lmwang@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/output: fix some code indentationsLinjie Fu2019-09-06
| | | | | Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* lsws/ppc/yuv2rgb_altivec: Replace vec_lvsl/vec_perm with vec_xlChip Kerchner2019-08-13
| | | | | | | | | | gcc 6.x and 7.x generate wrong code for little endian machines for the vec_lvsl/vec_perm instruction combos in some cases. The bug was fixed in version 8.x If these instructions are replaced with vec_xl, the problem goes away for all versions of the compilers. Fixes ticket #7124.
* Bump minor versions again on master to keep 4.2 versions separate from masterMichael Niedermayer2019-07-21
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* Bump minor versions to separate 4.2 from masterMichael Niedermayer2019-07-21
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/tests/swscale: Lengthen pixfmt name buffer to 21 bytesMichael Niedermayer2019-05-13
| | | | | | Some formats use longer names than 12. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libswcale: Fix possible string overflow in test.Adam Richter2019-05-13
| | | | | | | | | | | In libswcale/tests/swcale.c, the function fileTest() calls sscanf in an argument of "%12s" on character srcStr[] and dstStr[], which are only 12 bytes. So, if the input string is 12 characters, a terminating null byte can be written past the end of these arrays. This bug was found by cppcheck. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale: Add test for isSemiPlanarYUV to pixdesc_queryPhilip Langdale2019-05-12
| | | | | | Lauri had asked me what the semi planar formats were and that reminded me that we could add it to pixdesc_query so we know exactly what the list is.
* swscale: Add support for NV24 and NV42Philip Langdale2019-05-12
| | | | | | | | | | | The implementation is pretty straight-forward. Most of the existing NV12 codepaths work regardless of subsampling and are re-used as is. Where necessary I wrote the slightly different NV24 versions. Finally, the one thing that confused me for a long time was the asm specific x86 path that did an explicit exclusion check for NV12. I replaced that with a semi-planar check and also updated the equivalent PPC code, which Lauri kindly checked.
* swscale/ppc: Shorten power8 tests via a varLauri Kasanen2019-05-07
|
* swscale/ppc: VSX-optimize hScale16To*Lauri Kasanen2019-05-07
| | | | | | | | | | | | | | | | | | ./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \ -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw ./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \ -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw 32-bit mul, power8 only 2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37): 30896 UNITS in hscale, 8192 runs, 0 skips 63956 UNITS in hscale, 8192 runs, 0 skips 2.06 for hScale16To15_vsx: 30531 UNITS in hscale, 8192 runs, 0 skips 63161 UNITS in hscale, 8192 runs, 0 skips
* swscale/ppc: IndentLauri Kasanen2019-05-07
|
* swscale/ppc: VSX-optimize hScale8To19Lauri Kasanen2019-05-07
| | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw 2.26 speedup (x86 SSE2 is 2.32): 23772 UNITS in hscale, 4096 runs, 0 skips 53862 UNITS in hscale, 4096 runs, 0 skips
* swscale/ppc: VSX-optimize hscale_fastLauri Kasanen2019-04-30
| | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \ -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw 4.27 speedup for hyscale_fast: 24796 UNITS in hyscale_fast, 4096 runs, 0 skips 5797 UNITS in hyscale_fast, 4096 runs, 0 skips 4.48 speedup for hcscale_fast: 19911 UNITS in hcscale_fast, 4095 runs, 1 skips 4437 UNITS in hcscale_fast, 4096 runs, 0 skips
* swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2Lauri Kasanen2019-04-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. ~2x speedup: rgb24 24431 UNITS in yuv2packed2, 16384 runs, 0 skips 13783 UNITS in yuv2packed2, 16383 runs, 1 skips bgr24 24396 UNITS in yuv2packed2, 16384 runs, 0 skips 14059 UNITS in yuv2packed2, 16384 runs, 0 skips rgba 26815 UNITS in yuv2packed2, 16383 runs, 1 skips 12797 UNITS in yuv2packed2, 16383 runs, 1 skips bgra 27060 UNITS in yuv2packed2, 16384 runs, 0 skips 13138 UNITS in yuv2packed2, 16384 runs, 0 skips argb 26998 UNITS in yuv2packed2, 16384 runs, 0 skips 12728 UNITS in yuv2packed2, 16381 runs, 3 skips bgra 26651 UNITS in yuv2packed2, 16384 runs, 0 skips 13124 UNITS in yuv2packed2, 16384 runs, 0 skips This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version is also heavily inaccurate, while the vsx version has high accuracy.
* swscale/ppc: VSX-optimize yuv2rgb_full_XLauri Kasanen2019-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. ~6.4x speedup: rgb24 214278 UNITS in yuv2packedX, 16384 runs, 0 skips 33249 UNITS in yuv2packedX, 16384 runs, 0 skips bgr24 214616 UNITS in yuv2packedX, 16384 runs, 0 skips 33233 UNITS in yuv2packedX, 16384 runs, 0 skips rgba 214517 UNITS in yuv2packedX, 16384 runs, 0 skips 33271 UNITS in yuv2packedX, 16384 runs, 0 skips bgra 214973 UNITS in yuv2packedX, 16384 runs, 0 skips 33397 UNITS in yuv2packedX, 16384 runs, 0 skips argb 214613 UNITS in yuv2packedX, 16384 runs, 0 skips 33310 UNITS in yuv2packedX, 16384 runs, 0 skips bgra 214637 UNITS in yuv2packedX, 16384 runs, 0 skips 33330 UNITS in yuv2packedX, 16384 runs, 0 skips
* swscale/ppc: VSX-optimize yuv2rgb_full_2Lauri Kasanen2019-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. ~4x speedup: rgb24 52763 UNITS in yuv2packed2, 16384 runs, 0 skips 13453 UNITS in yuv2packed2, 16384 runs, 0 skips bgr24 53144 UNITS in yuv2packed2, 16384 runs, 0 skips 13616 UNITS in yuv2packed2, 16384 runs, 0 skips rgba 52796 UNITS in yuv2packed2, 16384 runs, 0 skips 12904 UNITS in yuv2packed2, 16384 runs, 0 skips bgra 52732 UNITS in yuv2packed2, 16384 runs, 0 skips 13262 UNITS in yuv2packed2, 16384 runs, 0 skips argb 52661 UNITS in yuv2packed2, 16384 runs, 0 skips 12879 UNITS in yuv2packed2, 16384 runs, 0 skips bgra 52662 UNITS in yuv2packed2, 16384 runs, 0 skips 12932 UNITS in yuv2packed2, 16384 runs, 0 skips
* swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1Lauri Kasanen2019-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \ -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. 1.8-2.3x speedup: rgb24 18192 UNITS in yuv2packed1, 32767 runs, 1 skips 9983 UNITS in yuv2packed1, 32760 runs, 8 skips bgr24 18665 UNITS in yuv2packed1, 32766 runs, 2 skips 9925 UNITS in yuv2packed1, 32763 runs, 5 skips rgba 20239 UNITS in yuv2packed1, 32767 runs, 1 skips 8794 UNITS in yuv2packed1, 32759 runs, 9 skips bgra 20354 UNITS in yuv2packed1, 32768 runs, 0 skips 8770 UNITS in yuv2packed1, 32761 runs, 7 skips argb 20185 UNITS in yuv2packed1, 32768 runs, 0 skips 8761 UNITS in yuv2packed1, 32761 runs, 7 skips bgra 20360 UNITS in yuv2packed1, 32766 runs, 2 skips 8759 UNITS in yuv2packed1, 32764 runs, 4 skips This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version is also heavily inaccurate, while the vsx version has high accuracy.
* swscale/ppc: VSX-optimize yuv2422_XLauri Kasanen2019-03-31
| | | | | | | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 7.2x speedup: yuyv422 126354 UNITS in yuv2packedX, 16384 runs, 0 skips 16383 UNITS in yuv2packedX, 16382 runs, 2 skips yvyu422 117669 UNITS in yuv2packedX, 16384 runs, 0 skips 16271 UNITS in yuv2packedX, 16379 runs, 5 skips uyvy422 117310 UNITS in yuv2packedX, 16384 runs, 0 skips 16226 UNITS in yuv2packedX, 16382 runs, 2 skips
* swscale/ppc: VSX-optimize yuv2422_2Lauri Kasanen2019-03-31
| | | | | | | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 5.1x speedup: yuyv422 19339 UNITS in yuv2packed2, 16384 runs, 0 skips 3718 UNITS in yuv2packed2, 16383 runs, 1 skips yvyu422 19438 UNITS in yuv2packed2, 16384 runs, 0 skips 3800 UNITS in yuv2packed2, 16380 runs, 4 skips uyvy422 19128 UNITS in yuv2packed2, 16384 runs, 0 skips 3721 UNITS in yuv2packed2, 16380 runs, 4 skips
* swscale/ppc: VSX-optimize yuv2422_1Lauri Kasanen2019-03-31
| | | | | | | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 15.3x speedup: yuyv422 14513 UNITS in yuv2packed1, 32768 runs, 0 skips 949 UNITS in yuv2packed1, 32767 runs, 1 skips yvyu422 14516 UNITS in yuv2packed1, 32767 runs, 1 skips 943 UNITS in yuv2packed1, 32767 runs, 1 skips uyvy422 14530 UNITS in yuv2packed1, 32767 runs, 1 skips 941 UNITS in yuv2packed1, 32766 runs, 2 skips
* swscale/swscale_unscaled: Fix chroma slice heightMichael Niedermayer2019-03-28
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/swscale_unscaled: fixed the issue that when width/height is not ↵Dong, Jerry2019-03-28
| | | | | | | | 2-multiple, transition of nv12 to u/v planes is not completed. Signed-off-by: Dong, Jerry <jerry.dong@intel.com> Signed-off-by: Decai Lin <decai.lin@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/ppc: VSX-optimize yuv2rgb_fullLauri Kasanen2019-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - This uses 32-bit mul, so POWER8 only. The following output formats get about 4.5x speedup: rgb24 39980 UNITS in yuv2packed1, 32768 runs, 0 skips 8774 UNITS in yuv2packed1, 32768 runs, 0 skips bgr24 40069 UNITS in yuv2packed1, 32768 runs, 0 skips 8772 UNITS in yuv2packed1, 32766 runs, 2 skips rgba 39759 UNITS in yuv2packed1, 32768 runs, 0 skips 8681 UNITS in yuv2packed1, 32767 runs, 1 skips bgra 39729 UNITS in yuv2packed1, 32768 runs, 0 skips 8696 UNITS in yuv2packed1, 32766 runs, 2 skips argb 39766 UNITS in yuv2packed1, 32768 runs, 0 skips 8672 UNITS in yuv2packed1, 32766 runs, 2 skips bgra 39784 UNITS in yuv2packed1, 32768 runs, 0 skips 8659 UNITS in yuv2packed1, 32767 runs, 1 skips
* swscale: Remove duplicated codeLauri Kasanen2019-03-27
| | | | In this function, the exact same clamping happens both in the if and unconditionally.
* swscale/ppc: Add av_unused to template vars only used in one includerLauri Kasanen2019-03-20
|
* swscale/ppc: Clean up some mixed decl warningsLauri Kasanen2019-03-20
|
* libswscale/ppc: VSX-optimize 9-16 bit yuv2planeXLauri Kasanen2019-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \ -s 1920x1728 -f null -vframes 100 -v error -nostats - 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. Fate passes, each format tested with an image to video conversion. Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out of the 16-bit function. This includes the vec_mulo/mule functions too, not just vmuluwm. With TIMER_REPORT skips disabled: yuv420p9le 12412 UNITS in planarX, 131072 runs, 0 skips 73136 UNITS in planarX, 131072 runs, 0 skips yuv420p9be 12481 UNITS in planarX, 131072 runs, 0 skips 73410 UNITS in planarX, 131072 runs, 0 skips yuv420p10le 12322 UNITS in planarX, 131072 runs, 0 skips 72546 UNITS in planarX, 131072 runs, 0 skips yuv420p10be 12291 UNITS in planarX, 131072 runs, 0 skips 72935 UNITS in planarX, 131072 runs, 0 skips yuv420p12le 12316 UNITS in planarX, 131072 runs, 0 skips 72708 UNITS in planarX, 131072 runs, 0 skips yuv420p12be 12319 UNITS in planarX, 131072 runs, 0 skips 72577 UNITS in planarX, 131072 runs, 0 skips yuv420p14le 12259 UNITS in planarX, 131072 runs, 0 skips 72516 UNITS in planarX, 131072 runs, 0 skips yuv420p14be 12440 UNITS in planarX, 131072 runs, 0 skips 72962 UNITS in planarX, 131072 runs, 0 skips yuv420p16le 10548 UNITS in planarX, 131072 runs, 0 skips 73429 UNITS in planarX, 131072 runs, 0 skips yuv420p16be 10634 UNITS in planarX, 131072 runs, 0 skips 150959 UNITS in planarX, 131072 runs, 0 skips Signed-off-by: Lauri Kasanen <cand@gmx.com>