| Commit message (Collapse) | Author | Age |
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
| |
tested on http://ps-auxw.de/10bit-h264-sample/10bit-eldorado.mkv
MMX: ~30% faster decoding overall
SSE2:~40% faster
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
|
|
| |
paddq is an SSE2 instruction so it cannot be used for MMX.
This was probably just a typo because the sums are dwords anyway.
Reviewed-by: Pascal Massimino <pascal.massimino@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
|
| |
integration by Neil Birkbeck, with help from Vitor Sessak.
core SSE2 loop by Skal (pascal.massimino@gmail.com)
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
| |
Reviewed-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
|
| |
And use the x86util ones instead, which are optimized for mmxext/sse2.
About ~1% increase in performance on pre SSSE3 processors.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
| |
with nasm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
| |
This makes C and MMX match, no change to fate as the differences where
apparently not sufficient to show up in fate
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
| |
This should avoid issues on x86_64
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
|
| |
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '01c5779f56cf708e6cb88b11cfdc248cae7e2ee8':
x86: Drop some unnecessary YASM ifdefs
Conflicts:
libavfilter/x86/vf_yadif_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Dead code elimination is enough to avoid undefined references in these cases.
|
| |
| |
| |
| | |
All copyright holders have agreed to the relicensing.
|
| |
| |
| |
| |
| | |
Signed-off-by: Robert Krüger <krueger@lesspain.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
This reverts commit 975110a85ef8e794fdc041455ff41b0ad30bc01e.
Signed-off-by: Robert Krüger <krueger@lesspain.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit a87b17f3283aada762820f1b797eeb7a2dff6c61.
This reduces the amount of non LGPL code, making a relicensing to LGPL
easier
Conflicts:
libavfilter/vf_yadif.c
libavfilter/x86/yadif.c
libavfilter/x86/yadif_template.c
libavfilter/yadif.h
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit fc5fe4804fd2ee9a29de502e9431b12d027c0c89, reversing
changes made to ffe33500983983946048def3a6047920d97d957b.
The factoring is broken; it's not calling the ssse3 code anymore, and
calling the mmx2 code with bad alignment. It also broke some FATE
instances.
Conflicts:
libavfilter/x86/vf_gradfun_init.c
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'ed1a11ed52bbd1f15bb9b0416d69b7924bee3191':
gradfun: x86: Factor out common code for some gradfun_filter_line() variants
Conflicts:
libavfilter/x86/vf_gradfun_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'ee80cf741a44115758e62399b7bde08d33161151':
avfilter: x86: K&R formatting cosmetics
Conflicts:
libavfilter/x86/vf_gradfun_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '0e730494160d973400aed8d2addd1f58a0ec883e':
avfilter: x86: Port gradfun filter optimizations to yasm
Conflicts:
libavfilter/x86/vf_gradfun_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'f6633c55a3c0e93a5b2bab6aa0692fb608f2a38d':
avfilter: Fix typo in Loren's email address
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| | |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
Consistently use "cpu_flags" as variable/parameter name for CPU flags
Conflicts:
libavcodec/x86/dsputil_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/hpeldsp_init.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideo.c
libavcodec/x86/proresdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Always use the special filter for the first and last 3 columns (only).
Changes made in 64ed397 slowed the filter to just under 3/4 of what it
was. This commit restores the speed while maintaining identical output.
For reference, on my Athlon64:
1733222 decicycles in old
2358563 decicycles in new
1727558 decicycles in this
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '6e9f8d6a7d7392a236df19fef6f4eba41f18167e':
x86: vf_yadif: Remove stray dsputil_mmx #include
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '093804a93cc5da3f95f98265a5df116912443cec':
avfilter: Add av_cold attributes to init/uninit functions
Conflicts:
libavfilter/af_ashowinfo.c
libavfilter/af_volume.c
libavfilter/src_movie.c
libavfilter/vf_lut.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: Move some conditional code around to avoid unused variable warnings
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/vf_yadif_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| |
| |
| | |
There is no noticable benefit for such precision.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| | |
Current dithering only uses the first 4 instead of the whole 8 random values.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Current code divides before increasing precision.
Also reduce upper bound for strength from 255 to 64. This will prevent
an overflow in the SSSE3 and MMX filter_line code: delta is expressed as
an u16 being shifted by 2 to the left. If it overflows, having a
strength not above 64 will make sure that m is set to 0 (making the
m*m*delta >> 14 expression void).
A value above 64 should not make any sense unless gradfun is used as
a blur filter.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| | |
CC:libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
The filter already checks that width (and height) are greater than 3.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2). It also avoids emulating the missing double word
instructions on older instruction sets.
Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.
Athlon64:
1809275 decicycles in C, 32718 runs, 50 skips
911675 decicycles in mmx, 32727 runs, 41 skips, 2.0x faster
495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster
Core2Quad:
921363 decicycles in C, 32756 runs, 12 skips
486537 decicycles in mmx, 32764 runs, 4 skips, 1.9x faster
293296 decicycles in sse2, 32759 runs, 9 skips, 3.1x faster
284910 decicycles in ssse3, 32759 runs, 9 skips, 3.2x faster
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a fairly dumb copy of the assembly for 8-bit samples but it
works and produces identical output to the C version. The options have
been tested on an Athlon64 and a Core2Quad.
Athlon64:
1810385 decicycles in C, 32726 runs, 42 skips
1080744 decicycles in mmx, 32744 runs, 24 skips, 1.7x faster
818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster
Core2Quad:
924025 decicycles in C, 32750 runs, 18 skips
623995 decicycles in mmx, 32767 runs, 1 skips, 1.5x faster
406223 decicycles in sse2, 32764 runs, 4 skips, 2.3x faster
387842 decicycles in ssse3, 32767 runs, 1 skips, 2.4x faster
307726 decicycles in sse4, 32763 runs, 5 skips, 3.0x faster
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Always use the special filter for the first and last 3 columns (only).
Changes made in 64ed397 slowed the filter to just under 3/4 of what it
was. This commit restores the speed while maintaining identical output.
For reference, on my Athlon64:
1733222 decicycles in old
2358563 decicycles in new
1727558 decicycles in this
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|