| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
Fixes failures with yasm 1.1.0 and older
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
| |
Old yasm/nasm versions don't support some of these
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on x86_64:
time PSNR
plain 3.303 inf
SSE 1.649 107.087535
SSE3 1.632 107.087535
AVX 1.409 106.986771
FMA3 1.265 107.108437
on x86_32 (PSNR compared to x86_64 plain):
time PSNR
plain 7.225 103.951979
SSE 1.827 105.859282
SSE3 1.819 105.859282
AVX 1.533 105.997661
FMA3 1.384 105.885377
FMA4 test is not available
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Also add some documentation for each function to colorspacedsp.h.
|
| |
|
| |
|
|
|
|
|
| |
Signed-off-by: Thomas Mundt <loudmax@yahoo.de>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: Thomas Mundt <loudmax@yahoo.de>
|
|
|
|
|
|
| |
4.5x faster than C float version with autovectorization
10 x faster than C int version
25 x faster than C float version without autovectorization
|
| |
|
|
|
|
|
|
| |
10x faster than C.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
5 times faster than C, 3 times overall.
|
| |
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
Fixes crash
Fixes: Ticket5055
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
|
| |
Found-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Found-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Lou Logan <lou@lrcd.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
| |
|
|
|
|
| |
Convert last users to av_opt_get_*() counterparts.
|
|
|
|
|
| |
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
|
|
|
|
|
|
| |
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
~20% faster than ssse3. Also enabled for x86_32
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Speed of all modes increased by a factor between 7.4 and 19.8 largely depending
on whether bytes are unpacked into words. Modes 2, 3, and 4 have been sped-up
by a factor of 43 (thanks quick sort!)
All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20,
21, and 22 are available on x86 due to the number of SIMD registers used.
With a contribution from James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The internal line accumulator for 16bit can overflow, so I changed that
from int to uint64_t in the C code. The matching assembly looks a little
weird but output looks correct.
(avx2 should be trivial to add later.)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
| |
Both are 2-2.5x faster than their C counterpart.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|