| Commit message (Collapse) | Author | Age |
|\
| |
| |
| |
| |
| |
| | |
* commit '40ad05bab206c932a32171d45581080c914b06ec':
checkasm: Cast unsigned to signed
Merged-by: Clément Bœsch <cboesch@gopro.com>
|
| |
| |
| |
| |
| | |
Avoid a warning for passing an unsigned value to abs(), some compilers
might optimize away abs().
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '9064777dbb335ab4809ae09e3fdcc0245f925cdc':
checkasm: add HEVC test for testing IDCT DC
Merged-by: Clément Bœsch <cboesch@gopro.com>
|
| |
| |
| |
| | |
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '6f9e34baea4f6f484392e4e67f606a0835d07b73':
arm: Check for support for the .fpu directive
Merged-by: Clément Bœsch <cboesch@gopro.com>
|
| |
| |
| |
| |
| |
| |
| | |
When targeting COFF (windows), clang doesn't support this
directive (while binutils supports it for all targets).
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
These bits are set by exceptions in NEON instructions.
Also print the differing bits when FPSCR is clobbered,
and use bic instead of lsl, for clearing the topmost bits.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| |
| | |
Fixes AS error on non NEON builds introduced in 71a04721145. Also
set the fpu directly to vfp in checkasm.S to cause build errors on NEON
builds.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Each const block needs to be terminated by one endconst
invocation so either call endconst after each, or just
declare plain labels to the later strings.
This fixes errors such as this, on some binutils versions:
checkasm.S:38: Error: Macro `endconst' was already defined
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '71a0472114574993df7035f4de9aa007e03817b8':
checkasm: arm: report the first clobbered register in checkasm_checked_call
Also includes 446353ea18, 59aeed93e4, and 37961044c6 to avoid breaking
too much stuff.
Merged-by: Clément Bœsch <u@pkh.me>
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This work is sponsored by, and copyright, Google.
Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:
Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3
By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:
vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9
vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1
I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.
In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.
This is cherrypicked from libav commit
9c8bc74c2b40537b0997f646c87c008042d788c2.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '7b1ae0e73ab7f7c5eabc70dbe2e579127c6e154f':
checkasm/arm: preserve the stack alignment checkasm_checked_call
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The stack used by checkasm_checked_call_vfp was a multiple of 4 when the
checked function is called. AAPCS requires a double word (8 byte)
aligned stack public interfaces. Since both calls are public interfaces
the stack is misaligned when the checked is called.
Might fix the SIGBUS error in the armv7-linux-clang-3.7 fate config.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '80fbb7becae530167373fe5178966b7d7604306e':
checkasm: vp8.mc: initialize the full src buffer after ec32574209f
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| | |
Fixes "Use of uninitialised value" valgrind warnings in checkasm.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '8c816c0c9b12fdefd9046415e97df299880bc9b8':
checkasm/arm: align the clobber check data properly for ldrd
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| | |
Should fix the SIGBUS in the armv7-linux-clang-3.7 fate target.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'ec32574209f36467ef0d22c21a7e811ba98c15b6':
checkasm: vp8: mc: test unequal width/height for partitions
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'f8d17d53957056c053a46f9320fa7ae6fe1479a5':
checkasm: Add tests for vp8dsp
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
The tests are inspired by similar tests for vp9 by
Ronald Bultje.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This avoids listing the same feature multiple times in the
test output. Previously the output contained something like this:
SSE2:
- hevc_mc.qpel [OK]
- hevc_mc.epel [OK]
- hevc_mc.unweighted_pred [OK]
- hevc_mc.qpel [OK]
- hevc_mc.epel [OK]
- hevc_mc.unweighted_pred [OK]
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'e48746deec48e9ff195841bc3266b4e153a878cd':
checkasm: h264dsp: Move the x and y variables into the randomize_buffer macro
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
This avoids the risk of accidentally clobbering such variables outside
of the macro if the same variables are used there.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes valgrind warnings about conditional jumps based on
uninitialized data (even though the uninitialized data only ever
was compared with a direct copy of the same uninitialized data).
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'dc7501e524dc3270335749302c7aa449973625f3':
checkasm: Issue emms after benchmarking functions
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The functions may not clean up properly after using MMX
registers. For the normal testing calls, the checkasm_checked_call
functions will do the cleanup (and check that functions that
should clean up do it as well), but when benchmarking functions
that don't clean up, we don't currently properly clean up at all.
This causes issues if a benchmarked function is followed by testing
of a function that is supposed to not clobber the MMX/FPU state but
doesn't touch it at all.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes valgrind warnings about conditional jumps based on
uninitialized data (even though the uninitialized data only ever
was compared with a direct copy of the same uninitialized data).
Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| | |
Fixes checkasm failures on mmxext functions
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '105998fb5ca3c343f5c8cb39ce3197f87a5e4d36':
checkasm: Add tests for h264 idct
Merged-by: Matthieu Bouron <matthieu.bouron@stupeflix.com>
|
| |
| |
| |
| |
| |
| |
| | |
The tests are inspired by similar tests for vp9 by
Ronald Bultje.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| | |
Chunk was not merged in ca5ec2bf51d8c4f8bb0a829d0a65c70c968888a3.
|
| |
| |
| |
| |
| |
| | |
The code is documented as to require 8byte alignment
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '257f00ec1ab06a2a161f535036c6512f3fc8e801':
Split global .gitignore file into per-directory files
Merged-by: Clément Bœsch <clement@stupeflix.com>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb':
cosmetics: Fix spelling mistakes
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| | |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '01621202aad7e27b2a05c71d9ad7a19dfcbe17ec':
build: miscellaneous cosmetics
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| |
| |
| | |
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| | |
Suggested & Approved by: BBB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '7c82d31cbe9fc5d5a321ad49c14a472bd629b50f':
checkasm: Use standard multiple inclusion guards
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Also bench a smaller buffer. This drastically reduces --bench runtime
and reports smaller, more readable numbers.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
They will now compile if avcodec is disabled
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
The test is already slow.
|