| Commit message (Collapse) | Author | Age |
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\
| |
| |
| |
| |
| |
| | |
* commit '01621202aad7e27b2a05c71d9ad7a19dfcbe17ec':
build: miscellaneous cosmetics
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| |
| |
| | |
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'cdb1665f70def544ddab3e3ed3763ef99c8b3873':
aarch64: Make transpose_4x4H do a regular transpose
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, ff_h264_idct_add_neon (originally in the arm version) used
a non-regular transpose in order to be able to use more instructions
that deal with registers as 128 bit register pairs. The aarch64
translation doesn't do it to the same extent, but brought along the
same structure since it was a straight translation.
This reshuffles ff_h264_idct_add_neon, bringing it closer to
the C implementation, making the transpose_4x4H macro do a regular
transpose, usable for other algorithms as well.
Previously, the third and fourth output from transpose_4x4H were
swapped, and prior to cc29d96d5a, the same inputs as well. In
addition to just swapping the outputs, also renumber the intermediate
registers for better readability (making the register order match
transpose_4x8B).
This runs with the same number of cycles as before.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555':
fft: arm: Drop unnecessary #include, add missing ones
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
dca: remove unused decode_hf function and quant_d tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| | |
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
|
| |
| |
| |
| |
| |
| | |
Fix related register order issue in ff_h264_idct_add_neon.
Found-by: zjh8890 <243186085@qq.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'a0fc780a2093784e8664f88205ee1b215e109cee':
arm64: int32_to_float_fmul neon asm
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
3% faster dts decoding on a cortex-a57.
cortex-a57 cortex-a53
int32_to_float_fmul_array8_c: 1270.9 4475.6
int32_to_float_fmul_array8_neon: 328.6 569.2
int32_to_float_fmul_scalar_c: 928.5 4119.6
int32_to_float_fmul_scalar_neon: 309.1 524.1
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '705f5e5e155f6f280a360af220fc5b30cfcee702':
arm64: port synth_filter_float_neon from arm
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
~25% faster dts decoding overall. The checkasm CPU cycles numbers are
not that useful since synth_filter_float() calls FFTContext.imdct_half().
cortex-a57 cortex-a53
synth_filter_float_c: 1866.2 3490.9
synth_filter_float_neon: 915.0 1531.5
With fftc.imdct_half forced to imdct_half_neon:
cortex-a57 cortex-a53
synth_filter_float_c: 1718.4 3025.3
synth_filter_float_neon: 926.2 1530.1
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c33c1fa8af2b2e82418a06901b6ad17b3d61b73e':
arm64: convert dcadsp neon asm from arm
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
~2% faster dts decoding overall.
cortex-a57 cortex-a53
dca_decode_hf_c: 474.8 1659.9
dca_decode_hf_neon: 225.2 301.1
dca_lfe_fir0_c: 913.2 1537.7
dca_lfe_fir0_neon: 286.8 451.9
dca_lfe_fir1_c: 848.7 1711.5
dca_lfe_fir1_neon: 387.1 506.4
|
| |
| |
| |
| |
| |
| |
| |
| | |
Fix related register order issue in ff_h264_idct_add_neon.
Found-by: zjh8890 <243186085@qq.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| | |
The change was not correct and broke H264
This reverts commit cd83f899c94f691b045697d12efa21f83eb2329f.
|
| |
| |
| |
| |
| | |
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong,
this bug waste me much time while I make aarch64 arm instruction which used the function.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'f56d8d8dd72b1ab52aa814c5a0fccabf8040ef68':
h264: aarch64: intra prediction optimisations
Conflicts:
libavcodec/h264pred.c
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
|
| | |
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '3d5d46233cd81f78138a6d7418d480af04d3f6c8':
opus: Factor out imdct15 into a standalone component
Conflicts:
configure
libavcodec/opus_celt.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
It will be reused by the AAC decoder.
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '780cd20b00a69e26bbfffbb8eec16fbe999ea793':
aarch64: Use .data.rel.ro for const data with relocations
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
This reverts commit c00365b46d464ce47716315c1801818d811bdb9a
in addition to using a different section.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c00365b46d464ce47716315c1801818d811bdb9a':
aarch64: Make the function pointer tables position independent
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
This allows running the code on android, where 64 bit binaries with
text relocations aren't allowed to be loaded.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'ac6b95dbc0b53b3ea461bd5e5e7f7f31d2983733':
aarch64: add ',' between assembler macro arguments where missing
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
llvm's integrated assembler does not accept spaces as macro argument
delimiter when targeting darwin. Using a explicit delimiter is a good
idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2'
clear.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'f23d26a6864128001b03876b0b92fffe131f2060':
h264: avoid using uninitialized memory in NEON chroma mc
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Adapt commit 982b596ea6640bfe218a31f6c3fc542d9fe61c31 for the arm and
aarch64 NEON asm. 5-10% faster on Cortex-A9.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'd3f5b94762fb803c0f3b29f9ad6c5eaa813998ba':
aarch64: opus NEON iMDCT and FFT
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Opus celt decoding 11% faster and the iMDCT over 2.5 times faster on
Apple's A7.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '9aa4592076d4dbb29d1198b0e258f9f85c0c00b5':
aarch64: assembler in clang-3.4 ignores the division by two
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Values are positive powers of two, so just replace it with right shift.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '3956a5e0ea46ed7e27ca888fe11c47986ad99261':
aarch64: NEON vorbis_inverse_coupling
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
From the ARMv7 NEON version. 16 times faster as the C version, overall
more than 12% faster vorbis decoding on Apple's A7.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '8f9fe6ae3461ce270bce6b7083fda5ec314cdad4':
aarch64: NEON fixed/floating point MPADSP apply_window
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
30%/25% (fixed/float) faster mp3 decoding on Apple's A7. The floating
point decoder is approximately 7% faster.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'ee2bc5974fe64fd214f52574400ae01c85f4b855':
aarch64: NEON float (i)MDCT
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Approximately as fast as the ARM NEON version on Apple's A7.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '650c4300d94aa9398ff1dd4f454bf39eaa285f62':
aarch64: NEON float FFT
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Approximately as fast as the ARM NEON version on Apple's A7.
|