summaryrefslogtreecommitdiff
path: root/libavcodec/aarch64
Commit message (Collapse)AuthorAge
* cosmetics: Fix spelling mistakesVittorio Giovara2016-05-04
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* build: miscellaneous cosmeticsDiego Biurrun2016-04-07
| | | | | | Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.
* aarch64: Make transpose_4x4H do a regular transposeMartin Storsjö2016-03-26
| | | | | | | | | | | | | | | | | | | | | | Previously, ff_h264_idct_add_neon (originally in the arm version) used a non-regular transpose in order to be able to use more instructions that deal with registers as 128 bit register pairs. The aarch64 translation doesn't do it to the same extent, but brought along the same structure since it was a straight translation. This reshuffles ff_h264_idct_add_neon, bringing it closer to the C implementation, making the transpose_4x4H macro do a regular transpose, usable for other algorithms as well. Previously, the third and fourth output from transpose_4x4H were swapped, and prior to cc29d96d5a, the same inputs as well. In addition to just swapping the outputs, also renumber the intermediate registers for better readability (making the register order match transpose_4x8B). This runs with the same number of cycles as before. Signed-off-by: Martin Storsjö <martin@martin.st>
* fft: Split MDCT bits off from FFTDiego Biurrun2016-03-01
|
* fft: arm: Drop unnecessary #include, add missing onesDiego Biurrun2016-02-26
|
* dca: remove unused decode_hf function and quant_d tablesAlexandra Hájková2015-12-24
| | | | | They were superseded with their integer equivalents. Rename integer decode_hf to decode_hf.
* arm64: fix inverted register order in transpose_4x4HJanne Grunau2015-12-21
| | | | | | Fix related register order issue in ff_h264_idct_add_neon. Found-by: zjh8890 <243186085@qq.com>
* arm64: int32_to_float_fmul neon asmJanne Grunau2015-12-14
| | | | | | | | | | 3% faster dts decoding on a cortex-a57. cortex-a57 cortex-a53 int32_to_float_fmul_array8_c: 1270.9 4475.6 int32_to_float_fmul_array8_neon: 328.6 569.2 int32_to_float_fmul_scalar_c: 928.5 4119.6 int32_to_float_fmul_scalar_neon: 309.1 524.1
* arm64: port synth_filter_float_neon from armJanne Grunau2015-12-14
| | | | | | | | | | | | | | ~25% faster dts decoding overall. The checkasm CPU cycles numbers are not that useful since synth_filter_float() calls FFTContext.imdct_half(). cortex-a57 cortex-a53 synth_filter_float_c: 1866.2 3490.9 synth_filter_float_neon: 915.0 1531.5 With fftc.imdct_half forced to imdct_half_neon: cortex-a57 cortex-a53 synth_filter_float_c: 1718.4 3025.3 synth_filter_float_neon: 926.2 1530.1
* arm64: convert dcadsp neon asm from armJanne Grunau2015-12-14
| | | | | | | | | | | | ~2% faster dts decoding overall. cortex-a57 cortex-a53 dca_decode_hf_c: 474.8 1659.9 dca_decode_hf_neon: 225.2 301.1 dca_lfe_fir0_c: 913.2 1537.7 dca_lfe_fir0_neon: 286.8 451.9 dca_lfe_fir1_c: 848.7 1711.5 dca_lfe_fir1_neon: 387.1 506.4
* h264: aarch64: intra prediction optimisationsJanne Grunau2015-07-20
|
* arm64: constify src in h264qpel dsp function definitionsJanne Grunau2015-06-24
|
* opus: Factor out imdct15 into a standalone componentDiego Biurrun2015-02-02
| | | | It will be reused by the AAC decoder.
* aarch64: Use .data.rel.ro for const data with relocationsMartin Storsjö2014-12-09
| | | | | | | This reverts commit c00365b46d464ce47716315c1801818d811bdb9a in addition to using a different section. Signed-off-by: Martin Storsjö <martin@martin.st>
* aarch64: Make the function pointer tables position independentMartin Storsjö2014-11-16
| | | | | | | This allows running the code on android, where 64 bit binaries with text relocations aren't allowed to be loaded. Signed-off-by: Martin Storsjö <martin@martin.st>
* aarch64: add ',' between assembler macro arguments where missingJanne Grunau2014-08-04
| | | | | | | llvm's integrated assembler does not accept spaces as macro argument delimiter when targeting darwin. Using a explicit delimiter is a good idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2' clear.
* h264: avoid using uninitialized memory in NEON chroma mcJanne Grunau2014-06-23
| | | | | Adapt commit 982b596ea6640bfe218a31f6c3fc542d9fe61c31 for the arm and aarch64 NEON asm. 5-10% faster on Cortex-A9.
* aarch64: opus NEON iMDCT and FFTJanne Grunau2014-05-15
| | | | | Opus celt decoding 11% faster and the iMDCT over 2.5 times faster on Apple's A7.
* aarch64: assembler in clang-3.4 ignores the division by twoJanne Grunau2014-05-13
| | | | Values are positive powers of two, so just replace it with right shift.
* aarch64: NEON vorbis_inverse_couplingJanne Grunau2014-04-22
| | | | | From the ARMv7 NEON version. 16 times faster as the C version, overall more than 12% faster vorbis decoding on Apple's A7.
* aarch64: NEON fixed/floating point MPADSP apply_windowJanne Grunau2014-04-22
| | | | | 30%/25% (fixed/float) faster mp3 decoding on Apple's A7. The floating point decoder is approximately 7% faster.
* aarch64: NEON float (i)MDCTJanne Grunau2014-04-22
| | | | Approximately as fast as the ARM NEON version on Apple's A7.
* aarch64: NEON float FFTJanne Grunau2014-04-22
| | | | Approximately as fast as the ARM NEON version on Apple's A7.
* aarch64: implement videodsp.prefetchJanne Grunau2014-04-06
| | | | 8% faster h264 decoding on Apple A7.
* build: Group general components separate from de/encoders in arch MakefilesDiego Biurrun2014-03-20
| | | | This is in line with how the top-level libavcodec Makefile is structured.
* aarch64: get_cabac inline asmJanne Grunau2014-03-09
| | | | | | | Based on the x86 branchless get_cabac asm. get_cabac_noinline() gets approximately 20% faster (no cycle counts available) compared to clang from Xcode 5.1 beta5. More than 6% faster overall. A part of the overall speedup might be explained by additional inlining of get_cabac().
* aarch64: use EXTERN_ASM consistently for exported symbolsJanne Grunau2014-02-20
| | | | Based on e3fec3f095ab5ea08ee662942d98526aaf5e3635 for arm.
* aarch64: port neon clobber test from armJanne Grunau2014-01-15
|
* aarch64: h264 (bi)weight NEON optimizationsJanne Grunau2014-01-15
| | | | Ported from ARMv7 NEON.
* aarch64: h264 loop filter NEON optimizationsJanne Grunau2014-01-15
| | | | Ported from ARMv7 NEON.
* aarch64: hpeldsp NEON optimizationsJanne Grunau2014-01-15
| | | | Ported from ARMv7 NEON.
* aarch64: h264 qpel NEON optimizationsJanne Grunau2014-01-15
| | | | Ported from ARMv7 NEON.
* aarch64: h264 idct NEON assembler optimizationsJanne Grunau2014-01-15
| | | | Ported from ARMv7 NEON.
* aarch64: h264 chroma motion compensation NEON optimizationsJanne Grunau2014-01-15
Since RV40 and VC-1 use almost the same algorithm so optimizations for those two decoders are easy to do and included.