| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
The assembler may fail to place literal pools close enough to
instructions referencing them. An explicit .ltorg directive
fixes this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
Based on patch by Ronald S. Bultje <rsbultje@gmail.com>,
partially ported from libvpx.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
| |
This is a preparation for complete ARMv6 optimisations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
| |
This adds some macros simplifying Thumb and pre-v6T2 compatibility.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
This feature is complex, of questionable utility, and slows down
normal decoding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.
The x86 code already used that trick.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
| |
|
|
|
|
|
|
| |
The were broken since August of 2010 without anyone noticing until
three weeks ago. Nobody cares about it anymore and hopefully Marvell
will support NEON like in the PXA978 from now on.
|
|
|
|
|
|
|
|
|
| |
There is only one caller, which does not need the shifting. Other use cases
are situations where different roundings would be needed.
The x86 and neon versions are modified accordingly.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
|
| |
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
|
|
|
|
| |
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
|
|
| |
This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
|
| |
|
|
|
|
|
|
|
| |
This function was broken when the start bin was not at the start
of a band.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
| |
Overall speedup of HE-AAC decoding 2.3x on Cortex-A8, 1.2x on A9.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
| |
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
Overall almost 4% faster, idct_add down from 350 to 85 cycles, idct_dc_add
down from 83 to 30 cycles.
squash: rv34 idct rearrange partial register loads
|
|
|
|
| |
Implement 1-pass inverse transform and reconstruction for inter blocks.
|
|
|
|
|
|
|
|
| |
The alignment directive must obviously precede the label.
This was never noticed in ARM mode since the location is
already aligned there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Due to apprent bugs in the GNU assembler and/or linker, relocations
can be incorrectly processed if the alignment of a Thumb instruction
is changed in the output file compared to the input object.
This fixes crashes in h264 decoding with Thumb enabled. No effect in
ARM mode since everything is 4-byte aligned there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
|
| |
30-50% faster than the C implementation, 0.5% overall speedup on
bourne.rmvb.
|
|
|
|
|
|
|
|
|
|
|
| |
Perform dequantization while decoding coefficients instead of performing it
on the entire coefficients buffer.
Since quantized coefficients are very sparse, this usually causes a small
speedup. Speedup of around 1% on Panda board compared to the removed here
neon code. Global speedup is probably around 3%.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
External symbol references need prefixes on some systems.
This should fix build errors on Darwin.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
Old gcc versions have trouble compiling this function, and
no simple, targeted test is possible.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
| |
Based on patch by Janne Grunau.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
| |
This allows sharing code with the rv40 version of these functions.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
|
| |
- Replace 'ip' with 'r12'.
- Use correct size designators for vld1/vst1.
- Whitespace fixes.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
This makes whitespace and register names consistent with
the style used in more recent code.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
| |
|
|
|
|
|
|
| |
Although this adds a few lines, the macro calls are less convoluted.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
|
|
| |
This is a hand-tuned version of the code with impossible parts of
the FASTDIV function ommitted.
2-5% faster overall on Cortex-A8.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
The 'function' macro already includes the appropriate
directives.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This prevents build errors when compiler and assembler default
targets differ. Ideally each file would declare the highest
level it requires. This is however not easily possible as it
complicates assembling pre-armv6t2 code in Thumb-2 mode.
HAVE_NEON is used as indicator for ARMv7-A since no other
symbol exists for this and NEON is only available in this
variant.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Filenames are brittle across renames and add no useful information.
|
| |
|
|
|
|
| |
Neon parts by Mans Rullgard <mans@mansr.com>.
|