| Commit message (Collapse) | Author | Age |
... | |
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Altivec can only load naturally aligned vectors. To handle possibly
unaligned data a second vector is loaded from an offset of the original
location and the data is recovered through a vector permutation.
Overreads are minimal if the offset for second load points to the last
element of data. This is 7 for loading eight 8-bit pixels and overreads
are reduced from 16 bytes to 8 bytes if the pixels are 64-bit aligned.
For unaligned pixels the overread is reduced from 23 bytes to 15 bytes
in the worst case.
|
|
|
|
|
| |
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
|
|
|
| |
Fixes invalid memory access.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Implicit vector loads on POWER7 hardware can use the VSX
instruction set instead of classic Altivec/VMX. Let's force
a VMX load in this case.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
the second and third sources were incremented only by half of the needed size
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Now that the headers themselves have ifdef protection this is no
longer necessary and more consistent with normal include handling.
|
| |
|
| |
|
|
|
|
|
|
| |
This fixes building in configurations where altivec is disabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
| |
This way, the special IDCT permutations are no longer needed. This
is similar to how H264 does it, and removes the dsputil dependency
imposed by the scantable code.
Also remove the unused type == 0 cases from the plain C version
of the idct.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
| |
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
|
|
|
|
| |
It does not help as an abstraction and adds dsputil dependencies.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
|
|
| |
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
|
| |
|
|
|
|
| |
This saves one instruction in the x86-64 assembly.
|
|
|
|
| |
Also fixes compilation with AltiVec disabled.
|
|
|
|
|
|
|
| |
This fixes build failures on ppc machines with a compiler that
supports -Werror=implicit-function-declaration.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
| |
Conveniently (together with Justin's earlier patches), this makes
our vorbis decoder entirely independent of dsputil.
|
|
|
|
|
|
|
|
|
| |
This is identical to what e.g. vp8 does, and prevents the function call
overhead (plus dependency on dsputil for this particular function).
Arm asm updated by Janne Grunau <janne-libav@jannau.net>.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
| |
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
|
|
|
|
|
|
| |
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
|
|
|
|
| |
This removes warnings about strict aliasing violations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
The third argument of OP_U8_ALTIVEC is evaluated at most once so
there is no need for a potentially unused temporary variable.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
The vec_splat() intrinsic requires a constant argument for the
element number, and the code relies on the compiler unrolling
the loop to provide this. Manually unrolling the loop avoids
this reliance and works with all compilers.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|