summaryrefslogtreecommitdiff
path: root/libavcodec/x86/proresdsp.asm
Commit message (Collapse)AuthorAge
* avcodec/x86: allow future 8-bit simple idct to use slightly different ↵James Darnley2017-06-20
| | | | coefficients
* avcodec/x86: modify simple_idct10 macros to add an action paramterJames Darnley2017-06-20
|
* avcodec/x86: cleanup simple_idct10James Darnley2017-06-20
| | | | | | Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register.
* x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-21
| | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF.
* x86: simple_idct10_template: use constChristophe Gisquet2015-10-13
| | | | | | | | This avoid going through constants.c while still sharing them with proresdsp.asm Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86: simple_idct(_put): 10bits versionsChristophe Gisquet2015-10-13
| | | | | | | | | | | | | | | | | Modeled from the prores version. Clips to [0;1023] and is bitexact. Bitexactness requires to add offsets in different places compared to prores or C, and makes the function approximately 2% slower. For 16 frames of a DNxHD 4:2:2 10bits test sequence: C: 60861 decicycles in idct, 1048205 runs, 371 skips sse2: 27567 decicycles in idct, 1048216 runs, 360 skips avx: 26272 decicycles in idct, 1048171 runs, 405 skips The add version is not implemented, so the corresponding dsp function is set to NULL to make it clear in a code executing it. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86: simple_idct10_template: fix overflow in passChristophe Gisquet2015-10-13
| | | | | | | | | | | | | | When the input of a pass has 15 or 16 bits of precision (in particular the column pass), the addition of a bias to W4 may lead to overflows in the input to pmaddwd. This requires postponing the adding of the bias to after the first butterfly. To do so, the fact that m15, unused although zeroed, is exploited. In case the pass is safe, an address can be directly used, and the number of xmm regs can be decreased. Otherwise, the 32bits bias is loaded into it. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86: prores: templatize 10 bits simple_idctChristophe Gisquet2015-10-13
| | | | | | | This should be reused for a generic simple_idct10 function. Requires a bit of trickery to declare common constants in C. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86/proresdsp: remove ff_prores_idct_put_10_sse4James Almer2015-03-16
| | | | | | | It's exactly the same as the sse2 version. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/proresdsp: remove unused macroJames Almer2015-03-16
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* Merge commit '55519926ef855c671d084ccc151056de9e3d3a77'Michael Niedermayer2014-03-14
|\ | | | | | | | | | | | | | | | | | | * commit '55519926ef855c671d084ccc151056de9e3d3a77': x86: Make function prototype comments in assembly code consistent Conflicts: libavcodec/x86/sbrdsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Make function prototype comments in assembly code consistentDiego Biurrun2014-03-13
| | | | | | | | This helps grepping for functions, among other things.
* | Merge commit '88bd7fdc821aaa0cbcf44cf075c62aaa42121e3f'Michael Niedermayer2013-01-23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '88bd7fdc821aaa0cbcf44cf075c62aaa42121e3f': Drop DCTELEM typedef Conflicts: libavcodec/alpha/dsputil_alpha.h libavcodec/alpha/motion_est_alpha.c libavcodec/arm/dsputil_init_armv6.c libavcodec/bfin/dsputil_bfin.h libavcodec/bfin/pixels_bfin.S libavcodec/cavs.c libavcodec/cavsdec.c libavcodec/dct-test.c libavcodec/dnxhdenc.c libavcodec/dsputil.c libavcodec/dsputil.h libavcodec/dsputil_template.c libavcodec/eamad.c libavcodec/h264_cavlc.c libavcodec/h264idct_template.c libavcodec/mpeg12.c libavcodec/mpegvideo.c libavcodec/mpegvideo.h libavcodec/mpegvideo_enc.c libavcodec/ppc/dsputil_altivec.c libavcodec/proresdsp.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * Drop DCTELEM typedefDiego Biurrun2013-01-22
| | | | | | | | | | | | It does not help as an abstraction and adds dsputil dependencies. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* | Merge commit '04581c8c77ce779e4e70684ac45302972766be0f'Michael Niedermayer2012-10-31
|\| | | | | | | | | | | | | | | | | | | * commit '04581c8c77ce779e4e70684ac45302972766be0f': x86: yasm: Use complete source path for macro helper %includes Conflicts: Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: yasm: Use complete source path for macro helper %includesDiego Biurrun2012-10-31
| | | | | | | | | | This is more consistent with the way we handle C #includes and it simplifies the build system.
* | Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'Michael Niedermayer2012-10-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: include x86inc.asm in x86util.asmDiego Biurrun2012-10-31
| | | | | | | | This is necessary to allow refactoring some x86util macros with cpuflags.
* | Add some missing _EXTERNAL suffixes to yasm source files.Carl Eugen Hoyos2012-08-31
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-03
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: vc1dec: Remove separate scaling function for interlaced field MVs vc1dec: Invoke edge_emulation regardless of MV precision x86: Use consistent 3dnowext function and macro name suffixes g723_1: scale output as supposed for the case with postfilter disabled g723_1: increase excitation storage by 4 g723_1: fix upper bound parameter from inverse maximum autocorrelation g723_1: make scale_vector() behave like the reference g723_1: fix off-by-one error in normalize_bits() g723_1: save/restore excitation with offset to store LPC history wmapro: prevent division by zero when sample rate is unspecified x86: proresdsp: improve SIGNEXTEND macro comments x86: h264dsp: K&R formatting cosmetics LICENSE: Document all GPL files Conflicts: libavcodec/g723_1.c libavcodec/wmaprodec.c libavcodec/x86/h264dsp_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: proresdsp: improve SIGNEXTEND macro commentsDiego Biurrun2012-08-02
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-07-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: proresdsp: port x86 assembly to cpuflags. lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro lavfi: better channel layout negotiation alac: check for truncated packets alac: reverse lpc coeff order, simplify filter lavr: add x86-optimized mixing functions x86: add support for fmaddps fma4 instruction with abstraction to avx/sse tscc2: fix typo in array index build: use COMPILE template for HOSTOBJS build: do full flag handling for all compiler-type tools eval: fix printing of NaN in eval fate test. build: Rename aandct component to more descriptive aandcttables mpegaudio: bury inline asm under HAVE_INLINE_ASM. x86inc: automatically insert vzeroupper for YMM functions. rtmp: Check the buffer length of ping packets rtmp: Allow having more unknown data at the end of a chunk size packet without failing rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets Conflicts: Makefile configure libavcodec/x86/proresdsp.asm libavutil/eval.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * proresdsp: port x86 assembly to cpuflags.Ronald S. Bultje2012-07-27
| |
* | Fix compilation without HAVE_AVX.Reimar Döffinger2012-02-12
| | | | | | | | | | | | %ifdef HAVE_AVX must now be %if HAVE_AVX. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-01-28
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (71 commits) movenc: Allow writing to a non-seekable output if using empty moov movenc: Support adding isml (smooth streaming live) metadata libavcodec: Don't crash in avcodec_encode_audio if time_base isn't set sunrast: Document the different Sun Raster file format types. sunrast: Add a check for experimental type. libspeexenc: use AVSampleFormat instead of deprecated/removed SampleFormat lavf: remove disabled FF_API_SET_PTS_INFO cruft lavf: remove disabled FF_API_OLD_INTERRUPT_CB cruft lavf: remove disabled FF_API_REORDER_PRIVATE cruft lavf: remove disabled FF_API_SEEK_PUBLIC cruft lavf: remove disabled FF_API_STREAM_COPY cruft lavf: remove disabled FF_API_PRELOAD cruft lavf: remove disabled FF_API_NEW_STREAM cruft lavf: remove disabled FF_API_RTSP_URL_OPTIONS cruft lavf: remove disabled FF_API_MUXRATE cruft lavf: remove disabled FF_API_FILESIZE cruft lavf: remove disabled FF_API_TIMESTAMP cruft lavf: remove disabled FF_API_LOOP_OUTPUT cruft lavf: remove disabled FF_API_LOOP_INPUT cruft lavf: remove disabled FF_API_AVSTREAM_QUALITY cruft ... Conflicts: doc/APIchanges libavcodec/8bps.c libavcodec/avcodec.h libavcodec/libx264.c libavcodec/mjpegbdec.c libavcodec/options.c libavcodec/sunrast.c libavcodec/utils.c libavcodec/version.h libavcodec/x86/h264_deblock.asm libavdevice/libdc1394.c libavdevice/v4l2.c libavformat/avformat.h libavformat/avio.c libavformat/avio.h libavformat/aviobuf.c libavformat/dv.c libavformat/mov.c libavformat/utils.c libavformat/version.h libavformat/wtv.c libavutil/Makefile libavutil/file.c libswscale/x86/input.asm libswscale/x86/swscale_mmx.c libswscale/x86/swscale_template.c tests/ref/lavf/ffm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * config.asm: change %ifdef directives to %if directives.Ronald S. Bultje2012-01-27
| | | | | | | | This allows combining multiple conditionals in a single statement.
* | Fix compilation with yasm-0.6.2Carl Eugen Hoyos2012-01-12
| |
* | proresdsp: fix roundingMichael Niedermayer2011-10-12
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | proresdsp: Correct credits to point to the Author and not just the code this ↵Michael Niedermayer2011-10-12
| | | | | | | | | | | | | | | | is based on. Also change Libav to FFmpeg Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | proresdsp: Optimize series of padds outMichael Niedermayer2011-10-12
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | proresdsp.asm: Remove useless instructions.Michael Niedermayer2011-10-12
| |
* | proresdsp.asm: drop useless shiftsElvis Presley2011-10-12
|/ | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* prores: idct sse2/sse4 optimizations.Ronald S. Bultje2011-10-11
~3.0-3.5x as fast as original C version, 1.6x as fast overall.