summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAge
...
* lavc/aarch64: new optimization for 8-bit hevc_epel_bi_hLogan Lyu2023-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | put_hevc_epel_bi_h4_8_c: 96.0 put_hevc_epel_bi_h4_8_neon: 36.3 put_hevc_epel_bi_h6_8_c: 288.3 put_hevc_epel_bi_h6_8_neon: 59.3 put_hevc_epel_bi_h8_8_c: 358.5 put_hevc_epel_bi_h8_8_neon: 61.5 put_hevc_epel_bi_h12_8_c: 759.8 put_hevc_epel_bi_h12_8_neon: 159.5 put_hevc_epel_bi_h16_8_c: 1307.0 put_hevc_epel_bi_h16_8_neon: 182.0 put_hevc_epel_bi_h24_8_c: 2778.3 put_hevc_epel_bi_h24_8_neon: 430.5 put_hevc_epel_bi_h32_8_c: 4952.3 put_hevc_epel_bi_h32_8_neon: 679.5 put_hevc_epel_bi_h48_8_c: 11803.3 put_hevc_epel_bi_h48_8_neon: 1443.5 put_hevc_epel_bi_h64_8_c: 20654.8 put_hevc_epel_bi_h64_8_neon: 2737.0 put_hevc_qpel_bi_h4_8_c: 140.0 put_hevc_qpel_bi_h4_8_neon: 111.5 put_hevc_qpel_bi_h6_8_c: 318.0 put_hevc_qpel_bi_h6_8_neon: 85.8 put_hevc_qpel_bi_h8_8_c: 536.5 put_hevc_qpel_bi_h8_8_neon: 95.3 put_hevc_qpel_bi_h12_8_c: 1188.5 put_hevc_qpel_bi_h12_8_neon: 291.3 put_hevc_qpel_bi_h16_8_c: 2064.3 put_hevc_qpel_bi_h16_8_neon: 365.3 put_hevc_qpel_bi_h24_8_c: 4757.5 put_hevc_qpel_bi_h24_8_neon: 1010.0 put_hevc_qpel_bi_h32_8_c: 8351.8 put_hevc_qpel_bi_h32_8_neon: 2917.8 put_hevc_qpel_bi_h48_8_c: 19299.8 put_hevc_qpel_bi_h48_8_neon: 2976.8 put_hevc_qpel_bi_h64_8_c: 34182.5 put_hevc_qpel_bi_h64_8_neon: 5236.3 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc/aarch64: new optimization for 8-bit hevc_pel_bi_pixelsLogan Lyu2023-12-01
| | | | | | | | | | | | | | | | | | | | | | | | put_hevc_pel_bi_pixels4_8_c: 54.7 put_hevc_pel_bi_pixels4_8_neon: 43.0 put_hevc_pel_bi_pixels6_8_c: 94.7 put_hevc_pel_bi_pixels6_8_neon: 37.0 put_hevc_pel_bi_pixels8_8_c: 171.0 put_hevc_pel_bi_pixels8_8_neon: 24.0 put_hevc_pel_bi_pixels12_8_c: 354.0 put_hevc_pel_bi_pixels12_8_neon: 68.7 put_hevc_pel_bi_pixels16_8_c: 588.2 put_hevc_pel_bi_pixels16_8_neon: 77.5 put_hevc_pel_bi_pixels24_8_c: 1670.7 put_hevc_pel_bi_pixels24_8_neon: 173.0 put_hevc_pel_bi_pixels32_8_c: 2267.7 put_hevc_pel_bi_pixels32_8_neon: 281.2 put_hevc_pel_bi_pixels48_8_c: 5787.5 put_hevc_pel_bi_pixels48_8_neon: 673.5 put_hevc_pel_bi_pixels64_8_c: 9897.0 put_hevc_pel_bi_pixels64_8_neon: 1159.5 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>
* checkasm/ac3dsp: add float_to_fixed24 testsunyuechi2023-12-01
| | | | Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
* avcodec/ac3dsp: add missing stddef.h includeJames Almer2023-12-01
| | | | | | Should fix make checkheaders Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/framesync: fix OOM casePaul B Mahol2023-11-30
| | | | | | Fixes OOM when caller keeps adding frames into filtergraph that reached EOF by other means, for example EOF is signalled by other filter in filtergraph or by buffersink.
* avfilter/arls_template: use defines for all constantsPaul B Mahol2023-11-28
|
* avfilter: add Affine Projection adaptive audio filterPaul B Mahol2023-11-28
|
* lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2dxufuji4562023-11-28
| | | | | | | Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b" Signed-off-by: xufuji456 <839789740@qq.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* avfilter/af_anlms: set output frame durationPaul B Mahol2023-11-28
|
* avfilter/af_arls: set output frame durationPaul B Mahol2023-11-28
|
* avfilter/af_amix: set output frame durationPaul B Mahol2023-11-28
|
* avfilter/af_amultiply: set output frame durationPaul B Mahol2023-11-28
|
* avfilter/af_amerge: use already provided outlinkPaul B Mahol2023-11-28
|
* avfilter: no need to request more samples if internal frame is availablePaul B Mahol2023-11-28
|
* tools/general_assembly: add newly voted-in extra GA membersAnton Khirnov2023-11-28
| | | | | | | Cf. * https://vote.ffmpeg.org/cgi-bin/civs/results.pl?id=E_d0b225b9aa8d45d5 * http://lists.ffmpeg.org/pipermail/ffmpeg-devel/2023-November/317496.html Message-Id <170115613784.8914.4950266152609138336@lain.khirnov.net>
* avfilter/af_arls: add double sample format supportPaul B Mahol2023-11-27
|
* avfilter/af_anlms: add double sample format supportPaul B Mahol2023-11-27
|
* checkasm: test for dcmul_addsunyuechi2023-11-27
| | | | Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
* avcodec/videotoolboxenc: refactor dump encoder nameZhao Zhili2023-11-27
| | | | Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* avcodec/videotoolboxenc: Fix build failure due to PropertyKey_EncoderIDZhao Zhili2023-11-27
| | | | Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* fftools/ffplay_renderer: declare function argument as constLeo Izen2023-11-27
| | | | | | | | Declaring the function argument as const fixes a warning down the line that the const parameter is stripped. We don't modify this argument. Signed-off-by: Leo Izen <leo.izen@gmail.com> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* avfilter/vf_colorcorrect: fix memory leaksPaul B Mahol2023-11-27
|
* avfilter/af_dialoguenhance: do output scaling oncePaul B Mahol2023-11-27
|
* avfilter/af_afwtdn: fix crash with EOF handlingPaul B Mahol2023-11-27
|
* avfilter/af_dialoguenhance: simplify channels copyPaul B Mahol2023-11-27
|
* doc/filters: restore entry for libvmaf option poolGyan Doshi2023-11-27
| | | | | | | | | 3d29724c00 removed the doc entry for the option pool while adding a parser function for it at the same time! The option remains available and undeprecated. Fixes trac #10693
* avformat: add QOA demuxerPaul B Mahol2023-11-26
|
* avcodec: add QOA decoderPaul B Mahol2023-11-26
|
* libavcodec/mlpdec: add missing correction to ch_layout when downmixingGeoffrey McRae2023-11-26
| | | | | | | | This fixes corrupted audio for applications relying on ch_layout when codec downmixing is active. Signed-off-by: Geoffrey McRae <geoff@hostfission.com> Signed-off-by: James Almer <jamrial@gmail.com>
* libavcodec/dcadec: adjust the `ch_layout` when downmix is activeGeoffrey McRae2023-11-26
| | | | | | | | | | | | | | Applications making use of this codec with the `downmix` option are segfaulting unless the `ch_layout` is overridden after `avcodec_open2` as can be seen in projects like MythTV[1] This patch fixes this by overriding the ch_layout as done in other decoders such as AC3. 1: https://github.com/MythTV/mythtv/blob/af6f362a140cd59b9ed784a8c639fd456b5f6967/mythtv/libs/libmythtv/decoders/avformatdecoder.cpp#L4607 Signed-off-by: Geoffrey McRae <geoff@hostfission.com> Signed-off-by: James Almer <jamrial@gmail.com>
* libavfilter/vf_dnn_detect: Add yolo supportWenbin Chen2023-11-26
| | | | | | | | | | | | Add yolo support. Yolo model doesn't output final result. It outputs candidate boxes, so we need post-process to remove overlap boxes to get final results. Also, the box's coordinators relate to cell and anchors, so we need these information to calculate boxes as well. Model detail please refer to: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v2-tf Signed-off-by: Wenbin Chen <wenbin.chen@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>
* libavfilter/vf_dnn_detect: Add model_type option.Wenbin Chen2023-11-26
| | | | | | | | | | There are many kinds of detection DNN model and they have different preprocess and postprocess methods. To support more models, "model_type" option is added to help to choose preprocess and postprocess function. Signed-off-by: Wenbin Chen <wenbin.chen@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>
* tools/general_assembly: restore printing HEADAnton Khirnov2023-11-26
|
* tools/general_assembly: implement extra GA membersAnton Khirnov2023-11-26
|
* avfilter/vsrc_gradients: allow zero speedPaul B Mahol2023-11-26
|
* avfilter/vsrc_gradients: add square typePaul B Mahol2023-11-26
|
* mips/ac3dsp_mips: add missing stddef.h header includeJames Almer2023-11-25
| | | | | | | Fixes compilation failures after 567c67c6c8cb9be083f56198bfa979e4bda84c99. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/ac3dsp: add ff_float_to_fixed24_avx()James Almer2023-11-25
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/ac3dsp: reduce instruction count inside the float_to_fixed24 loopJames Almer2023-11-25
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/af_dialoguenhance: fix overreadsPaul B Mahol2023-11-25
|
* avfilter/af_channelmap: do not override set channel layoutPaul B Mahol2023-11-25
|
* Revert "avformat/rtmpproto: Pass rw_timeout to underlying transport protocol"Zhao Zhili2023-11-25
| | | | | | | | | This reverts commit bec6dfcd5c0b59dd6d947ec3074986aeffd525aa. The patch is NOP since ffurl_open_whitelist copy options from parent automatically. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* checkasm/riscv: report an error upon SIGILLRémi Denis-Courmont2023-11-23
| | | | | | | | Terminating the whole checkasm process is not very helpful. This will report if an illegal instruction occurs while executing a tested function. This is a common occurrence whilst developping RISC-V assembler, due to the compatibility between vector configuration and instruction done at run-time.
* checkasm: add helper to report a fatal signalRémi Denis-Courmont2023-11-23
|
* lavc/llvidencdsp: add R-V V diff_bytesRémi Denis-Courmont2023-11-23
| | | | | diff_bytes_c: 163.0 diff_bytes_rvv_i32: 52.7
* lavc/aacpsdsp: use LMUL=2 and amortise stridesRémi Denis-Courmont2023-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The input is laid out in 16 segments, of which 13 actually need to be loaded. There are no really efficient ways to deal with this: 1) If we load 8 segments wit unit stride, then narrow to 16 segments with right shifts, we can only get one half-size vector per segment, or just 2 elements per vector (EMUL=1/2) - at least with 128-bit vectors. This ends up unsurprisingly about as fas as the C code. 2) The current approach is to load with strides. We keep that approach, but improve it using three 4-segmented loads instead of 12 single-segment loads. This divides the number of distinct loaded addresses by 4. 3) A potential third approach would be to avoid segmentation altogether and splat the scalar coefficient into vectors. Then we can use a unit-stride and maximum EMUL. But the downside then is that we have to multiply the 3 (of 16) unused segments with zero as part of the multiply-accumulate operations. In addition, we also reuse vectors mid-loop so as to increase the EMUL from 1 to 2, which also improves performance a little bit. Oeverall the gains are quite small with the device under test, as it does not deal with segmented loads very well. But at least the code is tidier, and should enjoy bigger speed-ups on better hardware implementation. Before: ps_hybrid_analysis_c: 1819.2 ps_hybrid_analysis_rvv_f32: 1037.0 (before) ps_hybrid_analysis_rvv_f32: 990.0 (after)
* lavc/g722dsp: optimise R-V V apply_qmfRémi Denis-Courmont2023-11-23
| | | | | | | | | | | | | | This stores the constant coefficients deinterleaved, so that they can be loaded directly with NF=0. Unfortunately, we cannot optimise loading the input, due to insufficient memory alignment (not 32-bit). Before: g722_apply_qmf_c: 82.5 g722_apply_qmf_rvv_i32: 78.2 After: g722_apply_qmf_c: 82.5 g722_apply_qmf_rvv_i32: 65.2
* lavu/fixed_dsp: R-V V fmul_window_scaledRémi Denis-Courmont2023-11-23
| | | | | vector_fmul_window_scaled_fixed_c: 4393.7 vector_fmul_window_scaled_fixed_rvv_i64: 1642.7
* lavu/float_dsp: optimise R-V V fmul_reverse & fmul_windowRémi Denis-Courmont2023-11-23
| | | | | | | | | | | | | | | | Roll the loop to avoid slow gathers. Before: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 2410.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1879.5 After: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 916.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1202.5
* lavu/fixed_dsp: optimise R-V V fmul_reverseRémi Denis-Courmont2023-11-23
| | | | | | | | | | | | | | | | | Gathers are (unsurprisingly) a notable exception to the rule that R-V V gets faster with larger group multipliers. So roll the function to speed it up. Before: vector_fmul_reverse_fixed_c: 2840.7 vector_fmul_reverse_fixed_rvv_i32: 2430.2 After: vector_fmul_reverse_fixed_c: 2841.0 vector_fmul_reverse_fixed_rvv_i32: 962.2 It might be possible to further optimise the function by moving the reverse-subtract out of the loop and adding ad-hoc tail handling.