libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	lavc: use av_cpu_max_align() instead of hardcoding alignment requirements	Anton Khirnov	2017-02-11
\|
*	arm: vp9lpf: Use orrs instead of orr+cmp	Martin Storsjö	2017-02-11
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm/aarch64: vp9lpf: Calculate !hev directly	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we first calculated hev, and then negated it. Since we were able to schedule the negation in the middle of another calculation, we don't see any gain in all cases. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7 vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0 After: vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. Before: Cortex A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 180.2 vp9_inv_dct_dct_32x32_sub1_add_neon: 475.3 Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0 459.2 862.2 553.9 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 226.5 145.0 225.1 171.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 721.2 415.7 727.6 475.0 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter	Martin Storsjö	2017-02-11
\| \| \| \| \| \|	No measured speedup on a Cortex A53, but other cores might benefit. Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \|	Before: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 After: vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9mc: Simplify the extmla macro parameters	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \| \| \| \|	Fold the field lengths into the macro. This makes the macro invocations much more readable, when the lines are shorter. This also makes it easier to use only half the registers within the macro. Signed-off-by: Martin Storsjö <martin@martin.st>
*	utvideodec: Add a missing include	Martin Storsjö	2017-02-10
\| \| \| \| \| \|	This was missing from 77c23704c76, fixing building. Signed-off-by: Martin Storsjö <martin@martin.st>
*	nvenc: make gpu indices independent of supported capabilities	Timo Rothenpieler	2017-02-09
\| \| \| \| \| \|	Do not allocate a CUDA context for every available gpu. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
*	avcodec: Mark some codecs with threadsafe init as such	Derek Buitenhuis	2017-02-09
\| \| \| \| \|	Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
*	aarch64: vp9itxfm: Fix incorrect vertical alignment	Martin Storsjö	2017-02-09
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Update a comment to refer to a register with a different name	Martin Storsjö	2017-02-09
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability	Martin Storsjö	2017-02-09
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \|	The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 ↵	Martin Storsjö	2017-02-09
\| \| \| \| \| \|	function Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function	Martin Storsjö	2017-02-09
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 ↵	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \|	function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \|	This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Make the larger core transforms standalone functions	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from 19496 to 14740 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 8095.7 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 1390.1 vp9_inv_dct_dct_32x32_sub4_add_neon: 5199.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8125.8 Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Make the larger core transforms standalone functions	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from 15324 to 12388 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3 vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1 vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9 Signed-off-by: Martin Storsjö <martin@martin.st>
*	omx: Use the EOS flag to handle flushing at the end	Martin Storsjö	2017-02-08
\| \| \| \| \| \| \| \|	This avoids having to count the number of frames sent to the codec and the number of output packets received; instead just wait until the encoder returns a buffer with the EOS flag set. Signed-off-by: Martin Storsjö <martin@martin.st>
*	Use bitstream_init8() where appropriate	Diego Biurrun	2017-02-07
\|
*	wma: Convert to the new bitstream reader	Alexandra Hájková	2017-02-06
\|
*	aarch64: vp9itxfm: Restructure the idct32 store macros	Martin Storsjö	2017-02-05
\| \| \| \| \| \| \| \| \|	This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. This is also arguably more readable. Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Avoid .irp when it doesn't save any lines	Martin Storsjö	2017-02-05
\| \| \| \| \| \|	This makes it more readable. Signed-off-by: Martin Storsjö <martin@martin.st>
*	asm: Consistently uppercase SECTION markers	Diego Biurrun	2017-02-03
\|
*	svq3: Convert to the new bitstream reader	Alexandra Hájková	2017-02-02
\|
*	lavc: deprecate refcounted_frames field	wm4	2017-02-01
\| \| \| \| \| \| \| \| \|	No deprecation guards, because the old decode API (for which this field is needed) doesn't have any either. This field should be removed together with the old decode calls. Signed-off-by: Anton Khirnov <anton@khirnov.net>
*	Mark some arrays that never change as const.	Anton Khirnov	2017-02-01
\|
*	ffv1: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	h261dec: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	shorten: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	ralf: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	loco: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	fic: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	dirac: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	cavs: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	aic: Convert to the new bitstream reader	Alexandra Hájková	2017-01-31
\|
*	golomb: Convert to the new bitstream reader	Diego Biurrun	2017-01-31
\|
*	pgssubdec: reset rle_data_len/rle_remaining_len on allocation error	Andreas Cadhalpun	2017-01-31
\| \| \| \| \| \| \| \|	The code relies on their validity and otherwise can try to access a NULL object->rle pointer, causing segmentation faults. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	vaapi_encode: Add VP8 support	Mark Thompson	2017-01-30
\|
*	vaapi_encode: Pass framerate parameters to driver	Mark Thompson	2017-01-30
\| \| \| \| \| \| \|	Only do this when building for a recent VAAPI version - initial driver implementations were confused about the interpretation of the framerate field, but hopefully this will be consistent everywhere once 0.40.0 is released.
*	vaapi_h264: Enable VBR mode	Mark Thompson	2017-01-30
\| \| \| \| \| \| \| \|	Default to using VBR when a target bitrate is set, unless the max rate is also set and matches the target. Changes to the Intel driver mean that min_qp is also respected in this case, so set a codec default to unset the value rather than using the current default inherited from the MPEG-4 part 2 encoder.
*	vaapi_encode: Support VBR mode	Mark Thompson	2017-01-30
\| \| \| \| \| \|	This includes a backward-compatibility hack to choose CBR anyway on old drivers which have no CBR support, so that existing programs will continue to work their options now map to VBR.
*	vaapi_encode: Add MPEG-2 support	Mark Thompson	2017-01-29
\|
*	tak: Convert to the new bitstream reader	Alexandra Hájková	2017-01-25
\|
*	magicyuv: Convert to the new bitstream reader	Diego Biurrun	2017-01-25
\|