Todo -- For other people - Multithread vp8 or vc1. - Multithread an intra codec like mjpeg. - Fix mpeg1 (see below). - Try the first three items under Optimization. - Fix h264 (see below). - Try mpeg4 (see below). -- Bug fixes General critical: - Error resilience has to run before ff_report_frame_progress() is called. Otherwise there will be race conditions. (This might already work.) In general testing error paths should be done more. h264: - Files split at the wrong NAL unit don't (and can't) be decoded with threads (e.g. TS split so PPS is after the frame, PAFF with two fields in a packet). Scan the packet at the start of decode and don't finish setup until all PPS/SPS have been encountered. mpeg4: - Packed B-frames need to be explicitly split up when frame threading is on. It's not very fast without this. - The buffer age optimization is disabled due to the way buffers are allocated across threads. The branch 'fix_buffer_age' has an attempt to fix it which breaks ffplay. - Support interlaced. mpeg1/2: - Seeking always prints "first frame not a keyframe" with threads on. Currently disabled for this reason. -- Prove correct - decode_update_progress() in h264.c h264_race_checking branch has some work on h264, but not that function. It might be worth putting the branch under #ifdef DEBUG in mainline, but the code would have to be cleaner. - MPV_lowest_referenced_row() and co in mpegvideo.c - Same in vp3. -- Optimization - EMU_EDGE is always set for h264 PAFF+MT because draw_edges() writes into the other field's thread's pixels. - Check update_thread_context() functions and make sure they only copy what they need to. - Try some more optimization of the "ref < 48; ref++" loop in h264.c await_references(), try turning the list0/list1 check above into a loop without being slower. - Support frame+slice threading at the same time by assigning slice_count threads for frame threads to use with execute(). This is simpler but unbalanced if only one frame thread uses any. -- Features - Support streams with width/height changing. This requires flushing all current frames (and buffering the input in the meantime), closing the codec and reopening it. Or don't support it. - Support encoding. Might need more threading primitives for good ratecontrol; would be nice for audio and libavfilter too. - Async decoding part 1: instead of trying to start every thread at the beginning, return a picture if the earliest thread is already done, but don't wait for it. Not sure what effect this would have. - Part 2: have an API that doesn't wait for the decoding thread, only returns EAGAIN if it's not ready. What will it do with the next input packet if it returns that? - Have an API that returns finished pictures but doesn't require sending new ones. Maybe allow NULL avpkt when not at the end of the stream. -- Samples http://astrange.ithinksw.net/ffmpeg/mt-samples/ See yuvcmp.c in this directory to compare decoded samples. For debugging, try commenting out ff_thread_finish_setup calls so that only one thread runs at once, and then binary search+ scatter printfs to look for differences in codec contexts.