| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
They do the same.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
| |
This should speed it up significantly on systems where it matters.
|
|
|
|
| |
load_functions() did not load the device-level functions.
|
|
|
|
|
|
|
|
|
|
|
|
| |
While Vulkan itself went more or less the way it was expected to go,
libvulkan didn't quite solve all of the opengl loader issues. It's multi-vendor,
yes, but unfortunately, the code is Google/Khronos QUALITY, so suffers from
big static linking issues (static linking on anything but OSX is unsupported),
has bugs, and due to the prefix system used, there are 3 or so ways to type out
functions.
Just solve all of those problems by dlopening it. We even have nice emulation
for it on Windows.
|
|
|
|
| |
This patch allows for alternative loader implementations.
|
|
|
|
|
|
|
|
|
|
| |
VkPhysicalDeviceLimits.optimalBufferCopyRowPitchAlignment and
VkPhysicalDeviceExternalMemoryHostPropertiesEXT.minImportedHostPointerAlignment are of type
VkDeviceSize (a typedef uint64_t).
VkPhysicalDeviceLimits.minMemoryMapAlignment is of type size_t.
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Lynne <dev@lynne.ee>
|
|
|
|
|
|
|
| |
Announced in 14040a1d913794d9a3fd6406a6d8c2f0e37e0062.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes http://trac.ffmpeg.org/ticket/9055
The hw decoder may allocate a large frame from AVHWFramesContext, and adjust width and height based on bitstream.
We need to use resolution from src frame instead of AVHWFramesContext.
test command:
ffmpeg -loglevel debug -hide_banner -hwaccel vaapi -init_hw_device vaapi=va:/dev/dri/renderD128 -hwaccel_device va -hwaccel_output_format vaapi -init_hw_device vulkan=vulk -filter_hw_device vulk -i 1920x1080.264 -c:v libx264 -r:v 30 -profile:v high -preset veryfast -vf "hwmap,chromaber_vulkan=0:0,hwdownload,format=nv12" -map 0 -y vaapiouts.mkv
expected:
No green bar at bottom.
|
|
|
|
|
| |
Same as when downloading. Not sure why this isn't done, probably
because the CUDA code predates the sync mechanism we settled on.
|
|
|
|
| |
Due to some endian-dependent overlap, these should be used last.
|
|
|
|
|
|
| |
These two extensions and two features are both optionally used by
libplacebo to speed up rendering, so it makes sense for libavutil to
automatically enable them as well.
|
|
|
|
|
| |
We support every single packed format possible now.
There are some fringe leftover mappings which are uninteresting.
|
|
|
|
|
|
|
|
| |
Vulkan formats with a PACK suffix define native endianess.
Vulkan formats without a PACK suffix are in bytestream order.
Pixel formats with a LE/BE suffix define endianess.
Pixel formats without LE/BE suffix are in bytestream order.
|
|
|
|
| |
Needed to support YUVA.
|
|
|
|
|
| |
frames_uninit is always called on failure, and the free_exec_ctx function
did not zero the pool when freeing it, so it resulted in a double free.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This relies on the fact that host memory is always going to be required
to be aligned to the platform's page size, which means we can adjust
the pointers when we map them to buffers and therefore skip an entire
copy. This has already had extensive testing in libplacebo without
problems, so its safe to use here as well.
Speeds up downloads and uploads on platforms which do not pool their
memory hugely, but less so on platforms that do.
We can pool the buffers ourselves, but that can come as a later patch
if necessary.
|
|
|
|
|
| |
It makes allocation a bit more robust in case some weird device with
weird drivers which segments memory in weird ways appears.
|
|
|
|
| |
Its a 64-bit bitfield being put directly into an int.
|
|
|
|
| |
They were identical, save for variable names and order.
|
|
|
|
| |
Its a validation layer thing.
|
| |
|
|
|
|
|
|
|
| |
The process space is guaranteed to be aligned to the page size, hence we're
never going to map outside of our address space.
There are more optimizations to do with respect to chroma plane alignment and
buffer offsets, but that can be done later.
|
|
|
|
|
|
| |
We want to copy the lowest amount of bytes per line, but while the buffer
stride is sanitized, the src/dst stride can be negative, and negative numbers
of bytes do not make a lot of sense.
|
|
|
|
|
| |
Some vendors (AMD) require dedicated allocation to be used for all imported
images.
|
|
|
|
|
|
| |
Otherwise, the frames context is considered to be ready to handle
mapping, and it doesn't get initialized the normal way through
.frames_init.
|
| |
|
|
|
|
| |
Speeds up both use cases by 30%.
|
|
|
|
| |
Otherwise custom vulkan device contexts won't work.
|
|
|
|
| |
This allows us to speed up only-uploading or only-downloading use cases.
|
|
|
|
|
| |
They're nothing special, and there's no reason they should always use the
default flags.
|
|
|
|
|
| |
Some users may need special formats that aren't available when the STORAGE
flag bit is set, which would result in allocations failing.
|
|
|
|
|
|
|
|
| |
This was never actually used, likely due to confusion, as the device context
also had one used for uploads and downloads.
Also, since we're only using it for very quick image barriers (which are
practically free on all hardware), use the compute queue instead of the
transfer queue.
|
|
|
|
|
| |
If an external pool was provided we skipped all of frames init,
including the exec context.
|
|
|
|
|
|
| |
This commit makes full use of the enabled queues to provide asynchronous
uploads of images (downloads remain synchronous).
For a pure uploading use cases, the performance gains can be significant.
|
|
|
|
| |
Makes it easier to support multiple queues
|
|
|
|
|
|
|
| |
With this, the puzzle of making libplacebo, ffmpeg and any other Vulkan
API users interoperable is complete.
Users of both libraries can initialize one another's contexts without having
to create a new one.
|
|
|
|
|
| |
This, along with the next patch, are the last missing pieces to being
interoperable with libplacebo.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows for users who derive devices to set options for the
new device context they derive.
The main use case of this is to allow users to enable extensions
(such as surface drawing extensions) in Vulkan while deriving from
the device their frames are on. That way, users don't need to write
any initialization code themselves, since the Vulkan spec invalidates
mixing instances, physical devices and active devices.
Apart from Vulkan, other hwcontexts ignore the opts argument since they
don't support options at all (or in VAAPI and OpenCL's case, options are
currently only used for device selection, which device_derive overrides).
|
| |
|
|
|
|
| |
Both API and CLI users can enable any extension they'd like using the options.
|
|
|
|
|
|
| |
Only warn instead. API users can find out which extensions were unavailable
by using the enabled_inst_extensions and enabled_dev_extensions fields.
This eliminates having to trial-and-error to find which extensions were missing.
|
|
|
|
|
|
|
|
|
|
| |
Due to our AVHWDevice infrastructure, where API users are offered a way
to derive contexts rather than always create new one, our filterchains,
being supported by a single hardware device context, can grow to considerable
size.
Hence, in such situations, using the maximum amount of queues the device offers
can be benefitial to eliminating bottlenecks where queue submissions on the
same family have to wait for the previous one to finish.
|
| |
|
|
|
|
|
|
|
|
|
| |
This reverts commit 97b526c192add6f252b327245fd9223546867352.
It broke the API, and assumed no other APIs used multiple semaphores.
This also disallowed certain optimizations to happen.
Dealing with APIs that give or expect single semaphores is easier when
we use per-image semaphores.
|
|
|
|
|
|
|
| |
The specs note that images should be in the GENERAL layout when exporting
for maximum compatibility.
CUDA exported images are handled differently, and the queue is the same,
so we don't need to do that there.
|
|
|
|
|
|
|
|
|
| |
As it turns out, we were already assuming and treating all images as if they had
concurrent access mode. This just changes the flag to CONCURRENT, which has less
restrictions than EXCLUSIVE, and fixed validation messages on machines with
multiple queues.
The validation layer didn't pick this up because the machine I was testing on
had only a single queue.
|
|
|
|
| |
Calling vkGetImageSubresourceLayout is only legal for linear and drm images.
|
| |
|
|
|
|
|
| |
This is a leftover from an old version which used the 1.0 Vulkan API
with the maintenance extensions being required.
|