summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGuo, Yejun <yejun.guo@intel.com>2019-10-31 16:33:02 +0800
committerPedro Arthur <bygrandao@gmail.com>2019-11-07 15:46:00 -0300
commit4d980a8ceba9d0b7cc9f34a5f5cd769a4d7c2773 (patch)
treee315cb921282dafcc79dd458bbff6be65ff81e8e /doc
parentfc7b6d55741a926896958939a97b2958df1c1bb4 (diff)
avfilter/vf_dnn_processing: add a generic filter for image proccessing with dnn networks
This filter accepts all the dnn networks which do image processing. Currently, frame with formats rgb24 and bgr24 are supported. Other formats such as gray and YUV will be supported next. The dnn network can accept data in float32 or uint8 format. And the dnn network can change frame size. The following is a python script to halve the value of the first channel of the pixel. It demos how to setup and execute dnn model with python+tensorflow. It also generates .pb file which will be used by ffmpeg. import tensorflow as tf import numpy as np import imageio in_img = imageio.imread('in.bmp') in_img = in_img.astype(np.float32)/255.0 in_data = in_img[np.newaxis, :] filter_data = np.array([0.5, 0, 0, 0, 1., 0, 0, 0, 1.]).reshape(1,1,3,3).astype(np.float32) filter = tf.Variable(filter_data) x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in') y = tf.nn.conv2d(x, filter, strides=[1, 1, 1, 1], padding='VALID', name='dnn_out') sess=tf.Session() sess.run(tf.global_variables_initializer()) output = sess.run(y, feed_dict={x: in_data}) graph_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ['dnn_out']) tf.train.write_graph(graph_def, '.', 'halve_first_channel.pb', as_text=False) output = output * 255.0 output = output.astype(np.uint8) imageio.imsave("out.bmp", np.squeeze(output)) To do the same thing with ffmpeg: - generate halve_first_channel.pb with the above script - generate halve_first_channel.model with tools/python/convert.py - try with following commands ./ffmpeg -i input.jpg -vf dnn_processing=model=halve_first_channel.model:input=dnn_in:output=dnn_out:fmt=rgb24:dnn_backend=native -y out.native.png ./ffmpeg -i input.jpg -vf dnn_processing=model=halve_first_channel.pb:input=dnn_in:output=dnn_out:fmt=rgb24:dnn_backend=tensorflow -y out.tf.png Signed-off-by: Guo, Yejun <yejun.guo@intel.com> Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
Diffstat (limited to 'doc')
-rw-r--r--doc/filters.texi44
1 files changed, 44 insertions, 0 deletions
diff --git a/doc/filters.texi b/doc/filters.texi
index 6d893d8b87..6800124574 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -8928,6 +8928,50 @@ ffmpeg -i INPUT -f lavfi -i nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2
@end example
@end itemize
+@section dnn_processing
+
+Do image processing with deep neural networks. Currently only AVFrame with RGB24
+and BGR24 are supported, more formats will be added later.
+
+The filter accepts the following options:
+
+@table @option
+@item dnn_backend
+Specify which DNN backend to use for model loading and execution. This option accepts
+the following values:
+
+@table @samp
+@item native
+Native implementation of DNN loading and execution.
+
+@item tensorflow
+TensorFlow backend. To enable this backend you
+need to install the TensorFlow for C library (see
+@url{https://www.tensorflow.org/install/install_c}) and configure FFmpeg with
+@code{--enable-libtensorflow}
+@end table
+
+Default value is @samp{native}.
+
+@item model
+Set path to model file specifying network architecture and its parameters.
+Note that different backends use different file formats. TensorFlow and native
+backend can load files for only its format.
+
+Native model file (.model) can be generated from TensorFlow model file (.pb) by using tools/python/convert.py
+
+@item input
+Set the input name of the dnn network.
+
+@item output
+Set the output name of the dnn network.
+
+@item fmt
+Set the pixel format for the Frame. Allowed values are @code{AV_PIX_FMT_RGB24}, and @code{AV_PIX_FMT_BGR24}.
+Default value is @code{AV_PIX_FMT_RGB24}.
+
+@end table
+
@section drawbox
Draw a colored box on the input image.