vidformer - Video Data Transformation
A research project providing infrastructure for video-native interfaces. Developed by the OSU Interactive Data Systems Lab.
π― Why vidformer
Vidformer efficiently transforms video data, enabling faster annotation, editing, and processing of video dataβwithout having to focus on performance.
It uses a declarative specification format to represent transformations. This enables:
-
Transparent Optimization: Vidformer optimizes the execution of declarative specifications just like a relational database optimizes relational queries.
-
Lazy/Deferred Execution: Video results can be retrieved on-demand, allowing for practically instantaneous playback of video results.
-
Familiar Technologies: Vidformer builds on open technologies you may already use:
- OpenCV: A
cv2
-compatible interface ensures both you (and LLMs) can use existing knowlege and code. - Supervision: Supervision-compatible annotators make visualizing computer vision models trivial.
- Jupyter: View transformed videos instantly right in your notebook.
- FFmpeg: Built on the same libraries, codecs, and formats that run the world.
- HTTP Live Streaming (HLS): Serve transformed videos over a network directly into any media player.
- Apache OpenDAL: Access source videos no matter where they are stored.
- OpenCV: A
π Quick Start
The easiest way to get started is using vidformer's cv2
frontend, which allows most Python OpenCV visualization scripts to replace import cv2
with import vidformer.cv2 as cv2
:
import vidformer.cv2 as cv2
cap = cv2.VideoCapture("my_input.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter("my_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"),
fps, (height, width))
while True:
ret, frame = cap.read()
if not ret:
break
cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
1, (255, 0, 0), 1)
out.write(frame)
cap.release()
out.release()
You can find details on this in our Getting Started Guide.
π Documentation
About the project
Vidformer is a highly modular suite of tools that work together; these are detailed here.
β vidformer is NOT:
- A conventional video editor (like Premiere Pro or Final Cut)
- A video database/VDBMS
- A natural language query interface for video
- A computer vision library (like OpenCV)
- A computer vision AI model (like CLIP or Yolo)
However, vidformer is highly complementary to each of these. If you're working on any of the later four, vidformer may be for you.
File Layout:
- ./vidformer: The core transformation library
- ./vidformer-py: A Python video editing client
- ./vidformer-cli: A command-line interface + the yrden server
- ./vidformer-igni: The second generation vidformer server
- ./snake-pit: The main vidformer test suite
- ./viper-den: Igni server test suite
- ./docs: The vidformer website
License: Vidformer is open source under Apache-2.0. Contributions welcome.
Acknowledgements: Vidformer is supported by the U.S. National Science Foundation under Awards #2118240 and #1910356.
Getting Started
Vidformer can be run in a cloud deployment (with the Igni server) or as a local process (with the Yrden server).
In the cloud (start here):
Walk through a demo using a hosted guest account:
You can host an Igni server yourself (here).
Local:
Installing locally allows for accessing the local file system and saving video results.
Local Install
Using vidformer requires the Python client library, vidformer-py, and a yrden server which is distributed through vidformer-cli
.
vidformer-py
pip install vidformer
vidformer-cli
π³ Docker:
docker pull dominikwinecki/vidformer:latest
docker run --rm -it -p 8000:8000 dominikwinecki/vidformer:latest yrden --print-url
This launches a vidformer yrden server, which is our reference server implementation for local usage, on port 8000.
If you want to read or save video files locally add -v /my/local/dir:/data
and then reference them as /data
in the code.
To use:
import vidformer as vf
server = vf.YrdenServer(domain="localhost", port=8000)
# or for cv2
import vidformer.cv2 as cv2
cv2.set_server(server)
Precompiled binary:
Precompiled binaries are available for vidformer releases.
For example:
wget https://github.com/ixlab/vidformer/releases/download/<version>/vidformer-cli-ubuntu22.04-amd64
sudo mv vidformer-cli-ubuntu22.04-amd64 /usr/local/bin/vidformer-cli
sudo chmod +x /usr/local/bin/vidformer-cli
sudo apt install -y libopencv-dev libfdk-aac-dev
To use:
import vidformer as vf
server = vf.YrdenServer(bin="vidformer-cli")
or
export VIDFORMER_BIN='vidformer-cli'
import vidformer as vf
server = vf.YrdenServer()
Build from Sources
vidformer-cli
can be compiled from our git repo with a standard cargo build
.
This depends on the core vidformer
library which itself requires linking to FFmpeg and OpenCV.
Details are available here.
Getting Started - cv2
This is a walkthrough of getting started with vidformer OpenCV cv2
compatability layer.
β οΈ Adding
cv2
functions is a work in progress. See the cv2 filters page for which functions have been implemented.
Installation
β οΈ Due to how Colab networking works, vidformer can't stream/play results in Colab, only save them to disk.
cv2.vidplay()
will not work!
Hello, world!
Copy in your video, or use ours:
curl -O https://f.dominik.win/data/dve2/tos_720p.mp4
Then just replace import cv2
with import vidformer.cv2 as cv2
.
Here's our example script:
import vidformer.cv2 as cv2
cap = cv2.VideoCapture("tos_720p.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"),
fps, (height, width))
while True:
ret, frame = cap.read()
if not ret:
break
cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
1, (255, 0, 0), 1)
out.write(frame)
cap.release()
out.release()
Stream the Results
Saving videos to disk works, but we can also display them in the notebook. Since we stream the results and only render them on demand this can start practically instantly!
First, replace "output.mp4"
with None
to skip writing the video to disk.
Then you can use cv2.vidplay()
to play the video!
import vidformer.cv2 as cv2
cap = cv2.VideoCapture("tos_720p.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter(None, cv2.VideoWriter_fourcc(*"mp4v"),
fps, (height, width))
while True:
ret, frame = cap.read()
if not ret:
break
cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
1, (255, 0, 0), 1)
out.write(frame)
cap.release()
out.release()
cv2.vidplay(out)
β οΈ By default
cv2.vidplay()
will return a video which plays in a Jupyter Notebook. If running outside a jupyter notebook you can passmethod="link"
to return a link instead.
The vidformer modules
vidformer is a highly modular suite of tools that work together:
-
vidformer-py: A Python π client for declarative video synthesis
- Provides an easy-to-use library for symbolically representing transformed videos
- Acts as a client for a VoD server (i.e., for yrden)
- Using vidformer-py is the best place to get started
-
libvidformer: The core data-oriented declarative video editing library
- An embedded video processing execution engine with low-level interfaces
- Systems code, written in Rust π¦
- You should use if: You are building a VDBMS or other multimodal data-system infrastructure.
- You should not use if: You just want to use vidformer in your workflows or projects.
-
vidformer-igni: A vidformer server for the cloud
- A multi-tenant scale-out server
- Designed for Video on Demand only
- Does not support full-video exports
- All video sources must be over the network, not local
- Enables live streaming and waiting on external dependencies for even lower time-to-playback latency
-
yrden: A vidformer server for local use
- Designed for local single-tenant use
- Enables broad drop-in
cv2
compatability (i.e., local fs access,imread
/imwrite
, etc.) - Supports basic Video on Demand hosting
Client libraries in other languages: Writing a vidformer client library for other languages is simple. It's a few hundred lines of code, and you just have to construct some JSON. Contributions or suggestions for other languages are welcome.
vidformer - Video Data Transformation Library
(lib)vidformer is a core video synthesis/transformation library. It handles the movement, control flow, and processing of video and conventional (non-video) data.
Quick links:
About
- It's written in Rust π¦
- So it does some fancy parallel processing and does so safely
- Uses the FFmpeg libav libraries for multimedia stuff
- So it should work with nearly every video file ever made
- Uses Apache OpenDAL for I/O
- So it can access videos in a bunch of storage services
- Implements filters using OpenCV
Building
This crate requires linking with FFmpeg, as detailed in the rusty_ffmpeg
crate.
We currently target FFmpeg 7.0.
vidformer-py
vidformer-py is a Python π frontend for vidformer. It has an API compatability layer with OpenCV cv2, as well as some supervision annotators. Our getting started guide explains how to use it.
Quick links:
- π¦ PyPI
- π Documentation - vidformer-py
- π Documentation - vidformer.cv2
- π Documentation - vidformer.supervision
- π§βπ» Source Code
Publish:
export FLIT_USERNAME='__token__' FLIT_PASSWORD='<token>'
flit publish
vidformer-igni
The vidformer server for the cloud.
Quick links:
Development Setup
docker-compose -f docker-compose-db.yaml up
export 'IGNI_DB=postgres://igni:igni@localhost:5432/igni'
cargo run -- user add --name test --api-key test --permissions test
cargo run --release -- server --config igni.toml
Deployment
# From vidformer project root
docker build -t igni -f vidformer-igni/Dockerfile .
cd vidformer-igni
docker-compose -f docker-compose-prod.yaml up
Filters
Built-in Filters
While most applications will use user-defined filters, vidformer ships with a handful of built-in filters to get you started:
DrawText
DrawText
does exactly what it sounds like: draw text on a frame.
For example:
DrawText(frame, text="Hello, world!", x=100, y=100, size=48, color="white")
BoundingBox
BoundingBox
draws bounding boxes on a frame.
For example:
BoundingBox(frame, bounds=obj)
Where obj
is JSON with this schema:
[
{
"class": "person",
"confidence": 0.916827917098999,
"x1": 683.0721842447916,
"y1": 100.92174338626751,
"x2": 1006.863525390625,
"y2": 720
},
{
"class": "dog",
"confidence": 0.902531921863556,
"x1": 360.8750813802083,
"y1": 47.983140622720974,
"x2": 606.76171875,
"y2": 717.9591837897462
}
]
Scale
The Scale
filter transforms one frame type to another.
It changes both resolution and pixel format.
This is the most important filter and is essential for building with vidformer.
Arguments:
Scale(
frame: Frame,
width: int = None,
height: int = None,
pix_fmt: str = None)
By default missing width
, height
and format
values are set to match frame
.
pix_fmt
must match ffmpeg's name for a pixel format.
For example:
frame = Scale(frame, width=1280, height=720, pix_fmt="rgb24")
IPC
IPC allows for calling User-Defined Filters (UDFs) running on the same system.
It is an infrastructure-level filter and is used to implement other filters.
It is configured with a socket
and func
, the filter's name, both strings.
The IPC
filter can not be directly invoked, rather IPC filters are constructed by a server upon request.
This can be difficult, but vidformer-py handles this for you.
As of right now IPC
only supports rgb24
frames.
HStack & VStack
HStack & VStack allow for composing multiple frames together, stacking them either horizontally or vertically. It tries to automatically find a reasonable layout.
Arguments:
HStack(
*frames: list[Frame],
width: int,
height: int,
format: str)
At least one frame is required, along with a width
, height
and format
.
For example:
compilation = HStack(left_frame, right_frame, width=1280, height=720, format="rgb24")
OpenCV/cv2 Functions
β οΈ The
cv2
module is a work in progress. If you find a bug or need a missing feature implemented feel free to file an issue or contribute yourself!
Legend:
- β - Support
- πΈ - Support via OpenCV cv2
- β - Not yet implemented
Vidformer-specific Functions
cv2.vidplay(video2)
- Play a VideoWriter, Spec, or SourceVideoWriter.spec()
- Return the Spec of an output videoFrame.numpy()
- Return the frame as a numpy arraycv2.setTo
- The OpenCVMat.setTo
function (not in cv2)cv2.zeros
- Create a black frame (equiv tonumpy.zeros
)
opencv
Class | Status |
---|---|
VideoCapture | β |
VideoWriter | β |
VideoWriter_fourcc | β |
Function | Status |
---|---|
imread | β |
imwrite | β (Yrden only) |
opencv.imgproc
Drawing Functions:
Function | Status |
---|---|
arrowedLine | β |
circle | β |
clipLine | β |
drawContours | β |
drawMarker | β |
ellipse | β |
ellipse2Poly | β |
fillConvexPoly | β |
fillPoly | β |
getFontScaleFromHeight | πΈ |
getTextSize | πΈ |
line | β |
polylines | β |
putText | β |
rectangle | β |
opencv.core
Function | Status |
---|---|
addWeighted | β |
resize | β |
User-Defined Filters
To implement a new user-defined filter (UDF) you need to host a filter server over a UNIX Domain Socket.
The vidformer-py
library makes this easy.
Filters take some combination of frames and data (string, int, bool) and return a single frame result.
The vidformer project uses Python-style arguments, allowing ordered and named arguments (*args
and **kwargs
style).
To do this we define a new filter class and host it:
import vidformer as vf
import cv2
class MyFilter(vf.UDF):
def filter(self, frame: vf.UDFFrame, name: str):
"""Return the result frame."""
text = f"Hello, {name}!"
image = frame.data().copy()
cv2.putText(
image,
text,
(100,100),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(255, 0, 0),
1,
)
return vf.UDFFrame(image, frame.frame_type())
def filter_type(self, frame: vf.UDFFrameType, _name: str):
"""Returns the type of the output frame."""
return frame
mf_udf = MyFilter("MyFilter") # name used for pretty printing
my_filter = mf_udf.into_filter() # host the UDF in a subprocess, returns a vf.Filter
Now we can use our newly-created filter in specs: my_filter(some_frame, "vidformer")
.
There is a catch, UDFs currently only support rgb24
pixel formats.
So invoking my_filter
will need to convert around this:
scale = vf.Filter('Scale')
def render(t, i):
f = scale(tos[t], pix_fmt="rgb24", width=1280, height=720)
f = my_filter(f, "world")
f = scale(f, pix_fmt="yuv420p", width=1280, height=720)
return f
Roadmap
An unordered list of potential future features:
-
v3 API
- Supporting compression, deep recursion, deduplication, and artifact uploads
-
Full GPU Acceleration
-
WebAssembly user defined filters
FAQ
What video formats does vidformer support?
In short, essentially everything. vidformer uses the FFmpeg/libav* libraries internally, so any media FFmpeg works with should work in vidformer as well. We support many container formats (e.g., mp4, mov) and codecs (e.g., H.264, VP8).
A full list of supported codecs enabled in a vidformer build can be found by running:
vidformer-cli codecs
Can I access remote videos on the internet?
Yes, vidformer uses Apache OpenDAL for I/O, so most common data/storage access protocols are supported. However, not all storage services are enabled in distributed binaries. We guarantee that HTTP, S3, and the local filesystem are always available.
How does vidformer compare to FFmpeg?
vidformer is far more expressive than the FFmpeg filter interface. Mainly, vidformer is designed for work around data, so edits are created programatically and edits can reference data. Also, vidformer enables serving resut videos on demand.
vidformer uses the FFmpeg/libav* libraries internally, so any media FFmpeg works with should also work in vidformer.
How does vidformer compare to OpenCV/cv2?
vidformer orchestrates data movment in video synthesis tasks, but does not implement image processing directly. Most use cases will still use OpenCV for this.