vidformer

A research project providing infrastructure for video-native interfaces and accelerating computer vision visualization. Developed by the OSU Interactive Data Systems Lab.

🎯 Why vidformer

Vidformer efficiently transforms videos, enabling faster annotation, editing, and processing of video data—without having to focus on performance.

It uses a declarative specification format to represent transformations. This enables:

Transparent Optimization: Vidformer optimizes the execution of declarative specifications just like a relational database optimizes relational queries.
Lazy/Deferred Execution: Video results can be retrieved on-demand, allowing for practically instantaneous playback of video results.

Vidformer usually renders videos 2-3x faster than cv2, and hundreds of times faster when serving videos on-demand.

Vidformer builds on open technologies you may already use:

OpenCV: A cv2-compatible interface ensures both you (and LLMs) can use existing knowlege and code.
Supervision: Supervision-compatible annotators make visualizing computer vision models trivial.
FFmpeg: Built on the same libraries, codecs, and formats that run the world.
Jupyter: View transformed videos instantly right in your notebook.
HTTP Live Streaming (HLS): Serve transformed videos over a network directly into any media player.
Apache OpenDAL: Access source videos no matter where they are stored.

🚀 Quick Start

The easiest way to get started is using vidformer's cv2 frontend, which allows most Python OpenCV visualization scripts to replace import cv2 with import vidformer.cv2 as cv2:

git clone https://github.com/ixlab/vidformer
cd vidformer
docker build -t igni -f Dockerfile .
docker-compose -f vidformer-igni/docker-compose-local.yaml up

import vidformer.cv2 as cv2

cap = cv2.VideoCapture("my_input.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

out = cv2.VideoWriter("my_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"),
                        fps, (width, height))
while True:
    ret, frame = cap.read()
    if not ret:
      break

    cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
                1, (255, 0, 0), 1)
    out.write(frame)

cap.release()
out.release()

You can find details on this in our Getting Started Guide.

About the project

File Layout:

./vidformer: The core transformation library
./vidformer-py: A Python video editing client
./vidformer-cli: A command-line interface
./vidformer-igni: The second generation vidformer server
./snake-pit: The main vidformer test suite
./docs: The vidformer website

Vidformer components are detailed here.

❌ vidformer is NOT:

A conventional video editor (like Premiere Pro or Final Cut)
A video database/VDBMS
A natural language query interface for video
A computer vision library (like OpenCV)
A computer vision AI model (like CLIP or Yolo)

However, vidformer is strongly complementary to each of these. If you're working on any of the later four, vidformer may be for you.

License: Vidformer is open source under Apache-2.0. Contributions are welcome.

Acknowledgements: Vidformer is supported by the U.S. National Science Foundation under Awards #2118240 and #1910356.

Getting Started

Vidformer can be run in a cloud deployment or locally.

In the cloud (start here):

Walk through a demo using a hosted guest account:

Local:

You can host a server yourself (here). Installing locally allows for accessing the local file system and saving video results.

Getting started with the cv2 compatability layer

Local Install

You can deploy the server locally with docker:

git clone https://github.com/ixlab/vidformer
cd vidformer
docker build -t igni -f Dockerfile .
docker-compose -f vidformer-igni/docker-compose-local.yaml up

Vidformer-py can be installed with pip:

pip3 install vidformer

There are two ways to connect the client to the server. Either use the environment variables printed out by the server or set it manually:

import vidformer as vf
import vidformer.cv2 as cv2

cv2.set_server(vf.Server("<ENDPOINT>", "<API_KEY>"))

Run admin commands

Admin commands can be run from inside the server container:

docker-compose -f vidformer-igni/docker-compose-local.yaml exec igni bash
vidformer-igni user ls

Run vidformer-igni --help for other commands.

Getting Started - cv2

This is a walkthrough of getting started with vidformer OpenCV cv2 compatability layer.

⚠️ Adding cv2 functions is a work in progress. See the cv2 filters page for which functions have been implemented.

Hello, world!

Copy in your video, or use ours:

curl -O https://f.dominik.win/data/dve2/tos_720p.mp4

Then just replace import cv2 with import vidformer.cv2 as cv2. Here's our example script:

import vidformer.cv2 as cv2

cap = cv2.VideoCapture("tos_720p.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

out = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"),
                        fps, (width, height))
while True:
    ret, frame = cap.read()
    if not ret:
      break

    cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
                1, (255, 0, 0), 1)
    out.write(frame)

cap.release()
out.release()

Stream the Results

Saving videos to disk works, but we can also display them in the notebook. Since we stream the results and only render them on demand this can start practically instantly!

First, replace "output.mp4" with None to skip writing the video to disk. Then you can use cv2.vidplay() to play the video!

import vidformer.cv2 as cv2

cap = cv2.VideoCapture("tos_720p.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

out = cv2.VideoWriter(None, cv2.VideoWriter_fourcc(*"mp4v"),
                        fps, (width, height))
while True:
    ret, frame = cap.read()
    if not ret:
      break

    cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
                1, (255, 0, 0), 1)
    out.write(frame)

cap.release()
out.release()

cv2.vidplay(out)

⚠️ By default cv2.vidplay() will return a video which plays in a Jupyter Notebook. If running outside a jupyter notebook you can pass method="link" to return a link instead.

The vidformer modules

vidformer is a highly modular suite of tools that work together:

vidformer-py: A Python 🐍 client for declarative video synthesis
- Provides an easy-to-use library for symbolically representing transformed videos
- Acts as a client for a vidformer server
- Using vidformer-py is the best place to get started
libvidformer: The core data-oriented declarative video editing library
- An embedded video processing execution engine with low-level interfaces
- Systems code, written in Rust 🦀
- You should use if: You are building a VDBMS or other multimodal data-system infrastructure.
- You should not use if: You just want to use vidformer in your workflows or projects.
vidformer-igni: The vidformer server
- A multi-tenant scale-out server
- Designed for Video on Demand only
  - Does not support full-video exports
  - All video sources must be over the network, not local
- Enables live streaming and waiting on external dependencies for even lower time-to-playback latency

Client libraries in other languages: Writing a vidformer client library for other languages is simple. It's a few hundred lines of code, and you just have to construct some JSON. Contributions or suggestions for other languages are welcome.

vidformer - Video Data Transformation Library

(lib)vidformer is a core video synthesis/transformation library. It handles the movement, control flow, and processing of video and conventional (non-video) data.

Quick links:

About

It's written in Rust 🦀
- So it does some fancy parallel processing and does so safely
Uses the FFmpeg libav libraries for multimedia stuff
- So it should work with nearly every video file ever made
Uses Apache OpenDAL for I/O
- So it can access videos in a bunch of storage services
Implements filters using OpenCV

Building

This crate requires linking with FFmpeg, as detailed in the rusty_ffmpeg crate. We currently target FFmpeg 7.0.

vidformer-py

vidformer-py is a Python 🐍 frontend for vidformer. It has an API compatability layer with OpenCV cv2, as well as some supervision annotators. Our getting started guide explains how to use it.

Quick links:

Publish:

 export FLIT_USERNAME='__token__' FLIT_PASSWORD='<token>'
flit publish

vidformer-igni

The vidformer server for the cloud.

Quick links:

🧑‍💻 Source Code

Local Setup

git clone https://github.com/ixlab/vidformer
cd vidformer
docker build -t igni -f Dockerfile .
docker-compose -f vidformer-igni/docker-compose-local.yaml up

Development Setup

docker-compose -f docker-compose-db.yaml up
export 'IGNI_DB=postgres://igni:igni@localhost:5432/igni'
cargo run -- user add --name test --api-key test --permissions test
cargo run --release -- server --config igni.toml

Server Deployment

# From vidformer project root
docker build -t igni -f Dockerfile .
docker-compose -f vidformer-igni/docker-compose-prod.yaml up

For tls certs:

docker-compose -f vidformer-igni/docker-compose-prod.yaml run --rm certbot certonly --webroot --webroot-path /var/www/certbot/ -d api.example.com -d cdn.example.com

Guest account setup (for colab notebook)

docker ps
docker exec -it <igni container> bash
vidformer-igni user add --name guest --permissions guest --api-key VF_GUEST
vidformer-igni user ls
vidformer-igni source add --user-id 98f6aa2a-e622-40bc-a0cd-e05f73f7e398 --name vf-sample-media/tos_720p.mp4 --stream-idx 0 --storage-service http --storage-config '{"endpoint":"https://f.dominik.win"}'
vidformer-igni source add --user-id 98f6aa2a-e622-40bc-a0cd-e05f73f7e398 --name vf-sample-media/tos_720p-yolov8x-seg-masks.mkv --stream-idx 0 --storage-service http --storage-config '{"endpoint":"https://f.dominik.win"}'

Filters

Built-in Filters

While most applications will use user-defined filters, vidformer ships with a handful of built-in filters to get you started:

DrawText

DrawText does exactly what it sounds like: draw text on a frame.

For example:

DrawText(frame, text="Hello, world!", x=100, y=100, size=48, color="white")

BoundingBox

BoundingBox draws bounding boxes on a frame.

For example:

BoundingBox(frame, bounds=obj)

Where obj is JSON with this schema:

[
  {
    "class": "person",
    "confidence": 0.916827917098999,
    "x1": 683.0721842447916,
    "y1": 100.92174338626751,
    "x2": 1006.863525390625,
    "y2": 720
  },
  {
    "class": "dog",
    "confidence": 0.902531921863556,
    "x1": 360.8750813802083,
    "y1": 47.983140622720974,
    "x2": 606.76171875,
    "y2": 717.9591837897462
  }
]

Scale

The Scale filter transforms one frame type to another. It changes both resolution and pixel format. This is the most important filter and is essential for building with vidformer.

Arguments:

Scale(
    frame: Frame,
    width: int = None,
    height: int = None,
    pix_fmt: str = None)

By default missing width, height and format values are set to match frame. pix_fmt must match ffmpeg's name for a pixel format.

For example:

frame = Scale(frame, width=1280, height=720, pix_fmt="rgb24")

IPC

IPC allows for calling User-Defined Filters (UDFs) running on the same system. It is an infrastructure-level filter and is used to implement other filters. It is configured with a socket and func, the filter's name, both strings.

The IPC filter can not be directly invoked, rather IPC filters are constructed by a server upon request. This can be difficult, but vidformer-py handles this for you. As of right now IPC only supports rgb24 frames.

HStack & VStack

HStack & VStack allow for composing multiple frames together, stacking them either horizontally or vertically. It tries to automatically find a reasonable layout.

Arguments:

HStack(
    *frames: list[Frame],
    width: int,
    height: int,
    format: str)

At least one frame is required, along with a width, height and format.

For example:

compilation = HStack(left_frame, right_frame, width=1280, height=720, format="rgb24")

OpenCV/cv2 Functions

See vidformer.cv2 API docs.

⚠️ The cv2 module is a work in progress. If you find a bug or need a missing feature implemented feel free to file an issue or contribute yourself!

Legend:

✅ - Support
🔸 - Support via OpenCV cv2
❌ - Not yet implemented

Vidformer-specific Functions

cv2.vidplay(video2) - Play a VideoWriter, Spec, or Source
VideoWriter.spec() - Return the Spec of an output video
Frame.numpy() - Return the frame as a numpy array
cv2.setTo - The OpenCV Mat.setTo function (not in cv2)
cv2.zeros - Create a black frame (equiv to numpy.zeros)

opencv

Class	Status
VideoCapture	✅
VideoWriter	✅
VideoWriter_fourcc	✅

Function	Status
imread	✅
imwrite	✅

opencv.imgproc

Drawing Functions:

Function	Status
arrowedLine	✅
circle	✅
clipLine	❌
drawContours	❌
drawMarker	❌
ellipse	✅
ellipse2Poly	❌
fillConvexPoly	❌
fillPoly	❌
getFontScaleFromHeight	🔸
getTextSize	🔸
line	✅
polylines	❌
putText	✅
rectangle	✅

opencv.core

Function	Status
addWeighted	✅
resize	✅

Hardware Acceleration

This page details how to compile vidformer with NVIDIA NVENC and similar hardware accelerated codecs. We assume Docker is running on a system with CUDA. Other codecs also work, see FFmpeg docs. Testing this with GitHub Actions is impossible, so it may be a tad outdated.

The container must be run with these arguments: --gpus all --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=all. If using Dev Containers, these can be added to the devcontainer.json file under runArgs.

The scripts/deps_ffmpeg.sh needs to be patched to include --enable-ffnvcodec.

Then you can run this in the container:

# From project root delete old FFmpeg build
rm -rf ffmpeg

sudo apt update -y
sudo apt install build-essential yasm cmake libtool libc6 libc6-dev unzip wget libnuma1 libnuma-dev -y

# Install (see https://trac.ffmpeg.org/wiki/HWAccelIntro)
rm -rf nv-codec-headers
git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
cd nv-codec-headers
# NOTE: Depending on your driver, you may want to checkout an older version tag here
make
sudo make install
cd -

# Install cuda
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update -y
sudo apt-get install -y nvidia-container-toolkit nvidia-cuda-toolkit

# Build ffmpeg
./scripts/deps_ffmpeg.sh

# Test it out
./ffmpeg/build/bin/ffmpeg -i myinputvid.mp4 -c:v h264_nvenc out.mp4 -y

Now you can recompile vidformer and use hardware accelerated codecs.

Roadmap

An unordered list of potential future features:

Full GPU Acceleration
WebAssembly user defined filters

FAQ

What video formats does vidformer support?

In short, essentially everything. vidformer uses the FFmpeg/libav* libraries internally, so any media FFmpeg works with should work in vidformer as well. We support many container formats (e.g., mp4, mov) and codecs (e.g., H.264, VP8).

A full list of supported codecs enabled in a vidformer build can be found by running:

vidformer-cli codecs

Keyboard shortcuts

vidformer - Video Data Transformation