Getting Started

This is a walkthrough of getting started with vidformer-py.

Installation

Getting vidformer-py:

pip3 install vidformer

Additionally you need a vidformer server. We distibute the locally-running version, yrden, through vidformer-cli. Currently, vidformer-cli only runs on Linux, so using the vidformer docker image is highly advised:

docker pull dominikwinecki/vidformer:latest
docker run --rm -it -p 8000:8000 dominikwinecki/vidformer:latest yrden --print-url

This launches a vidformer yrden server, which is our reference server implementation for local usage, on port 8000. If you want to read or save video files locally add -v /my/local/dir:/data and then reference them as /data in the code.

Hello, world!

We assume this is in a Jupyter notebook. If not then .play() won't work and you have to use .save() instead.

We start by connecting to a server and registering a source:

import vidformer as vf
from fractions import Fraction

server = vf.YrdenServer(domain='localhost', port=8000)
example_vids = vf.StorageService(
    "http",
    endpoint="https://f.dominik.win",
    root="/data/dve2/")

tos = vf.Source(
    server,
    "tos_720p",     # name (for pretty printing)
    "tos_720p.mp4", # path
    stream=0,       # index of the video stream we want to use
    service=example_vids)

print(tos.ts())
print(tos.fmt())

This will print the timestamps of all the frames in the video, and then format information: This may take a few seconds the first time, but frame times are cached afterwords.

> [Fraction(0, 1), Fraction(1, 24), Fraction(1, 12), Fraction(1, 8), ...]
> {'width': 1280, 'height': 720, 'pix_fmt': 'yuv420p'}

Now lets create a 30 second clip starting at the 5 minute mark. The source video is at at a constant 24 FPS, so lets create a 24 FPS output as well:

domain = [Fraction(i, 24) for i in range(24 * 30)]

Now we need to render each of these frames, so we define a render function.

def render(t: Fraction, i: int):
    clip_start_point = Fraction(5 * 60, 1) # start at 5 * 60 seconds
    return tos[t + clip_start_point]

We used timestamp-based indexing here, but you can also use integer indexing (tos.iloc[i + 5 * 60 * 24]).

Now we can create a spec and play it in the browser. We create a spec from the resulting video's frame timestamps (domain), a function to construct each output frame (render), and the output videos format (matching tos.fmt()).

spec = vf.Spec(domain, render, tos.fmt())
spec.play(server)

This plays this result:

Some Jupyter environments are weird (i.e., VS Code), so .play() might not work. Using .play(..., method="iframe") may help.

It's worth noting that we are playing frames in order here and outputing video at the same framerate we recieved, but that doesn't need to be the case. Here are some things other things you can now try:

  • Reversing the video
  • Double the speed of the video
    • Either double the framerate or sample every other frame
  • Shuffle the frames into a random order
  • Combining frames from multiple videos
  • Create a variable frame rate video
    • Note: .play() will not work with VFR, but .save() will.

Bounding Boxes

Now let's overlay some bouding boxes over the entire clip:

# Load some data
import urllib.request, json 
with urllib.request.urlopen("https://f.dominik.win/data/dve2/tos_720p-objects.json") as r:
    detections_per_frame = json.load(r)

bbox = vf.Filter("BoundingBox") # load the built-in BoundingBox filter

domain = tos.ts() # output should have same frame timestamps as our example clip

def render(t, i):
    return bbox(
        tos[t],
        bounds=detections_per_frame[i])

spec = vf.Spec(domain, render, tos.fmt())
spec.play(server)

This plays this result (video is just a sample clip):

Composition

We can place frames next to each other with the HStack and VStack filters. For example, HStack(left_frame, middle_frame, right_frame, width=1280, height=720, format="yuv420p") will place three frames side-by-side.

As a larger example, we can view a window function over frames as a 5x5 grid:

hstack = vf.Filter("HStack")
vstack = vf.Filter("VStack")

w, h = 1920, 1080

def create_grid(tos, i, N, width, height, fmt="yuv420p"):
    grid = []
    for row in range(N):
        columns = []
        for col in range(N):
            index = row * N + col
            columns.append(tos.iloc[i + index])
        grid.append(hstack(*columns, width=width, height=height//N, format=fmt))
    final_grid = vstack(*grid, width=width, height=height, format=fmt)
    return final_grid

domain = [Fraction(i, 24) for i in range(0, 5000)]

def render(t, i):
    return create_grid(tos, i, 5, w, h)

fmt = {'width': w, 'height': h, 'pix_fmt': 'yuv420p'}

spec = vf.Spec(domain, render, fmt)
spec.play(server)

This plays this result (video is just a sample clip):

Viewing Telemetry (and User-Defined Filters)

This notebook shows how to build custom filters to overlay data.

This plays this result (video is just a sample clip):