Motion Interpolation for Glitch Aesthetics using FFmpeg part 0

As you may have seen in this blog post I made use of FFmpeg’s minterpolate motion interpolation options to make all of the faces morph. There’s quite a few options for minterpolate and many different combinations of options that can be used. i had to consult Wikipedia to figure out exactly what the different motion estimation algorithms were but even with that information I couldn’t visualise how it would change the output. To add to this how I’m using minterpolate isn’t a typical use case.

To make things easier for those wishing to use FFmpeg’s minterpolate to create glitch aesthetics I have compiled 36 videos each showing a different combination of processing options. The source video can be seen below and features two of my favourite things: cats (obtained from here) and rainbows.

I’ve slowed it down so that you can see exactly what’s in the video, but the original can be downloaded here.

The base script used for each video is:

ffmpeg -i cat_rainbow_original.mp4 -filter:v "setpts=62.5*PTS,minterpolate='fps=25:mb_size=16:search_param=400:vsbmc=0:scd=none:

In part two of March’s Development Update I explained why I set scd to none and search_param to 400. I could have of course documented what happens when all of the processing options are changed but that would result in me having to make hundreds of videos! The options that were changed were the mc_mode (motion compensation mode), me_mode (motion estimation mode), and me (motion estimation algorithm).

Test conditions

These videos were created using FFmpeg 7:4.1.4-1build2, installed from the Ubuntu repositories, on a Dell XPS 15 (2017 edition) with 16GB memory, a i7 processor and an Nvidia GeForce GTK 1050 graphics card, all running on Ubuntu 19.10 using proprietary drivers.

I don’t have a Windows or Mac machine, and haven’t used other Linux distributions so can’t test these scripts in those conditions. If there’s any problems with getting FFmpeg on your machine it’s best that you contact the developers for assistance.

Observations

My first observation is that the esa me_mode takes frikkin ages to complete! Each video using this me_mode took about four hours to process. I did consider killing the script but for completeness I let it run.

Using bilat me_mode produces the most chaotic results by far. Just compare 026_mc_mode=obmc_me_mode=bilat_me=epzs.mp4 to 008_mc_mode=obmc_me_mode=bidir_me=epzs.mp4 and you’ll see what I mean.

For a video of this length nearly all of the scripts (except for those using esa) took between 30 seconds and 1 minute to complete, and that’s on machines with and without a GPU. This is good news if you don’t want to have to carry around a powerhouse laptop all the time.

All of this reminds me a bit of datamoshing. It’s more predictable and controllable, but the noise and melty movement it creates, especially some of the ones using bilat me_mode, remind me of the bloom effect in datamoshing. This could be down to the source material, and I’d be interested to see experiments involving datamoshed videos.

Let’s a go!

With that all said let’s jump into sharing the results. As there’s 36 videos I’ll be splitting it over nine blog posts over nine days, with the last being posted on 28th March 2020. Each will contain the script I used as well as the output video. Links to each part can be found below:

(mis)Using FFmpeg’s Motion Interpolation Options

Towards the end of the Let’s Never Meet video the robotic faces slowly morph into something a little bit more human-like.

These faces continue to morph between lots of different faces, suggesting that when getting to know people you can never really settle on who they are. To make the faces morph I used motion interpolation to morph between each face. Here’s what Wikipedia has to say about motion interpolation.

Motion interpolation or motion-compensated frame interpolation (MCFI) is a form of video processing in which intermediate animation frames are generated between existing ones by means of interpolation, in an attempt to make animation more fluid, to compensate for display motion blur, and for fake slow motion effects.

For those that use proprietary software there’s a few that can do this, including Twixtor and After Effects.

If, like me, you only use open source software there are a few options but they’re not integrated within a general post processing or video editing GUI.

slowmoVideo

slowmoVideo is an open source application which allows you to vary the speed of a video clip over time. I used this previously for the background images in the Visually Similar Artwork.

For Let’s Never Meet I did consider using slomoVideo again. What I like about it is being able to vary the speed and that it has a GUI. However, development on it seems kinda slow and, most importantly, it requires a GPU. Occasionally I find myself working on a machine that only has integrated graphics (i.e. no GPU), which makes using slomoVideo practically impractical. So, I needed something that would reliably work on a CPU and produced similar if not same visual results as slomoVideo.

Butterflow

Butterflow is another software for motion interpolation. It doesn’t have a native GUI but it has a nice set of command line options. Sadly it seems impossible to install on Linux. Many have tried, many have failed.

FFmpeg

Finally I tried FFmpeg. Pretty much all my artworks use FFmpeg at some point, whether as the final stage in compiling a Blender render or as the backend to a video editor or video converter. I’m already very familiar with how FFmpeg works and feel it can be relied to work an be developed in the future.

I actually first came across FFmpeg’s motion interpolation options sometime in late 2018, but only really cemented my understanding of how to use it in making Let’s Never Meet.

Going through FFmpeg’s minterpolate options was quite daunting at first. There’s lots of options which have descriptions on how they work but I didn’t really understand what results they would produce. Nonetheless I mixed and matched settings until I produced something close to my liking.

The first step in making the morphed video was making original speed video.

I’ve slowed the above video down so you can see each frame, but if you want the original video you can download it here. This consisted 47 faces/images, played one image per frame. In total it lasted 1.88 seconds and I needed to slow it down to at least x minutes, which is the length of the video.

Here is the code that I used

ffmpeg -i lnm_faces_original.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=400'" -y output.mp4

I’ll explain three of the important parts of this code.

setpts

The FFmpeg wiki has a good explanation of what setpts does:

To double the speed of the video, you can use:

ffmpeg -i input.mkv -filter:v "setpts=0.5*PTS" output.mkv

The filter works by changing the presentation timestamp (PTS) of each video frame. For example, if there are two successive frames shown at timestamps 1 and 2, and you want to speed up the video, those timestamps need to become 0.5 and 1, respectively. Thus, we have to multiply them by 0.5.

So, by using setpts=40*PTS I’m essentially slowing the video down by a factor of 40. For this video I took a guess at much I’d need to multiply the video of the faces to make it match the length of the video. If I wanted to be exact I’d need to use some maths and divide the frame count of the video (5268), divide it by the frame count of the face video (47) and use the output (112.085106383) as the PTS multiplier.

scd

scd is probably the most important part of this code. It attempts to detect if there’s any scene changes and then not perform any motion interpolation on those frames. In this scenario, however, I want to interpolate between every frame, regardless of whether they appear to be part of the same “scene”. If you leave scd at the default of fdiff and scd_threshold at 5.0 ffmpeg tries to decide if there’s enough difference between frames. Here’s what that would’ve looked like:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:me_mode=bidir:vsbmc=1:search_param=400'" -y lnm_faces_scd.mp4
(without setting scd the defaults are assumed)

Not ideal, so I disabled it by setting it to none.

search_param

This one I don’t quite understand but I understand how it affects the video. If I were to leave the setting with the default value of 32 then you can see that when it interpolates there isn’t much movement:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=32'" -y search_param_32.mp4

With the value of 400 which I used:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=400'" -y search_param_400.mp4

And with the slightly ridiculous value of 2000:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=2000'" -y search_param_2000.mp4

The biggest difference is clearly between setting the search_param from 32 to 400. At 2000 there’s only minor differences, though this may change depending on your source input.

It’s morphin’ time!

With all the settings of minteroplate now set I created the final video:


(I reduced the quality of the video a little bit to save on bandwidth)

I quite like the end results. It doesn’t look the same as the output of slowmoVideo in that it the morphing happens in blocks and doesn’t look like the dust grains output of slomoVideo. However, in using FFmpeg I can now use a familiar program that works on the CPU, even if it does take a long time!

Producing audio for Let’s Never Meet

For the majority of my career in art I’ve been primarily known for my visual artwork. I’ve dabbled in making noises with my Sonification Studies performances (which may make a comeback at some point) but it’s only since my 2018 performance at databit.me that I’ve regularly made and performed music.

On the performance side I’ve mostly used TidalCycles. You may have seen that I have been doing live streams of my rehearsals.

Outside of live coding I’ve spent most of my time getting to grips with software-based synthesisers and DAWs. When asking for advice on this most people told me to use software like Ableton. What these well-meaning people may not realise is that I exclusively use (Ubuntu) Linux and only open source software. This gives me the freedom that open source grants me but boy does it sometimes cause headaches! Plenty of people use the open source options available to them but this approach is still the road less travelld and so I’ve found myself sometimes asking lots of questions and either not getting a response or getting the response that what I’m trying to achieve is not possible.

And so for the last year or so that I’ve creating workflows that work for me. For this I’ve been using Ardour, which is a pretty good cross-platform DAW. So far I’ve produced soundtracks to two of my artworks, We Are Your Friends and Let’s Never Meet. In this Development Update I’ll go over a little trick I learnt whilst making the soundtrack for Let’s Never Meet.

In short, Let’s Never Meet is about meeting people over the internet. The soundtrack is actually a remix of a an Android alarm ring tone.

It’s not an alarm tone that I use myself but it was ambient enough to work in an outdoor setting for an extended period without getting annoying. Plus using a sample from my phone just somehow felt appropriate, if you know what I mean. After many many many hours of producing my remix sounded a bit like this:

I was really happy with the results but it felt like there was something missing. It was pretty samey throughout and I think there needed to be some kind of buildup or change in pace. To address this I decide to add some percussion. I turned to the glitch sample set that is downloaded when you install TidalCycles. It has a nice percussive quality and definitely sound glitchy and electronic, again in fitting with the digital theme of the piece.

As far as playing these samples I did consider manipulating them in TidalCycles and importing the whole recording file into Ardour, but I also wanted to get better with Ardour so sought a solution within that software. The glitch pack contains eight samples and I needed to be able to load them into Ardour to trigger/play at will. The drumkv1 plugin is the perfect solution to this.

It’s a sampler where you assign samples to midi notes. To play the notes you could use a midi keyboard, send the notes from Pure Data, or basically any software that can send midi. I decided to use the x42 step sequencer to input the midi notes. It’s a very simple step sequencer originally built for the MOD platform but, because it’s an lv2 plugin, it can run in any host that supports it.

Using this sequencer I could easily create an eight-step loop that starts simple builds up with more drums over time.

With the samples assigned to midi notes I just needed a way to press the pads in the step sequencer. I have two physical controllers, a Launchpad X and a MPK Mini. The latter only has two rows of four drum pads. The former is an 8×8 grid but I can’t yet program it properly to work with the software I use (more on that another time). In any case, in looking into how to use the Launchpad X with x42 the plugin’s author, Robin Gareus, told me that it’d never be possible because x42 doesn’t accept midi input 🙁

I accepted that using a software or hardware midi controller was a no go. I would have to use a mouse, which wasn’t ideal but it would work. The plugin’s author did recommend that I look into BSequencer. It appears to accept midi input but with a deadline looming I didn’t want to spend more time on this by learning yet another software.

Using my mouse in Ardour I started to record the input of me playing the step sequence but I noticed the midi notes from x42 weren’t being recorded.

I found this very strange. drumkv1 was blinking to show it was receiving midi but nothing was being recorded. After some research I discovered that it was because Ardour records external midi. When I loaded x42 as a plugin within Ardour it was sending midi internally. To get around this there are two solutions:

I used Carla as a plugin host to load x42 and then sent the midi output to the correct track in Ardour.

Carla showing x42 being connected to Ardour

This worked but I was getting a lot of latency with the input and the notes didn’t align properly. This is probably easy to solve by tuning my system to reduce latency (I already use the realtime kernel), or maybe something that I was doing wrong, but again with a looming deadline I didn’t want to do anything drastic and time consuming.

The second option was to send the output of x42 out into another application and then have that external application send its midi input into Ardour. To do this I loaded a2jmidid, connected the track’s midi output into it, and then connected the output of a2jmidid into the track in Ardour.

Screenshot showing ardour connecting to a2jmidid and back again

When I started up x42 again in Ardour and started clicking on its pads it all worked as expected!

After all of that effort I recorded myself building up the percussion. Here’s the finished track 🙂

I’ve been having a lot of fun making music, so expect more of it from me in the future.