Skip to main content
SearchLoginLogin or Signup

saccades

Published onAug 29, 2024
saccades
·

saccades

https://aimc2024.pubpub.org/pub/vftqi55v/draft?access=d11ef909

Project Description

saccades is presented in four movements. In the first movement, machine listening algorithms send streams of audio descriptors to a bespoke generative video system designed in openFrameworks. Audio onsets stochastically trigger video synthesis modules and video processing effects. The varying strategies of visualizing the audio analyses leads to a tight relationship between the aural and visual media. The sheer density of information in this movement suggests the contemporary experience of navigating the high entropy world in which we live.

In the second movement, rather than visualizing audio descriptors, the inverse approach is taken: sonifying video analyses. An multi-layer perceptron neural network (using the FluCoMa Toolkit) is trained to predict synthesis parameters (of a data set curated by the author) from the position of an eyeball on screen (x and y pupil location as well as the y position of the top and bottom eyelid). The neural network enables a strong relationship between the motion of the eyeball and the sonic results of the synthesis algorithm. 

While an automatic motion tracking algorithm could have been used to determine the position of the eyeball features, I found these systems lacking in speed and accuracy. I instead created a system that allowed me to click my mouse on every image in the video sequence to “human tag” my own data creating a much more accurate data set. This experience reflects the many problems and strategies surrounding data quality and data set creation. 

A computer vision algorithm finds salient points in the images of the eyeball which are turned into a graph of vertices and edges suggesting a common visual representation of the search for relationships within data sets.

The flashing blue bars that enter towards the end of the third movement visualize the real-time presence of the “features” learned from using non-negative matrix factorization (NMF) on the matrix of STFT magnitudes of the tape part. NMF sometimes separates sound objects in a way similar to human perception, but often not. The sometimes, but often not, correlation between the brightness of these rectangles and the presence and motion of sound objects in the tape mirrors (and explicitly demonstrates) the problems of human-AI alignment.

The fourth movement shows two latent space walk videos generated from generative adversarial networks (GANs) trained on the sequence of eyeball images. I find that letting the images randomly move through the latent space often creates many images that are not very interesting and/or are not perceived as an eyeball. In order to maintain the strong perception of the eye and keep the latent space walk engaging, I used a human-in-the-loop approach to generate these videos. I first generated 500 random images from the latent space and viewed each, keeping only the ones (a little over 100) that I decided would look good in the latent space walk. I then shuffled the kept images and computed interpolations between each adjacent images’ 512 dimension latent space representation to create the latent space morphing seen in the video.

The saxophone multiphonics played throughout saccades are chosen algorithmically using a nearest neighbor lookup approach. By analyzing the data set of multiphonic recordings from The Techniques of Saxophone Playing by Marcus Weiss and Giorgio Netti I found the average chroma representation of each multiphonic. When I wanted the saxophonist to play a multiphonic I performed a chroma analysis of that section of the piece and then found which multiphonic was nearest to the chroma analysis of the tape part. This resulted in a strong correlation between the pitch class content of the saxophone multiphonic and accompanying tape part.

Type of Submission

Performance 1 or 2. If the conference can hire a saxophonist that would be excellent. If not, I’ll endeavor to find one! Additionally, I sometimes screen this work as fixed media only (when I do I title it in the singular: saccade) and would be happy to present it this way as well.

Technical / Stage Requirements

These specifications are adaptable to different performance spaces, so please be in touch with what is or isn't possible and we'll find a solution!


1. Projector (as large as possible) thrown on to a wall (preferably a blank white wall with the performer positioned to stand in front of the wall so some of the projection is thrown onto them).

2. Stereo audio for playback (more channels are possible and welcome, let me know ahead of time how many and the configuration).

3. If possible/necessary microphone for amplification and blending of acoustic performer with electronic sound.

4. The performer uses a click track to keep in time with the tape part, so a way to get the click track part to the performers position on stage. (author can bring headphones and some headphone extenders if that's useful.)

Program Notes

To create saccades, personally-created datasets of video and audio analyses (created with FluCoMa, CV algorithms, and CNNs) are mined for similarities and patterns via MLP neural networks, dimensionality reduction algorithms such as PCA and UMAP, and feature learning using NMF. The relationships these algorithms revealed are used to create tight perceptual connections between the audio and visual media in the work. A plethora of variations on visual themes are created by the combinatorics of stochastically triggered visual synthesis modules and processing effects all of which is visualizing audio descriptor analyses of the tape part. 

A “saccade” is a rapid movement of the eyeball between two fixed focal points. During this brief moment, the brain hides this blurry motion from our perception. Once a saccade motion has begun, the destination cannot change, meaning that if the target of focus disappears the viewer won’t know until the saccade completes. 

This phenomenon is imitated by the sound and video presented in the piece. It also serves as a metaphor for the density of information and high entropy experiences we’re constantly trying to cope with. A scroll on social media, smartphone alerts, big data, technological advancements, the abundance of choices in the grocery aisle.

Media

Image:

Video Documentation:

Score:

Comments
0
comment
No comments here
Why not start the discussion?