Category: Works

  • Black Latents | Latent Diffusion

    Black Latents | Latent Diffusion is a gradio application that allows you to spawn audio items from Black Latents, a RAVE V2 VAE trained on the Black Plastics series using RAVE-Latent Diffusion models.

    A demo version is accessible on Huggingface. The full application can be retrieved from GitHub to use in local inference.


    Latent Diffusion with RAVE

    The RAVE architecture makes timbre transfer on audio input possible, but you can also generate audio by using its decoder layer as a neural audio synthesizer, e.g. in Latent Jamming.

    Another approach to use RAVE to spawn new audio information has been provided by Moisés Horta Valenzuela (aka 𝔥𝔢𝔵𝔬𝔯𝔠𝔦𝔰𝔪𝔬𝔰) with his RAVE-Latent Diffusion model.

    Latent diffusion models in general are quite efficient since they operate on the highly compressed representations of the original data. The key idea of RAVE-Latent Diffusion is to replicate structural coherency of audio information by encoding (longer) audio sequences into their latent representations using a RAVE encoder and then train a denoising diffusion model on these embeddings. The trained model is able to unconditionally generate new and similar sequences of the same length which can be decoded back into the audio domain using the RAVE model’s decoder.

    The original package by 𝔥𝔢𝔵𝔬𝔯𝔠𝔦𝔰𝔪𝔬𝔰 supports a latent embedding length down to a window size of 2048, which translates to about 95 seconds of audio at 44.1 KHz, suitable for compositional level information.

    In my fork RAVE-Latent Diffusion (Flex’ed), I extended the code to support a minimum of 256, which equals about 12 seconds at 44.1 KHz, and implemented a few other improvements and additional training options.

    Black Latents: turning Black Plastics into a RAVE model

    The motivation to train Black Latents was to extract dominant characteristics from my Black Plastics series, a compilation of 7 EPs with a total of 28 audio tracks of genres Experimental Techno, Breakbeats and Drum & Bass, I released between 2012-2020.

    I trained the model using the RAVE V2 architecture with a higher capacity of 128 and submitted it to the RAVE model challenge 2025 hosted by IRCAM, where it was publicly voted into first place. The model is available on the Forum IRCAM website.

    Using Black Latents | Latent Diffusion to spawn audio

    For Black Latents | Latent Diffusion, I trained diffusion models in 7 different configurations and context window lengths using once again the audio material from the Black Plastics series as base data set together with the Black Latents VAE.

    The application itself is a simple gradio interface to the generate script of RAVE-Latent Diffusion (Flex’ed). In the UI, you can choose from the different diffusion models, define seeds and set additional parameters like temperature or latent normalization before generating audio items through the Black Latents model decoder.

    Depending on the diffusion model and parameter selection, the resulting output varies from stumbling rhythmic micro structures to items with resemblances of their base training data’s macro scale considerations.

    Other examples

    I published earlier experiments with RAVE-Latent Diffusion and a different set of RAVE models in the form of two albums:

    MARTSMÆN – RLDG_0da02c80cb [datamarts/2KOMMA4]: BandcampNina

    MARTSM^N – RLDG_835770db1c [datamarts/2KOMMA3]: BandcampNina

  • Reykjavík Sunburn

    In Reykjavík Sunburn, four different neural audio models, trained on my own musical material — a corpus of electronic music conventionally written and produced — and a private voice dataset are employed in an improvisational setting inside Pure Data, a visual audio programming environment where I perform Latent Jamming, a real-time improvisation practice with neural audio models that embraces concepts of algorithmic and generative composition techniques. I act in real-time inside the models’ latent space, steering mood, density, and rhythmicality by exploring parameter settings in signal streams that resemble latent embeddings. By doing so, I aim to replace deterministic composition with guided exploration: tweak, listen, stabilize, vary. 

    Reykjavík Sunburn concludes that neural audio synthesis can extend creative practices in music performance and composition by leveraging the unpredictable behaviour of the models from a control interface that caters for creative intent.

    In Reykjavík Sunburn each two RAVE and vschaos2 models are being used:

    • Black Latents: a RAVE V2 model trained on the Black Plastics series – 28 tracks/ 3h of drum- and percussion-heavy electronic music. The resulting model generates mainly percussive output with rough textures and a generally high grittiness. In the framework, this model is used as a leading asset to generate the rhythmic baseline and general percussive structure. 
    • Nobsparse: a RAVE V2 model trained on a hybrid dataset of Tech House and sonically sparse Drum & Bass (about 4h of audio material). The model’s characteristics are relatively clear, sterile, and lightweight sounds, harmonic textures, and an isolated but dominant low end. Depending on the process development during the improvisation, this model serves as a secondary texture generator but can also replace Black Latent’s role in the composition. 
    • VSC2_Nobsparse: this vschaos2 model has been trained on the same dataset as the Nobsparse RAVE model. In the composition, this model is used to generate interchanging pads and drone-like noise textures for transitions or simply to enrich an ongoing section of the recording with a harmonic layer.
    • VSC2_Martha2023: being the only model trained on voice data, courtesy of my daughter, this model adds a layer of rhythmical, pseudo-vocal sound on top of the otherwise „instrumental“ generations of the three other models. 

    Output examples

    Reykjavík Sunburn (Take 1 Redux) received recognition at the AI Song Contest 2025 where it was selected to the finalist shortlist of 10 out of >150 submissions.

  • Latent Russando

    Latent Russando is a semi-generative compositional framework written in Pure Data dedicated to exploring musical qualities in working with generative neural nets for audio, conceived both as hybrid instruments and as autonomous actors.

    Practices from generative music and algorithmic composition are used as mediators between human performer and the generative abilities of the neural nets, displacing and circumventing concepts of authorship and genius by empowering multiple independent agents in an improvisation-driven, co-creative process.

    The work is based on Russando. Serenade for six German Sirens, op. 43 by Hallgrímur Vilhjálmsson, a heteronym of conceptual artist Georg Joachim Schmitt. The original piece was composed in 2008 and premiered in the context of the (also fictional) art exhibition cologne contemporary — international art biennale 08 at Asbach-Uralt Werke in Rüdesheim. It is a three-part composition of approx. 33 minutes in length, in which six German emergency and police sirens are alternately sounded together or alone. In consultation with the creator, I trained models based on two neural net architectures (RAVE, vschaos2, both courtesy of IRCAM, Paris) on the original piece.

    Public performances

    An exemplary instantiation, Nebuloso, a 7.0 output recording of the Latent Russando framework, was shortlisted for Soundcinema 2025, a recorded sound festival at FFT in Düsseldorf, where the framework was premiered in October 2025.

    In March 2026, Etereo, another 7.0 composition based on the framework, was publicly presented at Music for Cinemas, an event series dedicated to experimental music at Filmrauschpalast Moabit, Berlin.

    Fluidante, a quadrophonic recording from the framework, is presented during International Computer Music Conference (ICMC) 2026 in Hamburg.

  • Saatgut Proxy

    Saatgut Proxy is an experimental generative setup in Pure Data that creates both randomized and repeatable pathways through the latent space of two neural audio model architectures (RAVE, vschaos2) at the same time.

    The framework is based both on generalized abstractions that I have developed for the Latent Jamming use case and additional prototypes of techniques that I turned into dedicated abstractions later on.

    Saatgut Proxy was presented at ArtSearch symposium at ligeti zentrum in Hamburg during a lecture performance and along with a presentation on Latent Jamming and shared human/AI agency in electronic music creation at Storytellers+Machines 2024 conference at SODA (Manchester).

    Output examples

    The framework led to the following release artifacts:

    MARTSM=N – VARIA 3L [datamarts/2KOMMA1]: Nina

    MARTSM))N – Saatgut Proxy Reflux [datamarts/2KOMMA0]: Nina

    MARTSM))N – Saatgut Proxy [n/a]: Bandcamp

  • Fibonacci Jungle

    While singular generative composition techniques have already become an established part of the creative process in music writing, holistic approaches to generative music production in traditional electronic dance music genres yet seem under-represented both in theory and practice.

    Fibonacci Jungle is a POC for a simple to use generative framework for Jungle and Drum & Bass built on the Fibonacci number sequence as structural alternative to conventional meters and track build-up.

    The framework is implemented in Pure Data. It uses probability and randomization within a pre defined set of genre typical parameter settings (tempo, harmonics, sample selection). Fibonacci Jungle allows creating stand alone tracks in a Jungle and Drum & Bass aesthetics with only a few clicks and can be individually customized.

    For a detailed description of concept and implementation, see this paper and the below presentation video from Generative Music Prize 2024, hosted by IRCAM, where Fibonacci Jungle was awarded 2nd place.

    The source code for Fibonacci Jungle is publicly available on GitHub.

    Output examples

    Fibonacci Jungle Versions – an EP of recordings based on the Fibonacci Jungle framework. Each track/ version has been recorded multiple times and individually distributed through different channels (BandcampNinaSpotify).

  • Spoor

    Early prototypes and setups in latent embedding mimickry and establishing a control level baseline in latent space have led to Spoor, both name of a loosely coupled set of Latent Jamming techniques and two releases:

    MARTSM/\N – Spoor Widen [datamarts/1KOMMA9]: Nina

    MARTSM/\N – Spoor [n/a]: Bandcamp

    Below video shows the setup that lead to tracks Loom and Loom Rewood.

    Track Architects was based on the following patch