Poster Craze

10:40-11:00 on Wednesday, 4th September

P350 Lecture Theatre, Parkside

NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization
Patricio López-Serrano, Christian Dittmar, Yigitcan Oezer and Meinard Mueller

Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB and Python implementations of conceptually distinct NMF variants—in particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB and Python code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.

View Paper
An FPGA-Based Accelerator for Sound Field Rendering
Yiyu Tan and Toshiyuki Imamura

Finite difference time domain (FDTD) schemes are widely applied to analyse sound propagation, but are computation-intensive and memory-intensive. Current sound field rendering systems with FDTD schemes are mainly based on software simulations on personal computers (PCs) or general-purpose graphic processing units (GPGPUs). In this research, an accelerator is designed and implemented using the field programmable gate array (FPGA) for sound field rendering. Unlike software simulations on PCs and GPGPUs, the FPGA-based sound field rendering system directly implements wave equations by reconfigurable hardware. Furthermore, a sliding window-based data buffering system is adopted to alleviate external memory bandwidth bottlenecks. Compared to the software simulation carried out on a PC with 128 GB DDR4 RAMs and an Intel i7-7820X processor running at 3.6 GHz, the proposed FPGA-based accelerator takes half of the rendering time and doubles the computation throughput even if the clock frequency of the FPGA system is about 267 MHz.

View Paper
Synthetic Transaural Audio Rendering (STAR): a Perceptive Approach for Sound Spatialization
Eric Méaux and Sylvain Marchand

The principles of Synthetic Transaural Audio Rendering (STAR) were first introduced at DAFx-06. This is a perceptive approach for sound spatialization, whereas state-of-the-art methods are rather physical. With our STAR method, we focus neither on the wave field (such as HOA) nor on the sound wave (such as VBAP), but rather on the acoustic paths traveled by the sound to the listener ears. The STAR method consists in canceling the cross-talk signals between two loudspeakers and the ears of the listener (in a transaural way), with acoustic paths not measured but computed by some model (thus synthetic). Our model is based on perceptive cues, used by the human auditory system for sound localization. The aim is to give the listener the sensation of the position of each source, and not to reconstruct the corresponding acoustic wave or field. This should work with various loudspeaker configurations, with a large sweet spot, since the model is neither specialized for a specific configuration nor individualized for a specific listener. Experimental tests have been conducted in 2015 and 2019 with different rooms and audiences, for still, moving, and polyphonic musical sounds. It turns out that the proposed method is competitive with the state-of-the-art ones. However, this is a work in progress and further work is needed to improve the quality.

View Paper
Data Augmentation for Instrument Classification Robust to Audio Effects
António Ramires and Xavier Serra

Reusing recorded sounds (sampling) is a key component in Electronic Music Production (EMP), which has been present since its early days and is at the core of genres like hip-hop or jungle. Commercial and non-commercial services allow users to obtain collections of sounds (sample packs) to reuse in their compositions. Automatic classification of one-shot instrumental sounds allows automatically categorising the sounds contained in these collections, allowing easier navigation and better characterisation. Automatic instrument classification has mostly targeted the classification of unprocessed isolated instrumental sounds or detecting predominant instruments in mixed music tracks. For this classification to be useful in audio databases for EMP, it has to be robust to the audio effects applied to unprocessed sounds. In this paper we evaluate how a state of the art model trained with a large dataset of one-shot instrumental sounds performs when classifying instruments processed with audio effects. In order to evaluate the robustness of the model, we use data augmentation with audio effects and evaluate how each effect influences the classification accuracy.

View Paper
Time Scale Modification of Audio Using Non-Negative Matrix Factorization
Gerard Roma, Owen Green and Pierre Alexandre Tremblay

This paper introduces an algorithm for time-scale modification of audio signals based on using non-negative matrix factorization. The activation signals attributed to the detected components are used for identifying sound events. The segmentation of these events is used for detecting and preserving transients. In addition, the algorithm introduces the possibility of preserving the envelopes of overlapping sound events while globally modifying the duration of an audio clip.

View Paper
Visualaudio-Design – Towards a Graphical Sounddesign
Lars Engeln and Rainer Groh

VisualAudio-Design (VAD) is a spectral-node based approach to visually design audio collages and sounds. The spectrogram as a visualization of the frequency-domain can be intuitively manipulated with tools known from image processing. Thereby, a more comprehensible sound design is described to address common abstract interfaces for DSP algorithms that still use direct value inputs, sliders, or knobs. In addition to interaction in the timedomain of audio and conventional analysis and restoration tasks, there are many new possibilities for spectral manipulation of audio material. Here, affine transformations and two-dimensional convolution filters are proposed.

View Paper
A general-purpose deep learning approach to model time-varying audio effects
Marco A. Martínez Ramírez, Emmanouil Benetos and Joshua D. Reiss.

Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect units are often optimized to a very specific circuit and cannot be efficiently generalized to other time-varying effects. Based on convolutional and recurrent neural networks, we propose a deep learning architecture for generic black-box modeling of audio processors with long-term memory. We explore the capabilities of deep neural networks to learn such long temporal dependencies and we show the network modeling various linear and nonlinear, time-varying and time-invariant audio effects. In order to measure the performance of the model, we propose an objective metric based on the psychoacoustics of modulation frequency perception. We also analyze what the model is actually learning and how the given task is accomplished.

View Paper