Diffusion Based Audio Generation


  • Ruben Nithyaganesh Te Herenga Waka—Victoria University of Wellington


Software Engineering, Diffusion models, Audio Generation


Generating data from complex data distributions has been a long-standing problem in the field of artificial intelligence, with generative models offering many opportunities in rapid content creation, increasing efficiency, and many other use cases. Diffusion models are one class of generative models that have seen great success in recent years. In this work, we look to leverage current state of the art diffusion methods to generate musical audio. By estimating the gradient of an unknown target distribution, diffusion models have the capacity to generate new data samples from complex data distributions. Recent work has seen improvements to diffusion methods, particularly in training and sampling procedures that have allowed for improvements in sampling quality and cost. This project presents the usage of contemporary diffusion techniques for the purpose of musical audio generation and discusses the effectiveness of diffusion models in this setting. The work comprises of converting a dataset of classical piano pieces into a set of spectrogram images that are used within a diffusion-based setup to generate novel spectrogram images. The project converts generated spectrogram images back to raw audio, resulting in short audio sequences that resemble audio from our training set. Finally, the paper discusses opportunities for future work for diffusion methods, particularly in the domain of audio generation.


Download data is not yet available.




How to Cite

Nithyaganesh, R. (2023). Diffusion Based Audio Generation. Wellington Faculty of Engineering Symposium. Retrieved from https://ojs.victoria.ac.nz/wfes/article/view/8414



Software Engineering