A New Model for Symbolic Music Generation Using Musical Metadata
Artificial intelligence (AI) is revolutionizing the music industry by enabling the creation of tools for automatic generation of musical compositions and specific instrument tracks. However, most current tools are designed exclusively for musicians, composers, and music producers, making them inaccessible to non-experts.
Researchers at LG AI Research have developed a novel interactive system that democratizes music creation, allowing any user to translate their ideas into music effortlessly. This system, described in a paper on the arXiv preprint server, integrates a decoder-only autoregressive transformer trained on extensive music datasets with an intuitive user interface.
In their paper, Sangjun Han, Jiwon Ham, and colleagues introduce symbolic music generation that focuses on providing short musical motifs as central themes. They utilize an autoregressive model which uses musical metadata inputs to generate 4 bars of multitrack MIDI sequences. The model is trained on the Lakh MIDI and MetaMIDI datasets, totaling over 400,000 MIDI files that encode various musical track details like notes played, their duration, and speed.
To enhance model training, the researchers converted each MIDI file into a musical event representation (REMI). This format encodes MIDI data into tokens representing music features like pitch and velocity, crucial for training AI in music generation favorably. "During training, we randomly drop tokens from the musical metadata to guarantee flexible control," they explain. This ensures users can select input types while maintaining generative performance, fostering flexibility in music creation.
Beyond its transformer-based model, the team developed a user-friendly interface apt for both experts and non-experts. It features a sidebar and a central interactive panel. Users can specify musical elements like instruments and tempo in the sidebar, and edit the generated track in the central panel, such as by adding or removing instruments or adjusting their entry times.
"We validate the strategy's effectiveness through experiments assessing model capacity, musical fidelity, diversity, and controllability," they report. A subjective test comparing it with other models confirmed its superiority in control and music quality. Their model reliably generates up to 4 bars of music based on user specifications.
Future enhancements may include extending the track duration their model can create, broadening user specifications, and further refining the user interface. Though their current model, trained to generate 4-bar sequences with global control, has some limitations in extending music length and controlling bar-level elements, their work is significant in generating high-quality musical themes suitable for looping.
Earlier, SSP wrote that HP introduced AI features to boost printer functionality.