Skip to main content
SearchLoginLogin or Signup

Reviving Traditional Raga Music: An AI-Driven Approach for Melody Generation Using GPT-4

Published onAug 29, 2024
Reviving Traditional Raga Music: An AI-Driven Approach for Melody Generation Using GPT-4
·

Abstract

Traditional music, deeply intertwined with our culture, is witnessing a global decline, underscoring the urgent need to revive and sustain its rich heritage. Ragas, being unique forms of Indian music, evoke distinct emotions through specific note arrangements. In this study, we intend to develop a melody track generation method using OpenAI ChatGPT to enable future Raga music composition assistance. In this study, we created a dataset of 20 singing voices in Jog Raga, extracted and pre-processed to form the Musical Instrument Digital Interface (MIDI) dataset. From this, we developed a mathematical representation of the MIDI notes for GPT-4 inputs by extracting pitch, volume, start time, and length from the MIDI file for GPT-4 inputs. Through prompt engineering and a set of instructions, we guided the model to respond in a pre-determined manner with the raga characteristics and constraints. For evaluation, we tested the generated songs for note adherence and it was found that the GPT-4 model produced songs with an average of 98% relevancy. Incorporating cross-domain learning, we applied GPT models, commonly used in natural language processing, to generate musical codes, showcasing the potential of interdisciplinary approaches in music AI. Our contributions include the creation of a Jog Raga song dataset, the transformation of Raga compositions into MIDI formats, the development of a method for digital encoding of Raga nuances, and the engineering of advanced prompts for Raga melody generation. Moving forward, we would train the OpenAI Jukebox model with a larger dataset to find if it can produce tracks with better Raga traits.

Index Terms - Raga Music, AI-Driven Composition, GPT-4, MIDI Processing, Melody Generation, Traditional Music Revival, Cross-Domain Learning, Jog Raga Dataset, Music AI, OpenAI Jukebox.

Introduction

Indian traditional music is witnessing a declining trend, primarily due to the rapid modernization of music in India[1] . Raga1 music’s nature as a complex and traditional art form makes it vulnerable to the predations of an increasingly accelerated and simplified modern life[1]. This downtrend has also been noticed in other traditional musical forms such as the Arabic Maqam, American Blues, and Spanish Flamenco, revealing the ancient paradigm of the new pushing out the old[2]. Recently, artificial intelligence (AI) is being researched to delve into the potential of Raga music and it presents a promising solution for the revival of diminishing classical music forms[3].

Raga music has been shown to have beneficial effects on both the physical and emotional domains. Raga music can aid in emotional upliftment, alleviate various health issues by influencing the mind, and can promote overall well-being in the listener[4]. Jog Raga is a captivating melody in Indian classical music, cherished for its emotional depth and soothing qualities. Traditionally played during the late evening, it is known for evoking a sense of tranquility and introspection. This raga is particularly admired for its ability to touch the hearts of listeners, bringing about a serene and contemplative mood. Its melodious nature and expressive nuances make it a favorite among musicians and audiences alike[5].​ Jog Raga is a Hindustani equivalent of Natta Raga in which D♯ and E are missing in both directions. Similarly, in ’Natta’ Raga, a child of Chalanatta, has B and E missing in its descending scale[6]. “Chalanattai” is the fundamental raga which includes all the notes of Jog and Natta and creates similar emotional effects (Figure 1). In this project, when we invoke the term ’Jog’, we indicate this family of three similar Ragas - “Jog”, “Nattai”, and “Chalanattai”, for simplicity.

Figure 1: Chalanatta scale

During composition, we have to comply with these Raga rules with these kinds of subtle differences to stay within the Raga. These ragas, that evokes an exotic effect with closely placed note combinations have a unique ascending and descending scale as seen in figure I, and expresses romance, joy, and serenity. The chosen Raga’s ascending scale and descending scale in Western musical notation and corresponding MIDI notes are as follows.

C D# E F G A# B C
60 63 64 65 67 70 71 72
C B A# G F E D♯ C
72 71 70 67 65 64 63 60

This gives a constricted set [60, 63, 64, 65, 67, 70, 71, 72] MIDI notes in C4 octave and the same notes in other octaves in this scale. In addition to this rule, any song from has to follow a few rules as follows to evoke a serene, contemplative mood:

Rule

Constraint

Raga Adherence

Ascending: C D# E F G A# B C (S R3 G3 M1 P D3 N3 S̱)
Descending: C B A# G F E D♯ C (S̱ N3 D3 P M1 G3 R3 S)

Tonal Purity

Emphasize the vadi (dominant) note 'G3' and samvadi (consonant) note 'N3'

Tala2 Compliance

Follow a rhythmic cycle like Rupak Tala (7 beats) or Jhaptal (10 beats)

Start And End On Tonic

Start and end on 'Sa'

Ornamentation

Use meends (glides) and andolan (gentle oscillations) around 'G3' and 'N3'

Singing Voice Range

Compose within the MIDI pitch range of 58 (B2) to 84 (C6) preferably.

Silence As Element

Incorporate rests and silences in alignment with the chosen tala

Characteristic Phrases

Use characteristic phrases like 'G3 M1 P N3 S̱' and 'S̱' N3 D3 P'

Table II: Composition Constraints of Nattai / Jog Raga

OpenAI custom GPTs allows the creation of personalized ChatGPT that can further enhance the understanding and generation of Raga music[7]. With customGPT, users can upload documents to provide context, fine-tune the AI model with specific instructions, and even integrate user prompts in real-time chat interactions. This customization enables the creation of a more targeted and effective AI assistant, capable of handling complex Raga compositions and potentially aiding in the preservation and promotion of traditional music forms[8].

Musical notations cannot be inputed into the ChatGPT as it only allows for user input as a string datatype. SCAMP (Suite for Computer-Assisted Music in Python) notations assists to communicate these intricacies of Raga with its constraints to GPT-4 in a mathematical format. SCAMP allows flexible management of musical time, playback of notes, and quantization, and thus it bridges the music composer and LLM[9]. Inspired by SCAMP, we generated mathematical notations from MIDI files to communicate with GPT-4.

The application of GPT-4 in Raga melody generation not only contributes to the preservation of this traditional art form but also opens avenues for cross-domain learning. In our literature review, we could find only limited research that dvelves into GPT-4 based OpenAI assistant that can compose signing voice melody tunes especially based on eclectic Indian Raga. We aim to fill this gap of AI Raga music composition by the following contributions:

  • Curated Jog Raga Song Dataset: Recorded 20 Raga tracks for data processing and LLM contextualization.

  • Raga Melody Conversion: Transformed Raga compositions into MIDI formats.

  • MIDI Microtonal Pre-Processing: Developed a method for digital encoding of Raga nuances.

  • Advanced Raga AI Prompting: Engineered prompts guiding GPT-4 in Raga melody generation.

  • Applied SCAMP for Music AI: Utilized SCAMP to enable GPT-4 interpretation of Raga MIDI.

  • RAG System for Raga Context: Built a system enhancing GPT-4’s understanding of Ragas.

  • Text-to-Music Custom GPT: Developed a GPT model that generates Raga tunes from text input.

This paper, involving cross-domain learning techniques from machine learning field - Natural Language Processing, is scalable to be applied to any traditional complex music using a chosen LLM. By integrating cross-domain learning techniques from other fields such as natural language processing, we can further refine the model's ability to generate authentic and emotionally resonant Raga melodies.

Literature Review

For literature study, a comprehensive review of 117 papers was performed, identifying 42 papers emphasizing audio processing, 28 of which specifically focused on the intricate study of Raga music. It is that form of music composition comprised of melodic motions that have the effect of clearing the hearts of men,” writes Matanga, Indian musicologist and theorist (9- 10th century AD)[10]. The term ’Raga’ finds its roots in Sanskrit and holds the essence of ’coloring or dyeing’—in the case of applying the term to music, it metaphorically colors the mind and evokes emotions in the performer and listener[10]. A Raga is a form of Indian music that has a sequence of musical notes that forms a melody with a specific arrangement of musical notes[11]. The Carnatic ”swaras” or notes have corresponding Piano / Keyboard Notations as shown in Table I.

Swara

Carnatic Notation

Western Notation

Sa

s

C

Ri

r1

C#

r2

D

r3

D#

Ga

g1

D

g2

D#

g3

E

Ma

m1

F

m2

F#

Pa

p

G

Da

d1

G#

d2

A

d3

A#

Ni

n1

A

n2

A#

n3

B

Table I: Western Notations for Classical notes

The performance of Ragas involves expressive nuances such as ’Gamakas’ 3 (ornamentations), ’Glissandos’4 (slides), and intricate timing variations, which are hard to represent digitally[3]. This complexity arises from several factors: firstly, Indian music employs microtonal intervals or ’shrutis’, smaller than the semitones of Western music, and these can vary between Ragas, making them difficult to model [12]. This complexity of Raga, a cornerstone of Indian music, poses a significant challenge for computational emulation.

The effectiveness of GPT-4 in the realm of music composition is a natural progression from its predecessors, capitalizing on its enhanced data processing and computational capabilities to refine the sophistication of language models[13]. These advancements have endowed GPT-4 with an elevated capacity for factual response generation and reasoning abilities, which can be harnessed for musical creativity[14]. In practical terms, GPT-4 can act as a dynamic tool for composers and producers, offering suggestions for musical structures, themes, and motifs[15][16]. The strength of GPT-4 lies in its ability to interact with detailed descriptions and prompts, generating a new series of musical notes through the permutations and combinations of the given notes within the specified goals and constraints, thereby transforming them into complex outputs[17].

SCAMP is a framework designed for computer-assisted composition, allowing composers to connect to various resources for playback and notation. It manages musical time flow, supports notes playback via SoundFonts or MIDI, and exports music notation as MusicXML or Lilypond​ (PyPI)[18]​. In this research, we adopted a simplified version of SCAMP's music representation, focusing solely on melody generation as an inspiration for music composers. Our simplified form includes pitch, start time, length, and volume, providing a straightforward approach for composers to experiment with musical ideas[18]. The code used for conversion from MIDI to SCAMP and vice versa is give in our GitHub repository[19].

(55, 3.46, 0.29, 0.51)
(pitch, start time, length, volume)

The Jog Raga, known for its expressive depth and complexity, offers a fertile ground for the application of advanced computational models to traditional music composition. By leveraging SCAMP’s capabilities, researchers and composers can navigate the unique melodic structures and rhythmic patterns and communicate with LLMs for AI music generation.

In this study, we explored the application of GPT-4 for Raga music generation. While GPT-4 is not explicitly trained on music data, its enhanced computational capabilities and extensive dataset enable it to understand and generate complex musical structures. Furthermore, by uploading files to provide context, the model can learn and generate responses relevant to retrieved Raga notations, enhancing its ability to produce compositions that resonate with the traditional essence of this genre. This makes it a valuable tool for composers and musicians seeking to experiment with traditional Raga music in a modern context. The approach to decode this complexity is described in the methodology section here.

Methodology

Leveraging the power of GPT-4, we sidestep the need for dataset-specific training or fine-tuning by employing mathematical representations of music, akin to SCAMP methodology, to impart the intricacies of Jog Raga. GPT-4's profound understanding of complex patterns is coaxed out through the conversion of musical compositions into a text-based MIDI format[20]. This transformation into a structured, numerical language is what allows GPT-4 to perceive and learn from the uploaded context documents. We provide the system with prompts through OpenAI's Assistants API, which supports tools like the uploading relevant document for the model to interpret and analyze these numerical patterns as musical compositions in the given Raga.

The Raga composition pipeline is shown in the below figure II.

Figure II: GPT-4 music composition pipeline

Dataset

In the pursuit of innovating Raga-based machine learning for music generation, we encountered the challenge of sourcing a dataset rich with the nuanced vocals inherent to chosen Jog Raga. As such a dataset is rare, we embarked on curating a dataset exclusively comprising the singing voice, devoid of any instrumental accompaniment or harmonic support. This singular focus on vocal melodies aimed to distill the essence of Jog Raga, that can be used even for instrumental melody compositions. We selected 20 published tracks, five of which are composed by the author. These songs were selected considering the expression of Raga's traditional melodic patterns. A professional singer sang these songs and we recorded it in a professional setting. A sample of a song composed by the author for this project is given below.

Wav/mp3 files to MIDI conversion

The next stage involved the process of converting these recorded vocal tracks into MIDI format. Further to extensive research on different tools, we chose Spotify’s Basic-Pitch model as it extracts the melody in its purest form[21]. This model's ability to produce polyphonic outputs with speed and efficiency proved instrumental in our efforts. By utilizing the Constant-Q Transform and harmonic stacking technique, the model aligns frequencies, creating a pure representation of pitch, note activation, and onset[21].

This file was further pre-processed in four stages: Transpose, Measures, Contour, and Patterns. The Transpose stage adjusts pitch, aligns notes to the Raga if any deviances, and focuses on the lead melody. Measures stage calculates note density, removes low-density measures, and standardizes rhythm. Contour stage normalizes velocity, corrects melodic contours, and removes percussion. The Patterns stage quantizes notes and identifies repeating motifs, ensuring compliance with the Raga's structure. The resulting MIDI files capture the essence of Jog Raga, serving as a feasible source for LLM inputs.

MIDI to SCAMP Conversion

SCAMP, a computer-assisted composition framework in Python, provides a flexible connection between the composer-programmer and various resources for playback and notation[9]. It allows for the management of musical time, playback of notes, and quantization, and exports the result to music notation formats like MusicXML. In SCAMP, notes are played back with additional properties such as articulations, notations, noteheads, spanners, dynamics, and text markings, which add expressive qualities to the Raga music expression[9]. Effectively, the MIDI information is accurately translated into a mathematical format that LLMs can understand and manipulate. These SCAMP files are uploaded as context documents into the custom GPT model for augmentation, enabling the LLM to generate melodies based on the Raga Jog patterns.

Prompt Engineering Development

Prompt engineering is crucial for capturing Raga patterns and composing tunes based on user requests, ensuring effective responses from the LLM.. The prompts are categorized into two types: a general system prompt and a Raga Jog-specific prompt. The general system prompt outlines the role and goal of RagaGPT Composer, emphasizing its function in the exploration and understanding of Indian Raga music. The Raga Jog-specific prompt developed provides detailed instructions on the composition's constraints and characteristics, such as the specific MIDI note set, tonic note, intervals, characteristic phrases, emphasis notes, and temporal rules[19]. A prompt excerpt is given below:

Compose a piece in the Jog Raga with an average of 7 notes per line, varying between 5-20 notes as needed. The composition should avoid uniform or linear progressions, and each line should have a similar level of complexity and note distribution as reference songs. Ensure that no line is significantly shorter or simpler than the others. The overall length of the composition should match the original song.

The concept of Chain of Thought (CoT) prompting has been explored as a means to improve the reasoning capabilities of Large Language Models (LLMs) in various domains, including mathematics, commonsense reasoning, and symbolic reasoning[22]. CoT prompting involves providing LLMs with step-by-step examples or intermediate reasoning steps, which can significantly enhance their performance in solving complex problems[23][24]. We adopted chain of thought prompting here for more structured but creative musical notes generation in the given constraints as below.

{
  "Objective": "Compose a piece in the Jog Raga, focusing on non-linear melodic progressions and avoiding repetitive, step-wise movements.",
  "Melodic Range": "Utilize the MIDI range [58, 59, 60, 63, 64, 65, 67, 70, 71, 72, 75, 76, 77, 79] strictly to capture the expressive capacity of the Jog Raga.",
  "Note Density": "Aim for an average of 7 to 8 notes per line, with variation in line lengths to include sequences ranging from 13 to 20 notes.",
  "Rhythmic Complexity": "Incorporate diverse rhythmic elements such as dotted notes, syncopations, and variable rests.",
  "Non-Linear Progressions": "Focus on creating melodic phrases that leap across the scale, avoiding long sequences of step-wise movements.",
  "Resolution and Improvisation": "Balance between improvisational flair and structured melodic development, ensuring phrases resolve satisfactorily back to the tonic note (C)."
}

Custom GPTs let you customize ChatGPT for a specific purpose. In our study, we explore the potential of custom GPT named as “JogRagaGPT” [blinded] for AI Raga musical generation. JogRagaGPT is specifically designed to generate music sequences in the Jog Raga style, adhering to the raga's characteristic notes and emotional depth. After giving custom instruction, when asked JogRagaGPT for what it is programmed for, it gave the following answer (json format).

{
  "description": "JogRagaGPT is designed with instructions as followed and is programmed for jog based raga music notes sequence generation.",
  "purpose": "To facilitate the creation of music in the Jog Raga style, adhering to the specific notes that define the Raga.",
  "functionality": {
    "composition_generation": {
      "description": "Generates new compositions by varying existing songs with different permutations and combinations, creating a new piece each time.",
      "note_range": "Ensures that all notes fall within the valid signing range for Jog Raga: [58, 59, 60, 63, 64, 65, 67, 70, 71, 72, 75, 76, 77, 79].",
      "avoid_notes": "Avoids the pitches [57, 61, 62, 66, 68, 69, 73, 74, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88], which are outside the acceptable scale.",
      "emotion_and_variety": "Ensures that the composition is rich with the Raga's emotive capacity and improvisational variety, never resorting to simple wave-like patterns."
    }
  },
  "implementation": {
    "analysis": "When given a MIDI file, analyzes the pitch values to ensure adherence to the Jog Raga's scale.",
    "creativity": "Avoids repeating patterns and includes improvisational elements to maintain the unique characteristics of Jog Raga."
  }
}

After training when asked JogRagaGPT on how it composes Raga specific note patterns, it gave the following details in a json format.

{
  "Understanding Jog Raga Characteristics": "Focuses on specific notes: Sa, Re (flat), Ga, Ma, Pa, Dha (flat), Ni (flat), capturing the Raga's emotional mood of longing and reflection.",
  "Adherence to the Scale": "Ensures compositions strictly follow Jog Raga's scale, using specific notes in both ascending and descending scales, avoiding notes outside this set.",
  "Raga Adherence": {
    "Ascending": "C D# E F G A# B C (S R3 G3 M1 P D3 N3 S)",
    "Descending": "C B A# G F E D# C (S N3 D3 P M1 G3 R3 S)"
  },
  "Tonal Purity": "Emphasize the vadi (dominant) note 'G3' and samvadi (consonant) note 'N3'.",
  "Tala Compliance": "Follow a rhythmic cycle like Rupak Tala (7 beats) or Jhaptal (10 beats).",
  "Start And End On Tonic": "Start and end on 'Sa'."
  "Emotional Expression Through Gamakas": "Applies slight oscillations, slides, and bends on key notes like Ga and Ni to enhance the Raga’s emotional depth without altering its scale.",
  "Fluidity with Meends": "Incorporates smooth transitions between specific notes, ensuring the Raga's fluidity. Carefully plans the slides to maintain characteristic flow.",
  "Phrasing and Note Patterns": "Emphasizes characteristic notes through specific patterns, showcasing the Raga's unique mood. Patterns often highlight the longing quality of Ga and Ni.",
  "Rhythmic Structure": "Composes melodies with an understanding of the rhythmic cycle, ensuring melodic phrases complement and enhance the rhythm, aligning with tempo and beats.",
  "Development and Variation": "Gradually introduces notes and phrases, starting from the lower octave and moving higher, then elaborating on them with variations in speed, rhythm, and register.",
  "Emotional Expression": "Aims to evoke a sense of longing and introspection, choosing phrases that resonate with the Raga's mood, ensuring each note contributes to the emotional impact.",
  "Incorporation of Silence": "Thoughtfully incorporates pauses between notes to enhance the emotional and musical tension, reflecting and anticipating.",
  "Integration with Composition": "Integrates gamakas and meends as integral parts of the melody, considering how embellishments can highlight the Raga's key phrases and emotional peaks.",
  "Reference and Innovation": "Explores innovative expressions while respecting the Raga’s foundational structure, incorporating unexpected rhythms, new phrasing, or experimenting with dynamics."
}

Melody Composition

Following the interaction with GPT-4 for melody generation through prompt engineering, the LLM consults the provided reference documents to learn musical patterns based on user requests, incorporating user emotions and story context. It then generates a new song in SCAMP notations, which include information such as pitch, volume, start time, and length of each note.

A GPT-4 produced notes sample and corresponding MIDI generated (Audio 1) is given below:

[
    [67, 3.0, 0.22, 0.57], [70, 3.19, 0.26, 0.58], [67, 3.38, 0.2, 0.56], [72, 3.56, 0.4, 0.57], 
    [70, 3.94, 0.42, 0.57], [67, 4.31, 0.18, 0.54], [67, 4.5, 0.11, 0.57], [65, 4.59, 0.11, 0.57], 
    [67, 4.69, 0.08, 0.57], [65, 4.75, 0.66, 0.54], [65, 5.44, 0.23, 0.55], [70, 5.62, 0.23, 0.55], 
    [67, 5.81, 0.21, 0.56], [65, 6.0, 0.14, 0.55], [64, 6.1, 0.07, 0.56], [65, 6.21, 0.03, 0.17], 
    [64, 6.22, 0.73, 0.55], [65, 6.94, 0.12, 0.56], [64, 7.02, 0.13, 0.57]
]

Audio 1: GPT-4 composition - sample notes

Subsequently, the conversion of SCAMP notation back to MIDI involves creating a stream of musical elements where each note is represented with its specific attributes[19]. The stream is then saved as a MIDI file, which can be used for playback or further musical analysis.

Results

In our research, the utilization of GPT-4 for Jog Raga composition demonstrated 98% adherence to the structures and intricacies of the Raga scale, supported by a custom-developed prompt strategy. We have published customized GPT as JogRagaGPT. After several rounds of reference songs learning and prompt updates, the system successfully generated melodies that strictly followed Jog Raga's scales, characteristic phrases, and rules[19]. By analyzing reference songs for core motifs and rhythms, the model introduced creative variations in melody, rhythm, and dynamics while maintaining the raga's foundational elements. Techniques such as motif inversion, rhythmic alterations, and the strategic use of silence were employed to avoid repetition, enhance expressiveness, and maintain a consistent tempo. These compositional choices resulted in a dynamic and engaging musical narrative as shown in the example Audio 2.

Audio 2: MIDI composed by GPT-4

From the MIDI file generated in the above process (Audio 1), the composer can listen to the melody and make any necessary adjustments or inferences to refine the composition.

Vocal translation and composition rough sample inspired by the above melody:

Audio 3: Singing Voice version of MIDI composed

As you can hear above Audio 3, the GPT-composed melody tunes were enhanced and composed into a singing voice track in Jog Raga. This is a method that allows for the creation of a final singing melody or even an instrumental track that can be performed by a musicians.

Discussion

The integration of AI in Raga music composition presents a fascinating intersection between technology and traditional art forms. While AI models like GPT-4 can generate melodies based on specific Raga rules, it's important to recognize that Raga music encompasses more than just a sequence of notes. In Raga music, emotions are expressed through specific note sequences and ornamentations that evoke distinct feelings, creating a deep emotional connection with the listener. Jog Raga is typically characterized by a deep, soulful, and slightly melancholic sound and so we selected it and focussed on it solely to create a working model first. LLMs could reproduce Raga emotional expressions by referencing the patterns and nuances of Ragas and generating compositions that mimic the traditional structure and emotional depth of Raga music. However, JogRagaGPT doesn't "learn" in the traditional sense of machine learning models being trained on new data. Instead, the insights gained from the dataset will be manually incorporated into custom GPT's programming and instructions.

The heart of a composer, combined with the soulful rendition of a singer, brings forth the intrinsic essence of a Raga song, which may not be fully captured by AI alone. In this context, AI serves as a tool for inspiration and creativity enhancement, rather than acting as a standalone composer. It can provide a foundation for melodies that adhere to Raga rules, but the human touch is essential to infuse emotion and depth into the music. This synergy between AI-generated structures and human interpretation underscores the potential of technology to augment, rather than replace, traditional musical composition. As we explore the links between music AI and adjacent domains, it becomes clear that cross-disciplinary research can enrich our understanding of both music and AI, offering valuable insights to the broader community and fostering a collaborative approach to innovation in the arts.

The evaluation of the song involved transposing the MIDI file to a C time signature and checking if the MIDI notes adhered to the Raga musical notes specified in the raga notes set. The function ensures that the song is in the correct key and it checks the adherence of the MIDI notes to the Raga. If the MIDI file does not fully adhere to the Raga musical notes, the function identifies the non-adhering notes and calculates the accuracy percentage - a ratio of adhering notes to the total number of notes in the sequence, multiplied by 100. This evaluation provided a quantitative measure of how closely the song aligns with the Raga musical notes. In our experiment, the LLM produced songs resulted in an average accuracy of 98% with respect to note compliance. In addition, we verified with an Indian classical music teacher and found that the GPT output has followed the Raga style especially when we translate it by recording a human voice. However, there are still areas of improvement the embellishments, meends, and microtonal Variations. This could be done by adding more songs in the dataset and refining the prompt published in our [blinded] Github[19].

Limitations

JogRagaGPT captures the unique aspects of Raga music but faces challenges due to its foundation on GPT-4, not specifically trained on music datasets. This can lead to compositions that may require adjustments to authentically reflect the Raga style, sometimes resulting in linear and repetitive music lacking traditional Raga complexity.

Conclusion

Raga music's complexity, with its gamakas, glides, and microtonal variations, presents unique challenges for AI-based music generation. While GPT-4, not specifically trained on music data, has shown promise through innovative methodologies, the intricacies of these musical elements underscore the limitations and the expansive potential for further development. Our experiments have laid a foundational framework, and there is a considerable scope for enhancing the AI's capability to authentically replicate the nuanced expressions of traditional music forms. By providing a larger dataset and in-depth prompt engineering, GPT-4 could produce better Raga based tunes. The ultimate goal of this ongoing research is to refine and expand these initial observations into a versatile text-to-music tool for Raga melody composition. It is applicable across the spectrum of vocal and instrumental music, and scalable to the intricate diversity of global musical traditions.

Ethics Statement

In this study, we have adhered to ethical standards in research and data handling. We declare no potential conflicts of interest, financial or non-financial, that could influence the results or interpretation of this work. The research did not involve human or animal subjects; hence, institutional ethics committee approval was not required. We are aware of the potential societal, social, and environmental impacts of our work, especially in the context of cultural preservation and technological advancement. We have endeavored to ensure that our research contributes positively to the field of musicology and AI, respecting the traditions and heritage of Raga music.

Future Research

Future research will focus on extending AI capabilities to create various melodies, generate chords, and produce multi-track songs for reviving traditional music. A targeted approach like training OpenAI's Jukebox, Meta’s MusicGen, or Google’s MusicLM model could enhance such complex music composition. Additionally, creating a larger dataset based on different moods would help in exploring the possibility of attaining emotional touch as per each Raga.

Comments
0
comment
No comments here
Why not start the discussion?