Skip to main content
SearchLoginLogin or Signup

Jazz Guitar Voice-Leading Chord Fingerings With Long Short-Term Memory

Providing automatically generated guitar chord tablature practice material given chord name input

Published onAug 29, 2024
Jazz Guitar Voice-Leading Chord Fingerings With Long Short-Term Memory
·

Abstract

A problem in guitar practice is choosing chord voicings that fit together in sequence, a process known as voice leading. In jazz, a guitarist follows voice leading by maintaining stepwise or limited motion for smoother harmony. The main avenues to learn jazz guitar voice leading theory are through a guitar instructor or chord books. To our knowledge, no computational method of generating voice-leading given chord labels exists. Creating such a system would make advanced jazz guitar playing more accessible by providing expansive practice material. We present a novel approach to algorithmically derive tablature sequences for a given chord progression with a sequence-to-sequence long short-term memory (LSTM) model. We present a new dataset consisting of guitar chord names and tablature for multiple professional jazz guitarists. We then boost this data by transposing it to all twelve diatonic keys. We tokenize into an alphabet of all seen chord labels and tablatures. With the boosted data and our alphabet, we train our LSTM model on chord sequences of length three with a mean reciprocal rank metric. In cross-validation, our model consistently ranked held-out ground-truth expert fingerings among the top predicted choices from hundreds of possible tablature sequences. This model is the groundwork for an educational tool for guitar students to practice meaningful chord voicings.

Introduction

The guitarist can play single-note lines or chords when playing jazz guitar. These chords are given to the guitarist in a chord chart, which contains both the chord name and a rhythm with which to play the chords. With this chord name, the guitarist can then make several choices. These choices range from which intervals in the chord should be emphasized, or omitted altogether, if these notes should be played in a low register or high register, or the spacing between each note in the voiced chord. The result of these decisions is known as the chord voicing. The decision of which chord voicing to use becomes more interesting when considered in the context of all other chords in the song and their respective voices. This process of moving each voice in the chord throughout the progression is known as voice leading [1].

The main difference between jazz guitar voice leading theory (GVLT) and voice leading theory is the role of the bass. In typical voice leading, the bass leaps around when other voices move stepwise. Jazz guitarists typically play with a bassist, so the guitarist does not have to worry about the harmonic role of the bass and mainly focuses on the upper part of the chord.

With the removal of the bass, GVLT may be solved by selecting from many possible chord inversions. Inverting a chord means changing the ordering of each note in a chord relative to the frequency of the note. Therefore, we can change the roles of different components of the chord. The root note of the chord, or the lowest frequency note, would normally be the bass note to play the correct chord, but with GVLT and the existence of a bass, the root could be any note in the named chord. By using inversions, we have a larger group of chord voicings to select from when working out efficient voice leading.

Learning GVLT is difficult for the guitarist because the guitar is typically taught without voice-leading principles [2]. A beginner jazz guitarist will learn set chord shapes for each chord type, but playing a chord progression with these shapes involves huge leaps over the guitar neck. Therefore, the process for learning voice leading is to expand from these set shapes and learn the notes that are near each position to better understand the guitar neck. Once the guitarist has a better knowledge of the fretboard in each position, they can begin to play new progressions with this newly learned technique.

Due to this difficulty in mastering GVLT, we propose an algorithmic approach to approximate “good” GVLT. With two LSTM models [3] where one acts as an encoder and the other acts as a decoder, we can create a system that performs sequence-to-sequence generation. Using sequence-to-sequence generation, we can create a model that takes in a sequence of chord names and outputs a sequence of fingerings for each chord. Our goal is that these fingerings will have proper voice leading over the chord progression to assist a jazz guitar student in obtaining more practice materials for learning GVLT. We hope to make the study of jazz guitar more accessible to a wider audience.

Related Work

Previous research on algorithmically generated guitar tablature includes Cwitkowitz et al. [4], who find that score to tablature mapping for guitar has lacked significant study because of a lack of large data sets. McVicar et al. [5] expand on this idea by offering transposition as a method for expanding a data set, similar to the method that we use in this paper to find more transition data between chords.

Various models have been applied to find an optimal tablature arrangement for the guitar. Radisavljevic et al. [6] proposed a dynamic programming approach with path difference learning to converge on examples from professional guitarists. Hidden Markov Models (HMM) are common in the literature [7][8] both to use HMMs as a method to arrange scores into tablature and to use HMMs to convert audio into the most likely original tablature. Finally, a deep learning approach has been proposed with a convolutional neural network (CNN) model which converts audio into tablature [9], and a transformer-XL model for tablature and rhythm generation [10]. Our work expands on this data-driven approach to guitar modeling by proposing the new problem of GVLT and utilizing an encoder-decoder model to solve this problem.

The inspiration for our model architecture came from Cheung et al. [11], who created a model for generating violin fingerings from scores. The violin model used a variational autoencoder (VAE) to input their scores and output a new fingering pattern. This VAE contained bidirectional long short-term memory layers (BLSTM) whereas our model is a long short-term memory (LSTM) model. Further research is needed into the trade-offs between BLSTM and LSTM in the context of guitar chord-to-fingering mapping.

Data Set

We hand-curated a small dataset to demonstrate that GVLT is a learnable problem1. We used two sources: the book All Blues for Jazz Guitar (ABJG) by Jim Ferguson [12] and a collection of transcriptions made specifically for this project by Jason Ennis, an expert-level jazz guitarist who is a senior lecturer at Dartmouth College. We will refer to Ennis’s annotations as JE. We chose ABJG because they define voice leading on guitar before all their tablature examples. Therefore, we could guarantee their voicings were intentional examples of good voice leading. Similarly, Ennis studied and now teaches good voice-leading practices on guitar. We also verified with Ennis that all his chord voicings were best-practice examples of GVLT.

Sources

The following table breaks down each song, the source, and the total number of distinct chords in the transcription. Therefore, a chord was only recorded if it was different than its neighbors in sequence, and we condensed neighboring measures with equal chord voicings.

Source

Song Name

Total Chords

ABJC

“Twelve For Three”

12

ABJC

"Blues Basis”

12

ABJC

“Work Down Blues”

12

ABJC

“Route 2 Blues”

12

ABJC

“Voice Leading Ex 1”

3

ABJC

“Voice Leading Ex 2”

3

ABJC

“Half-Step Blues”

26

ABJC

“Freddie’s Home”

65

ABJC

“Swinging on the Spot”

24

ABJC

“Two’s Blues”

21

ABJC

“Slow Funky Shuffle”

48

ABJC

“Slow Blues”

16

ABJC

“Herbie’s Here”

48

JE

“B\flat Blues”

42

JE

“It Could Happen to You”

64

The data was codified with two equal-length sequences for each song. The first sequence was of chord names, which contained both the root of the chord with no inversions in the chord name and the chord type. The second sequence was of the ground truth fingering patterns for these chord names. The fingerings were recorded as six values corresponding to the E, A, D, G, B, and E strings, respectively. Each of these strings was either "0-22" for the fretted note or "x for a muted note.

Verification

We needed to verify that our data had a correct mapping from chord name to chord fingering for three reasons. First, all data was entered manually, so a verification script was necessary to check for any typos. The second reason is the transcriptions by Ennis were made for this project and had no external review process. The final reason is that our dataset is small, so any error that could have occurred would have a large impact on the quality of our results.

The data comprised the song name, chord names, and chord voicings. To verify the data we performed three checks: checking the size of the voicings, checking for valid characters, and checking for valid notes in the voicings.

The first two tests were trivial. For checking the size of the voicings, we counted the length of the voicing string and ensured that there were six fingering positions in the voicing (or the ‘x’ character marking the string is muted). To check for valid characters, we ensured that the voicings only contained the characters in the set of {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -, x}. We also checked if the chord names contained valid characters as part of the longer check for valid notes in the voicings.

We used a list of note roots (all 12 of the chromatic notes) and all valid chord types {min, min7, min6, 6, maj6, 7, 9, (major), 7 flat 9, 7 flat 5, min7 flat 5, aug, dim7, maj7}. We also had the intervals between the notes for each of the valid chord types. So, we built an array of all the valid notes in a given chord from the root name and the chord type.

Next, we traversed through the ground truth voicing in the data set. We converted the fret position to the note name for each note in the voicing. Then, we checked that it exists in the array of all notes for that chord name. If the note was not a note for the chord name, then the test failed for that chord. Note that this was also how this test checked the chord names; if the chord name were not valid then the array of notes for that chord would be empty and the valid notes in the voicing test would fail on the chord.

Our test also allowed for an oversimplification of a chord. For instance, the chord Amin7 could be voiced with all A notes and that would be valid. Alternatively, we could have made this test such that all of the notes in the chord must exist in the voicing. However, this test fails with no5 voicings and any 9th voicing because at least one chord degree must be omitted. Ultimately, since we are doing the data entry by hand, it was much more likely that a chord has a voicing that omits a certain chord degree than one that contains so few distinct notes that the chord is invalid.

Ultimately, this verification script cleaned potential errors both in the data entry pipeline and in the guitar teacher's annotation process.

Data Augmentation

From manual data gathering, we were able to obtain songs that were in a certain key with exact fingering patterns. This collection process was slow, so we used the cyclical nature of music to generate additional data. We set the bounds of allowable frets between fret 1 and 18, inclusive. We did not include the open fret position because it is a special case in playing style and our unaugmented data does not have any examples of utilizing it. We also chose to end our frets at 18 because in practice the human hand has difficulty reaching the final couple of frets, especially when playing chords. Then, we iterated over all of the transpositions of the entire chord progressions and appended all full chord progression transpositions that had every note in each fingering within the bounds. This fleshed-out list was saved as a separate CSV file.

The rationale behind transposing the data to all valid positions was that guitarists instinctively understand that the guitar is fundamentally a transposing instrument and is translationally invariant concerning each fret number. Guitarists frequently perform this data-augmenting process when playing. So, by providing this data to the machine, it allows the model to understand the transposition process. Additionally, augmenting the data will give the model more exposure to all parts of the alphabet.

Sequence Chunking

The data that we have gathered is variable in length, with the shortest sequence having a length of 3 chords and the longest having a length of 48. Therefore, training is difficult because the most frequent chord type or fingering type is the padding character. We approached this problem by cutting the sequence into chunks, with a sliding window used to generate them. For example, if we have a sequence of [A, B, C, D, E], this would become three sequences: [A, B, C], [B, C, D], [C, D, E].

The other design choice would be to have each element of the sequence represented only once, which would make the output [A, B, C], [D, E, ‘’] where ‘’ is the padding character. We decided on a sliding window because we wanted to represent the dependency for each chord past the first chord on an equal number of previous chords, just as a human musician would think. If we only represented each element in the overall sequence once, we would be removing the important data on how we arrived at a chord that might be the first chord in a chunk, but not the first chord in a sequence.

Sequence chunking introduces a decision to be made when training the model. At a smaller chunk size, there will be more data. Additionally, with a smaller sequence, there are fewer possible patterns to recognize, and with the limited amount of data, convergence will be more likely. However, the overall structure of these chord sequences is also important. With a longer sequence, we can pay attention to the macro-structure of the song and the comping decisions. Ultimately, the size of the chunks can grow proportionally with the size of the data set.

We used chunks of three to train the model again. We needed these chunks to be small so that we could maximize the number of generated data points. But, three was the tipping point before the chunks got too small because three chords can capture key jazz concepts. The two main examples we based this decision on were 2-5-1 progressions, which happen frequently in all types of jazz, and 5-4-1, which happens in most jazz blues progressions.

Alphabet

When working with an encoder-decoder model, we can either have our input/output space as a distribution or as one hot encoding. One hot encoding is defined as forming an alphabet of possible inputs and outputs and classifying each input and output as one of the elements in the alphabet. Essentially, our choice is between regression and classification problems, where one hot encoding is classification. The technology we are laying the groundwork for would create an education tool. This education tool would ideally never output chord fingerings that are never used or unplayable by a human guitarist. Therefore, we use one hot encoding rather than regression to constrain the output space and maximize the correctness concerning education.

Long-Short Term Memory Networks in Python (LSTMNIP) by Jason Brownlee inspired our alphabet structure [13]. In particular, section 9.3.3 demonstrates the specific notion of an alphabet as the union of two disjoint sets. In our case, one set contains chord names and one contains fingering patterns. Chord names are simple enough to define, and the set does not become too large with some care. There are 12 possible root names for chords. For our purposes, inversions are not named any differently than the root position chord due to the assumption of a separate bass line. Then, we have a finite number of chord types, for example, {min7, maj7, 7, dim7, …}.

By generating all combinations we have our alphabet of chord names

{Amin7,,Adim7,,Gmaj7,,Gdim7}\{\text{Amin7}, \ldots, \text{Adim7}, \ldots, \text{Gmaj7}, \ldots, \text{Gdim7}\}

Determining the set of all fingering patterns is a more difficult problem to solve. We could encode every possible fingering pattern. With around 22 frets on a standard guitar, the option to mute the string, and six strings to work with, we would have a tremendously large output space. Our dataset is not large enough to get any reliable classification on that output space, so we will need to constrain the output space.

The process that we used in our model training and evaluation was to form the alphabet only from seen chord fingerings, defined as chord-fingering patterns that were in the data set. The size of this final alphabet was ~500 in length. Before training the model in any way, we preprocessed the data set and extracted these fingering patterns to use as the alphabet. Using only seen chords provides three main benefits. First, our model trains much faster with this constrained alphabet space. Second, if the chord fingering does not exist in the data, we would never expect the model to output it even if that fingering existed in the alphabet. Finally, the alphabet can grow with the data set. Therefore, if there's some fingering position that is desired but not included in a certain alphabet, one only needs to expand the data set to include that fingering.

The notable drawback with this alphabet is that a fingering pattern in the held-out ground truth for the testing set might not be in the training labels. We could have condensed this alphabet further to only allow for fingerings in the held-out testing set. That way, we could have given our model model output the best chance of matching the testing set ground truth. However, this constraint would have been too restrictive for our dataset, given its size. Later in section 4.3, We discuss evaluating the model outside of a binary correct/incorrect evaluation against held-out ground truth. This holistic analysis allows us to work with our larger alphabet because we are not penalizing the model for a fingering pattern that did not exist in the training set.

Model

Model Architecture

We use an encoder-decoder model to perform sequence-to-sequence generation. Our encoder consists of two LSTM layers, each with an output layer smaller than the previous one. This encoder takes in the input as one hot encoded sequence of chord names and outputs a latent space representation of these chords. Then, the decoder also consists of two LSTM layers and a dense layer which increase in the size of their output as the model feeds forward. The dense layer is the output of the generated chord fingering. We trained this model with a categorical cross-entropy loss function.

LSTMNIP provides a sequence-to-sequence encoder-decoder LSTM which can evaluate the addition of strings of integers without hard coding addition or string parsing. We expand on this model by increasing the number of LSTM layers for both the encoder and decoder to allow for the increased complexity when recognizing multiple features of a chord name or fingering. Additionally, we expand the number of nodes in each layer to expand with our large alphabet.

Training

We trained our model with Keras categorical cross-entropy loss, Adam optimizer, and batches of size 5, and 10 epochs. To accurately determine how well our model was training, we employed a modified definition of mean reciprocal rank (MRR) [14].

MRR is typically used for classification models. The output of a standard classification model is a list of probabilities corresponding to each one-hot encoded class, where the rank of a ground-truth label is its index in the sorted list of probabilities. So, if our label was the fingering pattern '5-x-5-5-5-x' and the list of predictions in sorted order by likelihood was ['x-3-5-5-4-x', '5-x-5-5-5-x', ...], our rank would be 2. MRR is the mean reciprocal of that rank. Therefore, the standard definition of MRR on a set of test data T would be:

MRR=1TeT1rank(e)MRR=\frac{1}{|T|} \sum_{e \in T} \frac{1}{rank(e)}

MRR is a useful accuracy metric for training on our dataset because we sought to give a holistic view of our training convergence while remaining cognisant of our limited dataset. MRR was a natural choice because correct predictions are rewarded, but incorrect predictions can still be promising. For instance, if we achieve an MRR above 0.5, we know that on average our ground-truth label was ranked somewhere inside the top two outputted predictions.

To use MRR with our sequence-to-sequence model, we modified the MRR definition to work with sequences. This new definition follows two steps. First, for each sequence, we take the mean reciprocal rank over the sequence. Then, we take the mean of each of these sequence-level MRR calculations. This definition allows us to maintain the core idea of MRR and expand to working with sequences. Additionally, since our alphabet is the union between both the possible chord names and the possible fingerings, we amend typical MRR to only consider the rank of a correct prediction among the fingering-only alphabet. Therefore, letting our total fingering prediction array be FpF_p and tt be some ground truth label we say

rank(t,Fp)={eFp:et}rank(t, F_p) = |\{e \in F_p: e \geq t\}|

Our MRR must be calculated for the entire set of chord sequences. Let CC be the set of all chord sequences, FpF_p be the total fingering prediction array, and TT be the set of true labels where T(S,i)T(S, i) be the true label for the ithi^{th} chord in sequence S, our MRR is defined as

SCi=1S1CSrank(T(S,i),Fp)\sum_{S \in C} \sum_{i=1}^{|S|} \frac{1}{|C||S|rank(T(S, i), F_p)}

For our purposes, we are always using a chord sequence length of 3 so we can simplify this calculation to

SCi=1313Crank(T(S,i),Fp)\sum_{S \in C} \sum_{i=1}^{3} \frac{1}{3|C|rank(T(S, i), F_p)}

We run our model and use MRR as the evaluation metric. With Keras categorical cross-entropy loss, Adam optimizer, batches of size 5, and 10 epochs we can measure the improvement in MRR with each epoch.

Although the MRR graph gives evidence that our model would still be improving after 10 epochs, we stopped at 10 after inspecting the MRR curve. The curve is increasing at a decreasing rate after epoch 6. Our dataset is limited, so we grew concerned about overfitting with the diminishing marginal utility of each added epoch.

Evaluation

We built this model to work as an education tool for learning GVLT. Therefore, we need to account for the use case when evaluating the model performance. There are many ways to play a chord sequence, and determining which way of playing is "the best" involves some subjectivity. Standard voice leading theory can help us determine whether a fingering pattern is good, but searching for the "best" fingering pattern is misguided. So, we will examine how well the model did from a holistic viewpoint rather than simply examining whether the model had the same output as the held-out ground-truth label. For instance, voice leading theory dictates that increasing the number of common tones between chords, defined as the number of shared absolute positions between chords, is a good attribute to optimize. Then, we create two bounding functions for defining how well our model performed, with an upper bound counting the objective correctness for a chord fingering matching the chord name and the lower bound counting the number of ground-truth to model-outputted label matches.

Examining individual chord sequence predictions versus the ground-truth held-out labels lets us get insight into the strengths and shortcomings of this model. We highlighted four main categories: exact matches between the model and ground truth, correct voicings from the model that improve on the ground truth, correct voicings from the model that do not voice lead as well as the ground truth, and an example where the model is not voicing the correct chord.

Category 1: The model exactly matched the ground-truth labels on the song "Slow Blues".

F#min7

F7

E7

Model

9-x-7-9-x-x

8-x-7-8-x-x

7-x-6-7-x-x

Label

9-x-7-9-x-x

8-x-7-8-x-x

7-x-6-7-x-x

Category 2: The model differs from the ground-truth labels, but the chord voicings are still correct for the chord on the song "Slow Funky Shuffle". In this case, the model provides a better voice leading than the ground truth according to our criterion of shorter overall finger distance. The pair of F7 and B7 contains two common tones, and the other tone only needs to move down a half step.

Cmin7

F7

B7

Model

3-x-1-3-x-x

8-x-7-8-x-x

7-x-7-8-x-x

Label

3-x-1-3-x-x

1-x-1-2-x-x

7-x-7-8-x-x

Category 3: The model differs from the ground-truth labels, and even though the model's labels are correct voicings of the chords, they employ worse voice-leading principles than the ground truth on the song "B\flat Blues". The model correctly identified that there could be common tones between the G#7 and the Adim7 but only found voicings that could permit one common tone when the ground-truth labels contained three common tones. Additionally, the D#7 to G#7 did not find any common tones when a D# could have been shared. One possible explanation for this difference is the size of the data set. Most of the training data came from the All Blues for Jazz Guitar book, which emphasizes lower strings. However, the Ennis examples have more chords in the upper strings. Therefore, the model may not have seen enough training data to understand higher string chords and their importance for voice leading.

D#7

G#7

Adim7

Model

6-x-5-6-x-x

11-x-10-11-x-x

x-12-13-11-13-x

Label

x-x-8-8-8-9

x-x-6-8-7-8

x-x-7-8-7-8

Category 4: The model-generated chords had a mistake on "Blues Basis". The D#min7 voicing that the model generated is a voicing for a D7 chord with a D, C, and G# note, none of which are in D#min7.

C#7

A#7

D#min7

Model

9-x-9-10-x-x

6-x-6-7-x-x

10-x-10-11-x-x

Label

9-x-9-10-x-x

6-x-6-7-x-x

6-x-4-6-x-x

For the four cases, we would like to calculate them for the entire testing data set. For the categories where the model output is an exact match to the ground-truth label and where the model outputs an incorrect chord, this counting is easy. However, it is difficult to differentiate between the two cases where the model does not match the ground-truth label but the chords are technically correct. The difference between these two cases is whether the model outperforms the ground truth. We cannot solve this problem because to do so would be to understand more than our ground-truth data set. However, we are interested in how "good" our model performs, and to define goodness we include both the cases where our model matches the ground truth and the cases where our model outperforms the ground truth.

We propose an estimate of this measure of "good" performance for our model by bounding it with two known functions. For the absolute lower bound, we know the rank of a true label concerning the fingering predictions from our earlier definitions. We know the rank is a lower bound of the function because if the rank is achieved for some dependent cutoff variable xx, then we know that the model has converged on the ground truth and all chords must be valid fingerings. For the upper bound, we can calculate if each chord fingering in the sequence is a valid chord. This is an upper bound because if valid chords are achieved for the same dependent xx, we know that the whole sequence is valid. We can formalize this first by letting WxijW_{xij} and YxijY_{xij} be indicator variables for the upper and lower bounds, respectively, letting tijt_{ij} be the true label of chord j in sequence i, and letting FijF_{ij} be the fingering predictions for chord j in sequence i. With sort(n)sort(n) denote sorting the list nn from high to low, we can write

Wxij={1if there is some fsort(Fij)[1:x]such that f is a valid voicing0otherwiseW_{xij} = \left\{ \begin{array}{ll} 1 & \quad \text{if there is some } f \in sort(F_{ij})[1:x] \\ & \quad \text{such that } f\text{ is a valid voicing}\\ 0 & \quad \text{otherwise} \end{array} \right. \\

and

Yxij={1if rank(tij,Fij)x0otherwiseY_{xij} = \left\{ \begin{array}{ll} 1 & \quad \text{if } rank(t_{ij}, F_{ij}) \leq x \\ 0 & \quad \text{otherwise} \end{array} \right.

Therefore if we let nn be the number of sequences, ss be the length of the sequence, U(x)U(x) be the function for the upper bound we have, and L(x)L(x) be the function for the lower bound.

U(x)=i=1nminj=1sWxijL(x)=i=1nminj=1sYxijU(x) = \sum_{i=1}^{n} \min_{j=1}^{s} W_{xij}\\ L(x) = \sum_{i=1}^{n} \min_{j=1}^{s} Y_{xij}

We use min here instead of the arithmetic mean we used in the MRR metric because we want a guarantee that each of the chords in sequence ii has a valid or matching (depending on UU or LL) value before the cutoff xx. We can now graph UU and LL on the same plot.

Examining this graph provides insight into the performance of our model on the testing data. For instance, a "good" prediction occurs in the highest-ranked output about 20-40% of the time. If we want to increase that portion to 50%, we should look at the first 10 or so predictions to get one that is "good". Our model starts to steadily weaken above 60% of the samples having "good" predictions and to get around 90% of the samples to have "good" predictions we would need to look at the top 50 output of the ~450 length possible output array.

Conclusion and Future Work

This study introduced the idea of algorithmic optimizations for voice-leading guitar chord fingerings. With a hand-gathered data set, we utilized transpositions and sequence chunking to expand this data into a viable scale. With our sequence-to-sequence LSTM architecture, we see "good" predictions ranked highly for a large portion of the testing data set, a strong convergence considering the size of our data set. A case-by-case analysis of the model's predictions on held-out annotated data allowed us to see promising trends with a real grasp of fundamental concepts such as mapping chord labels to fingerings and minimizing the distance between chords.

Future work could be done on the size of the data set and the assumptions made to obtain a working model. Increasing the data set with more examples of voice leading through jazz standards could help a model achieve a greater MRR and allow for the model to increase the sequence length for each input. With enough data, this model would span the entire song with a unified voice-leading idea. For future work with assumptions, the key assumption made on this model was that voice leading is only done by examining chords. However, most professional jazz musicians also follow the melody when voice-leading. By including the melody as an added data point for each set of chords, a model may have even better convergence on the ground truth.

Future work could also be done towards making the education tool more user-friendly. Building a user interface that allows for inputting a desired set of chord changes as name labels and a digestible output tablature format would help the user learn. Additionally, we could use the verification script from section 3.2 to eliminate any tablature outputs that do not play the inputted chord label.

Ethics Statement

This endeavor seeks to make advanced guitar playing more accessible. A system built with these findings would provide intermediate jazz guitarists with advanced chordal practice examples, which are difficult to obtain without an instructor or textbook access. These findings do not heavily impact the standard beginner guitarist whose first task would be learning the instrument’s underlying mechanics.

We relied on participation from our university’s guitar instructor to create ground-truth examples of chord playing. The university instructor was made aware that these examples would be used to train a machine-learning model over email and in person. We acknowledge that this instructor can only give one expert’s opinion, for two total expert opinions when combined with Ferguson. Other expert guitarists could have different ways of playing the same chord progressions, which are equally correct. We acknowledge that other playing patterns are valid, and attempt to analyze our model holistically in the evaluation section.

We acknowledge that machine-learning models can have negative externalities in their applied area. In particular, we had to consider how this model could impact modern guitar teaching. This model does not seek to replace a guitar instructor. If a guitarist has access to instruction, these tools are intended to supplement and be used to practice the lessons provided in standard guitar instruction.

Comments
0
comment
No comments here
Why not start the discussion?