Skip to main content
SearchLoginLogin or Signup

The Artist in A.I. Art

An argument for the role of the artist in the future of Generative A.I. produced artworks

Published onAug 29, 2024
The Artist in A.I. Art
·

Abstract

Recent advances in artificial intelligence have created user-friendly production systems that allow novices to generate complex images, videos, and music from text prompts. This paper discusses the impact of Generative AI on the role of artists within it and considers whether these systems could become the purview of neophytes who can now claim the status of artist, and how AI systems can move beyond tools to partners in creative collaboration. The author argues that only artists can exploit the full potential of Generative AI systems.

Author Keywords

Generative AI, artist, musical metacreation, aesthetics

Introduction

The musical metacreation [1] (MuMe) community – a leading forum for the discussion of the newest developments in generative music in the 2010s – had a diverse population, ranging from artists to computer scientists. Many of its publications described artistic production systems that were generative with and without artificial intelligence; most systems discussed and presented were personal and bespoke1. However, recent advances in artificial intelligence [2] during the AI Spring [3] have produced user-friendly systems that allow novice users to generate complex images [4], video, [5] and music [6] from text prompts. This raises a question as to whether someone using such a system is automatically considered an artist, and even whether the role of “artist” is now antiquated, as any- and everyone can seemingly create art.

One of the dichotomies in the earlier MuMe era was the clear difference between how practitioners and scientists approached such production system’s “creativity”. This is evidenced by Wiggins’ review [7]of Cope’s “Computer Models of Musical Creativity”[8] in which he labelled Cope’s work as “pseudo-science” as well as criticising such aspects as Cope selecting individual items from EMI’s output for presentation. Wiggins, and others in the scientific community [9], seemed to have little interest in traditional evaluations of artistic success, which were first left to the artists themselves – i.e. whether the work was complete, ready and worthy to be shown; while secondary evaluation, such as acceptance into a festival or gallery, was also desirable, positive general public reception was not a requirement. Artistic evaluation was seen as problematic, and only more scientific methods could prove the value of a creative system[9]. Wiggins’ desire for a purely scientific solution to computational creativity would eliminate the subjective artist from the equation; he seemed to suggest that because art isn’t science, its practitioners are incapable of describing what they do, and it is therefore up to science to rectify this shortcoming [10]. Admittedly, the Wiggins vs. Cope situation was somewhat unique: Cope was producing computer-generated music at a time (the 1990s) when such practices were viewed suspiciously [11], and thus he attempted to frame his work more scientifically [12]; it was this framing that Wiggins challenged.

Evaluation of creative systems continues to be problematic[13] ; while there has been brave research into objective evaluation of creative systems and their output [14][15][16], one remaining major obstacle is determining the level of creativity exhibited by the machine itself, versus that of “clever programming” by the artist [17]. 2

Beyond issues of evaluation, I have argued for the continued need for artistic involvement in generative systems, particularly those that involve AI [18][19][20] for the main reason that artists have spent lifetimes making art, and are best positioned to determine its future direction. I acknowledge that I risk the charge of elitism; others may argue that artmaking should be egalitarian, especially now that domain knowledge may no longer be necessary. Instead of spending years learning the intricacies and power of harmony, for example, one will soon be able to use AI to generate endless chord progressions without understanding what a triad might be. Other foci for creative musical artists, such as the delicate balance between repetition and novelty within a formal musical structure – formerly requiring not just study but also compositional practice – will be resolved by Deep Learning of existing music using the massive database of the internet. Notwithstanding issues of authorship – as that is another paper altogether – Wiggins’ desire to remove the artist may be upon us.

Background

I have been creating generative systems for over thirty-five years; in almost every case, the goal for such systems is to not only reflect the music I want to hear but also to reflect my conception of how music functions. In other words, I bring to each system my knowledge accumulated through my training and experience, but also my bias. My generative systems could not function for others, as they would only produce my music due to the limits I purposefully imposed; they have never been general-purpose systems.

The latest Generative AI systems have been the work of teams and thus do not reflect the creative bias of any one individual: the various interfaces to DALL-E [21] are meant to be general-purpose. It should be noted that the current state of such text-to-image systems is limited when compared to text generation [22]; music generation is even further behind. For example, Meta’s 2023 MusicGen is unable to produce anything of interest beyond its limited-duration demo tracks [23], demonstrating the current limitations on comprehending the complexity of both musical time and context. However, given the exponential increase in these systems almost monthly, it seems inevitable that prompt-based musical generation of complete songs of high quality is close at hand.

An early paper on MuMe taxonomy [24] suggests the final level of metacreation would be volition: the ability of the system to decide what kind of music it wanted to make, on its own. This may seem impossible at this time for a variety of reasons. First, human artists tend to create within a restricted creative space of which they have taken years to develop a thorough understanding, including its implicit rules and its defining characteristics. They will also have internalised other artworks that exist within it – potential influences – while rejecting other works; style does not only involve exemplars but also excluded examples. In other words, the conceptual space encompasses an artist’s style and aesthetic preferences. How would a generative system define such a space? Would it exist within extant spaces, or could it define its own conceptual space?

Few artists are capable of straddling multiple genres and styles, let alone inventing new ones. One artist that comes to mind is Robert Glasper who successfully produces jazz, R&B, hip hop, neo-soul, and soundtracks3: to do so, Glasper has demonstrated expertise – both comprehension and execution – within each style, thereby avoiding notions of dabbling or dilettantism. Perhaps a computer system will soon be able to listen to all the music on the internet, understand every genre, and be able to create within each conceptual space – even mix between them; but what will the role of humans be at that point?

Generative AI as Tool

The role of any tool is to make tasks easier, and the latest software technologies employing machine learning and AI are no different. As with almost every technological advancement, the ability to execute formerly complex tasks with greater ease has been met with resistance by those whose knowledge and training are threatened by progress (which seems to be the point of this paper, sadly enough).

That said, the ability to generate complex, evolving, and interesting music without extant musical ability has existed in cellphones for almost fifteen years with the release of Bloom [25] by Brian Eno and Peter Chilvers in 2008[26]. Bloom allowed users to tap the screen and have these actions translated into beautiful sounds and visuals that would repeat in a virtual feedback loop that eventually fades, a variation on the process in Eno’s 1975 work Discreet Music. Eno claimed that the iPhone interface allowed users to “own the process rather than the results of the process”, in that nothing was saveable [27]. Although the app was successful – Gizmodo called it “the first great iPhone app” [28] – it seems to have had very little traction within the generative music scene. One could argue that Bloom doesn’t advance much beyond the level of a toy: yes, it makes lovely music, but the user is not invested in the result as it takes very little effort to produce a convincing result. This raises the question if pleasing outcomes are so easily attained, what is the motivation to improve on them, or improve oneself? While advocates of generative music may point out that the output of any generative system is mercurial, and no single output is superior to another, one wonders if minor masterpieces produced by Bloom were even noticed, let alone appreciated, by its users approaching it as they would a cellphone game. Eno’s later generative work for the iPhone, 2017’s Reflection, is for a passive audience: while generative, there is no doubt that it is Eno’s work the user is experiencing, rather than their own.

Bloom is an example of a concept or generative idea with interactive possibilities: the user cannot alter any parameter which the designers did not allow. Although there are some general presets in the app – with lovely names such as Neroli, Vetiver, Ylang, Labdanum – their function remains hidden and unknowable to the user. A user desiring more control and with fewer limitations on their interactions, mappings, and resulting music would inherently require a general-purpose tool such as those built on OpenAI’s models.

One could imagine in the near future a prompt-based text-to-music system: the prompt “create some cool beats” by a neophyte would produce some kind of music, although the system would need to make some inferences on what “cool” means in relation to existing musical styles and genres, injecting required bias. A more stylistically descriptive prompt “create a techno song” would provide more details from which the system could search for examples, although “techno” in itself is still a broad descriptor. Better yet, "create a Detroit techno song in the style of 1990s Derrick May" may produce something more stylistically accurate; however, any output from this prompt would be in danger of violating copyright as there are only a few examples from which to learn; instead of parroting back existing beats and structures found in the limited database, the system would need to understand the conceptual space that Derrick May’s music from the 1990s occupies: not an easy task.

The process of continually refining one’s prompts to achieve a more desirable result from the Generative AI system has been termed prompt engineering. This activity harkens back to artists’ iterative processes of execution followed by critical and objective review followed by further execution; this also is reflected in algorithmic arts notion of “don’t edit the result, edit the process” [29]. While prompts in text-to-image models often begin with stylistic constraints or artist-specific requests – “in the style of…” [30] – these have been restricted in some text-to-music models: Google’s MusicLM [31] has implemented a filter that blocks the mention of specific artists to avoid copyright issues. My own highly curated database used for harmonic generation may seem to break copyright law; happily, harmony cannot be copyrighted [32].

At the moment, the developmental direction of text-to-music models seems to parallel that of text-to-image models: prompts generate completed works. The latter model produces images by combining portions of existing images – however small – in its database; the former does likewise, thus far resulting in poor-quality audio as the model attempts to transpose and time-stretch the audio grains pulled from its database of songs4. One wonders if these are the equivalent of the “problem of fingers” in text-to-image generators [33], possibly soon eradicated through sheer technical willpower. The larger failure here is that music is not created in the same way as text or image, it is created in layers.

Instead of conceptualising text-to-music models as potential composer-producers, eliminating the role of the human beyond that of prompt generator, a more reasonable and useful methodology would be to consider these models as tools, albeit extremely powerful tools. In this way, a more traditional working method for the creation of a pop song, for example, would be through the creation of the aforementioned layers: first, generate a harmonic progression, then a melody to match the harmony, then an overall structure within which both melody and harmony can function, and finally fill out the instrumental parts, including beats; of course, we acknowledge that AI need not follow the same processes as humans. Although this process may allow neophytes to operate somewhat successfully, the output will quite possibly follow that of text-to-image systems: the production of several outputs from which the user can select and further develop; this would require the user to judge multiple outputs: the very definition of a fitness bottleneck. Instead, an expert user would benefit much more than a novice as they could provide both better prompts and identify superior output faster. Regardless of whether an amateur or expert interacts with such a system, the resulting interaction is still an example of user and tool.

Collaboration

Interaction between artists and AI can progress beyond such a limited relationship, moving toward true collaboration [34]. Collaboration surpasses interaction in that it involves a give-and-take on both sides rather than only reaction. The collaborative goal may be unknown; the collective actions of the parties will require a willingness to change along the path toward it. Furthermore, contemporary artistic collaboration is often non-hierarchical, using aspects of devising [35]. The 20th century has many examples of more typical hierarchical collaborations in which artists work to fulfil one person’s vision [36]: for example, film directors have tended to hold onto this paradigm. The latter model more closely reflects the aforementioned relationship of user as the all-knowing leader and tool as the abiding follower(s).

I consider my relationship with my generative systems to be one of collaboration [37] in that there have been many times that the system – specifically musebots [38]– have suggested a different path or outcome than I had imagined. The goal of my systems is to not only produce artworks that I find useful – in what I would consider equal to anything I could produce without their involvement – but also to surprise me, which requires more than straightforward novelty. My systems are not open-ended, in that they have clear limitations (usually in style); but I acknowledge those limitations just as I would acknowledge both the strengths and weakness of any human collaborator.

Such collaboration is a delicate balance and one that could easily tumble into a Procrustean relationship. Procrustes was the innkeeper from Greek mythology who claimed he had a bed that would fit any traveller; what he failed to mention was that if the bed were too short, he would cut off the feet of the traveller, and if the bed were too long, he would hammer out the traveller’s legs and stretch them to fit. Creative software can provoke similar interactions: if the particular working method matches that of the user, it will make the creative tasks easier to accomplish; if, however, the paradigms do not match, the user is forced to adapt and change not only their methodology but potentially their goals as well. As suggested, the difference between such a relationship and one of collaboration is delicate.

Generality vs. Strength

The argument has been made, above, that for users to explore Generative AI from their perspective rather than through another artist’s bespoke system requires one that is general-purpose. This idea is described in Truax’s early paper on computer music systems [39], which suggests a continuum between general systems, which can do many things but require more information from, and control by, the user, versus strong systems, which can do only a few things, and execute them with minimal requirements. Personal generative music systems, if they function like my own, will tend to produce a limited stylistic range of music, but ideally do so both successfully and efficiently; to do so, an inherent bias must be instated into the machine. Arguably, a generative system should be strong, while the coding language to produce it requires generality.

My latest generative system has only two control parameters – valence and arousal [40], thereby making it a strong system that requires little user input. These two parameters allow for large-scale variation in the resulting output; however, as the designer, I mapped what valence means concerning rhythm, harmony, structure, duration, and other musical parameters, thereby limiting its generality.

This system was originally designed to generate ambient music with somewhat complex harmonic progressions; it generates complete compositions, including selecting appropriate synthesiser timbres; my role, after selecting valence and arousal values, was that of curator, deciding which outputs were considered of personal musical interest and surprise (the acceptance rate was approximately 4 out of 5, or 80% of the generated works, so perhaps I’m easily surprised). The coding of the system took well over a year; however, a working version existed after approximately one month: the remaining time was spent listening carefully to its output and making musical decisions on how to improve it, hoping to find what Ollie Bown has called “the sweet spot” of predictability and surprise [41].

A Tolerably Detached Tone, from A Walk to Meryton (2023). Barbara Adler, text and reading; Meredith Bates, violin; Jon Bentley, saxophone

I include the above as an example of something of which I am immensely proud: the musebots have produced music that have achieved my personal balance between complexity and beauty. I viewed the generated works as the culmination of decades of practice within the field of generative art, so much so that I felt the need to create physical objects in the form of a double-vinyl release.

By altering what musical parameters the valence and arousal settings controlled – again, musical decisions based on years of training and experience – the same system was able to also produce uptempo music for robotic duo and jazz trio.

Clearing Prior to Empty (2024). Jon Bentley, saxophone; John Korsrud, trumpet; James Meger, bass.

These systems produce music worth listening to, in my biased opinion; whether a listener enjoys the music or not, their musical success is achieved not by the happenstance of a good prompt, but through expert design.

Conclusion

Generative AI is the latest technology available to artists, and like any leap in technological advancement, it is bound to disrupt artistic practice. With the introduction of sound synthesis and the first music synthesizers in the 1950s, there was a fear that these new instruments would replace musicians [42]; while that did not fully pan out, synthesizers have fundamentally disrupted both recording practices and many touring musical productions. One could imagine with the invention of European musical notation 1200 years ago, there were monks who complained that music was becoming too easy, with less reliance on memory and aural transmission; while those specialists at remembering the old chants may have become less important at that point, the potential to write down musical ideas was an unimaginable boost to creativity for others.

We are at a similar paradigm-shifting point in musical evolution. But just as the invention of musical notation allowed the most creative individuals to produce music impossible to conceive without it (while those same individuals no doubt endured complaints from a previous generation that notation now allowed anyone to create music), and just as synthesizers can produce music well beyond the limits of acoustic instruments (thus extending rather than replacing those instruments), Generative AI – much more than a toy for neophytes to play pretend-Producer – may usher in new forms of music previously not imaginable if they remain in the hands of artists.

Ethics Statement

This paper is a personal reflection on aesthetics, and does not involve any actual data; as such, there should be no concerns over data privacy and/or socio-economic unfairness. There was no research conducted on humans (or animals). The paper was written during my academic employment and did not require any external grant funding. Although I was a co-owner of a musical AI company between 2014-2018, I am no longer associated with the company, so there are no financial conflicts of interest.

Comments
0
comment
No comments here
Why not start the discussion?