Pulsed Melodic Affective Processing –Musical Structures for Increasing Transparency in Emotional Computation

of 17
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Pulsed Melodic Affective Processing (PMAP) is a method for the processing of artificial emotions in affective computing. PMAP is a data stream designed to be listened to, as well as computed with. The affective state is represented by numbers which
   Simulation Simulation: Transactions of the Society for Modeling and Simulation International  2014, Vol. 90(5) 606–622  2014 The Society for Modeling andSimulation InternationalDOI: 10.1177/0037549714531060sim.sagepub.com Pulsed Melodic Affective Processing:Musical structures for increasingtransparency in emotionalcomputation Alexis Kirke and Eduardo Miranda Abstract Pulsed Melodic Affective Processing (PMAP) is a method for the processing of artificial emotions in affective computing.PMAP is a data stream designed to be listened to, as well as computed with. The affective state is represented by num-bers that are analogues of musical features, rather than by a binary stream. Previous affective computation has beendone with emotion category indices, or real numbers representing various emotional dimensions. PMAP data can begenerated directly by sound (e.g. heart rates or key-press speeds) and turned directly into music with minimal transfor-mation. This is because PMAP data is music and computations done with PMAP data are computations done with music.This is important because PMAP is constructed so that the emotion that its data represents at the computational levelwill be similar to the emotion that a person ‘‘listening’’ to the PMAP melody hears. Thus, PMAP can be used to calculate‘‘feelings’’ and the result data will ‘‘sound like’’ the feelings calculated. PMAP can be compared to neural spike streams,but ones in which pulse heights and rates encode affective information. This paper illustrates PMAP in a range of simula-tions. In a multi-agent simulation, initial results support that an affective multi-robot security system could use PMAP toprovide a basic control mechanism for ‘‘search-and-destroy’’. Results of fitting a musical neural network with gradientdescent to help solve a text emotional detection problem are also presented. The paper concludes by discussing howPMAP may be applicable in the stock markets, using a simplified order book simulation. Keywords Communications, human–computer interaction, music, affective computing, Boolean logic, neural networks, emotions,multi-agent systems, robotics 1. Introduction This paper is an investigation into the use of melodies as atool for affective computation and communication in artifi-cial systems, through a connectionist architecture, a simu-lation of a robot security team, and a stock market tool.Such an idea is not so unusual when one considers the datastream in spiking neural networks (SNNs). SNNs have been studied both as artificial entities and as part of biolo-gical neural networks in the brain. These are networks of  biological or artificial neurons whose internal signals aremade up of spike or pulse trains that propagate through thenetwork in time. Bohte et al. 1 have developed a back- propagation algorithm for artificial SNNs. Back-propaga-tion is one of the key machine learning algorithms used todevelop neural networks that can respond intelligently. Itis an established practice for scientists to listen to ampli-fied neural spike trains via loudspeakers as a method of navigating the location of an electrode in the brain, 2 and itis interesting to note that a series of timed pulses with dif-fering heights can be naturally encoded by one of the mostcommon musical representations used in computers: theMusical Instrument Digital Interface (MIDI). 3 In its sim- plest form MIDI encodes a melody, which consists of note timing and note pitch information. In this paper weargue that melodies can be viewed as functional and  Interdisciplinary Centre for Computer Music Research, School of Humanities, Music and Performing Arts, University of Plymouth,Plymouth, UK Corresponding author: Alexis Kirke, Interdisciplinary Centre for Computer Music Research,Plymouth University, Smeaton Building, Room 206, Drake Circus,Plymouth PL4 8AA, UK.Email: Alexis.Kirke@Plymouth.ac.uk   recreational – they can fulfill the function of encoding anartificial emotional state, in a form that can be used inaffective computation tasks directly expressible to human beings (or indeed to other machines). The basis of the datastream used in this paper for processing is a pulse streamin which the pulse rate encodes tempo, and the pulseheight encodes pitch. 1.1. Uses and novelty of Pulsed Melodic AffectiveProcessing  Before explaining the motivations behind Pulsed MelodicAffective Processing (PMAP) in more detail, an overviewof its functionality will be given. Similarly, the novelty of that functionality will be summarized. PMAP provides amethod for the processing of artificial emotions that is use-ful in affective computing – for example combining emo-tional readings for input or output, making decisions based on that data or providing an artificial agent with simulated emotions to improve their computation abilities. In termsof novelty, PMAP is novel in that it is a data stream thatcan be listened to, as well as computed with. Affectivestate is represented by numbers that are analogues of musi-cal features, rather than by a discrete binary stream.Previous work on affective computation has been donewith normal data carrying techniques – for example emo-tion category index, a real number representing positivityof emotion, etc.The encoding of PMAP is designed to provide extrautility – PMAP data can be generated directly by sound and turned directly into sound. Thus, rhythms such as heartrates or key-press speeds can be directly turned into PMAPdata; PMAP data can be directly turned into music withminimal transformation. This is because PMAP data  is music and computations done with PMAP data are compu-tations done with music. Why is this important? BecausePMAP is constructed so that the emotion that a PMAP datastream represents in the computation engine will be similar to the emotion that a person ‘‘listening’’ to the PMAP-equivalent melody would be. So PMAP can be used to cal-culate ‘‘feelings’’ and the resulting data will ‘‘sound like’’the feelings calculated. This will be clarified over thecourse of this paper.Due to the novelty of the PMAP approach, the structureof this paper involves providing multiple examples of theability of melodies to be used in machine learning and pro-cessing. This does not follow the normal approach takenwith machine learning, communications or unconventionalcomputation for validation and comparison. For example,the musical neural network (MNN) demonstration doesnot include creating a formal description of the network and then rigorously demonstrating it in comparison to pre-vious machine learning methods. This is for two reasons:lack of space and lack of comparable approaches. It is feltthat such a novel approach needs to be shown to be at leastrelevant in multiple applications; hence, there is insuffi-cient room to develop and demonstrate validations for allof the three demonstration areas presented later. Also,there is no basis for comparison. MNN methodologies arealmost certainly less efficient than non-melody based com- putation equivalent. The same can be said of the other examples demonstrated in the paper. The positive argu-ment is that they, and the other PMAP approaches, providea human–computer interaction (HCI) advantage in addi-tion to their computational ability. There are no other com- putation approaches that do this, hence no meaningfulcomparisons are possible without controlled listener eva-luation results to determine how well the PMAP streamsrepresent the elements of the affective computations.However before doing these, it is first necessary to investi-gate if affective melodies are indeed useable in multipleaffective applications.In the previous subsection it was described how this paper is motivated by similarities between MIDI-typestructures and the pulsed-processing 4 computation found in artificial and biological systems. It is further motivated  by three other key elements that will now be examined: (i)the increasing prevalence of the simulation and communi-cation of affective states by artificial and human agents/nodes; (ii) the view of music as the ‘‘language of emo-tions’’; (iii) the concept of audio-display of non-audiodata. 1.2. Affective processing and communication It has been shown that affective states (emotions) play avital role in human cognitive processing and expression. 5 1. Universal and enhanced communication: two peo- ple who speak different languages are still able tocommunicate basic states such as happy, sad, angryand fearful.2. Internal behavioral modification: a person’s inter-nal emotional state will affect the planning pathsthey take. For example, affectivity can reduce thenumber of possible strategies in certain situations – if there is a snake in the grass, fear will causeyou to only use navigation strategies that allowyou to look down and walk quietly. Also pre- and de-emphasizing certain responses such that, for example, if a tiger is chasing you fear will makeyou keep running and not get distracted by a beau-tiful sunset, a pebble in your path, etc.3. Robust response: in extreme situations the affec-tive reactions can bypass more complex corticalresponses allowing for a quicker reaction, or allow-ing the person to respond to emergencies when notable to think clearly – for example when very tired,in severe pain, and so on. Kirke and Miranda  607  As a result, affective state processing has been incorporated into robotics and multi-agent systems (MASs). 6 MASs aregroups of agents where each agent is a digital entity thatcan interact with other agents to solve problems as a group,although not necessarily in an explicitly co-ordinated way.What often separates agent-based approaches from normalobject-oriented or modular systems is their emergent behavior. 7 The solution of the problem tackled by theagents is often generated in an unexpected way due to their complex interactional dynamics, although individual agentsmay not be that complex.A further reason in relation to point (1) above and HCIstudies is that emotion may help machines to interact withand model humans more seamlessly and accurately. 8 Sorepresentation of simulating affective states is an activearea of research. The dimensional approach to specifyingemotional state is one common approach. It utilizes an n -dimensional space made up of emotion ‘‘factors’’. Anyemotion can be plotted as some combination of these fac-tors. For example, in many emotional music systems 9 twodimensions are used: Valence and Arousal. In this model,emotions can be plotted on a graph (see Figure 1) with thefirst dimension being how positive or negative the emotionis (Valence), and the second dimension being how intensethe physical arousal of the emotion is (Arousal). For exam- ple ‘‘Happy’’ is a high-valence, high-arousal affectivestate, and ‘‘Stressed’’ is a low-valence high-arousal state. 1.3. Music and emotion There have been a number of questionnaire studies donethat support the argument that music communicates emo-tions. 10 Previous research 11 has suggested that a mainindicator of valence is musical key mode. A major keymode implies higher valence, while minor key modeimplies lower valence. For example the galloping‘‘William Tell Overture’’ by G Rossini opens in a major key and is a happy piece – that is, higher valence, whereasthe first movement of LV Beethoven’s Symphony No. 5 ismostly in a minor key, and although it can be played atthe same speed as the William Tell Overture, feels muchmore brooding and low valence. This is significant because of its mostly minor key mode. It has also beenshown that tempo is a prime indicator of arousal, withhigh tempo indicating higher arousal, and low tempo indi-cating low arousal. For example, Beethoven’s first move-ment above is often played Allegro (fast). Compare this tohis famous piano piece ‘‘Moonlight Sonata’’ – also minor key, but marked Adagio for slow. The piano piece has amelancholic feel. As well as being low valence, it is lowarousal because of its low tempo. 1.4. Sonification Sonification 12 involves representing non-musical data inaudio form to aid its understanding. Common forms of sonification include Geiger Counters and Heart Rate moni-tors. Sonification research has included tools for usingmusic to debug programs, 13 sonify activity in computer networks 14 and to give insight into stock market move-ments. 15 In the past, sonification has been used as an extramodule attached to the output of the system under question.A key aim of PMAP is to allow sonification in affectivesystems at any point in the processing path within the sys-tem. For example, between two neurons in an artificialneural network (ANN), or between two agents in a MAS,or between two processing modules within a single agent.The aim is to give the engineer or user quicker and moreintuitive insight into what is occurring within the commu-nication or processing path in simulated emotion systems by actually using simple music itself for processing and communication.There are already systems that can take the underlying binary data and protocols in a network and map them ontomusical features. 16 However, PMAP is the only data pro-cessing model currently that is its own sonification and requires no significant mapping for sonifying. This is because PMAP data is limited to use in affective commu-nications and processing where music can be both dataand sonification simultaneously. PMAP is not a new soni-fication algorithm; rather it is a new data representationand processing approach that is already in a sonified form.This means that no conversion is needed between theactually processing/communication stream and the listen-ing user – except perhaps downsampling. It also allows for the utilization of such musical features as harmony and timing synchronization to be incorporated into the Figure 1.  The Valence/Arousal model of emotion, from Kirkeand Miranda. 9 608  Simulation: Transactions of the Society for Modeling and Simulation International 90(5)  monitoring when multiple modules/agents are being moni-tored simultaneously (although these capabilities are notexamined here). 2. Pulsed Melodic Affective Processingrepresentation of affective state In PMAP the data stream representing affective state is astream of pulses. The pulses are transmitted at a variablerate. This can be compared to the variable rate of pulses in biological neural networks in the brain, with such pulserates being considered as encoding information. In PMAPthis pulse rate specifically encodes a representation of thearousal of an affective state. A higher pulse rate is essen-tially a series of events at a high tempo (hence high arou-sal), whereas a lower pulse rate is a series of events at alow tempo (hence low arousal).In addition, the PMAP pulses can have variable heightswith 12 possible levels. For example, 12 different voltagelevels for a low level stream, or 12 different integer valuesfor a stream embedded in some sort of data structure. The purpose of pulse height is to represent the valence of anaffective state, as follows. Each level represents one of themusical notes C, Db, D, Eb, E, F, Gb, G, Ab, A, Bb, B.For example 1mV could be C, 2mV be Db, 4mV be Eb,etc. We will simply use integers here to represent the notes(i.e. 1 for C, 2 for Db, 4 for Eb, etc.). These note valuesare designed to represent a valence (positivity or negativityof emotion). This is because, in the key of C, pulse streamsmade up of only the notes C, D, E, F, G, A, B are the notesof the key C major, and so will be heard as having a major key mode – that is, positive valence. However, streamsmade up of C, D, Eb, F, G, Ab, Bb are the notes of the keyC minor, and so will be heard as having a minor key mode – that is, negative valence.For example, a PMAP stream of say [C, Bb, Eb, C, D,F, Eb, Ab, G, C] (i.e. [1, 11, 4, 1, 3, 6, 4, 9, 8, 1]) would be principally negative valence because it is mainly minor key mode. However, [C, B, E, C, D, F, E, A, G, C] (i.e. [1,12, 5, 1, 3, 6, 5, 10, 8, 1]) would be seen as principally pos-itive valence. In addition, the arousal of the pulse streamwould be encoded in the rate at which the pulses weretransmitted. So if [1, 12, 5, 1, 3, 6, 5, 10, 8, 1] was trans-mitted at a high rate, it would be high arousal and highvalence – that is, a stream representing ‘‘happy’’ (seeFigure 1); at a low rate it would be low arousal and highvalence – that is, a stream representing ‘‘relaxed’’ or ‘‘ten-der’’ (Figure 1). However, if [1, 11, 4, 1, 3, 6, 4, 9, 8, 1]was transmitted at a low pulse rate then it will be low arou-sal and low valence – that is, a stream representing ‘‘sad’’. Note that [1, 12, 5, 1, 3, 6, 5, 10, 8, 1] and [3, 12, 1, 5,1, 1, 5, 8, 10, 6] both represent high valence (i.e. are bothmajor key melodies in C). This ambiguity has a potentialextra use. If there are two modules or elements both withthe same affective state, the different note groups thatmake up that state representation can be unique to theobject generating them. This allows other objects, and human listeners, to identify where the affective data iscoming from.In non-simulated systems the PMAP data would be astream of pulses. In fact in the first example below, a pulse-based data stream (MIDI) is used directly. However,in performing the analysis on PMAP for simulation in thesecond simulation, it would be convenient to utilize a parametric form to represent the data stream form. The parametric form represents a stream with a tempo-valuevariable and a key-mode-value variable. The tempo-valueis a real number varying between 0 (minimum pulse rate)and 1 (maximum pulse rate). The key-mode-value is aninteger varying between  2 3 (maximally minor) and 3(maximally major). 3. Musical neural network example This first example of the use of PMAP will focus on howPMAP streams can represent non-musical data as part of amachine learning algorithm. It will not be used to demon-strate the sonification abilities of PMAP explicitly but toshow that PMAP can be used for non-musical computa-tions. The example will utilize a form of simple ANN.ANNs are computational models inspired by the functionand structure of neural networks in the biological brain.They are a connected collection of artificial neurons that processes information through an input layer and producethe results of the processing through an output layer. AnANN is usually an adaptive system that changes its beha-vior during a learning phase. Many adaption methods uti-lize a method known as gradient descent. 17 This learning isused to develop a model linking the inputs and outputs soas to create a desired response. In recent years, there hasalso been work in making the neurons more realistic sothey take spike trains, similar to those found in the brain,as input signals. As has been mentioned, these are knownas SNNs, and learning algorithms have been developed for SNNs as well. The use of timed pulses in SNNs supportsan investigation into PMAP pulses in ANNs; in particular,a neural network application in which emotion and rhythmare core elements. One such example is now presented.A form of learning ANN that uses PMAP is firstdescribed. These artificial networks take as input, and useas their processing data, pulsed melodies. A musical neu-ron (muron – pronounced MEW-RON) is shown inFigure 2. The muron in this example has two inputs,although a muron can have more than this. Each input is aPMAP melody, and the output is a PMAP melody. Theweights on the input  w 1  and   w 2  are two-element vectorsthat define a key mode transposition and a tempo change,respectively. A positive  R k   will transpose more input tune Kirke and Miranda  609  notes into a major key mode, and a negative one will trans- pose more input notes into a minor key mode. Similarly, a positive  D t   will increase the tempo of the tune, and a nega-tive  D t   will reduce the tempo. The muron combines inputtunes by superimposing the spikes in time, that is, overlay-ing them. Any notes that occur at the same time are com- bined into a single note with the highest pitch beingretained. This retaining rule is fairly arbitrary but someform of non-random decision should be made in this sce-nario (future work will examine if the ‘‘high retain’’ ruleadds any significant bias). Murons can be combined intonetworks, called MNNs. The learning of a muron involvessetting the weights to give the desired output tunes for thegiven input tunes. Applications for which PMAP is mostefficiently used are those that naturally utilize temporal or affective data (or for which internal and external sonifica-tion is particularly important).One such system will now be proposed for the estima-tion of affective content of real-time typing. The system isinspired by research by the authors on analyzingQWERTY keyboard typing. This approach is based on theway that piano keyboard playing can be computer-analyzed to estimate the emotional communication of the piano player. 18 It has been found by researchers that themood a musical performer is trying to communicateaffects not only their basic playing tempo, but also thestructure of the hierarchical patterns of the musical timingof their performance. 19 Similarly, we propose that a per-son’s mood will affect not only their typing rate, 18  but alsotheir relative word rate and paragraph rate, and so forth.In Kirke et al., 18 a real-time system was developed toanalyze the local tempo of typing and estimate affectivestate. The MNN/PMAP version demonstrated in this paper is not real time, and does not take into account base typingspeed: it focuses on relative rates of offline pre-typed data.These simplifications are for the sake of expedient simula-tion and experiments. However, it does implicitly analyzehierarchies of tempo patterns, which the system in Kirkeet al. 18 did not.The proposed architecture for the emotion estimation isshown in Figure 3. It has two layers known as the inputand output layers. The input layer has four murons – which generate notes. The idea of these four inputs is theyrepresent four levels of the timing hierarchy in language.The lowest level is letters, whose rate is not measured inthe demo. These letters make up words, which are usuallyseparated by a space. The words make up phrases. In anideal system the syntax hierarchy would be used to define phrases. However for simplification here, an approxima-tion is made using commas. This will reduce the accuracyof the results but allows for a simpler demonstration of thelearning capacity of the network. So, phrases will bedefined here as being punctuated by commas. These phrases make up sentences (separated by full stops), and sentences make up paragraphs (separated by a paragraphend). So the tempo of the tune’s output from these four murons represents the relative word rate, phrase rate, sen-tence rate and paragraph rate of the text. Note that for datafrom an internet-based messenger application, the para-graph rate will represent the rate at which messages aresent. Every time a space character is detected, then a noteis output by the SPACE Flag. If a comma is detected thena musical note is output by the COMMA Flag, if a fullstop/period is detected then the FULL STOP (PERIOD)Flag generates a note, and if an end of paragraph isdetected then a note is output by the PARAGRAPH Flag.The ‘‘carrier melodies’’ used in the input layer are aseries of constantly rising pitches. The precise pitches inthese melodies are not important – rather it is having avariety of pitches at a neutral tempo, so that they can betransformed through different affective states. The desired output of the MNN will be a tune that represents an affec-tive estimate of the text content. A happy tune means thetext structure is happy; likewise a sad tune means the textis sad. Normally, neural networks are trained using a num- ber of methods, most commonly some variation of gradi-ent descent, a type of algorithm that attempts to change thenetwork parameters so as to lower the difference between w 3= [1, 1.4] w 1 = [0, 1.4] w 2 = [2, 1.8] w 4 = [1, 0.5]PARAGRAPHFlagFULL STOP (PERIOD)FlagCOMMA FlagSPACE Flag Figure 3.  Four input musical neural networks for Offline TextAffective Analysis with final learned weight values. w 1 = [  R 1  , D 1 ] w 2 = [  R 2  , D 2 ]  Input 1 Input 2Output  Figure 2.  A muron with two inputs with weight vectors  w  1  and w  2 , respectively. 610  Simulation: Transactions of the Society for Modeling and Simulation International 90(5)
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!