您的当前位置：首页 Emergence of Communication for Negotiation by a Recurrent Neural Network

Emergence of Communication for Negotiation by a Recurrent Neural Network

来源：爱玩科技网

Emergence of Communication for Negotiation By a Recurrent Neural Network

Katsunari SHIBATA and Koji ITO

Dept. of Comp. Intell. & Sys. Sci., Interdisciplinary Grad. School of Sci. & Eng., Tokyo Instute of Technology

4259 Nagatsuta, Midori-ku, Yokohama 226-8502, JAPAN

shibata@ito.dis.titech.ac.jp

Abstract

We believe that communication in multi-agent systemhas two major meanings. One of them is to transmitone agent’s observed information to the other. Theother meaning is to transmit what an agent is thinking.Here we focus the latter and aim to the emergence of theautonomous and decentralized arbitration throughcommunication among some agents. Thecommunication contents, strategy and representation arenot prescribed and are acquired by learning using areinforcement signal which is given to the agent after itsaction. The reinforcement signal is not shared with theother agents. In order to realize this learning, the agentoften has to make a decision not only from the presentcommunication signals but also from the past signals.Accordingly the system architecture using recurrent type(Elman) neural network is proposed. The ability of thisarchitecture was examined by two and four agentsnegotiation problems. A variety of negotiationstrategies emerged among the agents through the learningto avoid some conflict after their decisions.

1. Introduction

In multi-agent system, communication is very effectivefor cooperation or arbitration among the agents.However, if we have designed in advance what the agentscommunicate and how the agents communicate, the agentscannot modify the communication contents, strategy andrepresentation according to the change of the environment,and then they may lose the flexibility. There are alsomany cases that the most effective communicationcontents, strategy and representation are not knownbeforehand. Furthermore the communication contents,strategy and representation among our living creatures donot seem to be given from the outside of us, but aregenerated autonomously among us by the individuallearning through experiences. For these reasons, theemergence of communication is focused recently.

We believe that communication has two majormeanings. One of them is to transmit one agent’sobserved information to the other. That is useful for

making up a lack of the receiver’s observations. Thefamous G. M. Werner et al’s work[1] belongs to thiscategory. In their work dealing with the mate findingproblem, there are some females which cannot move buthave an eye, and some blind males which can move.The females can transmit some communication signals tothe males. The transformation from the observedinformation to the communication signals in the femalesand the transformation from the communication signals tothe action in the males are done by the recurrent neuralnetwork whose connection weights are given by the valueof their genes. When a male finds a female, they canproduce a son and a daughter. The offspring isproduced by the standard genetic operations of crossoverand mutation. Here, the communication from thefemales to the males is utilized in order that the femalesinform the nearby blind male as to the information aboutthe relative location between them that the blind malecannot know by itself. The pioneering work by K.Nakano et al.[2] shows the learning method to generatesome common words for the objects which exist in theenvironment. The common word is useful with respectthat the agent can know the existence of an object by thecommunication signals from the other agents.Accordingly it can be said that the communication in thiswork is also utilized to transmit the observed informationto the other.

The other meaning of communication is to representwhat an agent is intending. The agents become able tocooperate and also to avoid some conflict throughcommunication. Many works about negotiation aredone at this standpoint in the distributed artificialintelligence field[3][4], but in these researches, thecommunication contents, strategy and representation wereprescribed in advance, and were evaluated.

Now we focus on “emergence of communication” and“communication of intention”. We propose a learningarchitecture for the multi-agent which can generate anappropriate transformation from the past communicationsignals received from the other agents to the presentcommunication signal or action. Then note that thecommunication contents, strategy and representation arenot prescribed. Some reward is given to the agents in

the case of the appropriate communication and action, andsome penalty is given in the case of conflicts. Thesereinforcement signals (reward and penalty) are not sharedwith the others. It is also an important point of thisresearch that the negotiation among some agents can berealized without sharing the reinforcement learning. Asan example, a negotiation problem is tries to be solvedhere. In negotiation problems, an agent represents itsintention at first. However, when some conflict in thefinal decision is predicted, the agents have to change itsintention according to the communication signals from theother agents. . Sometimes the past series ofcommunication signals are required to decide the changeof its intention. To achieve this function, Elman-typerecurrent neural network[5] is employed.

The proposed learning architecture is described in thenext section, and two simulations are shown in thefollowing section. The first simulation is “two agentsnegotiation problem”. It is examined that one-to-onenegotiation, that is the simplest case, can be realized by thelearning. In this problem, there is only one solution.The basic ability of the system is evaluated by whether thesolution is obtained or not in spite of many combinationsof communication signals and decision. The second oneis “four agent negotiation problem”. This problem ismore difficult because it cannot be solved by the relationto only one opponent. Each agent has to make adecision according to the other three agents and theirlocations. It is difficult to design such agents manually.It is shown whether the agents have the abilities tonegotiate with more than two agents autonomously anddecentralizedly without sharing reinforcement signals.

2. Learning Architecture

Fig. 1 shows the proposed architecture of two agents.The agent receives the other agent’s previouscommunication signal and its own previouscommunication signal as the inputs to its own neuralnetwork. The neural network is Elman-type, in whichthe present outputs of the hidden neurons are used as theinputs at the next time step. The output function of eachneuron except the input layer is sigmoid function whosevalue range is from -0.5 to 0.5. The neural network hastwo outputs, one of which is the output for itscommunication and the other is for its action. Theactual communication signal and action outputs arestochastically determined from -1 or 1. The probabilityp that each of them is 1, is the sum of the output of theneural network and 0.5. The agents exchange theircommunication signals synchronously with each otherthree times, and then make a final decision of the action.At the beginning of each negotiation, all the inputsincluding the feedback inputs from the hidden neurons areset to be 0.0. Accordingly the probability that the first

-1 or 1: selector (-1 or 1 is selected saccording to the probability op as output o+0.5)action1com 1com 2action2ssssPlayer 1Player 2Figure 1. Proposed architecture of the agentfor communication using a recurrent neuralnetwork.

communication signal is 1, does not depend on the otheragent’s communication signal.

The agent learns its communication and action by thereinforcement signal which is obtained after the action.When the agent obtained the reward, the probabilities ofthe action and a series of the communication signalsbecome large, and when it obtained the penalty, theprobabilities become small. The neural network istrained by normal BP (Back Propagation) learningalgorithm or BPTT (Back Propagation Through Time)learning algorithm[6]. Both are typical supervisedlearnings for neural networks. In BPTT, the error signalpropagates backward through time until the beginning ofthe negotiation. On the other way, in normal BP, theerror signal propagates backward through the network, butdoes not propagate through time, in other words, does notpropagate from the input layer to the hidden layer. Thetraining signal is given to the communication output oraction output at each time step as follows,

xideal=x+η⋅r⋅x'⋅owhere xideal : training signal, x=f(u)=1/(1+exp(-u))-0.5 :output of the neural network, u : internal state of the outputneuron, η : learning rate (0.1 is employed here), r :reinforcement signal. Here 1 is employed in the case ofreward, and -5 in the case of penalty. What is differentfrom the popular reinforcement learning is that thereinforcement signals are not discounted for the training ofthe past communication signal. x’=dx/du=(0.5-x)(0.5+x),and o : actual (stochastically determined) communicationsignal or action that is different from the raw output of theneural network. x’ is added to make the output stablewhen the probability p is close to 0 or 1. The initialconnection weights from the input layer to the hiddenlayer are set to be small random numbers, and those from

the hidden layer to the output layer are set to be 0.0. Then the probabilities of all the actual outputs are just 0.5before the learning.

3. Simulation

3.1. Two Agents Negotiation Problem

3.1.1. Setting. At first, two agents negotiation problem isdescribed. Figure 2 shows the simulation environment. Two players are randomly chosen from 16 agents, and theplayer cannot know the opponent directly. Threecommunication chances, #1, #2, #3, are given to theplayers. The players decide the communication signal(-1 or 1) at each chance according to the probabilitydecided by the communication output as described in theprevious section. At the chance #1, the player has todecide the first communication signal without knowing theother’s signal. At the chance #2, it decides the secondcommunication signal depending on its and the other’sfirst signals. At the last chance #3, it can decide thethird signal from its and the other’s first and secondsignals. Then note that the information about the firstcommunication signals are supposed to be kept indirectlyin the hidden layer as the context information through thehidden-input feedback connections. Finally it decideswhich route of (action=1) or (action=-1) the playergoes to. If both players select the same route, theyreceives penalty (r=-5), and if they select the differentroute, they receives reward (r=1). Then they learns itsthree communication signals and action from thisreinforcement signal as described in the previous section. In this problem, since there are three communicationchances, two of 23=8 agents can be distinguished usingthe communication patterns. Even if all thecommunication signals are the same between two agents,the agents can make the different actions. Then each of24=16 agents can go through the different route from theother’s ideally.

3.1.2. Results. The right hand side of Table 1 shows thenegotiation examples after a successful learning, whichmeans that all the pairs of the agents could go without anycollisions after learning. In the almost all cases after thesuccessful learning, the probabilities of all thecommunication signals and action were close to 1.0 or 0.0. The first three examples are explained as follows.

Example 1: The agent 0 and agent 1 were chosen as

players. Both agents continued to output thecommunication signal 1, and finally the agent 0 wentto the route , and the agent 1 went to the route . Example 2: The agent 0 and 2 were chosen. The agent

0 made the same sequence of communication signalsand action (select the same route) as the example 1.

0152 agents are randomly chosen at each trialplayer 1ActActplayer 2#1#2#3#3#2#125# : communication point#1#2#3ActAct : action pointagent 21-11agent 5-111Figure 2. Simulation environment of two agentsnegotiation problem

The agent 2 made the communication output 1 twice,changed it to -1 at the third chance, and finally wentto the route .

Example 3: The agent 1 and 2 were chosen. The agent

1 continued to output the communication signal 1,and went to the route . The action is differentfrom that in example 1. The agent 2 made thesame communication signals and action as theexample 2.

Here, although the meaning of the communicationsignals had not been given to the agents before thelearning, we put the meaning on them after the learning.There were two agents which always went through thesame route not depending on the opponent agent like theagent 0 in the above examples. The sequences of suchagents’ communication signals were always the same notdepending on the opponent. These agents are supposedto have persisted on going through the selected route.Accordingly the meaning that the communication signal is1, is defined as the assertion to go through the route inthe above examples. Then the agent 1 in the example 1is supposed to persist on the route , but to change itsaction to the route finally.Now Table 1 shows the negotiation results for all thepairs of players. The order of the agents in this table issorted by the probability of selecting the route . Thefilled circle shows that the agent continued to persiston going along the route . The circle withsuperscript number means that the agent persisted on theroute until the superscript number of communicationchance, and changed its intention to the route at thenext chance like the agent 12 in the example 7. Thecircle with subscript number means that the agentpersisted on the route at first, it changed its mind to theroute at the subscript number of chance, and finally itreturned its intention to the route . Such agent can beseen as the agent 4 in the example 5 and the agent 6 in theexample 6 at the right hand side in Table 1. Each

Table 1. Communication signals and action corresponding to the agent pairafter learning when Elman-type neural network is used in the agent.

Agent No.example01234567101112131415agent#1#2#3Act0(1)0111131111222agent#1#2#3Act3113222(2)01211-141111222agent#1#2#3Act.oN1111111 51322(3)1tn111111211-1eg6A23agent#1#2#3Act711111111111111(4)011188-1-1-1923111111agent#1#2#3Act1022131111(5)41-1151-1-1112221111agent#1#2#3Act12222(6)61-1-11332271-1-1143agent#1#2#3Act15(7)12-1-1113-1-11meaning of and is as well as the circle case, but the1. The communication output is always close to the action

final selection is the route . It is known that eachoutput by the similar connection weights from theagent was ordered and arbitrated autonomously andhidden layer to the output layer.

decentralizedly. It was found in other simulations that2. The initial communication output is 1 by the positivethe communication sequence acquired by the learningbias of the hidden 2 neuron and the positive connectionwhen the agent persisted on the route was one of (-1,weight of the hidden 2 -> the output 1.

-1, -1), (-1, 1, -1), (1, 1, 1) and (1, -1, 1). The acquired3. If the opponent agent’s communication signal is -1, bothsequence of the four depends on the initial connectionoutputs become 1 by the positive connection weights ofweights and the stochastic factor in the choice of agents inthe hidden 2 -> the output 1 and the hidden 2 -> thethe simulation. The reason why the sequence was not (1,output 2, and negative connection weight of the input 11, -1), (1, -1, -1), (-1, -1, 1) or (-1, 1, 1) is that it is easy to-> the hidden 2. This can be observed in Fig. 4 (b),repeat the same communication signal at all times or to(c) and (d).

change the communication signal at all times. That is4. The agent tries to keep the output value by the negativebecause the previous signal is one of the inputs of theconnection weights of the input 2 -> the hidden 1 andneural network. When the number of agents is reducedthe hidden 1 -> the output 1, and also by keeping theto 8, the solution could be found more easily. In thishidden 2 neuron’s output based on the positive feedbackcase, no agents exist which change its intention more thanconnection weight of the input 4 -> the hidden 2.

once like the agent 2, 4, 5, 6, 9, 10, 11 and 13 in Table 1.5. If the opponent agent’s communication signal is 1, theNext, the connection weights of the trained neuralhidden 2 value is decreased by the negative connectionnetwork with two hidden neurons in the agent 2 in Table 1weight of the input 1 -> the hidden 2. If the state, thatis shown in Fig. 3 and the time series of the hiddenboth of the agent’s and the opponent agent’sneurons’ outputs for 4 types of opponent agents are showncommunication signals are 1, continues for two timein Fig. 4. The agents could solve this problem bysteps, the communication output becomes -1. That islearning only with two hidden neurons, but the probabilitybecause the hidden 2 neuron’s output becomes less thanof success was very small. The basic strategy of the0 for the latter reason of the item 4. This can beagent can be interpreted as follows,observed in Fig. 4 (a) and (b).

󰀀@The reason why the situation of the item 5 can be realizedis that the neural network is recurrent and the contextinformation can be stored by the feedback connectionweights. When the normal layered neural network wasused instead of the Elman network, the agents whichchanged their intentions at the chance #3 like 2, 3, 12 and13 in Table 1 did not appear.

100 simulations were done varying the initialconnection weights of the neural network of each agent. The number of the hidden neurons is 4. When BPTTwas applied, all the combinations of two agents could gothrough the route without collisions in 41 simulations. In the case of normal BP, all the agents went through theroute without collisions in 52 simulations. We hadexpected that successful number is more in the case ofBPTT than in the case of normal BP, but it was theopposite result. The connection weights from thecontext inputs to the hidden neurons are used for bothkeeping the necessary context information and reflectingthe information to the outputs. The learning for bothkeeping and reflecting the context information can be doneby BPTT, but by normal BP, the learning only forreflecting the context is done. In this case, it might bethought that the learning for reflecting the contextinformation to the outputs is useful to keep the necessarycontext information.

3.2. Four Agents Negotiation Problem

3.2.1. Setting. Next four agents negotiation problem ispresented. The previous two agents negotiation problemcan be solved by ordering all the agents and deciding eachroute based on the order between two players. However,four agents negotiation problem is more difficult. Because, even if an agent has negotiated successfully withanother agent, it might conflict to the other agents, andnevertheless the reinforcement signal of one agent is notshared with the others. Furthermore, to solve theproblem described in the following, it is not enough onlyto know what are the chosen agents, but also to knowwhere the other chosen agents are arranged relatively.

Figure 5 shows the environment of the four agentsnegotiation problem. 4 players are chosen randomly ateach trial among 8 agents, and located at the entrance ofeach route. The players can transmit a communicationsignal, -1 or 1, to all the players and can receive 4communication signals three times. The first receivedsignal is its own signal, the second one is transmitted fromthe opposite side agent, the third one is from the neighboragent through the route , and the last is from the otherneighbor agent through the route . But the agentdoes not know the following three matters, the source ofeach signal, which are the chosen agents, and the locationsof the chosen agents. The agents have to decidewhether it goes on the route or the route . If the

communicationactionsignalsignal1-1.113.72.0-4.22-6.37.51-0.13.62-5.42.17.21-7.5234of the opponentcom signal the lastcomthe lastof its own signal Figure 3. Connection weights of the neuralnetwork in the agent 2 after learning.

hidden 1hidden 2output0111311-1211-1211-10.50.5ttuputptuouo0.00.0-0.5#1#2t-0.5(a)#3Act#1#2(b)#3Actt131-1-115-1-1-1211121110.50.5ttuputptuouo0.00.0-0.5#1#2t-0.5(c)#3Act#1#2(d)#3ActtFigure 4. Time series of the hidden neurons’outputs of the agent 2 according to the opponentplayer.

agent can go through one route without collisions with theneighbor, it can obtain the reward r=1. If the agentcollides, the penalty r=-5 is given to the agent. Thereward and penalty is not shared with the other agents. The architectures of the agents are almost the same asshown in Fig. 1, but the number of communication inputsare 4 instead of 2. The learning of each agent is same asthe previous simulation. The problem is solved whenall the agents go clockwise or counterclockwise at eachtrial through the route arranged in a diamond shape asshown in Fig. 5. In order that all the agents goclockwise or anti-clockwise, the agents have to select thesame action (route) as the opposite side agent, and thedifferent action from the neighbors. It is very difficultfor us to design the communication strategy of each agent.3.2.2. Results. The number of agent, 8, is not themaximum for solving this problem. Many kinds ofsolutions could be found in the simulations on the contraryto the previous two agents negotiation problem. Insome simulations, one agent did not change the sequenceof its communication signals and action, whatever theother agents’ communication signals were. The agentseems to have the initiative, and all the other agents seemto try to know the location of the initiating agent anddecide their actions according to the location if theinitiating agent is chosen as a player. In the most cases,all the agents changed their communication signals andactions

Though the possibility of the successful learning is verysmall, this problem could be solved by the network withonly two hidden neurons. The details of the solution aredescribed as follows. Table 2 shows the typicalnegotiation examples in the solution. In Table 2 (a),agents 0, 2, 3, 7 were chosen. The first and secondcommunication signals for 4 agents were the same, butonly the agent 7 changed its signal at the communicationchance #3. Finally, the agent 0 and 3 took the action 1(route ) and the others took the action -1 (route ). Table 3 shows the probability of each communicationsignal and action, and the number of the communication-action patterns of each agent on all the combinations of theagents after learning. The order of the agents is notsorted. Since the values in the table is 2p-1 where p isthe probability that the communication signal or action is 1,the sign shows the agent’s preferable direction and theabsolute value shows the insistence degree of the agent. It is known that the agents can change its communicationsignals adaptively and with variety. Especially theagent 0 took seven communication-action patterns amongall the eight patterns. Since the communication signal atthe chance #1 is always the same, the number of all the

23#1512#24567#3Act4 agents are randomly chosen at each trial#1#2#3ActAct#3#2#107Act1#3#2#1Figure 5. Simulation of the four agentsnegotiation problem.

patterns is 8. In all the agents, the signs of thecommunication signals at three chances are the same,while the sign of the action is always different from that ofthe communication signals. It can be interpreted thatthe communication signal 1 denotes to insist on the action-1(), and the communication signal -1 denotes to insiston the action 1(). When we looked at the results forall the combination of the agents, some rules could befound as follows.

(Case 1) When the preferable route of the four chosen

agents is , e.g. when the chosen agents are 0, 2,3, and 7 as shown in Table 2 (a), the agent 7 alwayschanged the communication signal from -1 to 1 atthe chance #3, and took the action –1. That doesnot depend on the arrangement of the agents.The opposite side agent did not change itscommunication signal, but took the action -1.The others did not change their signals and took theaction 1 according to its preference.

(Case 2) When the preferable route of three chosen agents

is and the other’s is as shown in Table 2 (b),the agent who prefers and its two neighbors didnot change their signals, and only the opposite sideagent always changed the signal at the chance #2and took the action .(Case 3) When the preferable route of the half of the

chosen agents is , the situation can be dividedinto two cases. In one case, where the agentshave the same preference as the opposite side agentas shown in Table 2 (c), all the agents did notchange the signal and took the preferable actions.

Table 2. Typical communication patterns aftersuccessful learning in four agents negotiationproblem.

#1#2#3Act#1#2#3ActAgent 0-1-1-11Agent 2-1-1-11Agent 2-1-1-1-1Agent 5111-1Agent 3-1-1-11Agent 7-1-1-11Agent 7-1-11-1Agent 4111-1(a)(c)#1#2#3Act#1#2#3ActAgent 2-1-1-11Agent 2-1-1-11Agent 7-111-1Agent 4111-1Agent 0-1-1-11Agent 51111Agent 5111-1Agent 7-1-11-1(b)(d)Table 3. The probability of the communicationsignals and action of each agent after learningand the number of the communication patterns.The values in the table are 2p-1 where p is theprobability that the output is 1.

com #1com #2com #3Actionnum of patternsAgent 0-1.0-0.429-0.4570.3527Agent 11.00.0290.286-0.2574Agent 2-1.0-1.000-1.0000.5812Agent 3-1.0-0.086-0.3430.1815Agent 41.01.0001.000-0.5712Agent 51.00.7710.533-0.4865Agent 61.00.4290.324-0.2865Agent 7-1.0-0.771-0.5520.48 (Case 4) In the other case, where the two agents, who

have the same preferable direction, are theneighbors as shown in Table 2 (d), there are manycombinations and any simple rules could not befound.

As well as this simulation, it had been expected also inother simulations that the half of 8 agents preferred the

route , and the other 4 agentspreferred the route than the route . However, it could be found thatonly two agents preferred the route , and the other 6agents preferred the route as the most asymmetricalcase.

Figure 6 shows the connection weights of the recurrentneural network of the agent 0, 2 and 7. Thecharacteristics of each agent is described in the following.

communicationsignalactionsignal1-4.2-8.8-1.52-4.4-8.87.611.31.222.90.6-3.3-0.11-4.91.522.24.333.7-1.8-1.40.7456oppositeneighborneighborsideat sideat sideitself(a) Agent 0communicationsignalactionsignal1-7.0-0.920.29.7-0.8-4.710.4-0.720.21.8-2.84.0-0.110.82-2.41.4-3.53-1.62.1-0.7456oppositeneighborneighborsideat sideat sideitself(b) Agent 2communicationsignalactionsignal1-4.110.010.30.22-10.1-2.610.9-3.020.82.1-3.70.5-0.6-6.211.73.82-1.83.33-5.4-1.56oppositeneighborneighborsideat sideat sideitself(c) Agent 7Figure 6. Connection weights of recurrentneural network after learning

[All agents] In the neural network of all the agents, it is

easily found that the signs of the connectionweights to the hidden neurons from the input 1 (thelast communication signal from the opposite sideagent) and input 4 (the last communication signalof the agent itself) is opposite to those from theinput 2 and input 3. That is because the agent inthis simulation has to make the same decision withthe opposite side agent and make the differentdecision from the neighbor agents. Accordinglyall the 8 agents have the same tendency in theirstrategy decided by their connection weights.

[Agent 0] The agent 0 can generate seven sequences of

communication signals and action as shown inTable 3 by the recurrent neural network as Fig. 6(a). The absolute value of the connectionweights from the input 1, 2, 3 are large and thosefrom the input 4, 5, 6 are small. This means thatthe agent 0 does not have strong intention and triesto adjust its action to the others. That is thereason why the agent 0 generates a variety ofsequences of communication signals.

[Agent 2] On the contrary, the agent 2 does not change its

communication signal as shown in Table 3. Thatis also known from Fig. 6 (b) in which theconnection weights from the hidden neurons to thecommunication output is close to 0.0 and the biasof the output is very small.

[Agent 7] The agent 7 has an ability to change the

communication output according to some contextsusing recurrent neural network. In Table 2 (a)and (d), the received communication signals at thechance #1 is the same as those at #2, but thecommunication signal of the agent 7 at chance #2 isdifferent from that at #3. The ability can beexplained roughly from the connection weights asshown in Fig. 6(c). Since the hidden 2 neuron isvery small according to the bias at the chance #1,the hidden neuron 1 becomes large and thecommunication output becomes small at the chance#2 by the connection weight from the input 6 to thehidden 1 and the weight from the hidden 1 to theoutput 1. However, since the hidden neuron 2becomes large by the connection weight from theinput 3 to the hidden 2 at the chance #2, the hiddenneuron 1 becomes small and the communicationoutput becomes large at chance #3.

In the case of the recurrent neural network with 4hidden neurons, 100 simulations were done varying theinitial connection weights of each agents. When BPTTwas applied, in 42 simulations, all the combinations of twoagents can go through the route without collisions. Inthe case of normal BP, all the agents go through the routein 26 simulations. BPTT is slightly useful in thissimulation.

4. Conclusion

We proposed to divide communication among multipleagents into two classes with respect to its meaning. Thefirst one is to transmit the observed information, and thesecond one is to transmit the agent’s intention. We alsoproposed the architecture using Elman type recurrentneural network for the learning of the lattercommunication autonomously and decentralizedly fromthe reinforcement signal. The problem in which the

agents avoid some collision by negotiation was adopted asexample. Although the communication contents,strategy and representation were not given to the agentsbeforehand, they became to be able to avoid the collisionthrough the communication. It was known that therecurrent neural network kept the past information asoccasion demands, and the agent negotiated adaptivelyaccording to the other agents. Among the agents, therewere differences in the degree of the persistence of itsintention. We think that it can be said individuality. This individuality emerged even if the learning of all theagents are the same. Furthermore, in the four agentsnegotiation problem, which cannot be solved only by one-to-one negotiation, the solution could be obtained withoutsharing reinforcement signal to the other agents. Thisprocess is similar to the optimizing process by Hopfieldnetwork with respect that each element descends apotential surface of the whole system by a decentralizedway through the interaction with the others. Thisarchitecture is useful particularly for the emergence ofcommunication to transmit its intention, but it can beapplied to the emergence of communication in multi-agentsystem generally.

5. Acknowledement

A part of this research was supported by “The JapanSociety for the Promotion of Science” as “BiologicallyInspired Adaptive Systems” (JSPS-RFTF96I00105) in“Research for the Future Program”, and also by the Grant-in Aid for Scientific Research (No. 10450165) supportedby the Ministry of Education, Science, Sports and Cultureof Japan.

6. Reference

[1] Werner, G. M. & Dyer, M. G., “Evolution of Communication

in Artificial Organisms”, Proc. of Artificial Life II, pp.1-47(1991)

[2] Nakano, N., Sakaguchi, Y., Isotani, R. & Ohmori, T., “Self-Organizing System Obtaining Communication Ability”,Biological Cybernetics, 58, pp.417-425 (1988)

[3] Davis, R. & Smith, R. G., “Negotiation as a Metaphor for

Distributed Problem Solving”, Artificial Intelligence, Vol. 20,No. 1, pp. 63-109 (1983)

[4] Kreifelts, T. & von Martial, F., “A Negotiation Framework

for Autonomous Agents, Demazeau, Y. & Muller, J.-P. eds.,Decentralized A. I. 2, pp.71-88 (1991)

[5] Elman J. L., “Finding Structure in Time”, Technical Report

CRL 8801, Center for Research in Language, Univ. ofCalifornia, San Diego (1988)

[6] Rumelhart, D. E., Hinton, G. E. & Williams, R. J., “Learning

internal representations by error propagating”, ParallelDistributed Processing, Vol. 1, MIT Press, pp. 318-362(1986)

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文