Repeated game theory as a framework for algorithm development in communication networks

This article presents a tutorial on how to use repeated game theory as a framework for algorithm development in communication networks. The article starts by introducing the basis of one‐stage games and how the outcome of such games can be predicted, through iterative elimination and Nash equilibrium. In communication networks, however, not all problems can be modeled using one‐stage games. Some problems can be better modeled through multi‐stage games, as many problems in communication networks consist of several iterations or decisions that need to be made over time. Of all the multi‐stage games, the infinite‐horizon repeated games were chosen to be the focus in this tutorial, because optimal equilibrium settings can be achieved, contrarily to the suboptimal equilibria achieved in other types of game. With the theoretical concepts introduced, it is then shown how the developed game theoretical model, and devised equilibrium, can be used as a basis for the behavior of an algorithm, which is supposed to solve a particular problem and will be running at specific network devices. Copyright © 2015 John Wiley & Sons, Ltd.


INTRODUCTION
Game theory is a mathematical tool that aims to study and predict the outcome of situations where two or more agents have conflicting interests [1]. The field of game theory has its roots in decision theory, and, in fact, it can be thought as a generalization of decision theory for multiple agents [1]. As a field on its own, game theory was pioneered by John von Neumann and Morgenstern in [2], laying the foundations of current game theory. A general formal description of games was presented, and several zero-sum games were analyzed, and solutions to the games were devised.
Following the concepts published in [2], many other contributions were published, such as the first mathematical discussion of the prisoner's dilemma in [3] and the Nash equilibrium in [4], probably one of the most relevant contributions. Nash equilibrium was quite important because it is applicable to a wide variety of game types [1,4]. The field kept evolving with the research and analysis of several types of games such as extensive form and repeated games, which will be presented in this tutorial [5,6]. Game theory also laid down the foundations for modern disciplines, which are very active nowadays, such as algorithmic game theory and mechanism design [7,8].
Since its inception, game theory has been researched and used mainly for economical purposes, but other fields started to use it as well. For instance, game theory was extensively applied to biology, mainly because of the work from John Maynard Smith, who developed the evolutionary stable strategy [9]. Other fields like political and social sciences followed and started using game theory [10]. Computer science and communication networks are no exception, and much research emerged in the frontier between computer science and game theory. Most of the research in computer science and game theory has been related to complexity theory, where several algorithms to compute Nash equilibrium have been proposed and studied [11].
In the communication networks field, game theory has been used mainly for analytical purposes, where devices, such as routers, are players with selfish interests. Nonetheless, there are also some contributions where game theory is used in communication networks as a framework for algorithm development. In [12], an algorithm for fair bandwidth allocation, which controls the output queues of wireless routers, is developed with the aid of a repeated game model. In [13], a Stackelberg game model is adopted to address the issue of resource allocation in femtocells. In these scenarios, interference can play a significant role, and as such, the developed model enforces a pricing system applied to femtocell users that discourages over utilization of available spectrum and lowers interference. In [14], non-cooperative game theory is adopted to model wireless networks where users have multiple connections available and are faced with multiple connectivity decisions that can unbalance quality of service throughout the network. In [15], the problem of resource allocation in an orthogonal frequency division multiple access network is studied through the use of cooperative game theory in order to achieve fairness in bandwidth allocation.
In the different environment of military unmanned air vehicles [16], a coalition formation game is used where the air vehicles form coalitions to collect data from different geographical locations and transmit it wirelessly to a central base station using each other as relays in the most efficient manner.
When discussing fairness in resource allocation, one possible strategy is to enforce such fairness by applying queueing mechanisms and rules. That is the approach taken in [17], where a cooperative game is developed and where the payoffs obtained by cooperation are described by stochastic variables.
In this tutorial, it is shown how repeated game theory can be used as a framework for algorithm development in communication networks, instead of using it just as an analytical tool. No prior knowledge on game theory is assumed. That is, the tutorial starts by introducing the basis of onestage games and, with such knowledge, continues onto dynamic and repeated games. It is shown how optimal Nash equilibrium strategies can be obtained with infinite-horizon repeated games, while the equivalent one-stage version has suboptimal Nash equilibria and how that can be used as a support for the development of an algorithm to be run at devices in the network. This tutorial also exemplifies, with a simplified model taken from [12], the use of game theory to model a problem, devise an equilibrium strategy, and develop an algorithm that mimics such equilibrium strategy.
The rest of this paper is organized as follows. The next section introduces one-stage games, pure and mixed strategies, and Nash equilibrium. In Section 3, multi-stage games are presented together with Nash equilibrium and backward induction. Repeated games are then presented in Section 4, as well as the Nash equilibrium in infinite-horizon repeated games and the folk theorem. An example is then shown in Section 5, from the model to the development of the algorithm. The tutorial finalizes with some conclusions in Section 6.

ONE-STAGE GAMES
Game theory can be used to model and study situations where agents have conflicting interests. One example of such a situation is the widely known prisoner's dilemma, which is usually presented as follows [1]. Two men are arrested, but the police does not have enough information for them to be convicted. Both prisoners are then interrogated separately and at the same time. Each prisoner can choose to stay silent or betray the other. If both stay silent, both will go to prison for just 1 month. If one prisoner betrays the opponent while the opponent stays silent, then the silent prisoner goes to prison for 12 months and the betrayer goes free for cooperating with the police. Finally, if both prisoners betray each other simultaneously, then both will go to prison for 3 months. The question is what will prisoners do, assuming that none of them can be sure if the other will betray or stay silent. If both stay silent, both get a minor sentence of 1 month. However, each prisoner may feel tempted to betray the other in order to be freed. As a result, both may end up betraying each other. Hence the dilemma.

REPEATED GAME THEORY AS A FRMWRK FOR ALGORITHM DEVELOPMENT IN COMNETS
In [18], a conceptually similar version of the prisoner's dilemma is presented. It is called the forwarder's dilemma and will be used throughout this article to help in explaining some definitions. The game can be explained as follows. There are two players, router p 1 and router p 2 , that want to send a packet to d 1 and d 2 , respectively. As shown in Figure 1, for d 1 to receive the packet from p 1 ; p 2 will have to cooperate and forward the packet. Conversely, the same applies for the packet from p 2 sent to d 2 . If a packet reaches its destination, then the player who sent it receives a payment of 1. A player that chooses to forward the packet of the opponent is incurred a cost of C, where 0 < C << 1. This cost represents the consumption of resources to forward foreign traffic. The question is whether or not players in the forwarder's dilemma will cooperate with each other by forwarding packets. If both players cooperate, then both will receive a payoff of 1 C . However, a player might feel tempted to defect in order to receive a payoff of 1, which is the highest payoff in this game, leaving a payoff of C for the opponent. In non-cooperative one-stage games, it is assumed that players decide at the same time what will be their actions without communicating their preferences beforehand. Here, the term preference refers to the action that a player feels tempted to choose, forward, or drop the packet in the case of the forwarder's dilemma. However, even if players in the forwarder's dilemma communicate their preferences beforehand and agree to cooperate, both players will still be tempted to lie and drop the packet belonging to the opponent in order to receive the highest payoff of 1. As a safe precaution, both players will defect by not forwarding the packet of the opponent. This way, both players will play D and, as a result, will receive a payoff of 0, even though they could receive a better payoff of 1 C . Hence the dilemma.

Normal and strategic form representations
Games can be represented in many different forms. One of the most common is the normal form representation, which is very useful for simple games with two players and only a few available actions to each player [1,19]. This representation consists of a table, where the lines represent the strategies of one player and the columns represent the strategies of the other player. The cell that results from the intersection of a row and a column contains the payoffs that both players will receive. Considering the just presented forwarder's dilemma, there are two players, p 1 and p 2 , which can forward or drop a packet, represented by F and D, respectively. The normal form representation of the forwarder's dilemma is shown in Table I. The rows represent the actions available to p 1 , while the columns represent the actions available to p 2 . As already told, the cell resulting from the chosen line and column contains the payoffs that players will receive. For instance, if p 1 forwards and p 2 drops the packet, then the resulting cell contains . C; 1/, which means that p 1 receives a payoff of C and p 2 receives a payoff of 1. The tuple including the strategy chosen by each player is called strategy profile. In the example just used, where p 1 forwards and p 2 drops, the strategy profile is .F; D/.
The normal form representation is good for simple examples; however, for games with many players and multiple strategies, it is impossible to use the normal representation. For those cases,    the strategic form is the most suited. In this form, a game is represented by G D ¹P; S; U º, where P represents the set of players, S represents the set of all strategy profiles, and U represents the set of utility functions, explained next [1,18,19].
The set of all strategy profiles can be obtained by S D i 2P S i , where S i is the set of all strategies available to player i. ‡ In game theory literature, for convenience, the set of all players except i is denoted by i. This way, one can represent a strategy profile .s i ; s i / that is composed of a specific strategy from i, s i 2 S i and any combination of strategies from all other players, s i 2 S i . As for the set of utility functions, U D ¹u i ji 2 Pº, it includes the payoffs that each player receives as a result from the chosen strategy profile, that is, u i W S ! R [1,18,19].
Players in a game can have complete or incomplete information. In a complete information game, every player i 2 P knows everything about the game he or she is involved in. More specifically, every player i 2 P knows all the other players, their available strategies, and the respective payoffs. Moreover, every player knows that the opponents also have that information. This knowledge can be used to intelligently choose strategies that provide the highest possible payoffs [1,18,19].
On the other hand, in a game with incomplete information, players do not know which strategies are available to the opponents, neither the resulting payoffs. Certain beliefs might be known about the opponents, but those are not accurate, and as such, the behavior of players can be different. In this thesis, only complete information games will be used.

Definition 1 (Complete information game)
A game with complete information is a game where every player i 2 P knows all the other players, their available strategies, and all payoffs that they receive as a result from the chosen strategy profiles.

Dominated strategies
In game theory, players choose their strategies in order to receive the highest possible payoff. Thus, it can be expected that strategies that never lead to high payoffs will never be chosen. Considering the game in Table II, taken from [1], player p 2 will never choose strategy´2. That is because greater payoffs can be obtained by p 2 , either by choosing x 2 or y 2 , no matter how his opponent plays. In this case, it is said that strategy´2 is strongly dominated [1,18,19].

Definition 2 (Strong dominance)
Strategy s 0 i of player i is strongly dominated if for any strategy profile adopted by the opponents of i; s i 2 S i , there exists at least one s i ¤ s 0 i such that u i s 0 i ; s i < u i .s i ; s i /. Strongly dominated strategies can be removed from the game because intelligent players would never choose them. In the case of the game in Table II, if strategy´2 is eliminated, then the resulting game will be the one in Table III. Note that in the resulting game, after elimination of´2, strategy y 1 of p 1 also becomes strongly dominated and therefore can be removed. This elimination process of strongly dominated strategies is called iterative elimination [1,18,19]. At the end, for the given example, only one strategy for each player will remain, x 1 for p 1 and x 2 for p 2 . Because strategy profile .x 1 ; x 2 / is expected to be chosen, p 1 will receive a payoff of 2 and p 2 will receive a payoff of 3.   Strategies can also be weakly dominated [1,18,19].

Definition 3 (Weak dominance)
Strategy s 0 i of player i is weakly dominated if for any strategy profile adopted by the opponents of i; s i 2 S i , there exists at least one s i ¤ s 0 i such that u i s 0 i ; s i 6 u i .s i ; s i /, with strict inequality for at least one s i 2 S i .
Removing weakly dominated strategies by iterative elimination can also be carried out; however, it can lead to unexpected results. Considering the game in Table IV, taken from [20], p 1 has two weakly dominated strategies, x 1 and y 1 . In Figure 2, it is possible to see how eliminating x 1 or y 1 first can lead to different results. That is, the order in which weakly dominated strategies are eliminated can lead to different outcomes. Such situation does not happen with strongly dominated strategies, because elimination does not cause strongly dominated strategies to cease being strongly dominated. On the other hand, a weakly dominated strategy can cease being dominated if other strategies are removed.

Nash equilibrium
It is not always possible to predict the outcome of a game through iterative elimination. For instance, the game in Table V, taken from [1], has no dominated strategies. Nevertheless, it is still possible to predict what will be the outcome of the game. For that, the notion of best response needs to be introduced [1,18,19].

Definition 4 (Best response)
The best response of player i is a function br i .s i / that outputs which strategy i should choose in order to receive the highest possible payoff, given that the opponents will play s i . That is, In the game in Table V, the strategy x 1 from p 1 is the best response to strategy x 2 from p 2 . Strategy x 2 , in its turn, is the best response to strategy´1. One interesting strategy profile is the one where p 1 plays y 1 and p 2 plays y 2 , with the payoff .1; 1/. In this case, y 1 is the best response to y 2 and, similarly, y 2 is the best response to y 1 . This strategy profile is actually the expected outcome of this game because none of the players has any incentive to unilaterally choose a different strategy. That is, if p 1 plays x 1 or y 1 , its payoff will decrease, considering that p 2 does not change its strategy. Similarly, p 2 will also not change to x 2 or´2 because its payoff will decrease, because p 1 is playing y 1 . This type of strategy profiles, where no player has any incentive to deviate, is termed Nash equilibrium [1,18,19].

Definition 5 (Nash equilibrium)
with at least one strict inequality.
It is possible to have more than one Nash equilibrium in one game. In the example in Table IV, both strategies obtained through iterative elimination of weakly dominated strategies, .´1; x 2 / and .´1; y 2 /, are actually Nash equilibrium strategies. Indeed, strategy profiles obtained by iterative elimination are always Nash equilibrium profiles. Note, however, that in the case of iterative elimination of weakly dominated strategies, the resulting profiles are a subset of the Nash equilibrium profiles, meaning that there might be more Nash equilibrium profiles [1]. As for iterative elimination of strongly dominated strategies, the resulting profile is the only Nash equilibrium, as in the game in Table II [1].
Nash equilibrium, as shown, predicts what will be the outcome of a game. For example, in the forwarder's dilemma in Table I, the Nash equilibrium profile is .D; D/. Note that this outcome is not the most efficient, because both players could receive greater payoffs if the profile .F; F / was played instead. However, because any player might feel tempted to defect in order to receive the highest payoff of 1, both players, as a precaution, end up choosing D in order to avoid receiving C . In this case, the Nash equilibrium strategy is not the most efficient outcome because players receive the payoff .0; 0/, and a greater payoff .1 C; 1 C / could be earned if the profile .F; F / was chosen instead. In fact, Nash equilibrium only predicts what will be the natural choices of intelligent players that do not trust each other, and in many games, it is not the most efficient outcome. The challenge resides in designing systems where players have incentives to cooperate, forward traffic from each other in the case of the forwarder's dilemma, in order for efficient Nash equilibria to be reached [7,8,21].
In game theory, the strategy profile .F; F / of the forwarder's dilemma is said to be Pareto superior to other profiles.

Definition 6 (Pareto superior)
The most efficient outcome in a game would be one with the highest payoffs for every player, .F; F / in the case of the forwarder's dilemma. Such efficient outcome is said to be Pareto optimal, and no other profile is Pareto superior to it [1,18,19]. There are cases where Nash equilibrium is Pareto optimal. In such cases, it is said that Nash equilibrium is Pareto efficient. Naturally, the most desired Nash equilibrium is the Pareto efficient one because payoffs are higher.

Mixed strategies
Until now, in this chapter, it has been assumed that players choose one specific strategy to be played and that the expected outcome of the game is a Nash equilibrium profile. However, in some games, Nash equilibrium may not exist, as shown in the example in Table VI, taken from [22].
Instead of choosing which specific strategy should be played, players can define a probabilistic distribution over their available strategies. In the example in Table VI, a Nash equilibrium would exist if both players define a probability of one-half over each of their strategies, as it will become clear next. Such distribution is termed mixed strategy [22].

Definition 8 (Mixed strategy)
A mixed strategy i is a distribution over the strategies of i; S i .
The set of all mixed strategies from a player i 2 P is denoted by † i (capital of ). Similarly to strategy profiles, s 2 S, mixed strategy profiles can be defined by † D i 2P † i . From here on, to avoid confusion, the set of profiles in S will be called pure strategy profiles, while the profiles in † will be termed mixed strategy profiles.
Because mixed strategies define probabilities over the set of available pure strategies, the utility function in this case reveals the expected payoff based on the chosen mixed profile [1,19,22]: where s j is the strategy of j in profile s and j .s j / represents the probability of s j being chosen. Hence, u i .s/ Q j 2P j .s j / represents the expected payoff of i if s is chosen. Regarding the game in Table VI and assuming that p 1 chooses x 1 with probability q x 1 and p 2 chooses x 2 with probability q x 2 , then the expected payoff for p 1 can be calculated by: where 1 and 1 are the payoffs that p 1 would receive according to the different strategy profiles. As aforementioned, Nash equilibrium will exist, in this case, for q x 1 D q x 2 D 1=2, meaning that in this mixed Nash equilibrium, both players will receive a payoff of 0. According to Myerson [1] and Fudenberg and Tirole [19], every game with a finite set of strategies has at least one pure or mixed Nash equilibrium. Note that a mixed Nash equilibrium profile is never Pareto optimal. That is because a mixed profile is in fact a linear combination of pure strategies and as such, could not result in higher payoffs than the ones obtained by pure strategies. Before proceeding, let us display all the important concepts introduced in this section: normal and strategic representation. complete and incomplete information games. strongly and weakly dominated strategies. Nash equilibrium. Pareto superior and Pareto optimal strategies. mixed strategies.

Nash equilibrium and backward induction
The concept of Nash equilibrium in dynamic games is not different from one-stage games. That is, a strategy profile is a Nash equilibrium if no player can increase its payoff by unilaterally deviating. Considering the example in Figure 4, taken from [22], the pure Nash equilibrium strategy profiles One-stage games can only model situations where all players take their decisions at the same time. However, many situations may be better modeled with games composed of several stages [1,19,23]. For instance, in the forwarder's dilemma, players may not have packets to send at the same time. Let us assume that p 2 is the first player with a packet to be sent, which p 1 may or may not forward. Immediately after, p 1 also sends a packet, which p 2 can choose to forward or not. Such games are termed dynamic games or multi-stage games. In this chapter, and throughout the thesis, only dynamic games with perfect information are considered.

Definition 9 (Perfect information)
A dynamic game with perfect information is one where every player i 2 P knows all the actions taken in previous stages by all opponents.
In the previous multi-stage forwarder's dilemma example, p 2 can decide whether or not to forward based on the action of p 1 in the previous stage.
Naturally, dynamic games need a different representation that must be capable of showing the order in which players make their moves [1]. The extensive form representation, shown in Figure 3 for the forwarder's dilemma, is the most suited for these situations. The extensive form consists of a tree structure where the root node represents the first decision in the game. In the previous example, the first decision belongs to p 1 . The lines with labels (F and D) represent the actions available to the players. Player p 1 , at the first stage, can decide to forward, F , or to drop, D, the packet from p 2 . The leafs of the tree contain the payoffs that players will receive according to their decisions.
In multi-stage games, the player to move in the first stage has a set of strategies equal to the ones available in the one-stage game. For instance, p 1 in Figure 3 has the following strategies available: S p 1 D ¹F; Dº. The players in the subsequent stages can take their decisions based on the actions from previous stages. Player p 2 in Figure 3, can decide its action based on the move of p 1 in the previous stage. For this reason, strategies for p 2 will be different in the multi-stage game. In this case, p 2 has the following strategies available: S p 2 D ¹FF; FD; DF; DDº. The first character in a strategy from p 2 represents the action that p 2 takes if p 1 chooses F in the first stage and the second character represents the action that p 2 takes if p 1 chooses D. For instance, strategy .FD/ means that p 2 will forward if p 1 has forwarded in the previous stage and will drop if p 1 has dropped. As previously explained, the leafs represent the payoffs that players will receive according to the decisions taken.   Table VII. Note that the rows of the table include the current possible moves for p 1 (H and D) and that the columns include the current possible moves for p 2 that are based on the previous action of p 1 (HH; HD; DH , and DD). In strategy profile .D; HH /, which is one of the Nash equilibrium profiles, p 2 threatens to play H regardless of the move from p 1 . Player p 1 is aware of this threat and, as a result, could play his or her best response to HH , which is strategy D. However, looking more closely in Figure 4, if p 1 chooses H in the first stage, then p 2 is really not willing to choose H in the second stage because D would give p 2 a better payoff. This kind of threats is termed empty threats because p 2 is actually bluffing and HH does not represent a real threat [19,22]. Finding Nash equilibrium strategy profiles in multi-stage games can lead to empty threats, and their removal can be carried out through backward induction [19,22]. This technique starts by analyzing the most profitable action in the last stage and then, based on the most profitable actions at the last stage, it is analyzed which is the most profitable action at the penultimate stage. This analysis keeps proceeding upward in the tree structure until the root node is reached.
To exemplify this, let us assume the game in Figure 4. First, the action that results in the highest payoff for p 2 is determined, considering all the possible previous actions of p 1 . If p 1 played H , then the best choice is for p 2 to choose D. On the other hand, if p 1 chose D, then p 2 will be better with H . Given the best moves of p 2 , it is possible to decide which action results in the highest payoff for p 1 . Clearly, p 1 will be better by playing H because it will give him or her a higher payoff. In Figure 5, it is possible to see the result of the backward induction, where the thick lines mark the best actions at every stage. The continuous route of thick lines from the root to the leaf represents the predicted outcome of the game. Hence, .H; DD/ and .H; DH / are the predicted outcomes, because both these strategies lead to the actions chosen by backward induction.
As a reminder, the following list summarizes the important concepts introduced in this section: multi-stage games. perfect information games. backward induction and Nash equilibrium in multi-stage games.

REPEATED GAMES
Repeated games are a specific type of dynamic games where players face the same one-stage game repeatedly [1,19,22,23]. An example of a repeated game would be the repeated forwarder's dilemma, where the game in Table I

Finite-horizon games and Nash equilibrium
Repeated games can be finite-horizon, which means that the number of stages is limited, or infinitehorizon, which means that players interact over an infinite or unknown number of stages [1,22]. The payoff attributed to every player i 2 P of finite-horizon games can be calculated by summing the stage payoffs of all stages: where T is the last stage, u i is the stage payoff of player i, and U i is the total payoff. Considering the repeated forwarder's dilemma as an example and assuming that both players are using a strategy that chooses F at every stage, the payoff attributed to both players would be P T t D0 .1 C / D .T C 1/ .1 C / in this case.
To understand Nash equilibrium in finite-horizon games, let us keep considering the repeated forwarder's dilemma. If both players played F until stage T 1, then one of the players could deviate to D at the last stage T to increase his or her payoff. The opponent knows that, and to avoid receiving C at the last stage, he or she can also play D. Moreover, because it is predicted that both players will play D at the last stage, then players can also deviate at the penultimate stage in order to increase their payoff. Following this reasoning, the strategy profile that chooses the action .D; D/ at every stage is a Nash equilibrium of the finite-horizon repeated forwarder's dilemma. Note that this method, used to find Nash equilibrium, is similar to the backward induction introduced in Section 3.1. Hence, any strategy profile that produces the outcome predicted by the backward induction is a Nash equilibrium.

Infinite-horizon repeated games and Nash equilibrium
Infinite-horizon repeated games, as aforementioned, are played on forever or for an unknown number of stages. As such, the payoff function of the finite-horizon game, shown in Equation (2), cannot be used for infinite-horizon games because it could result in infinite payoffs. Instead, a weighted sum, termed discounted payoff, is used [1,22,23]: where ı is the weighting factor, termed discounting factor, and accepts only values between 0 and 1, 0 < ı < 1. As for .1 ı/, it is responsible for normalizing the payoffs, allowing the comparison between discounted payoffs and the payoffs received at every stage. For instance, in an infinitehorizon repeated game where player i receives a stage payoff of 1 at all stages, the discounted payoff for i will be 1. Note that as t grows, ı t decreases. Hence, stage payoffs become less important as t grows, because the stage payoff is being multiplied by ı t . The actual value attributed to ı t in a game will influence how fast the stage payoffs lose importance, altering the behavior of players and the Nash equilibrium profiles, as will be demonstrated next.
In infinite-horizon games, similarly to one-stage games, a strategy profile s 2 S is a Nash equilibrium if: However, in the case of infinite-horizon games, Nash equilibrium is greatly influenced by the discounting factor and by the fact that the game is played on forever [1,22,23]. To exemplify it, let us introduce the grim trigger strategy, which is widely used in game theoretical literature [1,22,23]. A player i using this strategy will exert effort at every stage, as long as the opponent also cooperates. If the opponent shirks even only once, then i will stop cooperating from thereafter. Applying this strategy to the forwarder's dilemma and labeling one of the players by i and the opponent by j (if i D p 1 then j D p 2 , if i D p 2 then j D p 1 ), grim trigger strategy can be defined by the following equation: .a j .t 1/ D F / D; otherwise (5) If both players in the infinite-horizon repeated forwarder's dilemma play this strategy, then they will play F at all stages. The outcome of such strategy profile for both players will be: If player i deviates at some stage t , then i will receive a higher payoff in that stage but will receive zero thereafter. The resulting discounted payoff for the deviating player can be calculated by: .1 ı/ .1 C / ı 0 C .1 C / ı 1 C : : : C C1ı t C 0ı t C1 C 0ı t C2 C : : : For the profile, where both players use grim trigger, to be Nash equilibrium, the deviation cannot be profitable for i. That is: Because 0 < ı < 1, then ı > C for the inequality to hold. As long as the cost of forwarding a packet, C , is lower than the discounting factor, ı, it is more profitable to follow the grim trigger than deviating from it. This results in a Nash equilibrium profile where both players exert effort by forwarding packets from each other. As aforementioned and exemplified, the discounting factor is an important piece in the Nash equilibrium of infinite-horizon repeated games. If ı is close to 0, then the importance of the successive stage payoffs will decrease rapidly, and as a result, the relevance of the first stage is much greater than the subsequent payoffs. As such, players will care mostly with the first stage and will try to earn the highest possible immediate payoff. This mimics the behavior of an impatient player mainly interested with the current stage payoff. On the other hand, if ı is close to 1, the importance of the successive stage payoffs decreases slowly, obligating players to be more patient and cooperate to avoid severe punishments in future stages. Relating this reasoning to the necessary condition for the grim trigger strategy to be a Nash equilibrium in the repeated forwarder's dilemma, ı > C , impatient routers will deviate from the strategy and drop the packets, while patient routers will follow the strategy and forward the packets.
An alternative meaning for ı is that it can represent the probability of the game ending in the current stage. That is, high values for ı represent a high probability that there will exist more stages, while a low value of ı represents a low probability that the game will continue to a next stage. Therefore, if the probability of the game being played for many stages is high, then players will be patient and cooperate. Otherwise, if there is a high probability that the game will be played only for a few stages, then players will not care about cooperations nor possible punishments, and will try to earn the highest possible immediate payoffs.
As shown, equilibrium strategies where players cooperate are possible in repeated games, as long as the discounting factor is high enough. In communication networks, generally, it can be considered that the discounting factor is close to 1. The reasoning is that networks are supposed to operate for very long periods of time and it is unknown when will a network cease operation [18]. That is, the probability that the network will operate for many stages is high, and therefore, ı can be assumed to be close to 1.

Folk theorem.
Many Nash equilibrium strategy profiles exist in infinite-horizon repeated games that do not exist in one-stage games. This allows for certain payoffs to be obtained that would not be possible in Nash equilibrium of one-stage games. To understand which payoff values are possible, let us introduce the min-max payoff [22,23].

Definition 10 (Min-max payoff)
The min-max payoff for player i is defined as u i D min s i max s i u i .s i ; s i /.
That is, the min-max payoff is the lowest payoff that some player i can receive, provided that all opponents will choose a strategy to minimize the payoff of i and i will choose the best response to such strategy to maximize his or her payoff. In the case of the forwarder's dilemma, this corresponds to the payoff .0; 0/, earned when both players play D. Any payoff greater or equal than the min-max is possible to obtain by a Nash equilibrium strategy profile in an infinite-horizon repeated game. In Figure 6, the min-max payoff in the infinite-horizon forwarder's dilemma is shown with thick lines. The gray area represents all feasible payoffs by Nash equilibrium strategies [22,23].
Provided that ı is high enough, then any payoff in the feasible area can be obtained by a Nash equilibrium strategy profile.

Theorem 1 (Folk theorem)
For every feasible payoff profile u 2 ¹u D .u 1 , u 2 , : : :, u jPj W u i > u i ; 8i 2 P¯, there exists a discounting factor ı < 1 such that for all ı 2 ı; 1OE, there is a Nash equilibrium profile with payoffs u . From all the possible outcomes in equilibrium, the ones with the highest payoff, that is, the ones obtained by a Pareto efficient equilibrium profile, are the most desired from the network point of view. An algorithm could be developed that mimics the behavior of such a Pareto efficient equilibrium strategy. For instance, in the case of the repeated forwarder's dilemma, an algorithm could be developed that leads every player to cooperate at every stage, as long as the opponent also cooperates. If it happens that the opponent defects, then the harmed player can punish the defecting opponent by not forwarding its packets for a certain amount of stages. The punishment needs to last for enough stages in order to make the punishment severe enough to deter any deviations. That is, it has to be clear for the alleged defecting opponent that defecting will not be more productive. The general idea is to model the problem in question and seek for equilibrium profiles with the highest possible payoffs. With such knowledge, an algorithm can then be developed that will mimic the behavior of the equilibrium strategy profile.
The example of the forwarder's dilemma represents a game where players have clear conflicting objectives. This could be applied, for instance, to border routers that belong to different autonomous networks with selfish interests. However, game theory can also be used in settings where the conflict arises in certain situations only or when there is a lack of coordination between players, as will be shown in the next section through an example.
Before proceeding with the example of the next section, refer to the following list for a summary of all the important concepts introduced in this section: finite-horizon and infinite-horizon games. Nash equilibrium in finite and infinite-horizon games. the importance of the discounted factor in the Nash equilibrium of infinite-horizon games. min-max payoffs in infinite-horizon games. feasible payoffs of infinite-horizon games and the folk theorem.

ALGORITHM DEVELOPMENT IN A COMMUNICATION NETWORK CONTEXT
As aforementioned, devices/players in a network do not need to have persistent conflicting interests for game theory to be useful as a means to develop an algorithm. In order to demonstrate how the presented game theory can be applied to communication networks, a simplified model from [12] will be shown in this section. In this example, several wireless routers, deployed by a service provider, have the objective of forwarding as much traffic as possible, while avoiding wastage of resources.

Fiber-wireless access network scenario
Fiber-wireless access networks use a mixture of optical and wireless technologies to provide Internet access to users. They are composed of two sections: an optical back end section, which brings fiber from the central office to near the users, and a wireless front end section, which provides wireless Internet access to the users. Here, it is considered that the wireless front end is composed of wireless routers in a mesh topology, as shown in Figure 7. Some of those wireless routers are gateways responsible for the frontier between the optical and wireless environments. A user willing to send/receive traffic to/from the Internet can connect to the nearest wireless router or gateway.
Traffic may need to travel through several hops in the wireless mesh section, and as such, one of the key issues to address is the allocation of resources throughout the mesh section in order to serve all users in a fair manner. At the wireless section, every wireless router/gateway wants to send/receive traffic belonging to its users to/from the Internet, through the optical section. Like in [12], it is assumed that tree structures are already formed as a result from the path selection carried out by a routing algorithm. Such tree formation is exemplified in Figure 7, where the established connections are shown. The set of all wireless routers will be denoted by W. Because a tree structure is used, C i is used to represent all descents of a wireless router i 2 W, while the set of ancestors is denoted by i . Every wireless router will forward traffic belonging to its directly connected users, both downstream and upstream traffic. Besides traffic belonging to its directly connected users, every wireless router i 2 W also forwards downstream and upstream traffic belonging to users connected to wireless routers in C i . The question is how much bandwidth should every wireless router allocate for traffic belonging to its own users and how much should be allocated for the traffic belonging to users connected to wireless routers in C i . In this model, every wireless router and gateway is a player with the following objectives: Forward as much traffic as possible, either belonging to its directly connected users or belonging to users connected to C i . Have the least possible amount of packet drops. Such packet losses lead to unfruitful use of resources, because these packets may have traveled through several hops and consumed resources from the wireless routers along the hops.
Resuming, every wireless router, which was deployed by a service provider, wants to assure the best quality of service possible and to use resources in a useful manner.
The amount of bandwidth that users connected to a wireless router, or gateway, i 2 W need to send/receive their traffic at time t is represented by B i .t /. Such wireless router will then dedicate the amount B i .t / of bandwidth for this traffic. Also, every node i 2 W will dedicate B i;j .t / of bandwidth to downstream and upstream traffic belonging to users connected to every wireless router The actual bandwidth that a wireless router/gateway i will have available for traffic belonging to its users, equal to the minimum bandwidth made available to it throughout all wireless routers/gateway in the route to the optical link, is denoted by B A i .t /. That is: Considering the objectives of the wireless routers/gateways, the stage payoff of every player can be calculated by the following expression: where Note that the drop component has a logarithmic nature, meaning that if only a few packets are dropped, then the drop component will be close to 0. On the other hand, if too many drops occur, then the drop component value will be too negative, Á i << 0. In computer networks, the reasoning behind this is that a few packet drops can be recoverable, while too many packets being dropped may end up in unrecoverable service failures. In Transmission Control Protocol/Internet Protocol (TCP/IP) networks, for example, a low number of packet losses can easily be solved by the fast retransmission mechanisms that TCP offers, while a high number of packet losses will cause TCP to enter into slow start [24]. The expression (7) calculates only the payoff of one stage. The discounted payoff that every player receives in the infinite/horizon repeated game is calculated with the discounted payoff function in (3), which uses expression (7) for the payoff of every stage. As for the strategies, they decide the values for B i and B i;j ; 8j 2 C i to be used at every stage, according to the history of values B i and B i;j ; 8j 2 C i chosen in previous stages.

Nash equilibrium, Pareto efficiency and algorithm development
As already explained in Sections 2.3 and 4.2, Nash equilibrium represents the expected outcome of the game. From the network point of view, the most desired outcome would be a Pareto efficient outcome, where all wireless routers forward as much traffic as possible with as few packet drops as possible.
In a one-stage version of the game described, the Nash equilibrium is for every wireless router i 2 W to set B i and B i;j ; 8j 2 C i , with values near zero. That is, because it is not known beforehand what will be the bandwidth needs of the other wireless routers in i and in C i . This way, routers can rest assured that they will not receive a 1 payoff because of the logarithmic nature of Á i . However, if the bandwidth needs were known beforehand, then every router i 2 W could set B i and B i;j ; 8j 2 C i to higher values, that is, as close to B i and B j ; 8j 2 C i , as possible. Note that in the forwarder's dilemma case, having players communicate or coordinate their preferences does not lead to a Pareto efficient Nash equilibrium because players still feel compelled to lie and deviate by dropping the packet of the opponent. In the game presented in this section, however, players have no incentives to lie to their opponents because they will receive a higher payoff if they do not lie or deviate. In game theory literature, this type of game is called a coordination game. That is, players have incentives to cooperate, as long as they can coordinate their actions. In these games, strategies where players communicate their preferences in order to coordinate their actions are Nash equilibrium and Pareto efficient [25,26].
In coordinated repeated games, players try to coordinate their actions at every stage. In games where the set of actions is small and well known, such coordination can be reached by having players randomize their actions at every stage until the desired coordination is reached, and once coordination is reached, players will keep using the coordinated actions. Another alternative, which reaches immediate coordination, is to have players communicate their preferences at every stage [25]. In the example shown here, the bandwidth needs of the users connected to every wireless router i, B i .t /, may change from stage to stage, and as such, the actions, B i and B i;j ; 8j 2 C i , need to change at every stage in order for coordination to be possible. As such, randomizing actions will most likely not lead to coordination. If players communicate their current preferences at every stage, then their actions can be coordinated. This is, in fact, a Pareto efficient Nash equilibrium in coordinated repeated games [25]. Note that in the forwarder's dilemma, having players communicate their preferences at every stage will not be beneficial to achieve a Pareto efficient equilibrium. For instance, if players in the forwarder's dilemma communicated that they prefer to cooperate, then they would still fell compelled to lie and defect in order to receive the highest payoff, unless the discounting factor is high enough to deter deviations. In our coordination game, however, players have no incentive to lie about their needs, because lying would result in a lower payoff.
As shown in this game, players do not have intrinsically conflicting interests. Instead, players want to coordinate their actions. However, if traffic congestion is too high, then some wireless routers will end up having less bandwidth than what is needed. That is, a conflict of interest arises because there is a shortage of bandwidth to cover all requests. Let us consider the high traffic congestion scenario in Fig. 8, where each wireless connection has a capacity of 100 Mbps. In this situation, wireless router x is using all its bandwidth capacity to forward traffic belonging to its users and traffic belonging to users connected to wireless routers y and´. That is, B x D 20, B x;y D 50, and B x;´D 30. However, wireless router w cannot forward that much traffic because w also needs 20 Mbps for its users and may choose to dedicate its resources the following way: B w D 20, B w;x D 15, B w;y D 40, and B w;´D 25. As a result, x will be reserving more bandwidth for itself, y and , than will be actually used, leaving x with Á x << 0 due to unfruitful utilization of resources. Wireless router x, to safeguard itself, can lower B x;y or B x;´, or both, to increase Á x and its own payoff. This would leave y and´with smaller payoffs because of Á y << 0 and Á´<< 0. These two nodes, y and´, have now two options: (i) each node lowers its dedicated bandwidth, B y and B ´, respectively and (ii) they lie to x, by inflating their requests, in order to match as much as possible B A y to B y and B Á to B ´. Note that according to the developed model, x does not care which B x;y or B x;´i s reduced, because B A because that will lead to a greater Á x and payoff § . This alternative would lead to a fairer bandwidth allocation among wireless routers. Thus, the model can be adjusted according to the objectives. The discussed model, the equilibrium strategy with communication of preferences and detection of over demanding wireless routers, provides the ground for the development of an allocation algorithm. This algorithm could include the following steps: § For any˛andˇ, if˛>ˇand 1 has to be subtracted from either˛orˇ, then .˛ 1/ˇ>˛.ˇ 1/.
Step 1: communication of the bandwidth needs among all wireless routers and gateway.
Step 3: decision, by every wireless router/gateway i 2 W, on the values B i and B i;j ; 8j 2 C i , with the objective of increasing as much as possible the payoff at every stage. This represents only an example of a possible algorithm for the bandwidth allocation problem in fiber/wireless access networks. Note that the concepts explained here can be applied to other problems related to any of the layers of the Open Systems Interconnection (OSI) model. Moreover, other types of games, such as coalition games or network formation games, can also be used to model existing problems and develop algorithms. For instance, in [16], a coalition game is used to develop a mechanism where unmanned aerial vehicles collect messages from certain data sources, scattered throughout a field, to be then delivered to a common receiver. In [27], a network formation game is used to develop an energy-efficient routing algorithm for the mesh front end of fiber/wireless access networks.
6. CONCLUSION The development of algorithms to solve existing problems in communication networks is a complex task. The purpose of this tutorial is to showcase how to use repeated game theory as a tool for algorithm development in communication networks. It starts by giving the basis of game theory and repeated game theory, including how their outcome can be predicted through Nash equilibrium and how certain Nash equilibria can be possible in infinite-horizon repeated games while not being possible in one-stage games. With the basis introduced, an example is then given where a model is developed and necessary conditions are devised for the existence of Nash equilibrium where all players cooperate, which can be used as a basis for the development of an algorithm.