“A Game of Thrones”: When Human Behavior Models Compete in

“A Game of Thrones”: When Human Behavior Models

Compete in Repeated Stackelberg Security Games

Debarun Kar, Fei Fang, Francesco Delle Fave

∗

, Nicole Sintov, Milind Tambe

University of Southern California, Los Angeles, CA, 90089

∗

Disney Research, Boston, MA, 02142

{dkar,feifang,sintov,tambe}@usc.edu,

∗

francesco.dellefave@disneyresearch.com

ABSTRACT

Several competing human behavior models have been proposed

to model and protect against boundedly rational adversaries in re-

peated Stackelberg security games (SSGs). However, these existing

models fail to address three main issues which are extremely detri-

mental to defender performance. First, while they attempt to learn

adversary behavior models from adversaries’ past actions (“attacks

on targets”), they fail to take into account adversaries’ future adap-

tation based on successes or failures of these past actions. Second,

they assume that sufﬁcient data in the initial rounds will lead to

a reliable model of the adversary. However, our analysis reveals

that the issue is not the amount of data, but that there just is not

enough of the attack surface exposed to the adversary to learn a

reliable model. Third, current leading approaches have failed to in-

clude probability weighting functions, even though it is well known

that human beings’ weighting of probability is typically nonlin-

ear. The ﬁrst contribution of this paper is a new human behavior

model, SHARP, which mitigates these three limitations as follows:

(i) SHARP reasons based on success or failure of the adversary’s

past actions on exposed portions of the attack surface to model ad-

versary adaptiveness; (ii) SHARP reasons about similarity between

exposed and unexposed areas of the attack surface, and also in-

corporates a discounting parameter to mitigate adversary’s lack of

exposure to enough of the attack surface; and (iii) SHARP inte-

grates a non-linear probability weighting function to capture the

adversary’s true weighting of probability.

Our second contribution is a ﬁrst “longitudinal study” – at least

in the context of SSGs – of competing models in settings involving

repeated interaction between the attacker and the defender. This

study, where each experiment lasted a period of multiple weeks

with individual sets of human subjects, illustrates the strengths and

weaknesses of different models and shows the advantages of SHARP.

Categories and Subject Descriptors

I.2.11 [Distributed Artiﬁcial Intelligence]: Multiagent systems

General Terms

Algorithms, Experimentation, Human Factors, Security, Performance

Keywords

Game Theory, Repeated Stackelberg Games, Human Behavior

Appears in: Proceedings of the 14th International

Conference on Autonomous Agents and Multiagent

Systems (AAMAS 2015), Bordini, Elkind, Weiss, Yolum

(eds.), May 4–8, 2015, Istanbul, Turkey.

 2015, International Foundation for Autonomous Agents

1. INTRODUCTION

Whereas previous real-world deployments of Stackelberg Secu-

rity Games (SSGs) to protect airports, ports or ﬂights have been

one-shot game models [32], recent work has focused on domains

involving repeated interactions between defenders and adversaries.

These domains include security of wildlife (repeated interactions

between rangers and poachers) [34], security of ﬁsheries (repeated

interactions between coast guard and illegal ﬁshermen) [12], for-

est protection or drug interdiction, and are modeled via repeated

SSGs. In a repeated SSG model, the defender periodically deploys

new patrol strategies (in “rounds” of the game) and the adversary

observes these strategies and acts accordingly.

Research in repeated SSGs has produced different approaches to

address uncertainty in key dimensions of the game such as payoff

uncertainty (but assuming a perfectly rational adversary) [4, 20, 23]

or uncertainty in adversary behavior model (but assuming known

payoffs) [12, 34]. Our work follows the second approach, learning

a model of boundedly rational adversaries with known adversary

payoffs, as (arguably) it provides a better ﬁt for domains of inter-

est in this work. This is because (i) in real-world settings such as

wildlife or ﬁshery protection, it is feasible to model adversary pay-

offs via animal or ﬁsh density in different locations; and (ii) there

is signiﬁcant support in the literature for bounded rationality of hu-

man adversaries [35, 29].

Unfortunately, despite the promise of Bounded Rationality mod-

els in Repeated Stackelberg Games (henceforth referred to as BR-

RSG models), existing work in this area [12, 34] suffers from three

key limitations which are extremely detrimental to defender perfor-

mance. First, existing models reason about the adversary’s future

actions based on past actions taken but not the associated successes

and failures. Our analysis reveals that the adversary adapts in future

rounds based on past success and failure. Hence, failing to consider

an adaptive adversary leads to erroneous predictions about his fu-

ture behavior, and thus signiﬁcantly ﬂawed defender strategies.

Second, existing approaches for learning BR-RSG models as-

sume that enough data will be collected in the initial rounds to

learn a reliable adversary model. Our analysis reveals that the is-

sue is not the amount of data, but insufﬁcient exposure of attack

surface [14, 21] in initial rounds to gather sufﬁcient information

about adversary responses to various strategies and learn a reliable

model. Learning is biased towards the limited available informa-

tion and hence signiﬁcant losses are incurred by the defender until

enough of the right kind of data becomes available. This degraded

performance in initial rounds may have severe consequences for

three reasons: (i) In domains like wildlife crime or ﬁsheries protec-

tion, each round lasts for weeks or potentially months and so initial

round losses (if massive) could imply irrecoverable losses of re-

sources (e.g., animal populations). (ii) Following heavy losses, hu-

man defenders may lose conﬁdence in recommendations provided

by the game-theoretic approach. (iii) Given the nature of these do-

mains, re-initialization of the game may periodically be necessary

and thus initial rounds may be repeated; in domains such as wildlife

crime, re-initialization can stem from man-made or natural changes

in parks, e.g., changes in vegetation, water bodies, or possible de-

velopmental activities. The construction of an oil-reﬁnery [1] and

the simultaneous re-introduction of rhinos in the Murchison Falls

National Park in Uganda is an example. In addition, re-initializing

the game after a year or so would mean repeating the initial rounds

after four to ﬁve rounds, adding to the importance of addressing

initial rounds.

Finally, BR-RSG models have failed to include probability weight-

ing functions (how humans “perceive” probabilities), even though

it is well known that probability weighting curves for humans –

e.g., in prospect theory [33] – are typically nonlinear. In light of

this, we show that direct application of existing models such as

SUQR [29] which assume a linear probability model, provide re-

sults that would be extremely detrimental to defender performance.

The primary contribution of this paper is a new model called

SHARP (Stochastic Human behavior model with AttRactiveness

and Probability weighting) that mitigates these three limitations: (i)

Modeling the adversary’s adaptive decision making process, SHARP

reasons based on success or failure of the adversary’s past actions

on exposed portions of the attack surface. (ii) Addressing lim-

ited exposure to signiﬁcant portions of the attack surface in ini-

tial rounds, SHARP reasons about similarity between exposed and

unexposed areas of the attack surface, and also incorporates a dis-

counting parameter to mitigate adversary’s lack of exposure to enough

of the attack surface. (iii) Addressing shortcomings of probability

weighting functions, we incorporate a two parameter probability

weighting function in existing human behavior models.

Our second contribution is to provide evidence from the ﬁrst

“longitudinal study” of competing models in repeated SSGs. In

our study, a suite of well-established models and SHARP take the

battleﬁeld (much like the warring factions in “A Game of Thrones”

who are ﬁghting for supremacy) in an attempt to prove themselves

best in repeated SSGs. Our results show: (i) SHARP outperforms

existing approaches consistently over all rounds, most notably in

initial rounds. (ii) As discussed earlier, existing approaches per-

form poorly in initial rounds with some performing poorly through-

out all rounds. (iii) Surprisingly, simpler models which were orig-

inally proposed for single-shot games performed better than re-

cent advances which are geared speciﬁcally towards addressing re-

peated SSGs. Taken together, given the critical importance of the

repeated ‘initial rounds’ as discussed above, these results indicate

that SHARP should be the model of choice in repeated SSGs.

2. BACKGROUND

2.1 Background on SSGs

In an SSG, the defender plays the role of a leader who protects

a set of targets from the adversary, who acts as the follower [5,

30, 18]. The defender’s pure strategy is an assignment of a limited

number of security resources M to the set of targets T . An assign-

ment of a resource to a target is also referred to as covering a target.

A defender’s mixed-strategy x (0 ≤ x

≤ 1; ∀x

, i ∈ T ) is then

deﬁned as a probability distribution over the set of all possible pure

strategies. A pure strategy of an adversary is deﬁned as attacking

a single target. The adversary receives a reward R

for selecting i

if it is not covered and a penalty P

for selecting i if it is covered.

Similarly, the defender receives a reward R

for covering i if it is

selected by the adversary and penalty P

for not covering i if it is

selected. Then, utility for the defender for protecting target i while

playing mixed strategy x is:

(x) = x

+ (1 − x

(1)

Similarly, the utility for the adversary for attacking target i is:

(x) = (1 − x

+ x

(2)

Recent work has focused on modeling boundedly rational adver-

saries in SSGs, developing models discussed below.

Subjective Utility Quantal Response (SUQR): SUQR [29] builds

upon prior work on quantal response [25] according to which rather

than strictly maximizing utility, an adversary stochastically chooses

to attack targets, i.e., the adversary attacks a target with higher ex-

pected utility with a higher probability. SUQR proposes a new util-

ity function called Subjective Utility, which is a linear combination

of key features that are considered to be the most important in each

adversary decision-making step. Nguyen et al. [29] experimented

with three features: defender’s coverage probability, adversary’s

reward and penalty at each target. According to this model, the

probability that the adversary will attack target i is given by:

(ω|x) =

(x)

j∈T

(x)

(3)

where SU

(x) is the Subjective Utility of an adversary for attack-

ing target i when defender employs strategy x and is given by:

(x) = ω

+ ω

(4)

The vector ω = (ω

, ω

) encodes information about the adver-

sary’s behavior and each component of ω indicates the relative im-

portance the adversary gives to each attribute in the decision mak-

ing process. The weights are computed by performing Maximum

Likelihood Estimation (MLE) on available attack data.

Bayesian SUQR: SUQR assumes that there is a homogeneous pop-

ulation of adversaries, i.e., a single ω is used to represent an adver-

sary in [29]. However, in the real-world we face heterogeneous

populations. Therefore Bayesian SUQR is proposed to learn a par-

ticular value of ω for each attack [34]. Protection Assistant for

Wildlife Security (PAWS) is an application which was originally

created using Bayesian SUQR.

Robust SUQR: Robust SUQR [12] combines data-driven learning

and robust optimization to address settings where not enough data

is available to provide a reasonable hypothesis about the distribu-

tion of ω. It computes the worst-case expected utility over all previ-

ously seen SUQR models of the adversary and deploys the optimal

strategy against the adversary type that reduces the defender’s util-

ity the most. Robust SUQR has been applied to ﬁsheries protection

domain[12].

2.2 Probability Weighting Functions

Probability weighting functions model human perceptions of prob-

ability. Perhaps the most notable is the weighting function in nobel-

prize winning work on Prospect Theory [17, 33], which suggests

that people weigh probability non-uniformly, as shown in Fig. 1.

It indicates that people tend to overweigh low probabilities and un-

derweigh high probabilities. Other works in this domain propose

and experiment with parametric models which capture both inverse

S-shaped as well as S-shaped probability curves [2, 10] (Fig. 2).

3. RELATED WORK

We have already discussed related work in SSGs in the previous

section, including key behavioral models. Here we discuss addi-

tional related work:

Figure 1: Probability Weighting

Function (Prospect Theory)

Figure 2: Probability Weighting

Function (Gonzalez & Wu, 99)

Learning in repeated Stackelberg games: The problem of learn-

ing the adversary’s payoffs in an SSG by launching a minimum

number of games against a perfectly rational adversary is studied

in [20, 4]. Additionally, Marecki et al. [23] focused on optimiz-

ing the defender’s overall utility during the learning process when

faced with a perfectly rational adversary with unknown payoffs.

Robust strategies in repeated games: In cases where the oppo-

nent cannot be successfully modeled, McCracken et al. [24] pro-

posed techniques to generate -safe strategies which bound the loss

from a safe value by . Johanson et al. [16, 15] studied the prob-

lem of generating robust strategies in a repeated zero-sum game

while exploiting the tendency in the adversary’s decision making

and evaluated their technique in a game of two-player, Limit Texas

Hold’em. Recently, Ponsen et al. [31] proposed techniques to com-

pute robust best responses in partially observable stochastic games

using sampling methods.

All of the above work differs from ours in three ways: (i) They do

not model bounded rationality in human behavior; (ii) They do not

consider how humans weigh probabilities; and (ii) None of these

existing work address the important problem of signiﬁcant initial

round losses. Initial round losses is a critical problem in domains

such as wildlife security as explained above; requiring a funda-

mental shift at least in the learning paradigm for SSGs. In addition,

work on learning in SSGs differ because in our game, the payoffs

are known but we are faced with boundedly rational adversaries

whose parameters in their behavioral model are to be learned.

4. WILDLIFE POACHING GAME

We conducted longitudinal experiments

[22] with human sub-

jects to test the effectiveness of existing behavioral models and al-

gorithms against our proposed approach on repeated SSGs.

4.1 Game Overview

In our game, human subjects play the role of poachers looking to

place a snare to hunt a hippopotamus in a protected park. The game

interface is shown in Fig. 3. In the game, the portion of the park

shown in the map is divided into a 5*5 grid, i.e. 25 distinct cells.

Overlaid on the Google Maps view of the park is a heat-map, which

represents the rangers’ mixed strategy x — a cell i with higher cov-

erage probability x

is shown more in red, while a cell with lower

coverage probability is shown more in green. As the subjects play

the game, they are given the following detailed information: R

and x

for each target i. However, they do not know the pure

strategy that will be played by the rangers, which is drawn ran-

domly from mixed strategy x shown on the game interface. Thus,

we model the real-world situation that poachers have knowledge

of past pattern of ranger deployment but not the exact location of

ranger patrols when they set out to lay snares. In our game, there

Whereas “longitudinal study” is often used to describe research that spans years – in

which measurement occasions are conducted every X years – we use the term longitu-

dinal study because our study included 5 measurement points with a single population.

Figure 3: Game Interface for our simulated online repeated SSG (Re-

ward, penalty and coverage probability for a selected cell are shown)

were M = 9 rangers protecting this park, with each ranger protect-

ing one grid cell. Therefore, at any point in time, only 9 out of the

25 distinct regions in the park are protected.

In addition to animal density, which is strongly correlated with

high-risk areas of poaching [28, 27, 11], distance is another im-

portant factor in poaching, e.g., recent snare-density studies have

found that signiﬁcant poaching happens within 5 kilometers of South

Africa’s Kruger National Park border [19]. Therefore, the reward

obtained by a poacher in successfully placing a snare at target i

is calculated by discounting the animal density by a factor of the

distance traveled and is calculated as follows:

= int(φ

− ζ ∗

max

)

) (5)

Here, φ

and D

refer to the animal density at target i and the dis-

tance to target i from the poacher’s starting location respectively.

int(y) rounds the value y to the closest integer value. The param-

eter ζ is the importance given to the distance factor in the reward

computation and may vary based on the domain. When the poacher

successfully poaches, he may thus obtain a reward that is less than

the animal density (Eqn. 5), but the defender loses a value equal to

that of the animal density, i.e., the game is non-zero-sum.

4.2 Experimental Procedures

We recruited human subjects on Amazon Mechanical Turk (AMT).

We ﬁrst primed participants with a background story about the

hardships of a poacher’s life. To enhance understanding of the

game, participants played two trial games, one validation game,

and ﬁnally the actual game. Data from subjects who played the

validation game incorrectly were discarded.

We tested a set of behavioral models introduced in Section 2.1

by deploying the mixed strategy generated based on each of these

models repeatedly over a set of ﬁve rounds. For each model, we

recruited a new set of participants to eliminate any learning bias.

Due to unavailability of data, the strategy shown for each ﬁrst round

was Maximin. We then learned the model parameters based on

previous rounds’ data, recomputed and redeployed strategies, and

asked the same players to play again in the subsequent rounds. For

each model, all ﬁve rounds were deployed over a span of weeks.

4.3 Online Longitudinal Experiments

Longitudinal studies on AMT are rare at AAMAS (in fact we are

not aware of any); and certainly none have been conducted in the

context of SSGs. Indeed, while the total time of engagement over

our 20 experimental settings was 46 weeks, each setting required on

average 2.3 weeks, a duration typically not seen in related research

at AAMAS (Table 1). Therefore this section highlights our method-

(a) ADS

(b) ADS

(d) ADS

Figure 4: Animal density structures

Table 1: Experiment Details

Average

time taken

per model

per payoff

structure

(all 5

rounds)

Average time

taken for a set of

participants to

play each round

Number of

participants

who played

round 1

(minimum,

maximum)

Average

number of

participants

who played

all 5 rounds

Average

retention

rate from

round 2 to

round 5

2.3 weeks 4 days (42 , 49) 38 83.69%

ological contributions to conduct such experiments. Speciﬁcally,

challenges of longitudinal studies on AMT include: (i) minimizing

attrition and maximizing retention, particularly for our study which

required ﬁve separate measurement occasions; (ii) setting up a pay-

ment scheme to maximize participant retention across ﬁve rounds;

and (iii) lengthy rounds due to participants’ long latencies (in some

instances forgetfulness) in responding to notiﬁcations.

To mitigate the above challenges, we took the following steps: (i)

To be eligible for initial study enrollment, participants had to com-

mit to completing all ﬁve rounds, i.e, remain in the study through

completion. (ii) We allowed respondents sufﬁcient time to respond

[26], as giving them an immediate deadline to ﬁnish a particular

task can result in high attrition rates. (iii) We maintained persis-

tent contact by sending repeated reminders [6]. (iv) We set up

the payment scheme to consistently reward participation in each

round plus offering a relatively high completion incentive at the

end of the experiment. Speciﬁcally, each participant was paid a

‘base compensation’ of $0.50 for each round and a relatively high

‘completion bonus’ of $2.50 for completing all the rounds. In ad-

dition, to motivate the participants to play the game competitively,

we also gave incentives, called ‘performance bonus’, based on the

reward/penalty of the region they chose to attack. On average, each

participant was paid $7.60 in total.

4.4 Payoff Structures

The payoff structures used in our human subject experiments

vary in terms of the animal densities and hence the adversary re-

wards. We henceforth refer to payoff structures and animal density

structures interchangeably in this paper. The total number of ani-

mals in all the payoffs we generate is the same (= 96). However,

the variation in these payoffs is in the way the animals are spread

out in the park. In payoff 1, the animal density is concentrated to-

wards the center of the park, whereas the animal density is higher

towards the edges of the park in payoff structure 2. These represent

scenarios that might happen in the real world. The animal density

for both payoffs is symmetric, thus eliminating any bias due to the

participant’s starting point in the game.

Contrary to the above, animals in the park may be randomly scat-

tered without any particular orientation. So, we randomly generate

two additional animal density structures (payoffs 3 and 4) and test

our proposed model on these payoffs. To generate a random struc-

ture, one out of 25 cells was chosen uniformly at random and then

an animal was allocated to it until the total number of animals in

all the cells was 96, keeping in mind that if a cell total reached 10

(maximum animal density in a cell), that cell was not chosen again.

Figs. 4(a)– 4(d) show heatmaps of four animal density structures,

denoted as ADS

, ADS

and ADS

respectively.

5. SHARP: PROBABILITY WEIGHTING

This paper contributes a novel human behavior model called SHARP

for BR-RSG settings. SHARP has three key novelties, of which

probability weighting is the ﬁrst. The need for probability weight-

ing became apparent after our initial experiments. In particular, ini-

tially following up on the approach used in previous work [29, 36,

34, 12], we applied MLE to learn the weights of the SUQR model

based on data collected from our human subject experiments and

found that the weights on coverage probability were positive for

all the experiments. That is, counter-intuitively humans were mod-

eled as being attracted to cells with high coverage probability, even

though they were not attacking targets with very high coverage but

they were going after targets with moderate to very low coverage

probability. Examples of the learned weights for SUQR from data

collected from the ﬁrst round deployment of the game for 48 human

subjects on ADS

and ADS

are: (ω

, ω

)=(2.876, -0.186,

0.3) and (ω

, ω

)=(1.435, -0.21, 0.3). We prove a theorem

(Theorem 5.1) to show that, when the weight on the coverage prob-

ability in the SUQR model (ω

) is found to be positive, the optimal

defender strategy is a pure strategy. The proof of the theorem can

be found in the online appendix

THEOREM 5.1. When ω

> 0, the optimal defender strategy is

a pure strategy.

Employing a pure strategy means that there will be no uncertainty

about the defender’s presence. Several cells will always be left

unprotected and in those cells, the attackers will always succeed.

In our example domains, even if the top-valued cells are covered

by a pure strategy, we can show that such a strategy would lead

to signiﬁcantly worse defender expected utility than what results

from the simplest of our defender mixed strategies deployed. For

example, if cells of value 4 are left unprotected, the defender ex-

pected value will be -4, which is much lower than what we achieve

even with Maximin. In repeated SSG domains like wildlife crime,

this would mean that the poachers successfully kill animals in each

round without any uncertainty of capture by rangers.

We hypothesize that this counter-intuitive result of a model with

> 0 may be because the SUQR model may not be consider-

ing people’s actual weighting of probability. SUQR assumes that

people weigh probabilities of events in a linear fashion, while ex-

isting work on probability weighting (Sec. 2.2) suggest otherwise.

To address this issue, we augment the Subjective Utility function

with a two-parameter probability weighting function (Eqn. 6) pro-

posed by Gonzalez and Wu [10], that can be either inverse S-shaped

(concave near probability zero and convex near probability one) or

S-shaped.

f(p) =

δp

+ (1 − p)

(6)

The SU of an adversary denoted by ‘a’ can then be computed as:

(x) = ω

f(x

) + ω

+ ω

(7)

where f (x

) for coverage probability x

is computed as per Eqn.

6. The two parameters δ and γ control the elevation and curvature

http://onlineappendixrsg.weebly.com/

Table 2: Performance (Squared Errors) of various feature sets

Eqn. 7 Eqn. 8 Eqn. 9 Eqn. 10

P-SUQR ADS

Algorithm 1 0.1965 0.2031 0.1985 0.1025

P-SUQR ADS

Algorithm 1 0.2065 0.2156 0.2625 0.1136

of the function respectively. γ < 1 results in an inverse S-shaped

curve while γ > 1 results in an S-shaped curve. We will hence-

forth refer to this as the PSU (Probability weighted Subjective Util-

ity) function and the models (SUQR, Bayesian SUQR and Robust

SUQR) augmented with PSU will be referred to as P-SUQR, P-

BSUQR and P-RSUQR respectively. Our SHARP model will also

use PSU. We will use these PSU-based models in our experiments.

One of our key ﬁndings, based on experiments with the PSU

function is that the curve representing human weights for probabil-

ity is S-shaped in nature, and not inverse S-shaped as prospect the-

ory suggests. The S-shaped curve indicates that people would over-

weigh high probabilities and underweigh low to medium probabil-

ities. Some learned curves are shown in Sec. 9.2. Recent studies

[3, 13, 9] have also found S-shaped probability curves which con-

tradict the inverse S-shaped observation of prospect theory. Given

S-shaped probability weighting functions, the learned ω

was neg-

ative as it accurately captured the trend that a signiﬁcantly higher

number of people were attacking targets with low to medium cov-

erage probabilities and not attacking high coverage targets.

Feature Selection and Weight Learning: In Sec. 4.1, we intro-

duced a new feature – distance – that affected the reward and hence

the obvious question for us was to investigate the effect of this new

feature in predicting adversary behavior. We considered several

variations of PSU with different combinations of features. In addi-

tion to Eqn. 7, three more are discussed below (Eqns. 8,9,10).

(x) = ω

f(x

) + ω

+ ω

(8)

(x) = ω

f(x

) + ω

+ ω

(9)

(x) = ω

f(x

) + ω

+ ω

(10)

To compare these variations, we need to learn the behavioral

parameters for each of the variations (e.g, for Eqn. 10, a 6-tuple

b =< δ, γ, ω

, ω

> is to be learned; δ and γ due to in-

clusion of Eqn. 6) from available data and evaluate their effective-

ness in predicting the attack probability. To learn the behavioral

parameters b from available data, we propose an algorithm based

on Repeated Random Sub-sampling Validation (Algorithm 1 – see

online appendix

). For SUQR, we learn a single b, while for P-

BSUQR and P-RSUQR we learn a set of b ∈ B for each attack.

We divided the ﬁrst round data for the experiment with P-SUQR

on ADS

into 10 random train-test splits. We then computed the

sum of squared errors (SE) of predicting the attack probability for

each of the test splits and for each of the feature combinations us-

ing the weights learned by Algorithm 1. Results in Table 2 show

that Eqn. 10 achieves the least SE and it is statistically signiﬁcant

(with two-tailed t-tests at conﬁdence=0.05).

This shows that we can achieve higher accuracy in modeling by

generalizing the subjective utility form used in [29] that relied on

just three parameters, by adding more features as shown in Eqn.

10. This opens the door to novel subjective utility functions for

different domains that exploit different domain features.

Based on our experiments, in addition to ω

< 0, we found that

> 0, ω

< 0 and ω

< 0. The rest of the formulations in this

paper will be based on these observations about the feature weights.

6. SHARP: ADAPTIVE UTILITY MODEL

A second major innovation in SHARP is the adaptive nature of

the adversary and addressing the issue of attack surface exposure.

(a) Maximin ADS

(b) P-RSUQR ADS

Figure 5: Evidence for adaptivity of attackers

First, we deﬁne key concepts, present evidence from our experi-

ments, and then present SHARP’s innovations.

Deﬁnition The attack surface α is deﬁned as the n-dimensional

space of the features used to model adversary behavior. Formally,

α =< F

, F

, ..., F

> for features F

(∀j; 1 ≤ j ≤ n).

For example, as per the PSU model in Eqn. 10, this would mean the

space represented by the following four features: coverage proba-

bility, animal density, adversary penalty and distance from the start-

ing location.

Deﬁnition A target proﬁle β

∈ α is deﬁned as a point on the

attack surface α and can be associated with a target. Formally,

=< F

, F

, ..., F

> denotes the kth target proﬁle on the

attack surface.

In our example domain, the kth target proﬁle can be represented as

=< x

, φ

, P

, D

>, where x

, φ

, P

and D

de-

note values for coverage probability, animal density, attacker penalty

and distance from starting location respectively

. For example,

<0.25, 2, -1, 4> is the target proﬁle associated with the top-leftmost

cell of ADS

in round 1. Exposing the adversary to a lot of differ-

ent target proﬁles would therefore mean exposing the adversary to

more of the attack surface and gathering valuable information about

their behavior. While a particular target location, deﬁned as a dis-

tinct cell in the 2-d space, can only be associated with one target

proﬁle in a particular round, more than one target may be associ-

ated with the same target proﬁle in the same round. β

denotes that

target proﬁle β

is associated with target i in a particular round.

6.1 Observations and Evidence

Below are two observations from our human subjects data, based

on the above concepts, that reveal interesting trends in attacker be-

havior in repeated SSGs.

OBSERVATION 1. Adversaries who have succeeded in attack-

ing a target associated with a particular target proﬁle in one round,

tend to attack a target with ‘similar’ target proﬁles in next round.

OBSERVATION 2. Adversaries who have failed in attacking a

target associated with a particular target proﬁle in one round, tend

not to attack a target with ‘similar’ target proﬁles in the next round.

In order to provide evidence in support of Observations 1 and 2,

we show results from our data highlighting these trends on ADS

in Figs. 5(a) - 5(b). Results of other models on ADS

and ADS

can be found in the online appendix

. In each plot, the y-axis de-

notes the percentage of (i) attacks on similar targets out of the total

successful attacks in the previous round (ζ

) and (ii) attacks on

similar targets out of the total failed attacks in the previous round

(ζ

). The x-axis denotes pairs of rounds for which we are comput-

ing the percentages, for example, in R12, 1 corresponds to round

In our experiments, φ

> 0, P

< 0 and D

> 0

(r − 1) and 2 means round r in our claim. Thus, ζ

correspond-

ing to R23 in ADS

is 80%, meaning that out of all the people

who succeeded in round 2, 80% attacked similar target proﬁles in

round 3. Similarly, ζ

corresponding to R23 in ADS

is 33.43%,

meaning that out of all people who failed in round 2, 33.43% at-

tacked similar target proﬁles in round 3. All statistical signiﬁcance

results reported below are on two-tailed t-tests at conﬁdence=0.05.

The average (over all four models on two payoffs and for all round

pairs) of ζ

is 75.2% and the average of ζ

which is 52.45%. This

difference is statistically signiﬁcant, thus supporting Observation 1

and Observation 2.

These observations are also consistent with the “spillover effect”

in psychology [8], which in our case suggests that an adversary

will tend to associate properties of unexposed target proﬁles with

knowledge about similar target proﬁles to which he has been ex-

posed, where similarity is expressed in terms of the Euclidean dis-

tance between two points on the attack surface. Smaller distance in-

dicates higher similarity. The above aspects of adversary behavior

currently remain unaccounted for, in BR-RSG models. Based on

observations above, we deﬁne two key properties below to capture

the consequences of past successes and failures on the adversary’s

behavior and reason based on them.

Deﬁnition The vulnerability associated with a target proﬁle β

which was shown to the adversary in round r, denoted V

, is de-

ﬁned as a function of the total number of successes and failures on

the concerned target proﬁle in that round (denoted by success

and failure

respectively). This is shown in Eqn. 11:

success

− failure

success

+ failure

(11)

Therefore, more successful attacks and few failures on a target

proﬁle indicate that it was highly vulnerable in that round. Because

multiple targets can be associated with the same target proﬁle and

the pure strategy generated based on the mixed strategy x in a par-

ticular round may result in a defender being present at some of

these targets while not at others, there may be both successes and

failures associated with the same target proﬁle in that round.

Deﬁnition The attractiveness of a target proﬁle β

at the end of

round R, denoted A

, is deﬁned as a function of the vulnerabilities

for β

from round 1 to round R. It is computed using Eq. 12.

r=1

(12)

Therefore, we model the attractiveness of a target proﬁle as the

average of the Vulnerabilities for that target proﬁle over all the

rounds till round R. This is consistent with the notion that a tar-

get proﬁle which has led to more successful attacks over several

rounds will be perceived as more attractive by the adversary.

6.2 SHARP’s Utility Computation

Existing models (such as SUQR, which is based on the subjective

utility function (Eqn. 4)) only consider the adversary’s actions from

round (r − 1) to predict their actions in round r. However, based

on our observations (Obs. 1 & 2), it is clear that the adversary’s

actions in a particular round are dependent on his past successes

and failures. The adaptive probability weighted subjective utility

function proposed in Eq. 13 captures this adaptive nature of the ad-

versary’s behavior by capturing the shifting trends in attractiveness

of various target proﬁles over rounds.

ASU

= (1 − d ∗ A

)ω

f(x

) + (1 + d ∗ A

)ω

+(1 + d ∗ A

)ω

+ (1 − d ∗ A

)ω

(13)

There are three main parts to SHARP’s computation: (i) Adapt-

ing the subjective utility based on past successes and failures on

exposed parts of the attack surface; (ii) Discounting to handle situ-

ations where not enough attack surface has been exposed; and (ii)

Reasoning about similarity of unexposed portions of the attack sur-

face based on other exposed parts of the attack surface.

The intuition behind the adaptive portion of this model is that,

the subjective utility of target proﬁles which are highly attractive to

the adversary should be increased, and that of less attractive target

proﬁles should be decreased, to model the adversary’s future de-

cision making. Hence, for a highly attractive target proﬁle β

, the

attacker would view the coverage x

and distance from starting

location D

to be of much lower value, but the animal density φ

to be of higher value, as compared to the actual values. The contri-

bution of the penalty term would also increase the utility (recall that

< 0 and ω

< 0). Taking an example from our game, for a tar-

get proﬁle β

=< 0.25, 2, −1, 4 > which had A

= 1 after round

1, and the weights learned were b =< δ, γ, ω

, ω

> =<

2.2, 2.4, −3, 0.9, −0.3, −0.5 >, P-SUQR would compute the sub-

jective utility as -0.29, while (assuming d (explained later) = 0.25,

for example) SHARP’s adaptive utility function would compute the

subjective utility as 0.855. In comparison to the original subjective

utility function, this function is adaptive due to the positive or nega-

tive boosting of model weights based on the defender’s knowledge

about the adversary’s past experiences. Here, learning the model

parameters b has been decoupled from boosting the model param-

eters for future prediction to ensure simplicity in terms of both the

model formulation as well the weight learning process. Through

an example in Sec. 8, we show the effect of this design decision on

the defender mixed strategy generated.

Now we turn to the next aspect of SHARP’s utility computa-

tion. Recall the problem that the defender does not necessarily have

information about the attacker’s preferences for enough of the at-

tack surface in the initial rounds. This is because, the attacker is

exposed to only a limited set of target proﬁles in each round and

the defender progressively gathers knowledge about the attacker’s

preferences for only those target proﬁles. We provide evidence in

support of this observation in Sec. 9.3.

The parameter d (0 ≤ d ≤ 1) in Eqn. 13 mitigates this attack

surface exposure problem. It is a discounting parameter which is

based on a measure of the amount of attack surface exposed. d is

low in the initial rounds when the defender does not have enough of

the right kind of data, but would gradually increase as more infor-

mation about the attacker’s preferences about various regions of the

attack surface become available. For our experiments, we varied d

based on Eqn. 14:

d =

− r

(14)

where N

is the total number of rounds and r is the round whose

data is under consideration. For example, N

= 5 and r = 1

for data collected in round 1 of an experiment conducted over ﬁve

rounds. The intuition behind this formulation is that, as more rounds

are played, more information about the adversary’ preferences for a

lot of the attack surface will be available and hence d will increase

from a very small value gradually as rounds progress.

Finally, we look at how we reason about unexposed portions of

the attack surface based on the exposed areas. If a target proﬁle

was not exposed to attacker response in round r, the defender

will not be able to compute the vulnerability V

. Therefore, we

will also not be able to estimate the attractiveness for β

and hence

the optimal defender strategy. So, in keeping with our analysis on

available data and based on the spillover effect introduced earlier,

we use the distance-weighted k-nearest neighbors algorithm [7] to

obtain the Vulnerability V

of an unexposed target proﬁle β

round r, based on the k most similar target proﬁles which were

exposed to the attacker in round r (Eqns. 15 and 16).

i=1

∗ V

i=1

(15)

≡

d(β

, β

)

(16)

where, d(β

, β

) denotes the Euclidean distance between β

and

in the feature space. We use k = 5 for our experiments.

7. GENERATING DEFENDER STRATEGIES

AGAINST SHARP

While SHARP provides an adversary model, we must now gen-

erate defender strategies against such a model. To that end, we ﬁrst

learn the parameters of SHARP from available data (See Sec. 5).

We then generate future round strategies against the boundedly ra-

tional adversary characterized by the learned model parameters by

solving the following optimization problem:

max

x∈X

i∈T

(x) q

(ω | x)

(17)

(ω|x) is the probability that the adversary will attack target i in

round R and is calculated based on the following equation:

(ω|x) =

ASU

(x)

i∈T

ASU

(x)

(18)

denotes that target proﬁle β

is associated with target i. ASU

and U

(x) are calculated according to Eqns. 13 and 1 respectively.

To solve the non-linear and non-convex optimization problem in

Eqn. 17 and generate the optimal defender strategy, we use PASAQ

[37] as it provides an efﬁcient approximated computation of the

defender strategy with near-optimal solution quality.

8. SHARP IN ACTION: AN EXAMPLE

Here we give an example to show the effectiveness of SHARP

in terms of the design of each component. Figs. 6(a), 6(b) and

6(c) show second round strategies generated by SHARP with dis-

counting of Eqn. 14 but without Attractiveness, SHARP without

discounting, i.e., d = 1 but with Attractiveness, and SHARP, based

on parameters learned (b =< δ, γ, ω

, ω

>= <1.2, 1.6,

-3.2791, 0.1952, -0.3, -0.8388>) from ﬁrst round data collected

for the experiment with SHARP on ADS

. The strategy gener-

ated by SHARP with discounting but without Attractiveness can be

easily exploited due to several unprotected cells with positive ani-

mal density. SHARP without discounting but with Attractiveness

generates a comparatively more robust strategy than SHARP with

discounting but without Attractiveness due to its adaptive utility

function and similarity learning mechanism. SHARP generates the

best strategy due to its capability to model all the design parameters

together into one single framework.

9. RESULTS WITH HUMAN SUBJECTS

9.1 Defender Utilities

In Figs. 7(a)- 7(d) we show actual defender utilities obtained

over 5 rounds for P-SUQR, P-BSUQR, P-RSUQR, SHARP and

(a) SHARP with

discounting but

without

Attractiveness

(b) SHARP without

discounting but

with

Attractiveness

Figure 6: Round 2 strategies generated by SHARP (with dis-

counting without Attractiveness), SHARP (no discounting but

with Attractiveness) and SHARP respectively.

Maximin on ADS

, ADS

and ADS

respectively, with

an average of 38 human subjects playing per round. In the plots, y-

axis corresponds to defender utility and the models tested for each

round is shown on the x-axis. For example, in Fig. 7(b), P-SUQR

performs worst in round 2 with an utility of -5.26. In Fig. 7(b), we

also show (inset) zoomed in results of the second round to high-

light the difference in performance between Maximin (= -0.18) and

SHARP (= -0.1). First round utilities for all models are same as

Maximin strategy was played due to absence of data. All signif-

icance results reported below are computed with bootstrap t-test.

Following are key observations from our experiments.

Heavy initial round losses: For all models except SHARP, there

is statistically signiﬁcant (p=0.05) loss in defender utility as com-

pared to Maximin in second round. P-SUQR recovers from initial

round losses and outperforms Maximin in rounds 3, 4 and 5 for

ADS

(statistically signiﬁcant at p=0.05), and in round 4 (statis-

tically signiﬁcant at p=0.15) and round 5 for ADS

. P-RSUQR,

which is a robust model, also outperforms Maximin in rounds 4 and

5 (statistically signiﬁcant at p=0.05) for ADS

after initial round

losses. Surprisingly, P-BSUQR, which is the basis for wildlife se-

curity application PAWS, performs worst on both payoffs over all

rounds. Figs. 7(e)- 7(h) show cumulative defender utility over ﬁve

rounds on ADS

, ADS

and ADS

respectively. Observe

that it takes ﬁve rounds for P-SUQR to recover from initial round

losses and outperform Maximin in terms of cumulative defender

utility for ADS

(Fig. 7(e)). None of the other models recover

from the initial round losses on either payoffs in ﬁve rounds, thus

highlighting the impact initial round losses have on model perfor-

mance for a long period of time.

Performance of SHARP against other models: SHARP consis-

tently outperforms (statistically signiﬁcant at p=0.05) all the mod-

els over all rounds (Figs. 7(a)- 7(d)), most notably in initial rounds

(round 2) and ends up with signiﬁcantly high cumulative utility at

the end of all rounds (Figs. 7(e)- 7(h)).

Performance of SHARP (with and without discounting): To test

the effectiveness of the design decisions in SHARP, we considered

SHARP both with and without discounting. SHARP with d = 1

is compared against SHARP and P-SUQR on ADS

in Fig. 8(a).

SHARP(d = 1) outperforms P-SUQR (statistically signiﬁcant at

p=0.05) because it captures the adaptive nature of the adversary.

However, it performs worse than SHARP (statistically signiﬁcant

at p=0.01) as SHARP also trusts the data less when we don’t have

enough information about the adversary’s responses to most of the

attack surface; in this case the initial rounds.

Therefore, our results on extensive human subjects experiments

on repeated SSGs show SHARP’s ability to perform well through-

out, including the important initial rounds.

(a) Results on ADS

(b) Results on ADS

(d) Results on ADS

(e) Results on ADS

(f) Results on ADS

(g) Results on ADS

(h) Results on ADS

Figure 7: (a), (b), (c) and (d): Defender utilities for various

models on ADS

, ADS

and ADS

respectively; (e),

(f), (g) and (h): Cumulative defender utilities for various mod-

els on ADS

, ADS

and ADS

respectively.

9.2 Learned Probability Curves

Fig. 8(b) shows human perceptions of probability in rounds 1 to

4 when the participants were exposed to P-SUQR based strategies

on ADS

. Learned curves from P-SUQR and SHARP on all pay-

offs have this S-shaped nature (See online appendix

), showing that

even though there is little change in the curvature between rounds,

it retains the same S-shape throughout all rounds. The curves in-

dicate that people weigh high probabilities to be higher and low to

medium probabilities to be lower than the actual values.

9.3 Attack surface exposure

In our repeated SSG, the only variation in terms of feature values

for our model (Eqn. 13) from round to round is the mixed strat-

egy x and hence the coverage x

at each target. Hence, exposure

to various regions of the attack surface means exposure to various

values of x

for ﬁxed values of the other model parameters. Fig.

9(a) shows how the adversary was exposed to more unique values

of the coverage probability, and hence attack surface, over the ﬁve

rounds for ADS

. We discretize the range of x

, i.e. [0,1] into 10

intervals (x-axis) and show the total number of unique coverages

exposed till a particular round (y-axis) for each interval. Observe

(a) Results on ADS

(b) Probability curves from

rounds 1 to 4

Figure 8: (a) Comparison of defender utilities between P-

SUQR, SHARP and SHARP(d=1) on ADS

; (b) Learned prob-

ability curves for P-SUQR on ADS

(a) For ADS

Figure 9: Total number of unique exposed target proﬁles till

the end of each round for each coverage probability interval

for ADS

that more interval ranges and more unique coverage probabilities

get exposed in rounds 3 to 5. As we showed in Fig. 7(a), the de-

fender performance for P-SUQR improves signiﬁcantly in rounds

4 and 5. Similar plot for ADS

is shown in the online appendix

10. CONCLUSION

This paper provides two major contributions that are critical for

important domains such as protection of wildlife, ﬁsh and forests.

First, it introduces a novel human behavior model called SHARP

for repeated SSG settings. SHARP has three major novelties: (i)

It models the adversary’s adaptive decision making process by rea-

soning based on success or failure of the adversary’s past actions

on exposed portions of the attack surface. (ii) It accounts for lack

of information about the adversary’s preferences due to insufﬁcient

exposure to attack surface by reasoning about similarity between

exposed and unexposed areas of the attack surface, and also in-

corporating a conﬁdence based discounting parameter to model the

learner’s trust in the available data. (iii) It integrates a non-linear

probability weighting function.

Second, we conducted the ﬁrst “longitudinal study” of compet-

ing models in repeated SSGs to test the performance of SHARP

along with existing approaches. Our results show that: (i) Human

perceptions of probability are S-shaped, contradicting the inverse

S-shaped observation from prospect theory. (ii) Existing human

behavior models and algorithms perform poorly in initial rounds

of repeated SSGs. (iii) SHARP performs signiﬁcantly better than

existing approaches consistently over all the rounds, most notably

in the initial rounds, earning it the “iron throne” in our game of

competing models in repeated SSGs.

11. ACKNOWLEDGMENTS

This research was supported by MURI Grant W911NF-11-1-03.

REFERENCES

[1] Ugandans fear curse of oil wealth as it threatens to blight

’pearl of africa’.

http://www.theguardian.com/world/2013/

dec/29/ugandans-oil-blight-pearl-africa.

Accessed: November 8, 2014.

[2] M. Abdellaoui, O. Lâ

ZHaridon, and H. Zank. Separating

curvature and elevation: A parametric probability weighting

function. Journal of Risk and Uncertainty, 41(1):39–65,

2010.

[3] Y. Alarie and G. Dionne. Lottery decisions and probability

weighting function. Journal of Risk and Uncertainty,

22(1):21–33, 2001.

[4] A. Blum, N. Haghtalab, and A. Procaccia. Learning optimal

commitment to overcome insecurity. In Proceedings of the

28th Annual Conference on Neural Information Processing

Systems (NIPS), 2014.

[5] V. Conitzer and T. Sandholm. Computing the optimal

strategy to commit to. In Proceedings of the 7th ACM

Conference on Electronic Commerce, EC ’06, pages 82–90,

2006.

[6] R. B. Cotter, J. D. Burke, M. Stouthamer-Loeber, and

R. Loeber. Contacting participants for follow-up: how much

effort is required to retain participants in longitudinal

studies? Evaluation and Program Planning, 28(1):15–21,

2005.

[7] S. A. Dudani. The distance-weighted k-nearest-neighbor

rule. Systems, Man and Cybernetics, IEEE Transactions on,

SMC-6(4):325–327, April 1976.

[8] J. Elster. A plea for mechanisms. Social mechanisms: an

analytical approach to social theory.

[9] N. Etchart-Vincent. Probability weighting and the level and

spacing of outcomes: An experimental study over losses.

Journal of Risk and Uncertainty, 39(1):45–63, 2009.

[10] R. Gonzalez and G. Wu. On the shape of the probability

weighting function. Cognitive psychology - Vol 38, pages

129–166, 1999.

[11] M. Hamisi. Identiﬁcation and mapping risk areas for zebra

poaching: A case of Tarangire National Park, Tanzania.

Thesis, ITC, 2008.

[12] W. Haskell, D. Kar, F. Fang, M. Tambe, S. Cheung, and

E. Denicola. Robust protection of ﬁsheries with compass. In

Innovative Applications of Artiﬁcial Intelligence (IAAI),

2014.

[13] S. J. Humphrey and A. Verschoor. The probability weighting

function: experimental evidence from Uganda, India and

Ethiopia. Economics Letters, 84(3):419–425, September

2004.

[14] S. Jajodia, A. K. Ghosh, V. Swarup, C. Wang, and X. S.

Wang. Moving Target Defense: Creating Asymmetric

Uncertainty for Cyber Threats. Springer Publishing

Company, Incorporated, 1st edition, 2011.

[15] M. Johanson and M. Bowling. Data biased robust counter

strategies. In Proceedings of the Twelfth International

Conference on Artiﬁcial Intelligence and Statistics

(AISTATS), 2009.

[16] M. Johanson, M. Zinkevich, and M. Bowling. Computing

robust counter-strategies. In Proceedings of the Annual

Conference on Neural Information Processing Systems

(NIPS), 2007.

[17] D. Kahneman and A. Tversky. Prospect Theory: An Analysis

of Decision under Risk. Econometrica, 47(2):263–91, 1979.

[18] D. Korzhyk, V. Conitzer, and R. Parr. Complexity of

computing optimal stackelberg strategies in security resource

allocation games. In Proceedings of the National Conference

on Artiﬁcial Intelligence (AAAI), pages 805–810, 2010.

[19] A. M. Lemieux. Situational Crime Prevention of Poaching

(Crime Science Series). Routledge, 2014.

[20] J. Letchford, V. Conitzer, and K. Munagala. Learning and

approximating the optimal strategy to commit to. In

Proceedings of the 2Nd International Symposium on

Algorithmic Game Theory, SAGT ’09, pages 250–262,

Berlin, Heidelberg, 2009. Springer-Verlag.

[21] P. K. Manadhata and J. M. Wing. An attack surface metric.

Software Engineering, IEEE Transactions on,

37(3):371–386, 2011.

[22] A. Mao, D. Parkes, Y. Chen, A. D. Procaccia, K. Z. Gajos,

and H. Zhang. Turkserver: Enabling synchronous and

longitudinal online experiments. In AAAI HCOMP

Workshop, 2012.

[23] J. Marecki, G. Tesauro, and R. Segal. Playing repeated

stackelberg games with unknown opponents. In AAMAS,

pages 821–828, 2012.

[24] P. McCracken and M. Bowling. Safe strategies for agent

modelling in games. In Proceedings of the National

Conference on Artiﬁcial Intelligence (AAAI), 2004.

[25] D. McFadden. Quantal choice analysis: A survey. Annals of

Economic and Social Measurement, 5(4):363–390, 1976.

[26] S. W. Menard. Handbook of longitudinal research: Design,

measurement, and analysis. Academic Press, 2008.

[27] M. Montesh. Rhino poaching: A new form of organised

crime. Technical report, College of Law Research and

Innovation Committee of the University of South Africa,

2013.

[28] W. Moreto. To Conserve and Protect: Examining Law

Enforcement Ranger Culture and Operations in Queen

Elizabeth National Park, Uganda. Thesis, Rutgers, 2013.

[29] T. H. Nguyen, R. Yang, A. Azaria, S. Kraus, and M. Tambe.

Analyzing the effectiveness of adversary modeling in

security games. In AAAI, 2013.

[30] P. Paruchuri, J. P. Pearce, J. Marecki, M. Tambe, F. Ordonez,

and S. Kraus. Playing games for security: An efﬁcient exact

algorithm for solving bayesian stackelberg games. In

Proceedings of the 7th International Joint Conference on

Autonomous Agents and Multiagent Systems - Volume 2,

AAMAS, pages 895–902, 2008.

[31] M. Ponsen, S. D. Jong, and M. Lanctot. Computing

approximate nash equilibria and robust best-responses using

sampling. J. Artif. Intell. Res. (JAIR), 2011.

[32] M. Tambe. Security and Game Theory: Algorithms,

Deployed Systems, Lessons Learned. Cambridge University

Press, New York, NY, 2011.

[33] A. Tversky and D. Kahneman. Advances in prospect theory:

Cumulative representation of uncertainty. Journal of Risk

and Uncertainty, 5(4):297–323, 1992.

[34] R. Yang, B. Ford, M. Tambe, and A. Lemieux. Adaptive

resource allocation for wildlife protection against illegal

poachers. In International Conference on Autonomous

Agents and Multiagent Systems (AAMAS), 2014.

[35] R. Yang, C. Kiekintveld, F. Ordonez, M. Tambe, and

R. John. Improving resource allocation strategy against

human adversaries in security games. In IJCAI, 2011.

[36] R. Yang, C. Kiekintveld, F. Ordonez, M. Tambe, and

R. John. Improving resource allocation strategies against

human adversaries in security games: An extended study.

Artif. Intell., 195:440–469, 2013.

[37] R. Yang, F. Ordonez, and M. Tambe. Computing optimal

strategy against quantal response in security games. In

Proceedings of the 11th International Conference on

Autonomous Agents and Multiagent Systems - Volume 2,

AAMAS ’12, pages 847–854, 2012.