Poisoning Attacks on Federated Learning-based Wireless Traffic

Poisoning Attacks on Federated Learning-based

Wireless Trafﬁc Prediction

Zifan Zhang

∗

, Minghong Fang

†

, Jiayuan Huang

∗

, Yuchen Liu

∗

North Carolina State University, USA,

†

University of Louisville, USA

Abstract—Federated Learning (FL) offers a distributed frame-

work to train a global control model across multiple base

stations without compromising the privacy of their local network

data. This makes it ideal for applications like wireless trafﬁc

prediction (WTP), which plays a crucial role in optimizing

network resources, enabling proactive trafﬁc ﬂow management,

and enhancing the reliability of downstream communication-

aided applications, such as IoT devices, autonomous vehicles,

and industrial automation systems. Despite its promise, the

security aspects of FL-based distributed wireless systems, partic-

ularly in regression-based WTP problems, remain inadequately

investigated. In this paper, we introduce a novel fake trafﬁc

injection (FTI) attack, designed to undermine the FL-based WTP

system by injecting fabricated trafﬁc distributions with minimal

knowledge. We further propose a defense mechanism, termed

global-local inconsistency detection (GLID), which strategically

removes abnormal model parameters that deviate beyond a

speciﬁc percentile range estimated through statistical methods in

each dimension. Extensive experimental evaluations, performed

on real-world wireless trafﬁc datasets, demonstrate that both our

attack and defense strategies signiﬁcantly outperform existing

baselines.

Index Terms—Poisoning attacks, wireless trafﬁc prediction,

federated learning, injection attack.

I. INTRODUCTION

Federated learning (FL) represents an evolving paradigm

in distributed machine learning techniques, allowing a uniﬁed

model to be trained across numerous devices containing local

data samples, all without the need to transmit these samples to

a central server. This innovative framework empowers training

on diverse datasets characterized by heterogeneous distribu-

tions, offering substantial advantages in the current landscape

of big data. In practical applications, FL has found widespread

use in addressing real-world challenges, particularly in envi-

ronments dealing with sensitive or personal data, including

the Internet of Things (IoT) [1], [2], edge computing [3], and

health informatics [4], [5].

In the realm of wireless networks, FL leverages its dis-

tributed nature to facilitate multiple network services, includ-

ing wireless trafﬁc prediction (WTP). With the exponential

growth in the number of connected devices and the ever-

increasing demand for data-intensive applications like stream-

ing, online gaming, and IoT services, predicting wireless trafﬁc

accurately becomes vital for ensuring network reliability and

efﬁciency. By forecasting network load on a temporal basis,

service providers can dynamically allocate resources, reducing

the risk of congestion and ensuring a high Quality of Service

(QoS) for users [6]–[8]. Furthermore, accurate trafﬁc predic-

tions enable operators to strategically plan network expansions

and efﬁciently upgrade infrastructure, resulting in cost savings

and enhanced network performance. Particularly, in the era of

5G and beyond, where technologies like network slicing and

edge computing play crucial roles, WTP becomes essential

for optimizing these advancements, which not only enhances

user experience but also facilitates the provision of innovative

services that demand high bandwidth and low latency. To

implement WTP, while centralized methods exist [9], [10], FL-

based solution stands out by utilizing training data distributed

across diverse edge nodes. This approach enhances the gen-

eration of precise and timely predictions concerning network

trafﬁc [11]. Despite FL’s potential in accuracy, efﬁciency, and

privacy preservation, its integration into WTP is not devoid

of challenges. Notably, Byzantine attacks, particularly model

poisoning attacks, pose signiﬁcant threats to the effectiveness

and trustworthiness of FL-based WTP systems [12].

In a model poisoning attack, malicious network entities

introduce adversarial modiﬁcations to the model parameters

during training process of WTP. This tampering results in a

compromised global model when aggregated at the central

network controller, subsequently producing incorrect trafﬁc

predictions. Such inaccuracies lead to the risk of network

inefﬁciencies and even severe service disruptions, especially

in real-time applications like autonomous driving systems. In

more extreme scenarios, these attacks may serve as gateways

to further malicious network intrusions, instigating broader

security and privacy concerns as illustrated in [13], [14].

The grave implications of model poisoning attacks underscore

the pressing need for robust security measures to ensure the

integrity, reliability, and resilience of FL-based WTP systems

against Byzantine failures, thereby safeguarding the overarch-

ing network infrastructure and the services reliant on it. While

most existing FL algorithms and their associated security

strategies are typically assessed within the context of classiﬁ-

cation problems [15], [16], scant attention has been paid to the

regression problems, as observed in examined WTP scenarios,

introducing distinct challenges related to data distribution,

model complexity, and evaluation metrics. The distinction

between data manipulation strategies in regression and clas-

siﬁcation problems, as well as their detection methodologies,

underscores the nuanced challenges in safeguarding machine

learning models against attacks. For instance, in regression-

based WTP problems, attackers typically target the model’s

continuous output by altering the distribution or magnitude of

input time-series data, with the goal of steering predictions in a

speciﬁc direction. This differs from classiﬁcation tasks, where

arXiv:2404.14389v1 [cs.NI] 22 Apr 2024

the manipulation revolves around modifying input features

to induce misclassiﬁcation without noticeably changing the

input’s appearance to human observers.

To bridge this gap, we make the ﬁrst attempt to introduce a

novel attack centered on injecting fake base station (BS) trafﬁc

into wireless networks. Existing model poisoning attacks have

predominantly depended on additional access knowledge and

direct intrusions on BSs [12], [15], [17]. However, in practical

cellular network systems, BSs have exhibited a commendable

level of resilience against attacks, making the extraction of

training data from them a challenging endeavor. In contrast,

the cost of deploying fake BSs that mimic their behaviors

is comparatively lower than the resources required for com-

promising authentic ones [18]. This assumption asserts that

these compromised BSs lack insight into the training data

and only have access to the initial and current global models,

aligning with the practical settings studied in [18]. Importantly,

other information, such as data aggregation rules and model

parameters from benign BSs, remains inaccessible to these

compromised BSs. Within the FL framework, the global

model is aggregated based on the model parameters of BSs

in each iterative round, encompassing both benign and fake

BSs. Consequently, our threat model envisions a minimum-

knowledge scenario for an adversary. To this end, we propose

Fake Trafﬁc Injection (FTI), a methodology designed to create

undetectable fake BSs with minimal prior knowledge, where

each fake BS employs both its initial model and current

global information to determine the optimizing trajectory of

the FL process on WTP. These malicious participants aim

to subtly align the global model towards an outcome that

undermines the integrity and reliability of the data learning

process. Numerous numerical experiments are conducted to

validate that our FTI demonstrates efﬁcacy across various

state-of-the-art aggregation rules, outperforming other model

poisoning attacks in terms of vulnerability impacts.

On the contrary, we propose an innovative defensive strat-

egy known as Global-Local Inconsistency Detection (GLID),

aimed at neutralizing the effects of model poisoning attacks

on WTP. This defense scheme involves strategically removing

abnormal model parameters that deviate beyond a speciﬁc

percentile range estimated through statistical methods in each

dimension. Such an adaptive approach allows us to trim

varying numbers of malicious model parameters instead of

a ﬁxed quantity [19]. Next, a weighted mean mechanism is

employed to update the global model parameter, subsequently

disseminated back to each BS. Our extensive evaluations,

conducted on real-world datasets, demonstrate that the pro-

posed defensive mechanism substantially mitigates the impact

of model poisoning attacks on WTP, thereby showcasing a

promising avenue for securing FL-based WTP systems against

Byzantine attacks.

The contribution of this work is summarized in three folds:

1) We present a novel model poisoning attack, employing

fake BSs for trafﬁc injection into FL-based WTP sys-

tems under a minimum-knowledge scenario.

2) Conversely, we propose an effective defense strategy

designed against various model poisoning attacks, which

dynamically trims an adaptive number of model param-

eters by leveraging the percentile estimation technique.

3) Lastly, we evaluate both the proposed poisoning attack

and the defensive mechanism using real-world trafﬁc

datasets from Milan City, where the results demonstrate

that the FTI attack indeed compromises FL-based WTP

systems, and the proposed defensive strategy proves

notably more effective than other baseline approaches.

II. RELATED WORKS AND PRELIMINARIES

A. FL-based WTP

Consider a wireless trafﬁc forecasting system that employs

FL and incorporates a central server located in a macro-

cell station along with n small-cell BSs (e.g., gNB). Ev-

ery BS i ∈ [n] possesses its own private training dataset

= {u

, u

, . . . , u

}, where M represents the total count

of time intervals, and u

denotes the trafﬁc load on BS

i during the m-th interval, with m ∈ [M]. To delineate

a prediction model, we construct a series of input-output

pairs {a

, b

}

j=1

. Here, each a

represents a historical sub-

set of trafﬁc data that correlates to its associated output

= {u

m−1

, . . . , u

m−r

, u

m−ω1

, . . . , u

m−ωs

}. The parameters

r and s serve as sliding windows capturing immediate tem-

poral dependencies and cyclical patterns, respectively. Further-

more, ω encapsulates inherent periodicities within the network,

potentially driven by diurnal user patterns or systematic service

demands. Given the importance of real-time responsiveness in

wireless networks, our prediction model is designed for a one-

step-ahead forecast. To be speciﬁc, for the i-th BS, we seek

to predict the trafﬁc load

based on the historical trafﬁc data

and model parameter θ as

= f(a

, θ), where f (·) is the

regression function.

In a FL-based WTP system, the objective is to minimize

prediction errors across n BSs. This can be formulated as

the following optimization problem to determine the optimal

global model θ

∗

in the central server:

∗

= arg min

i=1

j=1

F (f(a

, θ), b

), (1)

where F is the quadratic loss, i.e., F (f(a

, θ), b

) =



f(a

, θ) − b



. Eq. (1) can be resolved in a distributed

fashion based on FL with the following three steps in each

global training round t.

• Step I (Synchronization). The central server sends the

current global model θ

to all BSs.

• Step II (Local model training). Each BS i ∈ [n] utilizes

its private time-series training data along with the current

global model to reﬁne its own local model, then transmits

the updated local model θ

back to the server.

• Step III (Local models aggregation). The central server

leverages the aggregation rule (AR) to merge the n re-

ceived local models and subsequently updates the global

model as follows:

t+1

= AR{θ

, θ

, . . . , θ

}. (2)

The commonly used aggregation rule is the FedAvg [20],

where the server simply averages the received n local

models from distributed BSs, i.e., AR{θ

, θ

, . . . , θ

} =

i=1

B. Byzantine-robust Aggregation Rules

In non-adversarial scenarios, the server aggregates the re-

ceived local model updates by straightforwardly averaging

them [20]. Nevertheless, recent research [21] has revealed

that this averaging-based aggregation method is susceptible

to poisoning attacks, where a single malicious BS can ma-

nipulate the ﬁnal aggregated outcome without constraints.

To counteract such potential threats, various Byzantine-robust

aggregation rules have been suggested [19], [21]–[27]. For

instance, in the Krum method [21], each client’s update is

scored based on the sum of Euclidean distances to other

clients’ updates. The global update is then updated by selecting

the update from the client (i.e., BS) with the minimum score.

In a Median aggregation scheme [19], the server calculates the

median value for each dimension using all the local model

updates. In the FLTrust [22], it is assumed that the server

possesses a validation dataset. The server maintains a model

derived from this dataset. To determine trust levels, the server

computes the cosine similarity between its model update and

the update of each BS. These scores are then used to weigh

the contribution of each BS to the ﬁnal aggregated model.

C. Poisoning Attacks to FL-based Systems

The decentralized nature of FL makes our considered prob-

lem susceptible to Byzantine attacks [12], [15], [16], [18],

[28], [29], where attackers with control over malicious BSs

can compromise the FL-based WTP system. Malicious BSs

can corrupt their local training trafﬁc data or alter their local

models directly. For instance, in the Trim attack [15], the at-

tacker intentionally manipulates the local models on malicious

BSs to cause a signiﬁcant deviation between the aggregated

model after attack and the one before attack. In the Model

Poisoning Attack based on Fake clients (MPAF) attack [18],

each malicious BS ﬁrst multiplies the global model update

synchronized from the central server by a negative scaling

factor and subsequently transmits these scaled model updates

to the server. In the Random attack [15], every malicious BS

randomly generates a vector from a Gaussian distribution and

transmits it to the server. Recently, [12] introduced poisoning

attacks for FL-based WTP systems, where the attacker controls

some deployed BSs, each with its own local training data.

These malicious BSs ﬁne-tune their local models using their

respective training data. Subsequently, the attacker scales the

local model updates on malicious BSs by applying a scaling

factor and sends the scaled model updates to the server.

However, existing attacks suffer from the practical imple-

mentation limitations. For instance, the attack described in [12]

is not feasible because it is based on the unrealistic assumption

that an attacker can readily take control of authentic BSs. In

reality, it is highly challenging for an attacker to gain such

inﬂuence over existing, authentic BSs. In the MPAF attack,

which has a simpler threat model, the model updates from

fake clients are exaggerated by a factor such as 10

. This

approach is impractical because the central server can easily

identify these excessive updates as anomalies and discard

them. By contrast, our proposed poisoning attack involves

carefully crafting model updates on fake BSs by addressing a

parametric optimization problem. This ensures that the server

is unable to differentiate these fake updates from benign ones,

allowing the attacker to simultaneously breach the integrity of

the system without detection.

Data Privacy Protection

FTI

Bi-directional

Communication

Benign BS 1

Fake BS 5

Server

Benign BS 2 Benign BS 3 Benign BS 4

Local Model Transmission

Global Model Transmission

GLID Secure

Aggregation

Secure

Aggregation

Fig. 1: Framework of Security Protection in FL-based WTP.

III. THREAT MODEL TO FL-BASED WTP SYSTEMS

In this section, we present a novel model poisoning attack,

employing fake BSs for trafﬁc injection into FL-based WTP

systems under a minimum-knowledge scenario.

A. Attacker’s Goal

The attacker’s primary goal in compromising the integrity

of the FL-based WTP system is to degrade the ﬁnal global

model’s performance. This degradation directly impacts the

accuracy of real-time trafﬁc predictions, which is a critical

aspect of network management and resource allocation. In

practical cellular systems, inaccurate trafﬁc predictions can

lead to network congestion, poor quality of service, and inefﬁ-

cient use of resources, thereby causing substantial operational

challenges for network providers. This disruption not only

affects service providers but also has a cascading effect on

end-users who rely on consistent and reliable network services.

B. Attacker’s Capability

The attacker achieves this objective by introducing fake BSs

into the targeted FL-based WTP system, as shown in Fig. 1.

These fake BSs, which could be simple network devices,

mimic the trafﬁc processing behaviors of benign BSs with

minimal effort and expense. Unlike the methods proposed

in [12] which involve compromising genuine BSs, the use of

fake BSs is far more feasible in real-world contexts. Creating

fake BSs with open-source projects or emulators [18], [30]–

[32] is a low-cost approach that can be executed without the

need for sophisticated hacking skills or deep access to the net-

work infrastructure. This approach is particularly viable given

the heightened security measures in modern networks, which

make compromising genuine BSs increasingly challenging.

C. Attacker’s Knowledge

The attacker’s minimal knowledge about the targeted FL-

based WTP system signiﬁcantly increases the difﬁculty of

executing the attack. In many real-world systems, gaining

detailed insights into the central server’s aggregation rules or

acquiring information about benign BSs is highly challenging

due to stringent security protocols and encryption. Therefore,

an attack strategy that requires limited knowledge is not only

more realistic but also more likely to get undetected. The

fake BSs’ operation, which is limited to receiving the global

information and sending malicious updates, can be executed

with basic technical skills, further lowering the barrier to entry

for potential attackers. This aspect opens the door to a broader

range of network adversaries, including those with limited

technical expertise or computing resources.

D. Fake Trafﬁc Injection Attack

The proposed Algorithm 1, referred to as the Fake Trafﬁc

Injection (FTI), outlines a Byzantine model poisoning attack

strategy designed to manipulate the prediction accuracy of an

FL-based WTP system under the aforementioned assumptions.

Central to the FTI attack is an iterative process where each

iteration involves a thorough examination of current global

model θ

and base model

θ. For each fake BS i, a malicious lo-

cal model θ

is constructed by combining the global model θ

and a base model

θ in a weighted manner (Line 5). Following

the creation of θ

, it evaluates its divergence from the global

model using the Euclidean norm (Line 7). The algorithm then

checks for an increase in this distance relative to the prior

measurement (Line 8). If the distance has increased, indicating

that the malicious local model θ

from some BS is diverging

further from the global model θ

in the central server, the

value of η is adjusted upwards. Conversely, if no increase in

distance is observed, η is adjusted downwards. The adjustment

of η is done in half-steps of its initial value (Lines 8-12). In

other words, the value of η indicates the severity of poisoning

attacks, measuring their impact or intensity.

To this end, the algorithm involves guiding the global

model to align more closely with a predeﬁned base model

in each round. Speciﬁcally, during the t-th round, fake BSs

calculate the direction of local model updates, determined by

the difference between current global model and base model,

denoted as H =

θ − θ

. Moving towards this direction

indicates that the global model is becoming more similar

to the base model. A simple approach to acquire the local

model of fake BS involves multiplying H by a scaling factor

η. However, this direct method produces sub-optimal attack

performance. Suppose n is the number of benign BSs, and the

attacker wants to inject m fake BSs into the network system.

We propose a method for calculating θ

for each fake BS

i ∈ [n + 1, n + m]:

= η

θ + (1 − η)θ

. (3)

Optimal Points

Fig. 2: Optimal value of η over communication round of R in

Algorithm 1.

Algorithm 1 Fake Trafﬁc Injection (FTI)

Require: Current global model θ

, base model

θ, n benign

BSs, m fake BSs, η

Ensure: Fake models θ

, i ∈ [n + 1, n + m]

1: step ← η

2: PreDist ← −1

3: for r = 1, 2, . . . , R do

4: for each fake BS i do

5: θ

← η

θ − (η − 1)θ

6: end for

7: Dist ← ∥θ

− θ

∥

8: if PreDist < Dist then

9: η ← η +

step

10: else

11: η ← η −

step

12: end if

13: step ←

step

14: PreDist ← Dist

15: end for

16: return θ

, i ∈ [n + 1, n + m]

In such cases, an attacker tends to choose a higher value for

η to ensure the sustained effectiveness of the attack, as shown

in Fig. 2 with an initial η of 10. This holds true even after the

server consolidates the manipulated local updates from fake

BSs with legitimate updates from benign BSs.

IV. GLOBAL-LOCAL INCONSISTENCY DETECTION

The defense against model poisoning attacks on the FL-

based WTP system relies on an aggregation protocol designed

to identify malicious BSs. This protocol, named the Global-

local Inconsistency Detection (GLID) method, is detailed in

Algorithm 2. In each global round t, GLID primarily scruti-

nizes the anomalies present in each dimension of the model

parameters θ

, aiding in the identiﬁcation of any potentially

malicious entities, where i ∈ [1, n + m], and n+m is the total

number of BSs in the system. Such a robust and versatile

nature allows the network to adapt to various operational

contexts without requiring intricate similarity assessments as

in other existing works, like FLTrust [22].

Speciﬁcally, GLID approach enhances the detection of po-

tential malicious activities within the network by employing

percentile-based trimming on each dimension of the model

parameters. To establish an effective percentile pair for iden-

tifying abnormalities, four statistical methods can be adopted:

Standard Deviation (SD), Interquartile Range (IQR), Z-scores,

and One-class Support Vector Machine (One-class SVM).

Suppose the total count of dimensions of model parameter

is D, then for the default SD method, the percentile pair for

each dimension d can be calculated as follow:

percentile pair



− k · σ



, g



+ k · σ



, (4)

where

is the mean of the d-th dimension across all models

in the t-th global training round, σ

is the standard deviation of

the d-th dimension, and k is a predeﬁned constant dictating the

sensitivity of outlier detection. g(·) is the interpolation function

based on standard deviation bound to estimate percentile pairs,

shown as follows:

g(x) =



P (x) − 0.5

n + m



× 100, (5)

where P (x) is the position of x in the sorted dataset. We

use k = 3 for general purposes. Given that different tasks

may require varied percentile bounds, a precise estimation

method is crucial for generalizing our defense strategy. The

detailed percentile estimation methods can be found later in

this section. In the FL-based WTP system, model parameters

in the d-th dimension exceeding these percentile limits are

ﬂagged as malicious, and their weights α

are assigned as 0.

The other benign values in this dimension are aggregated using

a weighted average rule, where the weights α

d,i

are inversely

proportional to the absolute deviation of each value θ

d,i

from

the mean

, and normalized by the standard deviation σ

. It

can be represented as follows:

d,i



d,i

−



. (6)

These weights of the d-th dimension are then normalized and

applied to aggregate each BS’s local model θ

into a global

model θ

t+1

, which can be represented as follow in the view

of each dimension:

t+1

n+m

i=1

d,i

· θ

d,i

n+m

i=1

d,i

. (7)

Subsequently, the server broadcasts this aggregated global

model parameter θ

t+1

back to all BSs for synchronization.

There are three additional percentile estimation strategies

listed below. Based on the upper and lower bound computed

below, we can get a ﬁnal percentile estimation decision to

detect abnormal values in each dimension.

• Interquartile Range (IQR): The IQR method calculates

the range between the ﬁrst and third quartiles (25

and

percentiles) of the data, identifying outliers based

on this range. For each dimension d, the outlier bounds

are:

lower bound

d,IQR

= Q1

− k

IQR

· IQR

, (8)

upper bound

d,IQR

= Q3

+ k

IQR

· IQR

, (9)

Algorithm 2 Global-local Inconsistency Detection (GLID)

Require: Local models θ

, θ

, . . . , θ

n+m

, current global

model θ

, k

Ensure: Aggregated global model θ

t+1

1: for d = 1, 2, . . . , D do

←

n+m

i=1

d,i

3: σ

←

n+m

i=1

(θ

d,i

−

)

4: percentile

←



− k · σ



, g



+ k · σ



5: Identify malicious BSs based on percentile pairs

6: for each BS i do

7: if θ

d,i

is benign then

8: α

d,i

←

d,i

−

9: else

10: α

d,i

← 0

11: end if

12: end for

13: θ

t+1

←

n+m

i=1

d,i

·θ

d,i

n+m

i=1

d,i

14: end for

15: θ

t+1

←



t+1

, θ

t+1

, . . . , θ

t+1



16: return θ

t+1

where Q1

and Q3

are the ﬁrst and third quartiles, and

IQR

adjusts sensitivity.

• Z-scores: The Z-score method measures how many stan-

dard deviations a point is from the mean. For each

dimension d, the normal range bounds are:

lower bound

d,Z-score

= g



− k

· σ



, (10)

upper bound

d,Z-score

= g



+ k

· σ



, (11)

where k

is the number of standard deviations for the

normal range.

• One-Class SVM: One-Class SVM constructs a decision

boundary for anomaly detection. The decision function

for each dimension d is:

(β) = sign

i=1

· K(β

, β) − ρ

, (12)

where β

are the support vectors, γ

are the Lagrange

multipliers, K(·, ·) is the kernel function, and ρ is the

offset. A point β is an outlier if f

(β) < 0.

In essence, this defense mechanism is a strategic amalgama-

tion of direct statistical trimming and aggregation, targeting the

preservation of the global model’s integrity against poisoning

attacks. By accurately isolating and excluding malicious BSs

prior to model aggregation process, it signiﬁcantly diminishes

the likelihood of adversarial disruption in the FL framework.

Additionally, its capacity to accommodate various dimensions

and adapt to different inconsistency metrics and aggregation

protocols considerably extends its applicability across a broad

spectrum of distributed wireless network scenarios.

V. EVALUATIONS

In this section, we demonstrate the effectiveness of our FTI

poisoning attack and the GLID defense mechanism. Extensive

evaluation results are provided regarding the performance

metrics in multiple dimensions.

A. Experiment Setup

1) Datasets: We utilize the real-world datasets obtained

from Telecom Italia [33] to evaluate our proposed methods.

The wireless trafﬁc data in Milan is segmented into 10,000

grid cells, with each cell served by a BS covering an area

of approximately 235 meters on each side. Milan Dataset

contains three subset datasets, “Milan-Internet”, “Milan-SMS”

and “Milan-Calls”. These datasets capture different types

of wireless usage patterns, and we are mainly focusing on

“Milan-Internet”. Such comprehensive data collection enables

an in-depth analysis of urban telecommunication behavior.

2) Baseline Schemes: We evaluate various state-of-the-art

model poisoning attacks as comparison points to our proposed

FTI attack. Furthermore, we employ these baseline poisoning

attacks to highlight the effectiveness of our defense strategy

GLID.

• Trim attack [15]: It processes each key within a model

dictionary, computing and utilizing the extremes in a

designated dimension to determine a directed dimension,

where model parameters are selectively zeroed or retained

to inﬂuence the model behavior.

• History attack [18]: It iterates over model parameters,

replacing current values with historically scaled ones,

effectively warping the model parameters using past data

to misguide the aggregation process.

• Random attack [18]: It disrupts the model by replacing

parameters with random and normally distributed values,

scaled to maintain a semblance of legitimacy, thereby

injecting controlled chaos into the aggregation process.

• MPAF [18]: It calculates a directional vector derived

from the difference between initial and current param-

eters. This vector is then used to adjust model values,

intentionally diverging from the model’s original trajec-

tory to introduce an adversarial bias. Following these

calculations, the fake BSs are injected into the system.

• Zheng attack [12]: It inverts the direction of model

updates by incorporating the negative of previous global

updates. This inversion is reﬁned through error maximiza-

tion, generating a poison that proves challenging to detect

due to its alignment with the model’s error landscape.

Besides, we consider several baseline defensive mechanisms

to demonstrate the effectiveness of our attack and defense.

• Mean [20]: It calculates the arithmetic mean of updates

in each dimension, assuming equal trustworthiness among

all BSs. However, this method is susceptible to the

inﬂuence of extreme values.

• Median [20]: It identiﬁes the median value in each

dimension for each parameter across updates, which

inherently discards extreme contributions to enhance the

robustness against outliers.

• Trim [20]: It discards a speciﬁed percentage of the

highest and lowest updates before computing the mean in

each dimension, thereby reducing the potential sway of

anomalous or malicious updates on the aggregate model.

• Krum [21]: Each BS’s update is scored based on the sum

of Euclidean distances to other BSs’ updates. The global

update is then updated by selecting the update from the

BS with the minimum score.

• FoolsGold [23]: It calculates a cosine similarity matrix

among all BSs and adjusts the weights for each BS based

on these similarities. The weighted gradients are then

aggregated to form a global model.

• FABA [25]: It computes the Euclidean distance for each

BS’s model from the mean of all received models. By

identifying and excluding a speciﬁc percentage of the

most distant models, this process effectively ﬁlters out

potential outliers or malicious updates.

• FLTrust [22]: Cosine similarity is calculated between the

server’s current model and each BS’s model to generate

trust scores. These scores are then used to weigh the BS’s

contribution to the ﬁnal aggregated model.

• FLAIR [24]: Each BS calculates “ﬂip-scores” derived

from the changes in gradient directions and “suspicion-

scores” based on historical behavior. These scores are

used to adjust the weights assigned to each BS’s contri-

butions to the global model.

3) Experimental Settings and Performance Metrics: In our

experimental setup, we randomly selected 100 BSs to evaluate

the impact of poisoning attacks and the effectiveness of

defense mechanisms. By default, we report the results on

Milan-Internet dataset. Model training is conﬁgured with a

learning rate of 0.001 and a batch size of 64. We inject a 20%

percentage of fake BSs to mimic benign ones in the system

for FTI attack and simulate a scenario where 20% of the BSs

are compromised for other baseline attacks. Our proposed FTI

attack utilizes a parameter η = 10, and other attacks utilize

a scaling factor of 1000. For the Trim aggregation rule, we

discard 20% of the model parameters from all BSs. In our pro-

posed GLID defense, we employ the standard deviation (SD)

method as the default percentile estimation method. Through-

out the measurement campaign, we adopt Mean Absolute Error

(MAE) and Mean Squared Error (MSE) as the primary metrics

for performance evaluation. MSE quantiﬁes the average of the

squared discrepancies between estimated and actual values,

while MAE calculates the average absolute differences across

predictions, disregarding their directional errors. The larger the

MAE and MSE, the better the effectiveness of the attack.

B. Numerical Results

1) Performance of Proposed Methods: The FTI Attack,

in particular, exposes signiﬁcant vulnerabilities in numerous

aggregation methods. It is observed that under our FTI Attack,

both Mean and Krum Rules are completely compromised, as

reﬂected by their MAE and MSE values reaching over 100.0

(values exceeding 100 are capped at 100). This result denotes

a total breakdown in their WTP functionality. The Median

Rule further emphasizes the severity of FTI Attack, with both

its MAE and MSE escalating from modest baseline ﬁgures to

TABLE I: Performance Metrics for Milan-Internet Dataset

Aggregation Rule Metric

Attack

NO Trim History Random MPAF Zheng FTI

Mean

MAE 0.211 100.0 100.0 100.0 100.0 0.698 100.0

MSE 0.086 100.0 100.0 100.0 100.0 0.294 100.0

Median

MAE 0.211 0.213 0.211 0.212 0.211 0.217 100.0

MSE 0.086 0.086 0.087 0.086 0.086 0.095 100.0

Trim

MAE 0.211 0.212 0.212 0.211 0.212 0.239 100.0

MSE 0.086 0.087 0.089 0.086 0.088 0.106 100.0

Krum

MAE 0.221 0.225 100.0 0.225 100.0 0.225 100.0

MSE 0.091 0.093 100.0 0.094 100.0 0.094 100.0

FoolsGold

MAE 0.213 100.0 100.0 100.0 100.0 0.934 100.0

MSE 0.095 100.0 100.0 100.0 100.0 0.607 100.0

FABA

MAE 0.219 100.0 100.0 100.0 100.0 0.623 100.0

MSE 0.089 100.0 100.0 100.0 100.0 0.249 100.0

FLTrust

MAE 0.242 0.234 100.0 0.240 100.0 3.182 100.0

MSE 0.094 0.092 100.0 0.094 100.0 1.208 100.0

FLAIR

MAE 0.216 0.228 100.0 100.0 100.0 0.250 100.0

MSE 0.094 0.088 100.0 100.0 100.0 0.096 100.0

GLID

MAE 0.211 0.211 0.212 0.211 0.211 0.212 72.383

MSE 0.086 0.087 0.086 0.086 0.087 0.086 27.528

100. This sharp contrast highlights FTI attack’s reliable per-

formance against other defenses, such as Trim Attack against

Median rule, where the increase in MAE and MSE is relatively

minor at 0.234 and 0.092, respectively. Additionally, the Trim

Rule, typically considered robust, exhibits a drastic increase

in MAE to over 100.0, a signiﬁcant rise from its baseline

without any attack (termed as NO in Table I) of 0.211. This

surge underscores Trim Rule’s vulnerability to the FTI Attack,

marking a notable departure from its typical resilience. Similar

results can also be found in other aggregation rules under

FTI attack, such as FoolsGold, FABA, FLTrust, and FLAIR.

The FTI attack has the best overall performance against

the given defenses. The Zheng attack, however, presents a

distinct pattern of disruption. When subjected to this attack,

FLTrust, which typically exhibits lower error metrics, shows a

signiﬁcant compromise, evidenced by the dramatic increase

in its MAE to 3.182 and MSE to 1.208. Such a tailored

nature of Zheng attack appears to target speciﬁc vulnerabilities

within FLTrust, which are not as apparent in other scenarios,

such as Trim Attack, where the rise in MAE and MSE for

FLTrust is relatively modest. Regarding the MPAF Attack,

most aggregation rules in the table do not show a convincing

defense, except for a few like Median, Trim, and GLID.

Next, if we turn our attention to the defender’s stand-

point, the proposed GLID aggregation method demonstrates

consistent performance stability across various attacks. Both

its MAE and MSE values remain close to their baseline

levels. Even in the case of our FTI attack, GLID manages

to keep errors below 100, which is 72.383 and 27.528 for

MAE and MSE respectively. This stability is particularly

noteworthy, especially when compared to other rules such

as FLAIR, which exhibit a signiﬁcant deviation from their

non-attacked baselines under the same adversarial conditions.

GLID’s ability to sustain its performance in the face of diverse

and severe attacks underscores its potential as a resilient

aggregation methodology.

2) Evaluation on the Impact of η: The step size η in our

proposed FTI attack (see Algorithm 1) serves as a dynamic

scaling factor, and its initial value signiﬁcantly inﬂuences the

model’s performance metrics. This impact is illustrated in

Fig. 3, where the Median aggregation rule is employed as the

baseline defense strategy. A notable observation is the corre-

lation between increasing values of η and the corresponding

rise in MAE and MSE. For example, at η = 1, the MAE

and MSE are relatively low, recorded at 0.501 and 0.208,

respectively. However, increasing η to higher values, such as

10 or 20, results in a dramatic surge that reaches the maximum

error rate. This increase suggests a signiﬁcant compromise in

the model, surpassing the predeﬁned threshold for effective

detection of the attack. The rationale behind this analysis

emphasizes the pivotal role of η in determining the strength

of a poisoning attack. An increased initial η tends to degrade

model performance, deviating signiﬁcantly from its expected

operational state. Simultaneously, a higher η also raises the

risk of the attack’s perturbations being detected and eliminated

during the defense process.

Fig. 3: Impact of Values of η.

3) Evaluation on Percentage of Fake BSs: The degree

of compromise in BSs signiﬁcantly inﬂuences the model’s

performance, as evidenced in Table II. By adopting Median

aggregation as the defensive approach, the model ﬁrst exhibits

resilience at lower compromise levels, such as with only

5%–10% fake BSs in the scenario. However, a noticeable

decline in performance is observed as the percentage of

fake BSs increases to 20% or higher. This deterioration is

evident as MAE and MSE values reach 100.0 in all categories,

signaling a complete model failure. The underlying principle

behind this trend suggests the model’s limited tolerance to

malicious interference. More precisely, the network system

can withstand below 20% compromise without signiﬁcant

performance degradation. However, beyond this threshold,

the model’s integrity is severely undermined, resulting in a

complete system breakdown. This observation highlights the

critical importance of implementing robust security measures

to prevent excessive compromise of BSs, ensuring the model’s

reliability and effectiveness.

TABLE II: Impact of Percentages of Fake BSs

Pct. Metric

Attack

Trim Hist. Rand. MPAF Zhe. FTI

MAE 0.221 0.215 0.219 0.215 0.213 0.229

MSE 0.088 0.089 0.088 0.088 0.088 0.089

10%

MAE 0.220 0.213 0.218 0.213 0.214 0.258

MSE 0.087 0.090 0.088 0.090 0.096 0.104

20%

MAE 0.223 0.218 0.218 0.216 0.269 100.0

MSE 0.087 0.096 0.087 0.092 0.136 100.0

30%

MAE 100.0 100.0 100.0 100.0 5.990 100.0

MSE 100.0 100.0 6.141 100.0 1.154 100.0

40%

MAE 100.0 100.0 100.0 100.0 100.0 100.0

MSE 100.0 100.0 100.0 100.0 100.0 100.0

4) Evaluations on Percentile Estimation Methods: The dy-

namic trimming of an adaptive number of model parameters

through percentile estimation, which is adapted in GLID, is ef-

fective for an effective defense strategy against various model

poisoning attacks. In the comparative analysis of various

estimation methods, as shown in Table III, Standard Deviation

(SD) estimation emerges as the best technique, exhibiting

a marked consistency and robustness across a spectrum of

estimation approaches. This is evidenced by the consistently

low MAE and MSE values for SD across these approaches,

at 0.219 and 0.087, respectively. In contrast, other methods

have varying degrees of inconsistency and vulnerability. For

instance, One-class SVM exhibits pronounced variability, with

MAE and MSE values reaching the maximal error level of

over 100.0 under Trim, History, and MPAF attacks. Such a

disparity in performance, particularly the stably lower error

rates of SD compared to the signiﬁcant ﬂuctuations in other

estimation methods, positions SD as a reliable and effective

percentile estimation technique in GLID.

5) Evaluations on the Impact of BS Density: Given the

percentage of fake BSs at 20%, Figs. 4(a)-(d) compare Median

and GLID rules with varying densities of BS in the network

scenario. It is interesting to see that the total number of BSs

does not signiﬁcantly impact the performance of any attack

and defense mechanisms, especially for our FTI and GLID.

Under Median aggregation, FTI consistently shows maximal

error (MAE and MSE at over 100.0) across different BS

densities, indicating a failure of the defense. This consistent

TABLE III: Impact of Percentile Estimation Methods

Method Metric

Attack

NO Trim Hist. Rand. MPAF Zhe. FTI

MAE 0.219 0.219 0.219 0.218 0.219 0.219 72.38

MSE 0.087 0.087 0.087 0.087 0.087 0.087 27.52

IQR

MAE 0.219 0.220 0.220 0.219 0.210 0.218 100.0

MSE 0.087 0.087 0.087 0.087 0.087 0.088 100.0

Z-scores

MAE 0.219 0.219 0.219 0.219 0.220 1.047 100.0

MSE 0.087 0.087 0.088 0.087 0.087 0.401 100.0

SVM

MAE 0.219 100.0 100.0 0.220 100.0 0.713 100.0

MSE 0.087 100.0 100.0 0.087 100.0 0.275 100.0

pattern of stable performance across varying participants in the

FL-based WTP system suggests that the total number of BS

does not substantially inﬂuence the effectiveness of the attack

and defense strategies.

TABLE IV: Impact of Different Percentile Pairs

Pair Metric

Method

Trim Hist. Rand. MPAF Zhe. FTI

[10, 70]

MAE 100.0 100.0 100.0 100.0 0.710 100.0

MSE 100.0 100.0 100.0 100.0 0.279 100.0

[20, 70]

MAE 0.215 0.214 0.218 0.217 0.216 100.0

MSE 0.083 0.085 0.084 0.082 0.086 100.0

[30, 70]

MAE 0.218 0.219 0.220 0.215 0.217 72.382

MSE 0.090 0.088 0.089 0.086 0.088 27.246

[10, 80]

MAE 100.0 100.0 100.0 100.0 0.711 100.0

MSE 100.0 100.0 100.0 100.0 0.275 100.0

[20, 80]

MAE 0.217 0.215 0.218 0.214 0.216 72.168

MSE 0.085 0.083 0.084 0.082 0.086 27.147

[30, 80]

MAE 0.220 0.218 0.219 0.216 0.217 71.298

MSE 0.088 0.089 0.086 0.088 0.090 27.022

[10, 90]

MAE 100.0 100.0 100.0 100.0 0.712 100.0

MSE 100.0 100.0 100.0 100.0 0.274 100.0

[20, 90]

MAE 0.215 0.217 0.218 0.216 0.214 100.0

MSE 0.088 0.086 0.085 0.089 0.086 100.0

[30, 90]

MAE 0.217 0.218 0.219 0.216 0.215 100.0

MSE 0.086 0.088 0.089 0.085 0.088 100.0

6) Evaluations on the Percentile Range of GLID: Table IV

presents an evaluation of performance across a variety of per-

centile pairs used in the proposed GLID method on different

attack methods. The conﬁguration of the percentile pair guides

the GLID method in identifying and eliminating outliers. For

example, specifying a percentile pair of [10, 70] means that

values below the 10

percentile and above the 70

percentile

are trimmed away, focusing the analysis on the data within

these bounds. It is observed that, when the percentile pair

is set at [10, 70], most methods, except for Zheng attack,

attacked. Similarly, the percentile pair of [10, 90] yields a

value over 100 for all methods except Zheng attack. The Zheng

attack consistently records low metrics across all settings, such

as 0.710, and 0.279 for the pair [10, 70], raising questions

about its attack efﬁcacy. On the other hand, FTI shows varied

performance; it achieves over 100.0 for most percentile pairs

like [10, 70] and [20, 90] but drops to 72.382 and 27.246 for

the pair [30, 70]. These results underscore the importance of

ﬁne-tuning the percentile pair parameters in the GLID method.

Proper parameter selection can effectively trim outliers without

signiﬁcantly impacting overall network performance.

VI. CONCLUSION

In this study, we introduced a novel approach to perform

model poisoning attacks on WTP through fake trafﬁc injec-

tion. Operating under the assumption that real-world BSs are

challenging to attack, we inject fake BS trafﬁc distribution

with minimum knowledge that disseminates malicious model

parameters. Furthermore, we presented an innovative global-

local inconsistency detection mechanism, designed to safe-

guard FL-based WTP systems. It employs an adaptive trim-

ming strategy, relying on percentile estimations that preserve

accurate model parameters while effectively removing outliers.

Extensive evaluations demonstrate the effectiveness of our

attack and defense, outperforming existing baselines.

(a) Median AR w.r.t MAE (b) Median AR w.r.t MSE

Fig. 4: The impact of BS density on the performance of Median and GLID methods with respect to MAE and MSEs.

ACKNOWLEDGMENT

This research was supported by the National Science Foun-

dation through Award CNS–2312138.

REFERENCES

[1] L. U. Khan, W. Saad, Z. Han, E. Hossain, and C. S. Hong, “Federated

learning for internet of things: Recent advances, taxonomy, and open

challenges,” in IEEE Communications Surveys & Tutorials, 2021.

[2] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, and

H. V. Poor, “Federated learning for internet of things: A comprehensive

survey,” in IEEE Communications Surveys & Tutorials, 2021.

[3] H. G. Abreha, M. Hayajneh, and M. A. Serhani, “Federated learning in

edge computing: a systematic survey,” in Sensors, 2022.

[4] J. Xu, B. S. Glicksberg, C. Su, P. Walker, J. Bian, and F. Wang,

“Federated learning for healthcare informatics,” in Journal of Healthcare

Informatics Research, 2021.

[5] D. C. Nguyen, Q.-V. Pham, P. N. Pathirana, M. Ding, A. Seneviratne,

Z. Lin, O. Dobre, and W.-J. Hwang, “Federated learning for smart

healthcare: A survey,” in ACM Computing Surveys, 2022.

[6] J. Wen, M. Sheng, J. Li, and K. Huang, “Assisting intelligent wireless

networks with trafﬁc prediction: Exploring and exploiting predictive

causality in wireless trafﬁc,” in IEEE Communications Magazine, 2020.

[7] L. Nie, D. Jiang, S. Yu, and H. Song, “Network trafﬁc prediction based

on deep belief network in wireless mesh backbone networks,” in IEEE

Wireless Communications and Networking Conference, 2017.

[8] S. P. Sone, J. J. Lehtom

aki, and Z. Khan, “Wireless trafﬁc usage

forecasting using real enterprise network data: Analysis and methods,”

in IEEE Open Journal of the Communications Society, 2020.

[9] C. Qiu, Y. Zhang, Z. Feng, P. Zhang, and S. Cui, “Spatio-temporal

wireless trafﬁc prediction with recurrent neural network,” in IEEE

Wireless Communications Letters, 2018.

[10] Y. Xu, W. Xu, F. Yin, J. Lin, and S. Cui, “High-accuracy wireless trafﬁc

prediction: A gp-based machine learning approach,” in IEEE Global

Communications Conference, 2017.

[11] C. Zhang, S. Dang, B. Shihada, and M.-S. Alouini, “Dual attention-

based federated learning for wireless trafﬁc prediction,” in INFOCOM,

2021.

[12] T. Zheng and B. Li, “Poisoning attacks on deep learning based wireless

trafﬁc prediction,” in INFOCOM, 2022.

[13] M. Joshi and T. H. Hadi, “A review of network trafﬁc analysis and

prediction techniques,” arXiv preprint arXiv:1507.05722, 2015.

[14] J. Fan, D. Mu, and Y. Liu, “Research on network trafﬁc prediction model

based on neural network,” in International Conference on Information

Systems and Computer Aided Education, 2019.

[15] M. Fang, X. Cao, J. Jia, and N. Gong, “Local model poisoning attacks

to byzantine-robust federated learning,” in USENIX security symposium,

2020.

[16] V. Shejwalkar and A. Houmansadr, “Manipulating the byzantine: Opti-

mizing model poisoning attacks and defenses for federated learning,” in

NDSS, 2021.

[17] C. Xie, O. Koyejo, and I. Gupta, “Fall of empires: Breaking byzantine-

tolerant sgd by inner product manipulation,” in UAI, 2020.

[18] X. Cao and N. Z. Gong, “MPAF: Model poisoning attacks to federated

learning based on fake clients,” in CVPR Workshops, 2022.

[19] D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust dis-

tributed learning: Towards optimal statistical rates,” in ICML, 2018.

[20] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,

“Communication-efﬁcient learning of deep networks from decentralized

data,” in AISTATS, 2017.

[21] P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Ma-

chine learning with adversaries: Byzantine tolerant gradient descent,” in

NeurIPS, 2017.

[22] X. Cao, M. Fang, J. Liu, and N. Z. Gong, “Fltrust: Byzantine-robust

federated learning via trust bootstrapping,” in NDSS, 2021.

[23] C. Fung, C. J. Yoon, and I. Beschastnikh, “Mitigating sybils in federated

learning poisoning,” arXiv preprint arXiv:1808.04866, 2018.

[24] A. Sharma, W. Chen, J. Zhao, Q. Qiu, S. Bagchi, and S. Chaterji,

“Flair: Defense against model poisoning attack in federated learning,”

in ASIACCS, 2023.

[25] Q. Xia, Z. Tao, and Q. Li, “Defending against byzantine attacks in

quantum federated learning,” in International Conference on Mobility,

Sensing and Networking, 2021.

[26] M. Fang, J. Liu, N. Z. Gong, and E. S. Bentley, “Aﬂguard: Byzantine-

robust asynchronous federated learning,” in ACSAC, 2022.

[27] Y. Xu, M. Yin, M. Fang, and N. Z. Gong, “Robust federated learning

mitigates client-side training data distribution inference attacks,” in The

Web Conference, 2024.

[28] V. Tolpegin, S. Truex, M. E. Gursoy, and L. Liu, “Data poisoning attacks

against federated learning systems,” in ESORICS, 2020.

[29] M. Yin, Y. Xu, M. Fang, and N. Z. Gong, “Poisoning federated

recommender systems with fake users,” in The Web Conference, 2024.

[30] “Android-x86 run android on your pc,” https://www.android-x86.org/.

[31] “Noxplayer, the perfect android emulator to play mobile games on pc,”

https://www.bignox.com/.

[32] “The world’s ﬁrst cloud-based android gaming platform,”

https://www.bluestacks.com/.

[33] Barlacchi, Gianni, M. D. Nadai, R. Larcher, A. Casella, C. Chitic,

G. Torrisi, F. Antonelli, A. Vespignani, A. Pentland, and B. Lepri, “A

multi-source dataset of urban life in the city of milan and the province

of trentino,” in Scientiﬁc data, 2015.