Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge

Takayuki Nishio, Ryo Yonetani

I Introduction

A variety of modern AI products are powered by cutting-edge machine learning (ML) technologies, which range from face detection and language translation installed on smartphones to voice recognition and speech synthesis used in virtual assistants such as Amazon Alexa and Google Home. Therefore, the development of such AI products typically necessitates large-scale data, which are essential for training high-performance ML models such as a deep neural network. Arguably, a massive amount of IoT devices, smartphones, and autonomous vehicles with high-resolution sensors, all of which are connected to a high-speed network, can serve as promising data collection infrastructure in the near future (e.g., ). Researchers in the field of communication and mobile computing have started to interact with data science communities in the last decade and have proposed mobile edge computing (MEC) frameworks that can be used for large-scale data collection and processing .

Typically, MEC frameworks assume that all data resources are transferred from data collection clients (IoT devices, smartphones, and connected vehicles) to computational infrastructure (high-performance servers) through cellular networks to perform their tasks . However, this assumption is not always acceptable when private human activity data are collected, such as life-logging videos, a history of e-mail conversations, and recorded phone calls. On one hand, such private activity data would be a key factor for improving the quality of AI products that support our daily life, which include not only AI-related apps on smartphones and virtual assistants but also AI-powered smart cities. On the other hand, uploading these data directly to computational infrastructure is problematic as the data could be eavesdropped by malicious users in a network to compromise client’s privacy.

To address this fundamental privacy concern, one work has recently been presented by the ML community: Federated Learning (FL) . As illustrated in Fig. 1, FL iteratively asks random clients to 1) download parameters of a trainable model from a certain server, 2) update the model with their own data, and 3) upload the new model parameters to the server, while asking the server to 4) aggregate multiple client updates to further improve the model. In exchange for requiring data collection clients to install a certain level of computational resources (e.g., a laptop equipped with reasonable GPUs, autonomous vehicles with moderate computational capacities ), the FL protocol allows the clients to keep their data secure in their local storage.

In this work, we focus on the implementation of the abovementioned FL protocol in practical MEC frameworks. We believe that our work will influence the future development platform of various AI products that require a large amount of private activity data to train ML models. In particular, we consider the problem of running FL in a cellular network used by heterogeneous mobile devices with different data resources, computational capabilities, and wireless channel conditions. Unfortunately, a direct application of existing FL protocols without any consideration of such heterogeneous client properties will make the overall training process inefficient. For instance, when some clients are with limited computational resources, they will require longer time to update models. Moreover, if the clients are under poor wireless channel conditions, that will result in longer update time. All such problems will delay the subsequent server’s aggregation step necessary to continue the training process.

Our main contribution is a new protocol referred to as FedCS, which can run FL efficiently while an operator of MEC frameworks actively manages the resources of heterogeneous clients. Specifically, FedCS sets a certain deadline for clients to download, update, and upload ML models in the FL protocol. Then, the MEC operator selects clients such that the server can aggregate as many client updates as possible in limited time frames, which makes the overall training process efficient and reduces a required time for training ML models. This is technically formulated by a client-selection problem that determines which clients participate in the training process and when each client has to complete the process while considering the computation and communication resource constraints imposed by the client, which we can solve in a greedy fashion.

We evaluate our approach with a realistic large-scale training of deep neural networks for object classification on a simulated MEC environment, where client data were generated using publicly-available large-scale image datasets. Our experimental results reveal that the FedCS can complete its training process in a significantly shorter time compared to the original FL protocol.

Resource optimization for MEC frameworks is one of the common topics in the field of communication and mobile computing. Recent work includes the joint optimization of heterogeneous data, computation, and communication resources . However, these approaches are designed to minimize computation times and/or energy consumptions for general computation tasks, which is considerably different from our work that aims to maximize the efficiency of training ML models. Moreover, as we stated earlier, our work assumes a different scenario where each mobile client has data and computational resources to preserve client data privacy when performing ML tasks. These differences motivate us to propose new tailored MEC protocols and algorithms.

Federated Learning is an emerging technique in the ML community. Following pioneering work , recent studies have specifically focused on how to enhance the security of FL protocols . However, little work has examined how to run FL efficiently with a practical network configuration. One exception is , which explored model compression techniques for efficient communications while sacrificing model performances. The other one is , which optimized hyper-parameters of FL (i.e.the number of epochs in each update phase and the number of total epochs) in a resource constrained MEC environment. However, these techniques do not particularly consider heterogeneous computation and communications and/or data resources of clients. The additional use of model compression techniques could help us improve the overall efficiency of our protocol, which is however beyond the scope of this study.

II Federated Learning

In this section, we briefly introduce the original FL framework presented in . Then, we identify the problems that affect FL communications when they are performed by heterogeneous clients in resource-constrained cellular networks.

Consider a scenario where a large population of mobile clients individually have data that they want to maintain as secret, such as laptops with personal collections of photos and autonomous vehicles with cityscape images captured by cameras. If all these distributed data are accessible, one can obtain a high-performance ML model that has been trained on an extremely large data collection. However, it is not desirable for clients to disclose their data owing to privacy concerns.

Federated Learning is a decentralized learning protocol that aims to resolve the abovementioned problem. As shown in Protocol 1, FL asks a certain server and K×C\lceil K\times C\rceil random clients (where KK is the number of all clients, CC is the fraction of clients considered in each round, and \lceil\cdot\rceil is the ceiling function, ) to communicate the parameters of a global model that they are going to train (Distribution and Update and Upload steps). The protocol requires the selected clients to compute an update of the model using their data (Update and Upload step), while asking the server to aggregate multiple updates from the clients to make the model better (Aggregation step). The advantage of this protocol is that clients do not have to upload private data; instead, they secure the data in their local storage. The only technical requirement is that each client must have a certain level of computational resources because Update and Upload consists of multiple iterations of the forward propagation and backpropagation of the model (i.e., we focus exclusively on training deep neural networks in a supervised manner; see for more details).

II-B Heterogeneous Client Problem in FL

Protocol 1 can experience major problems while training ML models in a practical cellular network, which are mainly due to the lack of consideration of the heterogeneous data sizes, computational capacities, and channel conditions of each client. For example, if a client has more data compared to others, the client will require longer time to update models unless it has a better computational resource. This will delay the subsequent communication for uploading new model parameters. Moreover, upload time will be longer if a client is under a severely poor channel condition.

All such problems about heterogeneous client resources will become bottlenecks in the FL training process; the server can complete the Aggregation step only after it receives all client updates. One may set a deadline for random clients to complete the Update and Upload step and ignore any update submitted after the deadline. However, this straightforward approach will lead to the inefficient use of network bandwidths and waste the resources of delayed clients.

III FedCS: Federated Learning with Client Selection

We propose a new FL protocol, FedCS, which works efficiently with clients with heterogeneous resources. In the following sections, we first summarize several assumptions of our proposal and then present FedCS in more detail.

As illustrated in Fig. 1, we consider that a certain MEC platform, which is located in a wireless network and consists of a server and a base station (BS), manages the behaviors of the server and clients in the FL protocol. We will particularly focus in this work on leveraging the wireless networks when they are stable and not congested, such as at midnight or in the early morning time, mainly because ML models to be trained and communicated are typically large. Nevertheless, each process has to be carried out under certain limited bandwidths, particularly when there are multiple ML tasks to be performed via FL. Specifically, we assume that the amount of resource blocks (RBs; the smallest unit of bandwidth resources defined in LTE ) available for each process is limited and managed by the MEC operator. In addition, if multiple clients upload model parameters simultaneously, the throughput for each client decreases accordingly.

We assume that the modulation and coding scheme of radio communications for each client are determined appropriately while considering its channel state so that packet-loss rate is negligible. This leads to different throughput for each client to upload model parameters although the amount of allocated RBs is constant. The throughput for broadcast and multicast transmission by the BS is assumed to be limited by that of the client with the worst channel conditions. Nevertheless, we also assume the channel state and throughput of each client to be stable as mentioned above.

III-B FedCS Protocol

We present FedCS in Protocol 2 (see also the diagram in Fig. 2 for how each step is performed in order). The key idea of our protocol is that instead of selecting random clients in the original Client Selection step of Protocol 1, we propose the following two-step client selection scheme. First, the new Resource Request step asks random clients to inform the MEC operator of their resource information such as wireless channel states, computational capacities (e.g., if they can spare CPUs or GPUs for updating models), and the size of data resources relevant to the current training task (e.g., if the server is going to train a ‘dog-vs-cat’ classifier, the number of images containing dogs or cats). Then, the operator refers to this information in the subsequent Client Selection step to estimate the time required for the Distribution and Scheduled Update and Upload steps and to determine which clients go to these steps (the specific algorithms for scheduling clients are explained later). In the Distribution step, a global model is distributed to the selected clients via multicast from the BS because it is bandwidth effective for transmitting the same content (i.e., the global model) to client populations. In the Scheduled Update and Upload step, the selected clients update the model in parallel and upload new parameters to the server using the RBs allocated by the MEC operator. The server aggregates client updates following Protocol 1 and measures model performances with certain validation data. Until the model achieves a certain desired performance (e.g., a classification accuracy of 90%) or the final deadline arrives, all steps but Initialization are iterated for multiple rounds.

III-C Algorithm for Client Selection Step

Our goal in the Client Selection step is to allow the server to aggregate as many client updates as possible within a specified deadline. This criterion is based on the result from that a larger fraction of clients used in each round saves the time required for global models to achieve a desired performance. Based on the criterion, the MEC operator selects clients who can complete the Distribution and Scheduled Update and Upload steps within a deadline. At the same time, the operator schedules when the RBs for model uploads are allocated to the selected clients to prevent congestion in the limited bandwidths a cellular network could impose. Note that we assume that selected clients start and complete their upload processes one by one for simplicity. Nevertheless, even if multiple clients can upload in parallel by sharing RBs, the time required for transmitting all models is the same as that for the sequential upload.

IV Performance Evaluation

As a proof-of-concept scenario to show how our protocol works effectively, we simulated a MEC environment and conducted experiments of realistic ML tasks using publicly-available large-scale datasets.

We simulated a MEC environment implemented on the cellular network of an urban microcell consisting of an edge server, a BS, and K=1000K=1000 clients, on a single workstation with GPUs. The BS and server were co-located at the center of the cell with a radius of 2 km, and the clients were uniformly distributed in the cell.

IV-B Experimental Setup of ML Tasks

With the simulated MEC environment described above, we adopted two realistic object classification tasks using publicly-available large-scale image datasets. One was CIFAR-10, a classic object classification dataset consisting of 50,000 training images and 10,000 testing images with 10 object classeshttps://www.cs.toronto.edu/~kriz/cifar.html. This dataset has been used commonly in FL studies . The other was Fashion MNIST , which comprised 60,000 training images and 10,000 testing images of 10 different fashion products such as T-shirts and bags. This dataset would give a more beneficial but sensitive setting because the ability to automatically recognize fashion products would be useful for various applications such as e-commerce, but the products that people are interested in are highly-private information. Figure 3 shows sample images in the datasets.

For both tasks, the training dataset was distributed to K=1000K=1000 clients as follows: First, we randomly determined the number of image data owned by each client in a range of 100 to 1,000. Then, by following the experimental setup used in , we split the training dataset into the clients in two ways: IID setting where each client just sampled the specified number of images from the whole training dataset randomly, and Non-IID setting where each client sampled images randomly but from different subsets (2 out of the 10 categories chosen randomly) of the training data, standing for a more challenging but realistic setting. In each round of the FL protocols, we set C=0.1C=0.1 based on to select a maximum of K×C=100K\times C=100 clients. Finally, the testing dataset was used only for measuring classification performances.

IV-C Global Models and Their Updates

We implemented a standard convolutional neural network as a global model for both tasks. Specifically, our model consisted of six 3×33\times 3 convolution layers (32, 32, 64, 64, 128, 128 channels, each of which was activated by ReLU and batch normalized, and every two of which were followed by 2×22\times 2 max pooling) followed by three fully-connected layers (382 and 192 units with ReLU activation and another 10 units activated by soft-max). This resulted in approximately 4.6 million model parameters (Dm=D_{\rm m}= 18.3 megabytes in 32-bit float) for CIFAR-10 and 3.6 million parameters (Dm=D_{\rm m}= 14.4 megabytes in 32-bit float) for Fashion-MNIST. Deeper models such as residual networks would provide higher classification performances. However, these models were not the focus of our experiments.

When updating global models, we selected the following hyperparameters according to : 5050 for mini-batch size, 55 for the number of epochs in each round, 0.250.25 for the initial learning rate of stochastic gradient descent updates, and 0.99 for learning rate decay. The computation capability of each client was simply modeled by how many data samples it could process in a second to update a global model, which could be fluctuated due to other computation load on the client. We determined the mean capability of each client randomly from a range of 10 to 100, which are used the value for Client Selection. As a result, each update time, tkUDt_{k}^{\rm UD}, used in Client Selection varied from 5 to 500 seconds averagely. In Scheduled Update and Upload, the computation capability is determined by the Gaussian distribution with the standard deviation given by the r%r\% of the mean capability value like our throughput model. We considered this range to be reasonable because our workstation required 5 seconds for a single update with a single GPU; mobile devices with a weaker computation resource could require a 10 or 100 times longer update time. Finally, we empirically set TroundT_{\rm round} to 3 minutes and TfinalT_{\rm final} to 400 minutes.

IV-D Evaluation Details

We compared FedCS with the FL protocol modified slightly to be limited with deadline TroundT_{\rm round} for each round. We referred to this protocol as FedLim. In this baseline, the clients selected randomly by a MEC operator updated the models and sequentially uploaded their new parameters to a server until the deadline. The updates completed after the deadline were just discarded and not aggregated. FedCS and FedLim were evaluated based on the following metrics:

Time of arrival at a desired accuracy (ToA@xx): We observed the changes in the accuracy on testing datasets over time and identified when the accuracy reached a certain level for the first time (i.e., the earlier the better). Specifically, we report ToA@0.5 (i.e., 50% accuracy) and ToA@0.75 for CIFAR-10 and ToA@0.5 and ToA@0.85 for Fashion-MNIST with the IID setting, and ToA@0.35 and ToA@0.5 for CIFAR-10 and ToA@0.5 and ToA@0.7 for Fashion-MNIST with the Non-IID setting.

Accuracy after the final deadline (Accuracy)): We also measured the accuracy on testing datasets just after the final deadline (Tfinal=360T_{\rm final}=360 minutes since the beginning).

IV-E Results

Effect of TroundT_{\rm round}: To obtain a deeper understanding of how our approach works, we investigated ToA and the changes in the classification accuracies FedCS on Fashion MNIST for different values of deadline TroundT_{\rm round} while maintaining TfinalT_{\rm final} fixed, as shown in Table I and Fig. 4. We observed that TroundT_{\rm round} must be selected to be neither too long nor too short. While longer deadlines (e.g., 10 minutes) with FedCS involved numerous clients in each round, their performances were extremely limited owing to the smaller number of Aggregation steps. On the contrary, a short deadline, such as 1 minute, limited the number of clients accessible in each round, which also degraded the classification accuracies. A better method of selecting TroundT_{\rm round} is to change it dynamically to involve a sufficient number of clients in each round. This is left for future work.

Non-IID setting: The results with the Non-IID setting are shown in Table II and Figure 5. FedCS still works well while the performance of FedLim could not achieve the accuracy of even 50% and 70% on CIFAR-10 and Fashion-MNIST. However, similar to the previous work , the overall performances were limited with the Non-IID setting (i.e., lower averages and higher variances in the classification accuracies) compared to those with the IID setting. As indicated in the results of , to better cope with non-IID data we need to increase either number of the selected clients for each round or that of rounds, both of which were however difficult due to the time constraints TroundT_{\rm round} and TfinalT_{\rm final} we imposed in the experiments. One potential extension that can alleviate the non-IID problem is the additional use of model compression techniques , which could increase the number of clients that can be selected within the same constraint of TroundT_{\rm round}.

V Conclusion

We have presented a new protocol, FedCS, which aimed to perform FL efficiently in a MEC framework with heterogeneous clients. Our experimental results have revealed that FedCS constantly provided high-performance ML models in a significantly shorter time compared to the state-of-the-art protocol by incorporating more clients into its training process, regardless of the choices of datasets, the ways of splitting data (i.e., IID or Non-IID), and the uncertainty of throughput and computation capability. As we limit our global model to sufficiently simple deep neural networks, other possible extension of this study is to train a more sophisticated model with dozens of millions of parameters using very large-scale data. Another interesting direction for future work is to work on more dynamic scenarios where the average amount of the resources as well as the required times for updating and uploading can fluctuate dynamically.

Acknowledgment

This work was supported in part by JST ACT-I Grant Number JPMJPR17UK and JPMJPR16UT and KDDI Foundation.

References