In this section, we describe the architecture of our framework, including the system model that explains how crowdsourcing performs, and the threat model.
5.1. System Model
CrowdSFL consists of two parts, preliminaries before the crowdsourcing and the implementation of the crowd computing process. The working steps of the CrowdSFL are as follows:
Step 1: REQ releases the tasks.
Before the FL begins, the REQ needs to publish its basic requirements to the CSP, such as the functions of the solution it needs, what features the data should have, whether the model used for learning belongs to CNN, RNN or others, and all hyperparameters of the model structure, etc. Necessary information is crucial for workers to perform FL, and incorrect parameters or structure of model will affect or even destroy the efficiency of model aggregator. In a task that has a strict demand for accuracy, this means that there would be more training rounds. Since each data interaction is implemented on the blockchain, each worker participating in FL must store all interactive data on the blockchain locally.
In addition to the basic requirements, the REQ also needs to release the reward pool, which is the source of motivation for workers to participate in a computing task. A participation deadline is necessary. Before deadline arrives, workers can voluntarily choose whether to participate in a task released by a REQ on the CSP. Finally, the maximum waiting time for a round of training needs to be set. In one iteration, the CSP will collect and process newly uploaded models on the blockchain after the maximum waiting time ends. All workers must complete the training and upload new model within this specified time, otherwise, they will be judged to be sabotage in this round.
Step 2: Workers voluntarily participate in tasks.
Workers who want to get a reward, choose to participate in a task in a voluntary way. This is a common practice of crowdsourcing systems [
15]. A malicious worker may wish to submit a wrong model to be rewarded which leads to model poisoning, so workers need to make a deposit before participation, which can effectively thwart many attacks such as DDoS, Sybil and “false-reporting” attacks. In FL, model poisoning can seriously affect the learning process [
21].
The workers who have confirmed to participate in this task will start data interaction based on the blockchain. They firstly upload their metadata on the private chain in the form of transactions. Metadata contains the account number, the reputation of a worker, the size of the dataset owned by the worker, and the hash value of the dataset. The account number is a unique identifier of the worker. Reputation is a value representing workers’ participation. The higher the reputation, the more tasks this worker has successfully completed, in a sense, he is a veteran.
The deposit submitted by the worker is a function based on the reputation and . The higher the reputation, the fewer deposits he needs to submit. The larger , the more deposits he needs to submit, because in Algorithm 1, the calculation result of the model aggregator will be biased to the worker with a large amount of data. When the size of a dataset owned by a worker almost occupies the entire dataset size, likely 99%, the result of model aggregation will be approximately equal to the model submitted by this client, so he needs to submit more deposits to ensure that he is a positive worker. Similarly, the worker can finally get reward as a function of . A larger means that the worker has more resources and has exerted more power in the construction of the classifier. From the perspective of “data is wealth”, after the training process, a worker with more data deserves more rewards.
The configuration of deposits and rewards is variant between different scenarios, depending on the nature of a crowdsourcing task, which is not the focus of this paper.
Step 3: Contract deployment.
Through the code constructed in the contract deployed, workers, REQ, and CSP can implement uploading and querying. In the concept of Ethereum, the storage operation of the blockchain is quite expensive, each account node would save a copy of the data locally, so the participants will not upload oversized data, for instance, one dataset of HD pictures. Moreover, Ethereum supports that query the information of the blockchain without requiring any transaction fee since the information has been synchronized locally due to the blockchain transaction mechanism. In the following operations, all data interactions are essentially repeated calls to this contract.
Step 4: Model initialization.
The REQ generates an initialization model randomly and uploads by calling the contract.
Step 5: Querying, Training and performing Enc.
Workers query the model from blockchain and training with their own data locally. During the t-th round, each worker k that normally participates in training can calculate and then generate a ciphertext by the encryption key . Finally, is uploaded to blockchain before the end of setting .
Step 6: CSP performs ReEnc and Blinding.
When arrives, CSP queries new transactions information on the blockchain and obtains all from it. It first performs and to get . To achieve a fair scoring mechanism, and are backed up locally on the CSP. Finally, each and their unique ID are uploaded to the blockchain.
Step 7: REQ performs Dec, judges model accuracy.
REQ query the blinding ciphertexts and performs . Since the blinding ciphertexts have different ciphertext segments to , a malicious REQ who is biased against some workers cannot figure out a blinding ciphertext belongs to which worker, the only thing REQ can do is performing the operation in the protocol.
When REQ gets all plaintexts of model , he needs to judge the accuracy on his own dataset and upload the accuracy corresponding to each blinding ciphertext to the blockchain.
Step 8: CSP scores and selects available models.
Now the CSP has the accuracy of each worker’s model, as well as the metadata of each of them, and then scores according to the established rules. Finally, he will publish scores and select the ID corresponding to each available model in ciphertexts to upload to the blockchain.
Step 9: REQ performs aggregation.
REQ selects the corresponding model in plaintext to participate in the aggregation. Then, the result would be uploaded to the blockchain.
Step 10: Return to step 5 and into the next round.
The execution of one round ends and the next round of training will be performed until the stop condition is reached.
Figure 3 illustrates all the working steps of the CrowdSFL.
5.2. Evaluation Mechanism
In Step 7 and Step 8, REQ and CSP will work together to obtain a list of models that will participate in the aggregation. This corresponds to solution evaluation, and the difference is that the evaluation result is not to choose the best solution but to score one by one. For REQ, it is hoped that only the models with high accuracy values will participate in the aggregation, in order to finally get a solution that is sufficiently robust to all data spaces. For workers, they hope that they can get higher scores as much as possible to participate in aggregations and finally get more rewards. Therefore, REQ and workers in the evaluation process can be considered from the perspective of game theory.
Wu et al. [
38] proposed an evaluation framework for software crowdsourcing. They modeled and quantified software quality, costs, diversity of solutions, and other attributes in software crowdsourcing. They proposed participation-output analysis, submissions-award analysis, and some other analysis methods to evaluate software crowdsourcing. Affected by it, we propose a scoring evaluation mechanism based on min-max analysis.
When CrowdSFL is working, CSP has the size of each worker’s dataset, reputation value, and the accuracy value of the model submitted in each round of training. In traditional machine learning, a dataset with high quality and large amounts will produce a better model. In a round, if a worker with a large amount of data submits a model with low accuracy, it can be inferred that the worker is “negative”. In our scoring evaluation mechanism, we set the CSP to rank the dataset size of each worker and the model’s accuracy value in each round, which are
X and
Y, respectively. The difference value between the two is the result of scoring.
The larger
X means that the worker has more data, so REQ will hope that the model he submitted can have higher accuracy. The larger the
Y, the better the model submitted by the worker, and the worker will hope that he will get a higher score by submitting the solution.
represents the game result of REQ and worker in a round. When the score is smaller, the REQ is more dominant, and when the score is larger, the worker is more dominant. Considering that in the crowdsourcing system, the reputation value can represent the potential positivity of the worker, the revised scoring evaluation function is:
W represents the weight of reputation value in the scoring evaluation mechanism, and it can set as needed in different crowdsourcing.
The score can not only help CSP to select the model to participate in aggregation, but also provide a reference for the reward distribution mechanism of crowdsourcing. The main goal of this paper is the security and efficiency of crowdsourcing. Reward distribution needs to consider more realistic factors, and there are few general reward distribution mechanisms in previous related works, so no more introduction about it is made in this paper.
5.3. Threat Model
We define privacy as all kinds of information submitted by participants to the blockchain, to ensure that:
All ciphertexts cannot be decrypted.
Participants cannot find the rules of the information on the blockchain. For example, REQ can find the corresponding gradient plaintext through the blinded gradient ciphertext.
Potentially malicious participants may take different behaviors to maximize their own profits [
15]. We first define the threat model, which describes the potential threats and malicious behavior as follows:
Malicious REQ. REQ should release the reward pool he provided before performing the task. Considering that the purpose of malicious REQ is to obtain a useful solution and reduce his asset consumption, he may use some features of crowdsourcing to achieve it. For example, malicious REQ will recruit some fake workers to participate in crowdsourcing, and misrepresent the solutions submitted by these workers as a high-quality solution. After crowdsourcing, these workers recruited by malicious REQ can theoretically receive more rewards, so that malicious REQ can recover some assets from the reward pool.
Malicious Workers. Malicious workers attempt to obtain rewards without paying sufficient effort, which is free-rid-ing attack. Workers who scored low during the evaluation would want to change these in other ways, such as denying or even fork the existing blockchain. In addition, malicious workers can directly grab other workers’ solutions and claim to be their own, so as to achieve reaping without sowing.
Malicious Miners. Malicious miners attempt to disrupt the normal execution of programs on the blockchain by forking chains or collaborating with malicious participants, thereby achieving their attack goals.
We formalize the entire process of crowdsourcing as completely secure and fair, and give the following definition:
Definition. (Completely Fair and Collusion Resistance) Each worker will get the corresponding due reward by the amount of work he paid out, and REQ will get a relatively objective and available solution if all participants are semi-honest. That is, each participant follows the implementation process of the agreement, but the participants will try to infer the information of others based on the public information on the blockchain. The Elgamal encryption system guarantees the confidentiality of the information. Assume that malicious REQ uses brute-force-guessing to find information rules. When the number of gradient ciphertexts on the blockchain is x, the probability of req to accurately obtain the plaintext corresponding to all ciphertexts is . Every round, REQ needs to re-guess. The computational complexity of the brute-force-guessing would be .
Malicious participants always violate the rules to achieve their own purposes, but after that, CSP could easily find the malicious participants through auditing, since all the interaction data is recorded on the blockchain. For a malicious participant, CSP will confiscate his deposit and reduce his reputation. Given that CSP is generally a credible institution, he is generally credible and honest.