CIKM 2022 AnalytiCup Competition

Federated Hetero-Task Learning

Introduction

We propose a new task, federated hetero-task learning, which meets the requirements of a wide range of real-world scenarios, while also promoting the interdisciplinary research of Federated Learning with Multi-task Learning, Model Pre-training, and AutoML. We have prepared an easy-to-use toolkit based on FederatedScope [1,2] to help participants easily explore this challenging yet manageable task from several perspectives, and also set up a fair testbed and different formats of awards for participants.

We are running this competition on Tianchi competition platform. Please visit this link.

Awards

Prizes:
- 1st place: 5000 USD
- 2nd place: 3000 USD
- 3rd place: 1500 USD
- 4th ~ 10th place: 500 USD each
Certification:
- 1st ~ 20th: Certification with rank
- Others: Certification with participation

Schedule

July 15, 2022: Competition launch. Sample dataset releases and simulation environment opens. Participants can register, join the discussion forum, upload the code for training and get feedback from leadboard.
Sept 1, 2022: Registration ends.
Sept 11, 2022: Submission ends.
Sept 12, 2022: Checking phase starts. Codes of top 30 teams will automatically be migrated into a checking phase.
Sept 18, 2022: Notification of checking results.
Sept 21, 2022: Announcement of the CIKM 2022 AnalytiCup Winner.
Oct 17, 2022: Beginning of CIKM 2022.

All deadlines are at 11:59 PM UTC on the corresponding day. The organizers reserve the right to update the contest timeline if necessary.

Problem description

In federated hetero-task learning, the learning goals of different clients are different. In practice, this setting is often observed due to personalized requirements of different clients, or the difficulty in aligning goals among multiple clients. Specifically, the problem is defined as follows:

Input: Several clients, each one is associated with a different dataset (feature space can be different) and a different learning objective.
Output: A learned model for each client.
Evaluation metric: The averaged improvement ratio (against the provided “isolated training” baseline) across all the clients.

More details can be found in this page.

We provide the dataset for this competition via Tianchi. At the same time, we encourage you to see the exemplary federated hetero-task learning datasets defined in B-FHTL [3], where the design and construction of these datasets are illustrated in the following picture:

References

[1] FederatedScope: A Flexible Federated Learning Platform for Heterogeneity. arXiv preprint 2022. pdf

[2] FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning. KDD 2022. pdf

[3] A Benchmark for Federated Hetero-Task Learning. arXiv preprint 2022. pdf

Step-by-step Guidance for CIKM22 AnalytiCup Competition

Step1. Install FederatedScope

Download FederatedScope and switch to the stable branch cikm22competition for this competition:

git clone https://github.com/alibaba/FederatedScope.git

cd FederatedScope

git checkout cikm22competition

Step 2. Setup the running environment

You can use Docker or Conda to setup your running environment

Use Docker

Check your cuda version via the command
```
  nvidia-smi
```

Build the corresponding docker image

If your CUDA Version >= 11:

docker build -f environment/docker_files/federatedscope-torch1.10-application.Dockerfile -t alibaba/federatedscope:base-env-torch1.10 .
      
docker run --gpus device=all --rm -it --name "fedscope" -v $(pwd):$(pwd) -w $(pwd) alibaba/federatedscope:base-env-torch1.10 /bin/bash
      
pip install -e .

If your CUDA Version >= 10 but <11:

docker build -f environment/docker_files/federatedscope-torch1.8-application.Dockerfile -t alibaba/federatedscope:base-env-torch1.8 .
      
docker run --gpus device=all --rm -it --name "fedscope" -v $(pwd):$(pwd) -w $(pwd) alibaba/federatedscope:base-env-torch1.8 /bin/bash
      
pip install -e .

Use Conda
- We recommend using a new virtual environment to install FederatedScope:
```
  conda create -n fs python=3.9
  conda activate fs
```
- If you are using torch, please install it in advance (torch-get-started). For example, if your cuda version is 11.3 please execute the following command:
```
  conda install -y pytorch=1.10.1 torchvision=0.11.2 torchaudio=0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
```
- Install the required packages as follows
```
  python setup.py install
```
- Finally, install packages required by graph tasks as follows
```
  bash environment/extra_dependencies_torch1.10-application.sh
```

Step 3. Download contest data

Click sign up in Tianchi, and register your account as follows if you don’t have one.

Click sign up in Tianchi.

Register your account.
Login and download the contest data

Download the contest data.
Suppose the contest data is placed in ${YOUR_OWN_PATH}/CIKM22Competition.zip, unzip the contest data as follows
```
  mkdir data
  unzip -d ./data/ ${YOUR_OWN_PATH}/CIKM22Competition.zip
```

Then you can access the contest data with the directory FederatedScope/data/CIKM22Competition. The contest data is organized by the index of the client CIKM22Competition/${client_id} (counts from 1), and the data of each client contains the train, test and validate splitted parts. You can load it by torch.load as follows:

  import torch
  # The train split of client 1
  train_data_client1 = torch.load('./data/CIKM22Competition/1/train.pt')
  # Check the first sample
  print(train_data_client1[0])
  # Check the label of the first sample
  print(train_data_client1[0].y)
  # Check the index of the first sample as ${sample_id}
  print(train_data_client1[0].data_index)

Step 4. Execute baselines on the contest data

Within FederatedScope, we build in two baselines for the contest data, “isolated training” and “FedAvg”. Suppose you have successfully built the running environment, and downloaded the contest data

Run the following command to execute the isolated training

python federatedscope/main.py --cfg federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup.yaml --client_cfg federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup_per_client.yaml

Run the following command to execute the FedAvg solution

python federatedscope/main.py --cfg federatedscope/gfl/baseline/fedavg_gin_minibatch_on_cikmcup.yaml --client_cfg federatedscope/gfl/baseline/fedavg_gin_minibatch_on_cikmcup_per_client.yaml

where the argument --cfg xxxx.yaml specifies the global configuration, and --client_cfg xxx.yaml specifies the client-wise hyper parameters.

Step 5. Save and submit the prediction results

Submission format

As stated in the introduction of CIKM 2022 AnalytiCup Competition, participants are required to submit the prediction results for all clients within one csv file. Within the file, each line records one prediction and is identified by ${client_id} and ${sample_id}. The ${client_id} counts from 1 and ${sample_id} should be consistent with the contest data (You can access it by the attribute data_index).
The classification and multi-dimensional regression tasks follow different formats as follows:
- For classification tasks, each line follows (${category_id} counts from 0)
```
${client_id},${sample_id},${category_id}
```
- For N-dimensional regression task, each line follows
```
${client_id},${sample_id},${prediction_1st_dimension},…,${prediction_N-th_dimension}
```

Saving prediction results

By FederatedScope
- The “cikm22competition” branch in FederatedScope supports to save prediction results at the end of training. You can refer to code in federatedscope/gfl/trainer/graphtrainer.py and federatedscope/core/trainers/torch_trainer.py.
- The prediction results will be saved in a csv file named prediction.csv. For the convenience of users to conduct multiple experiments (e.g., for HPO), prediction.csv of each experimental run with a spcific configuration will be placed in this experiment’s output directory (specified by outdir), which will be automatically appended with a suffix of timestamp if the specified directory has been there.
- The training log will report the directory of the prediction results. Taking FedAvg as an example, at the end of training FederatedScope will report the path of prediction results as follows:
- Then you can refer to the directory for prediction results.

By Yourself

Also, you can save the prediction results by yourself. Within a test routine in FederatedScope, the output of the model are all saved in the context as follows, and you can access it by ctx.test_y_prob

    def _hook_on_fit_start_init(self, ctx):
        ...
        setattr(ctx, "{}_y_true".format(ctx.cur_data_split), [])
        setattr(ctx, "{}_y_prob".format(ctx.cur_data_split), [])
  
    ...
      
    def _hook_on_batch_forward(self, ctx):
        batch = ctx.data_batch.to(ctx.device)
        pred = ctx.model(batch)
        # TODO: deal with the type of data within the dataloader or dataset
        if 'regression' in ctx.cfg.model.task.lower():
            label = batch.y
        else:
            label = batch.y.squeeze(-1).long()
        if len(label.size()) == 0:
            label = label.unsqueeze(0)
        ctx.loss_batch = ctx.criterion(pred, label)
  
        ctx.batch_size = len(label)
        ctx.y_true = label
        ctx.y_prob = pred
  
        # record the index of the ${MODE} samples
        if hasattr(ctx.data_batch, 'data_index'):
            setattr(
                ctx,
                f'{ctx.cur_data_split}_y_inds',
                ctx.get(f'{ctx.cur_data_split}_y_inds') + ctx.data_batch.data_index.detach().cpu().numpy().tolist()
            )
      
    ...
      
    def _hook_on_batch_end(self, ctx):
        ...
        # cache label for evaluate
        ctx.get("{}_y_true".format(ctx.cur_data_split)).append(
            ctx.y_true.detach().cpu().numpy())
  
        ctx.get("{}_y_prob".format(ctx.cur_data_split)).append(
            ctx.y_prob.detach().cpu().numpy())
      
    ...
      
    def _hook_on_fit_end(self, ctx):
        """Evaluate metrics.
  
        """
        setattr(
            ctx, "{}_y_true".format(ctx.cur_data_split),
            np.concatenate(ctx.get("{}_y_true".format(ctx.cur_data_split))))
        setattr(
            ctx, "{}_y_prob".format(ctx.cur_data_split),
            np.concatenate(ctx.get("{}_y_prob".format(ctx.cur_data_split))))
        ...

Submit prediction results

Finally, you can submit your prediction results and get your score in Tianchi:

Get the evaluation feedback In Tianchi, the submitted prediction results will be evaluated by the metric of “average improve ratio”, which is calculate as:

\[\text{averaged improvement ratio}=\frac{1}{n}\sum_{i=1}^{n}(\frac{b_i-m_i}{b_i}\times 100\%),\]

where $n$ is the total number of clients; when client $i$ owns the classification task, $m_i$ and $b_i$ are the error rate of the developed method and “isolated training” baseline, respectively; when client $i$ has a regression task, $m_i$ and $b_i$ correspond to their mean squared error (MSE).

To ensure a fair competition, we will use the following $b_i$for the 13 clients to calculate the averaged improvement ratio.

Client ID	Task type	Metric	$b_i$
1	cls	Error rate	0.263789
2	cls	Error rate	0.289617
3	cls	Error rate	0.355404
4	cls	Error rate	0.176471
5	cls	Error rate	0.396825
6	cls	Error rate	0.261580
7	cls	Error rate	0.302378
8	cls	Error rate	0.211538
9	reg	MSE	0.059199
10	reg	MSE	0.007083
11	reg	MSE	0.734011
12	reg	MSE	1.361326
13	reg	MSE	0.004389

After submit the prediction results, you can check the evaluation results as follows, and the Leaderboard will update at UTC time 00:00, 04:00, 08:00, 12:00 and 16:00.

Advanced Guidance for Participants

About FederatedScope

FederatedScope is a well-modularized federated learning platform. Participants are welcome and encouraged to develop their own federated solutions with FederatedScope. The following documents will help you better understand the organization of FederatedScope and how it works

Develop your own solution

You can develop your own algorithm based on FederatedScope as follows:

If you want to improve the performance of the baseline, you can adjust the global hyper parameters (specified by --cfg) or adjust the hyper parameters for each client (specified by --client_cfg). Taking FedAvg as an example:

The global configuration federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup.yaml specifies the global settings, such as the total training round (federate.total_round_num), dataset (data.type and data.root) and evaluation metric (eval.metrics)

use_gpu: True
device: 0
early_stop:
  patience: 20
  improve_indicator_mode: mean
  the_smaller_the_better: False
federate:
  mode: 'standalone'
  make_global_eval: False
  total_round_num: 100
  share_local_model: False
data:
  root: data/
  type: cikmcup
model:
  type: gin
  hidden: 64
personalization:
  local_param: ['encoder_atom', 'encoder', 'clf']
train:
  batch_or_epoch: epoch
  local_update_steps: 1
  optimizer:
    weight_decay: 0.0005
    type: SGD
trainer:
  type: graphminibatch_trainer
eval:
  freq: 5
  metrics: ['imp_ratio']
  report: ['avg']
  best_res_update_round_wise_key: val_imp_ratio
  count_flops: False
  base: 0.

The client configuration federatedscope/gfl/baseline/fedavg_gin_minibatch_on_cikmcup_per_client.yaml allows you to set different hyper parameters for different clients by replacing the global configuration, and we use the argument client_${client_id} to specify the client. Here we set the loss function and the model according to the distributed tasks, and use eval.base to provide the basic performance for the metric imp_ratio.

client_1:
  model:
    out_channels: 2
    task: graphClassification
  criterion:
    type: CrossEntropyLoss
  train:
    optimizer:
      lr: 0.1
  eval:
    base: 0.263789
client_2:
  model:
    out_channels: 2
    task: graphClassification
  criterion:
    type: CrossEntropyLoss
  train:
    optimizer:
      lr: 0.01
  eval:
    base: 0.289617
client_3:
  model:
    out_channels: 2
    task: graphClassification
  criterion:
    type: CrossEntropyLoss
  train:
    optimizer:
      lr: 0.001
  eval:
    base: 0.355404
client_4:
  model:
    out_channels: 2
    task: graphClassification
  criterion:
    type: CrossEntropyLoss
  train:
    optimizer:
      lr: 0.01
  eval:
    base: 0.176471
client_5:
  model:
    out_channels: 2
    task: graphClassification
  criterion:
    type: CrossEntropyLoss
  train:
    optimizer:
      lr: 0.0001
  eval:
    base: 0.396825
client_6:
  model:
    out_channels: 2
    task: graphClassification
  criterion:
    type: CrossEntropyLoss
  train:
    optimizer:
      lr: 0.0005
  eval:
    base: 0.261580
client_7:
  model:
    out_channels: 2
    task: graphClassification
  criterion:
    type: CrossEntropyLoss
  train:
    optimizer:
      lr: 0.01
  eval:
    base: 0.302378
client_8:
  model:
    out_channels: 2
    task: graphClassification
  criterion:
    type: CrossEntropyLoss
  train:
    optimizer:
      lr: 0.05
  eval:
    base: 0.211538
client_9:
  model:
    out_channels: 1
    task: graphRegression
  criterion:
    type: MSELoss
  train:
    optimizer:
      lr: 0.1
  eval:
    base: 0.059199
client_10:
  model:
    out_channels: 10
    task: graphRegression
  criterion:
    type: MSELoss
  train:
    optimizer:
      lr: 0.05
  grad:
    grad_clip: 1.0
  eval:
    base: 0.007083
client_11:
  model:
    out_channels: 1
    task: graphRegression
  criterion:
    type: MSELoss
  train:
    optimizer:
      lr: 0.05
  eval:
    base: 0.734011
client_12:
  model:
    out_channels: 1
    task: graphRegression
  criterion:
    type: MSELoss
  train:
    optimizer:
      lr: 0.01
  eval:
    base: 1.361326
client_13:
  model:
    out_channels: 12
    task: graphRegression
  criterion:
    type: MSELoss
  train:
    optimizer:
      lr: 0.05
  grad:
    grad_clip: 1.0
  eval:
    base: 0.004389

If you want to modify the clients, it is suggested to create a new trainer that inherits the basic trainer as follows, and you can replace the original hook functions with yours. For example, you can implement your trainer with new _hook_on_batch_forward function, and you need to set trainer.type as “new_trainer” to use it.

from federatedscope.register import register_trainer
from federatedscope.core.trainers import GeneralTorchTrainer
  
  
class NewTrainer(GeneralTorchTrainer):
    def _hook_on_batch_forward(self, ctx):
        pass
          
  
def call_new_trainer(trainer_type):
    if trainer_type == 'new_trainer':
        trainer_builder = NewTrainer
        return trainer_builder
  
  
register_trainer('new_trainer', call_new_trainer)

If you want to modify the server for federated learning, it is suggested to create a new aggregator and register it to FederatedScope.
- First, you can inherit the following abstract class Aggregator in federatedscope/core/aggregator.py and implement your own aggregate function:
```
class Aggregator(ABC):
    def __init__(self):
        pass
    
    @abstractmethod
    def aggregate(self, agg_info):
        pass
```
- Then you can register your aggregator within the federate/core/auxiliaries/aggregator_builder.py.

For More Help

We encourage participants to join our Q&A slack channel or join our DingGroup by scanning the QR code: