FS Data
FS Data Module
In the tutorial, you will learn how to use FS Data Module and the code structure is shown below:
federatedscope/core
├── auxiliaries
│ ├── data_builder
│ ├── dataloader_builder
│ ├── ...
├── data
│ ├── base_data
│ | ├── ClientData
│ | ├── StandaloneDataDict
│ ├── base_translator
│ ├── ...
We will discuss all the concepts of our FS data module from top to bottom.
The main entrance of FS data
federatedscope.core.auxiliaries.data_builder.get_data
is the entrance functions to access built-in FS data. With the config
appropriately set, you can easily get FS Data.
fs_data, modified_cfg = get_data(config=init_cfg.clone())
get_data
consists of three steps:
- Load Dataset
federatedscope.core.data.utils.load_dataset
- Load local file to torch dataset
- Translate data
federatedscope.core.data.BaseDataTranslator
- Dataset -> ML split -> FL split -> FS Dataloader
- Convert mode
federatedscope.core.data.utils.convert_data_mode
- To adapt simulation mode and distributed mode
Data Translator
In FederatedScope, the input to Runner
is ClientData
(in distributed mode) or StandaloneDataDict
(in standalone mode), which are both subclasses of python dict
. So FederatedScope provides Data Translator to help you convert torch.utils.data.Dataset
to our data format. Data Translator contains four steps, two of which are optional.
Dataset -> (ML split) -> (FL split) -> FS Dataloader
- ML split(
split_train_val_test
):- Build train/val/test data split
- FL split(
split_to_client
) (please seesplitter
for details):- Split global data into local data for each client.
ClientData
In FederatedScope, each client will obtain a ClientData
, which has the following attributes:
-
A subclass of
dict
withtrain
,val
andtest
. -
Convert
dataset
/list
/array
to DataLoader.Example:
-
# Instantiate client_data for each Client client_data = ClientData(PyGDataLoader, cfg, train=train_data, val=None, test=test_data) # other_cfg with different batch size client_data.setup(other_cfg) print(client_data) >> {'train': PyGDataLoader(train_data), 'test': PyGDataLoader(test_data)}
-
StandaloneDataDict
In standalone mode, the input to Runner
is StandaloneDataDict
, which is the return value of calling Data Translator. StandaloneDataDict
has the following attributes:
- A subclass of
dict
with client_id as keys:{1: ClientData, 2: ClientData, ...}
- Responsible for some pre-process for FS data:
- Global evaluation: merge test data
- Global training: merge all data into one client
- Injected data attacks
- …