Core Module References

federatedscope.core.configs

class federatedscope.core.configs.CN(init_dict=None, key_list=None, new_allowed=False)[source]

An extended configuration system based on [yacs]( https://github.com/rbgirshick/yacs). The two-level tree structure consists of several internal dict-like containers to allow simple key-value access and management.

assert_cfg()[source]

check the validness of the configuration instance

Returns

clean_unused_sub_cfgs()[source]

Clean the un-used secondary-level CfgNode, whose .use attribute is True

Returns

freeze(inform=True, save=True)[source]
  1. make the cfg attributes immutable;

2) save the frozen cfg_check_funcs into “self.outdir/config.yaml” for better reproducibility; 3) if self.wandb.use=True, update the frozen config

Returns

merge_from_file(cfg_filename)[source]

load configs from a yaml file, another cfg instance or a list stores the keys and values.

Parameters

(string) (cfg_filename) –

Returns

merge_from_list(cfg_list)[source]

load configs from a list stores the keys and values. modified merge_from_list in yacs.config.py to allow adding new keys if is_new_allowed() returns True

Parameters

(list) (cfg_list) –

Returns

merge_from_list_yacs(cfg_list)[source]

Merge config (keys, values) in a list (e.g., from command line) into this CfgNode. For example, cfg_list = [‘FOO.BAR’, 0.5].

merge_from_other_cfg(cfg_other)[source]

load configs from another cfg instance

Parameters

(CN) (cfg_other) –

Returns

federatedscope.core.configs.init_global_cfg(cfg)[source]

This function sets the default config value. 1) Note that for an experiment, only part of the arguments will be used The remaining unused arguments won’t affect anything. So feel free to register any argument in graphgym.contrib.config 2) We support at most two levels of configs, e.g., cfg.dataset.name

Returns

configuration use by the experiment.

federatedscope.core.monitors

class federatedscope.core.monitors.EarlyStopper(patience=5, delta=0, improve_indicator_mode='best', the_smaller_the_better=True)[source]

Track the history of metric (e.g., validation loss), check whether should stop (training) process if the metric doesn’t improve after a given patience.

class federatedscope.core.monitors.Monitor(cfg, monitored_object=None)[source]

Provide the monitoring functionalities such as formatting the evaluation results into diverse metrics. Besides the prediction related performance, the monitor also can track efficiency related metrics for a worker

calc_blocal_dissim(last_model, local_updated_models)[source]
Parameters
  • last_model (dict) – the state of last round.

  • local_updated_models (list) – each element is ooxx.

Returns

the measurements proposed in “Tian Li, Anit Kumar Sahu, Manzil Zaheer, and et al. Federated Optimization in Heterogeneous Networks”.

Return type

b_local_dissimilarity (dict)

format_eval_res(results, rnd, role=- 1, forms=None, return_raw=False)[source]

format the evaluation results from trainer.ctx.eval_results

Parameters
  • results (dict) – a dict to store the evaluation results {metric:

  • value}

  • rnd (int|string) – FL round

  • role (int|string) – the output role

  • forms (list) – format type

  • return_raw (bool) – return either raw results, or other results

Returns

a formatted results with different forms and roles, e.g., { ‘Role’: ‘Server #’, ‘Round’: 200, ‘Results_weighted_avg’: {

’test_avg_loss’: 0.58, ‘test_acc’: 0.67, ‘test_correct’: 3356, ‘test_loss’: 2892, ‘test_total’: 5000 },

’Results_avg’: {

‘test_avg_loss’: 0.57, ‘test_acc’: 0.67, ‘test_correct’: 3356, ‘test_loss’: 2892, ‘test_total’: 5000 },

’Results_fairness’: {

‘test_correct’: 3356, ‘test_total’: 5000, ‘test_avg_loss_std’: 0.04, ‘test_avg_loss_bottom_decile’: 0.52, ‘test_avg_loss_top_decile’: 0.64, ‘test_acc_std’: 0.06, ‘test_acc_bottom_decile’: 0.60, ‘test_acc_top_decile’: 0.75, ‘test_loss_std’: 214.17, ‘test_loss_bottom_decile’: 2644.64, ‘test_loss_top_decile’: 3241.23 },

}

Return type

round_formatted_results (dict)

merge_system_metrics_simulation_mode(file_io=True, from_global_monitors=False)[source]

average the system metrics recorded in “system_metrics.json” by all workers

Returns

track_avg_flops(flops, sample_num=1)[source]

update the average flops for forwarding each data sample, for most models and tasks, the averaging is not needed as the input shape is fixed

Parameters
  • flops – flops/

  • sample_num

Returns

track_model_size(models)[source]

calculate the total model size given the models hold by the worker/trainer

Parameters

models – torch.nn.Module or list of torch.nn.Module

Returns

update_best_result(best_results, new_results, results_type, round_wise_update_key='val_loss')[source]

update best evaluation results. by default, the update is based on validation loss with `round_wise_update_key=”val_loss” `

federatedscope.core.fed_runner

class federatedscope.core.fed_runner.FedRunner(data, server_class=<class 'federatedscope.core.worker.server.Server'>, client_class=<class 'federatedscope.core.worker.client.Client'>, config=None, client_config=None)[source]

This class is used to construct an FL course, which includes _set_up and run.

Parameters
  • data – The data used in the FL courses, which are formatted as {

  • 'ID' – data} for standalone mode. More details can be found in

  • . (federatedscope.core.auxiliaries.data_builder) –

  • server_class – The server class is used for instantiating a (

  • server. (customized)) –

  • client_class – The client class is used for instantiating a (

  • client. (customized)) –

  • config – The configurations of the FL course.

  • client_config – The clients’ configurations.

run()[source]

To run an FL course, which is called after server/client has been set up. For the standalone mode, a shared message queue will be set up to simulate receiving message.

federatedscope.core.worker

class federatedscope.core.worker.Client(ID=- 1, server_id=None, state=- 1, config=None, data=None, model=None, device='cpu', strategy=None, is_unseen_client=False, *args, **kwargs)[source]

The Client class, which describes the behaviors of client in an FL course. The behaviors are described by the handling functions (named as callback_funcs_for_xxx)

Parameters
  • ID – The unique ID of the client, which is assigned by the server

  • course (when joining the FL) –

  • server_id – (Default) 0

  • state – The training round

  • config – The configuration

  • data – The data owned by the client

  • model – The model maintained locally

  • device – The device to run local training and evaluation

  • strategy – redundant attribute

callback_funcs_for_address(message: federatedscope.core.message.Message)[source]

The handling function for receiving other clients’ IP addresses, which is used for constructing a complex topology

Parameters

message – The received message

callback_funcs_for_assign_id(message: federatedscope.core.message.Message)[source]

The handling function for receiving the client_ID assigned by the server (during the joining process), which is used in the distributed mode.

Parameters

message – The received message

callback_funcs_for_converged(message: federatedscope.core.message.Message)[source]

The handling function for receiving the signal that the FL course converged

Parameters

message – The received message

callback_funcs_for_evaluate(message: federatedscope.core.message.Message)[source]

The handling function for receiving the request of evaluating

Parameters

message – The received message

callback_funcs_for_finish(message: federatedscope.core.message.Message)[source]

The handling function for receiving the signal of finishing the FL course.

Parameters

message – The received message

callback_funcs_for_join_in_info(message: federatedscope.core.message.Message)[source]

The handling function for receiving the request of join in information (such as batch_size, num_of_samples) during the joining process.

Parameters

message – The received message

callback_funcs_for_model_para(message: federatedscope.core.message.Message)[source]

The handling function for receiving model parameters, which triggers the local training process. This handling function is widely used in various FL courses.

Parameters
  • message – The received message, which includes sender, receiver,

  • state – More detail can be found in federatedscope.core.message

  • content. (and) – More detail can be found in federatedscope.core.message

join_in()[source]

To send ‘join_in’ message to the server for joining in the FL course.

register_handlers(msg_type, callback_func)[source]

To bind a message type with a handling function.

Parameters
  • msg_type (str) – The defined message type

  • callback_func – The handling functions to handle the received

  • message

run()[source]

To listen to the message and handle them accordingly (used for distributed mode)

class federatedscope.core.worker.Server(ID=- 1, state=0, config=None, data=None, model=None, client_num=5, total_round_num=10, device='cpu', strategy=None, unseen_clients_id=None, **kwargs)[source]

The Server class, which describes the behaviors of server in an FL course. The behaviors are described by the handled functions (named as callback_funcs_for_xxx).

Parameters
  • ID – The unique ID of the server, which is set to 0 by default

  • state – The training round

  • config – the configuration

  • data – The data owned by the server (for global evaluation)

  • model – The model used for aggregation

  • client_num – The (expected) client num to start the FL course

  • total_round_num – The total number of the training round

  • device – The device to run local training and evaluation

  • strategy – redundant attribute

broadcast_client_address()[source]

To broadcast the communication addresses of clients (used for additive secret sharing)

broadcast_model_para(msg_type='model_para', sample_client_num=- 1, filter_unseen_clients=True)[source]

To broadcast the message to all clients or sampled clients

Parameters
  • msg_type – ‘model_para’ or other user defined msg_type

  • sample_client_num – the number of sampled clients in the broadcast behavior. And sample_client_num = -1 denotes to broadcast to all the clients.

  • filter_unseen_clients – whether filter out the unseen clients that do not contribute to FL process by training on their local data and uploading their local model update. The splitting is useful to check participation generalization gap in [ICLR’22, What Do We Mean by Generalization in Federated Learning?] You may want to set it to be False when in evaluation stage

callback_funcs_for_join_in(message: federatedscope.core.message.Message)[source]

The handling function for receiving the join in information. The server might request for some information (such as num_of_samples) if necessary, assign IDs for the servers. If all the clients have joined in, the training process will be triggered.

Parameters

message – The received message

callback_funcs_for_metrics(message: federatedscope.core.message.Message)[source]

The handling function for receiving the evaluation results, which triggers check_and_move_on

(perform aggregation when enough feedback has been received).

Parameters

message – The received message

callback_funcs_model_para(message: federatedscope.core.message.Message)[source]
The handling function for receiving model parameters, which triggers

check_and_move_on (perform aggregation when enough feedback has been received).

This handling function is widely used in various FL courses.

Parameters

message – The received message, which includes sender, receiver, state, and content. More detail can be found in federatedscope.core.message

check_and_move_on(check_eval_result=False, min_received_num=None)[source]

To check the message_buffer. When enough messages are receiving, some events (such as perform aggregation, evaluation, and move to the next training round) would be triggered.

Parameters
  • check_eval_result (bool) – If True, check the message buffer for

  • otherwise. (evaluation; and check the message buffer for training) –

check_and_save()[source]

To save the results and save model after each evaluation.

check_buffer(cur_round, min_received_num, check_eval_result=False)[source]

To check the message buffer

Arguments: cur_round (int): The current round number min_received_num (int): The minimal number of the receiving messages check_eval_result (bool): To check training results for evaluation results :returns: Whether enough messages have been received or not :rtype: bool

check_client_join_in()[source]

To check whether all the clients have joined in the FL course.

eval()[source]

To conduct evaluation. When cfg.federate.make_global_eval=True, a global evaluation is conducted by the server.

merge_eval_results_from_all_clients()[source]

Merge evaluation results from all clients, update best, log the merged results and save them into eval_results.log

Returns

the formatted merged results

register_handlers(msg_type, callback_func)[source]

To bind a message type with a handling function.

Parameters
  • msg_type (str) – The defined message type

  • callback_func – The handling functions to handle the received

  • message

run()[source]

To start the FL course, listen and handle messages (for distributed mode).

save_best_results()[source]

To Save the best evaluation results.

save_client_eval_results()[source]

save the evaluation results of each client when the fl course early stopped or terminated

Returns

terminate(msg_type='finish')[source]

To terminate the FL course

trigger_for_start()[source]

To start the FL course when the expected number of clients have joined

trigger_for_time_up(check_timestamp=None)[source]

The handler for time up: modify the currency timestamp and check the trigger condition

class federatedscope.core.worker.Worker(ID=- 1, state=0, config=None, model=None, strategy=None)[source]

The base worker class.

federatedscope.core.trainers

class federatedscope.core.trainers.Context(model, cfg, data=None, device=None, init_dict=None, init_attr=True)[source]

Record and pass variables among different hook functions :param model: training model :param cfg: config :param data: a dict contains train/val/test dataset or dataloader :type data: dict :param device: running device :param init_dict: a dict used to initialize the instance of Context :type init_dict: dict :param init_attr: if set up the static variables :type init_attr: bool

Note

  • The variables within an instance of class Context

can be set/get as an attribute. ` ctx.${NAME_VARIABLE} = ${VALUE_VARIABLE} ` where ${NAME_VARIABLE} and ${VALUE_VARIABLE} is the name and value of the variable.

  • To achieve automatically lifecycle management, you can

wrap the variable with CtxVar and a lifecycle parameter as follows ` ctx.${NAME_VARIABLE} = CtxVar(${VALUE_VARIABLE}, ${LFECYCLE}) ` The parameter ${LFECYCLE} can be chosen from LIFECYCLE.BATCH, LIFECYCLE.EPOCH and LIFECYCLE.ROUTINE. Then the variable ctx.${NAME_VARIABLE} will be deleted at the end of the corresponding stage

  • LIFECYCLE.BATCH: the variables will

be deleted after running a batch - LIFECYCLE.EPOCH: the variables will be deleted after running a epoch - LIFECYCLE.ROUTINE: the variables will be deleted after running a routine

More details please refer to our [tutorial](https://federatedscope.io/docs/trainer/).

  • Context also maintains some special variables across

different routines, like
  • cfg

  • model

  • data

  • device

  • ${split}_data: the dataset object of data split

named ${split} - ${split}_loader: the data loader object of data split named ${split} - num_${split}_data: the number of examples within the dataset named ${split}

class federatedscope.core.trainers.FedEMTrainer(model_nums, models_interact_mode='sequential', model=None, data=None, device=None, config=None, base_trainer: Optional[Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer]] = None)[source]

The FedEM implementation, “Federated Multi-Task Learning under a Mixture of Distributions (NeurIPS 2021)” based on the Algorithm 1 in their paper and official codes: https://github.com/omarfoq/FedEM

register_multiple_model_hooks()[source]

customized multiple_model_hooks, which is called in the __init__ of GeneralMultiModelTrainer

class federatedscope.core.trainers.GeneralMultiModelTrainer(model_nums, models_interact_mode='sequential', model=None, data=None, device=None, config=None, base_trainer: Optional[Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer]] = None)[source]
get_model_para()[source]

return multiple model parameters

Returns

init_multiple_models()[source]

init multiple models and optimizers: the default implementation is copy init manner; ========================= Extension ============================= users can override this function according to their own requirements

register_multiple_model_hooks()[source]

By default, all internal models adopt the same hook_set. ========================= Extension ============================= Users can override this function to register customized hooks for different internal models.

Note

for sequential mode, users can append interact_hook on begin/end triggers such as

” -> (on_fit_end, _interact_to_other_models) -> “

for parallel mode, users can append interact_hook on any trigger they want such as

” -> (on_xxx_point, _interact_to_other_models) -> “

self.ctx, we must tell the running hooks which data_loader to call and which num_samples to count

update(model_parameters, strict=False)[source]
Parameters
  • model_parameters (list[dict]) – Multiple pyTorch Module object’s

  • state_dict.

class federatedscope.core.trainers.GeneralTorchTrainer(model, data, device, config, only_for_eval=False, monitor=None)[source]
discharge_model()[source]

Discharge the model from GPU device

get_model_para()[source]
Returns

model_parameters (dict): {model_name: model_val}

parse_data(data)[source]

Populate “${split}_data”, “${split}_loader” and “num_${ split}_data” for different data splits

update(model_parameters, strict=False)[source]

Called by the FL client to update the model parameters

Parameters

model_parameters (dict) – PyTorch Module object’s state_dict.

class federatedscope.core.trainers.Trainer(model, data, device, config, only_for_eval=False, monitor=None)[source]

Register, organize and run the train/test/val procedures

get_model_para()[source]
Returns

model_parameters (dict): {model_name: model_val}

print_trainer_meta_info()[source]

print some meta info for code-users, e.g., model type; the para names will be filtered out, etc.,

update(model_parameters, strict=False)[source]

Called by the FL client to update the model parameters

Parameters
  • model_parameters (dict) – {model_name: model_val}

  • strict (bool) – ensure the k-v paris are strictly same

federatedscope.core.trainers.wrap_DittoTrainer(base_trainer: Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer]) Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer][source]

Build a DittoTrainer with a plug-in manner, by registering new functions into specific BaseTrainer

The Ditto implementation, “Ditto: Fair and Robust Federated Learning Through Personalization. (ICML2021)” based on the Algorithm 2 in their paper and official codes: https://github.com/litian96/ditto

federatedscope.core.trainers.wrap_fedprox_trainer(base_trainer: Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer]) Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer][source]

Implementation of fedprox refer to Federated Optimization in Heterogeneous Networks [Tian Li, et al., 2020]

(https://proceedings.mlsys.org/paper/2020/ file/38af86134b65d0f10fe33d30dd76442e-Paper.pdf)

federatedscope.core.trainers.wrap_nbafl_server(server)[source]

Register noise injector for the server

federatedscope.core.trainers.wrap_nbafl_trainer(base_trainer: Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer]) Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer][source]

Implementation of NbAFL refer to Federated Learning with Differential Privacy: Algorithms and Performance Analysis [et al., 2020]

(https://ieeexplore.ieee.org/abstract/document/9069945/)

Arguments:

mu: the factor of the regularizer epsilon: the distinguishable bound w_clip: the threshold to clip weights

federatedscope.core.trainers.wrap_pFedMeTrainer(base_trainer: Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer]) Type[federatedscope.core.trainers.torch_trainer.GeneralTorchTrainer][source]

Build a pFedMeTrainer with a plug-in manner, by registering new functions into specific BaseTrainer

The pFedMe implementation, “Personalized Federated Learning with Moreau Envelopes (NeurIPS 2020)” is based on the Algorithm 1 in their paper and official codes: https://github.com/CharlieDinh/pFedMe