Data Loader

class diagnnose.probe.data_loader.DataLoader(corpus: Corpus, activations_dir: Optional[str] = None, model: Optional[LanguageModel] = None, activation_names: Optional[List[Tuple[int, str]]] = None, train_selection_func: Optional[Callable[[int, Example], bool]] = None, test_activations_dir: Optional[str] = None, test_corpus: Optional[Corpus] = None, test_selection_func: Optional[Callable[[int, Example], bool]] = None, control_task: Optional[Callable[[int, Example], Union[str, int]]] = None, train_test_ratio: Optional[float] = None, create_new_activations: bool = False)[source]

Bases: object

Reads in pickled activations that have been extracted, and creates a train/test split of activations and labels.

Train/test split can be created in multiple ways: 1. Using the test activations from a different corpus 2. Using the activations from the same corpus for both, but the train/test split is defined on 2 separate selection_funcs. 3. Based on a random 90/10 split.

  • corpus (Corpus) – Corpus containing the labels for each sentence.

  • activations_dir (str, optional) – Directory containing the extracted activations. If not provided, new activations will be extracted. If create_new_activations is set to True, the newly extracted activations will be stored in this directory.

  • test_activations_dir (str, optional) – Directory containing the extracted test activations. If not provided the train activation set will be split and partially used as test set.

  • test_corpus (Corpus, optional) – Corpus containing the test labels for each sentence. Must be provided if test_activations_dir is provided.

  • train_selection_func (SelectFunc, optional) – Selection function that determines whether a corpus item should be taken into account for training. If not provided all extracted activations will be used and split into a random train/test split.

  • test_selection_func (SelectFunc, optional) – Selection function that determines whether a corpus item should be taken into account for testing. If not provided all extracted activations will be used and split into a random train/test split.

  • control_task (ControlTask, optional) – Control task function of Hewitt et al. (2019), mapping a corpus item to a random label.

  • train_test_ratio (float, optional) – Ratio of the train/test split. If separate test activations are provided this split won’t be used. Defaults to None, but must be provided if no test_selection_func is passed.

  • create_new_activations (bool, optional) – Toggle to create new activations based on corpus and model. Overwrites existing activations that might be present in activations_dir.

load(activation_name: Tuple[int, str]) DataDict[source]

Creates train/test data split of activations


activation_name (ActivationName) – (layer, name) tuple indicating the activations to be read in


data_dict – Dictionary containing train and test activations, and their corresponding labels and, optionally, control labels.

Return type


DC Trainer

class diagnnose.probe.dc_trainer.DCTrainer(data_loader: DataLoader, save_dir: str, lr: float = 0.01, max_epochs: int = 10, rank: Optional[int] = None, lambda1: float = 0.0, verbose: int = 0)[source]

Bases: object

Trains Diagnostic Classifiers (DC) on extracted activation data.

For each activation that is part of the provided activation_names argument a different classifier will be trained.

  • data_loader (DataLoader) – DataLoader that contains the activations and labels on which the DCs will be trained. A DataLoader can contain activations for multiple layers and gates of a model, for which separate DCs will be trained and evaluated.

  • save_dir (str) – Directory to which trained models will be saved.

  • lr (float, optional) – Learning rate of the linear classifier that is used during training. Defaults to 0.01.

  • max_epochs (int, optional) – Maximum number of training epochs used for cross-validation. Defaults to 10.

  • rank (int, optional) – Matrix rank of the linear classifier. Defaults to the full rank if not provided.

  • lambda1 (float, optional) – Coefficient for L1 regularization that can be increased to induce sparsity in the diagnostic classifier. Defaults to 0., indicating no L1 regularization.

  • verbose (int, optional) – Set to any positive number for verbosity. Defaults to 0.


Current classifier that is being trained.



train() Dict[Tuple[int, str], Any][source]

Trains DCs on multiple activation names.


class diagnnose.probe.logreg.L1NeuralNetClassifier(*args: Any, **kwargs: Any)[source]

Bases: NeuralNetClassifier

get_loss(y_pred, y_true, X=None, training=False)[source]
class diagnnose.probe.logreg.LogRegModule(ninp: int, nout: int, rank: Optional[int] = None)[source]

Bases: Module

forward(inp: Tensor, create_softmax=True)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.


Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool