- class diagnnose.probe.data_loader.DataLoader(corpus: Corpus, activations_dir: Optional[str] = None, model: Optional[LanguageModel] = None, activation_names: Optional[List[Tuple[int, str]]] = None, train_selection_func: Optional[Callable[[int, Example], bool]] = None, test_activations_dir: Optional[str] = None, test_corpus: Optional[Corpus] = None, test_selection_func: Optional[Callable[[int, Example], bool]] = None, control_task: Optional[Callable[[int, Example], Union[str, int]]] = None, train_test_ratio: Optional[float] = None, create_new_activations: bool = False)¶
Reads in pickled activations that have been extracted, and creates a train/test split of activations and labels.
Train/test split can be created in multiple ways: 1. Using the test activations from a different corpus 2. Using the activations from the same corpus for both, but the train/test split is defined on 2 separate selection_funcs. 3. Based on a random 90/10 split.
corpus (Corpus) – Corpus containing the labels for each sentence.
activations_dir (str, optional) – Directory containing the extracted activations. If not provided, new activations will be extracted. If
create_new_activationsis set to
True, the newly extracted activations will be stored in this directory.
test_activations_dir (str, optional) – Directory containing the extracted test activations. If not provided the train activation set will be split and partially used as test set.
test_corpus (Corpus, optional) – Corpus containing the test labels for each sentence. Must be provided if test_activations_dir is provided.
train_selection_func (SelectFunc, optional) – Selection function that determines whether a corpus item should be taken into account for training. If not provided all extracted activations will be used and split into a random train/test split.
test_selection_func (SelectFunc, optional) – Selection function that determines whether a corpus item should be taken into account for testing. If not provided all extracted activations will be used and split into a random train/test split.
control_task (ControlTask, optional) – Control task function of Hewitt et al. (2019), mapping a corpus item to a random label.
train_test_ratio (float, optional) – Ratio of the train/test split. If separate test activations are provided this split won’t be used. Defaults to None, but must be provided if no test_selection_func is passed.
create_new_activations (bool, optional) – Toggle to create new activations based on corpus and model. Overwrites existing activations that might be present in
- load(activation_name: Tuple[int, str]) DataDict ¶
Creates train/test data split of activations
activation_name (ActivationName) – (layer, name) tuple indicating the activations to be read in
data_dict – Dictionary containing train and test activations, and their corresponding labels and, optionally, control labels.
- Return type
- class diagnnose.probe.dc_trainer.DCTrainer(data_loader: DataLoader, save_dir: str, lr: float = 0.01, max_epochs: int = 10, rank: Optional[int] = None, lambda1: float = 0.0, verbose: int = 0)¶
Trains Diagnostic Classifiers (DC) on extracted activation data.
For each activation that is part of the provided activation_names argument a different classifier will be trained.
data_loader (DataLoader) –
DataLoaderthat contains the activations and labels on which the DCs will be trained. A
DataLoadercan contain activations for multiple layers and gates of a model, for which separate DCs will be trained and evaluated.
save_dir (str) – Directory to which trained models will be saved.
lr (float, optional) – Learning rate of the linear classifier that is used during training. Defaults to 0.01.
max_epochs (int, optional) – Maximum number of training epochs used for cross-validation. Defaults to 10.
rank (int, optional) – Matrix rank of the linear classifier. Defaults to the full rank if not provided.
lambda1 (float, optional) – Coefficient for L1 regularization that can be increased to induce sparsity in the diagnostic classifier. Defaults to 0., indicating no L1 regularization.
verbose (int, optional) – Set to any positive number for verbosity. Defaults to 0.
Current classifier that is being trained.
- train() Dict[Tuple[int, str], Any] ¶
Trains DCs on multiple activation names.
- class diagnnose.probe.logreg.L1NeuralNetClassifier(*args: Any, **kwargs: Any)¶
- get_loss(y_pred, y_true, X=None, training=False)¶
- class diagnnose.probe.logreg.LogRegModule(ninp: int, nout: int, rank: Optional[int] = None)¶
- forward(inp: Tensor, create_softmax=True)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶