models¶

Import Model¶

diagnnose.models.import_model.import_model(*args, **kwargs) → LanguageModel[source]¶

Import a model from a json file.

Returns: model – A LanguageModel instance, based on the provided config_dict.
Return type: LanguageModel

Init States¶

Language Model¶

class diagnnose.models.language_model.LanguageModel(device: str = 'cpu')[source]¶

Bases: ABC, Module

abstract activation_names() → List[Tuple[int, str]][source]¶

Returns a list of all the model’s activation names.

Returns: activation_names – List of (layer, name) tuples.
Return type: ActivationNames

abstract create_inputs_embeds(input_ids: Tensor) → Tensor[source]¶

Transforms a sequence of input tokens to their embedding.

Parameters: input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.
Returns: inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.
Return type: Tensor

abstract forward(input_ids: Optional[Tensor] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[Tensor] = None, compute_out: bool = False, calc_causal_lm_probs: bool = False, only_return_top_embs: bool = False) → Union[Dict[Tuple[int, str], Tensor], Tensor][source]¶

Performs a single forward pass across all LM layers.

Parameters

input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len
inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid
input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.
calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.
only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.

Returns

activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid

Return type

ActivationDict

is_causal: bool = False¶

abstract nhid(activation_name: Tuple[int, str]) → int[source]¶: Returns number of hidden units for a (layer, name) tuple.

abstract property num_layers: int¶: Returns the number of layers in the LM.

sizes: Dict[Tuple[int, str], int] = {}¶

abstract property top_layer: int¶: Returns the index of the LM’s top layer.

Recurrent LM¶

class diagnnose.models.recurrent_lm.RecurrentLM(device: str = 'cpu')[source]¶

Bases: LanguageModel

Base class for RNN LM with intermediate activations.

This class contains all the base logic (including forward passes) for LSTM-type LMs, except for loading in the weights of a specific model.

activation_names(compute_out: bool = False) → List[Tuple[int, str]][source]¶

Returns a list of all the model’s activation names.

Parameters: compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to True.
Returns: activation_names – List of (layer, name) tuples.
Return type: ActivationNames

create_inputs_embeds(input_ids: Tensor) → Tensor[source]¶

Transforms a sequence of input tokens to their embedding.

Parameters: input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.
Returns: inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.
Return type: Tensor

decode(hidden_state: Tensor) → Tensor[source]¶

final_hidden(hidden: Dict[Tuple[int, str], Tensor]) → Tensor[source]¶

Returns the final hidden state.

Parameters: hidden (ActivationTensors) – Dictionary of extracted activations.
Returns: final_hidden – Tensor of the final hidden state.
Return type: Tensor

forget_offset: int = 0¶

forward(input_ids: Optional[Tensor] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[Tensor] = None, calc_causal_lm_probs: bool = False, compute_out: bool = False, only_return_top_embs: bool = False) → Union[Dict[Tuple[int, str], Tensor], Tensor][source]¶

Performs a single forward pass across all LM layers.

Parameters

input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len
inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid
input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.
calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.
only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.

Returns

activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid

Return type

ActivationDict

forward_cell(layer: int, input_: Tensor, prev_hx: Tensor, prev_cx: Tensor) → Dict[Tuple[int, str], Tensor][source]¶

Performs the forward step of 1 LSTM cell.

Parameters

layer (int) – Current RNN layer.
input (Tensor) – Current input embedding. In higher layers this is h^l-1_t. Size: batch_size x nhid
prev_hx (Tensor) – Previous hidden state. Size: batch_size x nhid
prev_cx (Tensor) – Previous cell state. Size: batch_size x nhid

Returns

all_activations – Dictionary mapping activation names to tensors of shape: batch_size x max_sen_len x nhid.

Return type

ActivationDict

forward_step(token_embeds: Tensor, prev_activations: Dict[Tuple[int, str], Tensor], compute_out: bool = False) → Dict[Tuple[int, str], Tensor][source]¶

Performs a forward pass of one step across all layers.

Parameters

token_embeds (Tensor) – Tensor of word embeddings at the current sentence position.
prev_activations (ActivationDict) – Dict mapping the activation names of the previous hidden and cell states to their corresponding Tensors.
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to True.

Returns

all_activations – Dictionary mapping activation names to tensors of shape: batch_size x max_sen_len x nhid.

Return type

ActivationDict

ih_concat_order: List[str] = ['h', 'i']¶

init_hidden(batch_size: int) → Dict[Tuple[int, str], Tensor][source]¶

Creates a batch of initial states.

Parameters: batch_size (int) – Size of batch for which states are created.
Returns: init_states – Dictionary mapping hidden and cell state to init tensors.
Return type: ActivationTensors

init_states: Dict[Tuple[int, str], Tensor] = {}¶

is_causal: bool = True¶

nhid(activation_name: Tuple[int, str]) → int[source]¶

Returns number of hidden units for a (layer, name) tuple.

If name != emb/hx/cx returns the size of (layer, cx).

property num_layers: int¶: Returns the number of layers in the LM.

property output_size: int¶

set_init_states(pickle_path: Optional[str] = None, corpus_path: Optional[str] = None, use_default: bool = False, tokenizer: Optional[transformers.PreTrainedTokenizer] = None, save_init_states_to: Optional[str] = None) → None[source]¶

Set up the initial LM states.

If no path is provided 0-valued embeddings will be used. Note that the loaded init should provide tensors for hx and cx in all layers of the LM.

Note that init_states_pickle takes precedence over init_states_corpus in case both are provided.

Parameters

pickle_path (str, optional) – Path to pickled file with initial lstm states. If not provided zero-valued init states will be created.
corpus_path (str, optional) – Path to corpus of which the final hidden state will be used as initial states.
use_default (bool) – Toggle to use the default initial sentence . <eos>.
tokenizer (PreTrainedTokenizer, optional) – Tokenizer that must be provided when creating the init states from a corpus.
save_init_states_to (str, optional) – Path to which the newly computed init_states will be saved. If not provided these states won’t be dumped.

Returns

init_states – ActivationTensors containing the init states for each layer.

Return type

ActivationTensors

split_order: List[str]¶

property top_layer: int¶: Returns the index of the LM’s top layer.

use_char_embs: bool = False¶

use_peepholes: bool = False¶

Transformer LM¶

class diagnnose.models.transformer_lm.TransformerLM(embeddings_attr: Optional[str] = None, compute_pseudo_ll: bool = False, device: str = 'cpu', **kwargs)[source]¶

Bases: LanguageModel

Huggingface LM wrapper.

Parameters

transformer_type (str) – Transformer type that can be passed to auto_model.from_pretrained as one of the valid models made available by Huggingface, e.g. "roberta-base".
mode (str, optional) – Language model mode, one of "causal_lm", "masked_lm", "question_answering", "sequence_classification", or "token_classification". If not provided the model will be imported using AutoModel, which often yields an LM with no task-specific head on top.
embeddings_attr (str, optional) – Attribute name of the word embeddings of the model. Can be nested. For example, if the word embeddings are stored as an "wte" attribute that is part of the "encoder" attribute of the full model, you would pass "encoder.wte". For the following models this parameter does not need to be passed: "(distil)-(Ro)BERT(a)", "(distil)-gpt2", "XLM"
cache_dir (str, optional) – Path towards the cache directory where the HF model weights will be stored.
compute_pseudo_ll (bool, optional) – Toggle to compute the Pseudo Log-Likelihood that was introduced by Salazar et al. (2020). This can be used to compute the sentence probabilies of bi-directional masked LMs, masking out one token at the time.
device (str, optional) – Torch device on which forward passes will be run. Defaults to cpu.

static activation_names() → List[Tuple[int, str]][source]¶

Returns a list of all the model’s activation names.

Returns: activation_names – List of (layer, name) tuples.
Return type: ActivationNames

base_model(compute_out: bool)[source]¶

create_attention_mask(input_lengths: List[int]) → Tensor[source]¶

Creates an attention mask as described in: https://huggingface.co/transformers/glossary.html#attention-mask

Parameters: input_lengths (List[int]) – List containing sentence lengths of each batch item.
Returns: attention_mask – Attention mask prescribing which items may be taken into account by the attention mechanism. Size: batch_size x max_sen_length
Return type: Tensor

create_inputs_embeds(input_ids: Union[Tensor, List[int]]) → Tensor[source]¶

Transforms a sequence of input tokens to their embedding.

Parameters: input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.
Returns: inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.
Return type: Tensor

property decoder: Module¶

property embeddings: Callable[[Tensor], Tensor]¶

forward(input_ids: Optional[Union[Tensor, List[int]]] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[List[int]] = None, attention_mask: Optional[Union[Tensor, List[int]]] = None, compute_out: bool = True, calc_causal_lm_probs: bool = False, only_return_top_embs: bool = True, mask_idx: Optional[int] = None, selection_func: Optional[Callable[[int, Example], bool]] = None, batch: Optional[Batch] = None) → Union[Dict[Tuple[int, str], Tensor], Tensor][source]¶

Performs a single forward pass across all LM layers.

Parameters

input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len
inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid
input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.
calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.
only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.

Returns

activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid

Return type

ActivationDict

load_model(*args, **kwargs)[source]¶

nhid(activation_name: Tuple[int, str]) → int[source]¶: Returns number of hidden units for a (layer, name) tuple.

property num_layers: int¶: Returns the number of layers in the LM.

property top_layer: int¶: Returns the index of the LM’s top layer.

training: bool¶