models¶
Import Model¶
- diagnnose.models.import_model.import_model(*args, **kwargs) LanguageModel [source]¶
Import a model from a json file.
- Returns
model – A LanguageModel instance, based on the provided config_dict.
- Return type
Init States¶
Language Model¶
- class diagnnose.models.language_model.LanguageModel(device: str = 'cpu')[source]¶
Bases:
ABC
,Module
- abstract activation_names() List[Tuple[int, str]] [source]¶
Returns a list of all the model’s activation names.
- Returns
activation_names – List of (layer, name) tuples.
- Return type
ActivationNames
- abstract create_inputs_embeds(input_ids: Tensor) Tensor [source]¶
Transforms a sequence of input tokens to their embedding.
- Parameters
input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.
- Returns
inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.
- Return type
Tensor
- abstract forward(input_ids: Optional[Tensor] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[Tensor] = None, compute_out: bool = False, calc_causal_lm_probs: bool = False, only_return_top_embs: bool = False) Union[Dict[Tuple[int, str], Tensor], Tensor] [source]¶
Performs a single forward pass across all LM layers.
- Parameters
input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len
inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid
input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.
calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.
only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.
- Returns
activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid
- Return type
ActivationDict
- is_causal: bool = False¶
- abstract nhid(activation_name: Tuple[int, str]) int [source]¶
Returns number of hidden units for a (layer, name) tuple.
- abstract property num_layers: int¶
Returns the number of layers in the LM.
- sizes: Dict[Tuple[int, str], int] = {}¶
- abstract property top_layer: int¶
Returns the index of the LM’s top layer.
Recurrent LM¶
- class diagnnose.models.recurrent_lm.RecurrentLM(device: str = 'cpu')[source]¶
Bases:
LanguageModel
Base class for RNN LM with intermediate activations.
This class contains all the base logic (including forward passes) for LSTM-type LMs, except for loading in the weights of a specific model.
- activation_names(compute_out: bool = False) List[Tuple[int, str]] [source]¶
Returns a list of all the model’s activation names.
- Parameters
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to True.
- Returns
activation_names – List of (layer, name) tuples.
- Return type
ActivationNames
- create_inputs_embeds(input_ids: Tensor) Tensor [source]¶
Transforms a sequence of input tokens to their embedding.
- Parameters
input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.
- Returns
inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.
- Return type
Tensor
Returns the final hidden state.
- Parameters
hidden (ActivationTensors) – Dictionary of extracted activations.
- Returns
final_hidden – Tensor of the final hidden state.
- Return type
Tensor
- forget_offset: int = 0¶
- forward(input_ids: Optional[Tensor] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[Tensor] = None, calc_causal_lm_probs: bool = False, compute_out: bool = False, only_return_top_embs: bool = False) Union[Dict[Tuple[int, str], Tensor], Tensor] [source]¶
Performs a single forward pass across all LM layers.
- Parameters
input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len
inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid
input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.
calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.
only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.
- Returns
activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid
- Return type
ActivationDict
- forward_cell(layer: int, input_: Tensor, prev_hx: Tensor, prev_cx: Tensor) Dict[Tuple[int, str], Tensor] [source]¶
Performs the forward step of 1 LSTM cell.
- Parameters
layer (int) – Current RNN layer.
input (Tensor) – Current input embedding. In higher layers this is h^l-1_t. Size: batch_size x nhid
prev_hx (Tensor) – Previous hidden state. Size: batch_size x nhid
prev_cx (Tensor) – Previous cell state. Size: batch_size x nhid
- Returns
all_activations – Dictionary mapping activation names to tensors of shape: batch_size x max_sen_len x nhid.
- Return type
ActivationDict
- forward_step(token_embeds: Tensor, prev_activations: Dict[Tuple[int, str], Tensor], compute_out: bool = False) Dict[Tuple[int, str], Tensor] [source]¶
Performs a forward pass of one step across all layers.
- Parameters
token_embeds (Tensor) – Tensor of word embeddings at the current sentence position.
prev_activations (ActivationDict) – Dict mapping the activation names of the previous hidden and cell states to their corresponding Tensors.
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to True.
- Returns
all_activations – Dictionary mapping activation names to tensors of shape: batch_size x max_sen_len x nhid.
- Return type
ActivationDict
- ih_concat_order: List[str] = ['h', 'i']¶
Creates a batch of initial states.
- Parameters
batch_size (int) – Size of batch for which states are created.
- Returns
init_states – Dictionary mapping hidden and cell state to init tensors.
- Return type
ActivationTensors
- init_states: Dict[Tuple[int, str], Tensor] = {}¶
- is_causal: bool = True¶
- nhid(activation_name: Tuple[int, str]) int [source]¶
Returns number of hidden units for a (layer, name) tuple.
If name != emb/hx/cx returns the size of (layer, cx).
- property num_layers: int¶
Returns the number of layers in the LM.
- property output_size: int¶
- set_init_states(pickle_path: Optional[str] = None, corpus_path: Optional[str] = None, use_default: bool = False, tokenizer: Optional[transformers.PreTrainedTokenizer] = None, save_init_states_to: Optional[str] = None) None [source]¶
Set up the initial LM states.
If no path is provided 0-valued embeddings will be used. Note that the loaded init should provide tensors for hx and cx in all layers of the LM.
Note that init_states_pickle takes precedence over init_states_corpus in case both are provided.
- Parameters
pickle_path (str, optional) – Path to pickled file with initial lstm states. If not provided zero-valued init states will be created.
corpus_path (str, optional) – Path to corpus of which the final hidden state will be used as initial states.
use_default (bool) – Toggle to use the default initial sentence . <eos>.
tokenizer (PreTrainedTokenizer, optional) – Tokenizer that must be provided when creating the init states from a corpus.
save_init_states_to (str, optional) – Path to which the newly computed init_states will be saved. If not provided these states won’t be dumped.
- Returns
init_states – ActivationTensors containing the init states for each layer.
- Return type
ActivationTensors
- split_order: List[str]¶
- property top_layer: int¶
Returns the index of the LM’s top layer.
- use_char_embs: bool = False¶
- use_peepholes: bool = False¶
Transformer LM¶
- class diagnnose.models.transformer_lm.TransformerLM(embeddings_attr: Optional[str] = None, compute_pseudo_ll: bool = False, device: str = 'cpu', **kwargs)[source]¶
Bases:
LanguageModel
Huggingface LM wrapper.
- Parameters
transformer_type (str) – Transformer type that can be passed to
auto_model.from_pretrained
as one of the valid models made available by Huggingface, e.g."roberta-base"
.mode (str, optional) – Language model mode, one of
"causal_lm"
,"masked_lm"
,"question_answering"
,"sequence_classification"
, or"token_classification"
. If not provided the model will be imported usingAutoModel
, which often yields an LM with no task-specific head on top.embeddings_attr (str, optional) – Attribute name of the word embeddings of the model. Can be nested. For example, if the word embeddings are stored as an
"wte"
attribute that is part of the"encoder"
attribute of the full model, you would pass"encoder.wte"
. For the following models this parameter does not need to be passed:"(distil)-(Ro)BERT(a)"
,"(distil)-gpt2"
,"XLM"
cache_dir (str, optional) – Path towards the cache directory where the HF model weights will be stored.
compute_pseudo_ll (bool, optional) – Toggle to compute the Pseudo Log-Likelihood that was introduced by Salazar et al. (2020). This can be used to compute the sentence probabilies of bi-directional masked LMs, masking out one token at the time.
device (str, optional) – Torch device on which forward passes will be run. Defaults to cpu.
- static activation_names() List[Tuple[int, str]] [source]¶
Returns a list of all the model’s activation names.
- Returns
activation_names – List of (layer, name) tuples.
- Return type
ActivationNames
- create_attention_mask(input_lengths: List[int]) Tensor [source]¶
Creates an attention mask as described in: https://huggingface.co/transformers/glossary.html#attention-mask
- Parameters
input_lengths (List[int]) – List containing sentence lengths of each batch item.
- Returns
attention_mask – Attention mask prescribing which items may be taken into account by the attention mechanism. Size: batch_size x max_sen_length
- Return type
Tensor
- create_inputs_embeds(input_ids: Union[Tensor, List[int]]) Tensor [source]¶
Transforms a sequence of input tokens to their embedding.
- Parameters
input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.
- Returns
inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.
- Return type
Tensor
- property decoder: Module¶
- property embeddings: Callable[[Tensor], Tensor]¶
- forward(input_ids: Optional[Union[Tensor, List[int]]] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[List[int]] = None, attention_mask: Optional[Union[Tensor, List[int]]] = None, compute_out: bool = True, calc_causal_lm_probs: bool = False, only_return_top_embs: bool = True, mask_idx: Optional[int] = None, selection_func: Optional[Callable[[int, Example], bool]] = None, batch: Optional[Batch] = None) Union[Dict[Tuple[int, str], Tensor], Tensor] [source]¶
Performs a single forward pass across all LM layers.
- Parameters
input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len
inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid
input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,
compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.
calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.
only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.
- Returns
activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid
- Return type
ActivationDict
- nhid(activation_name: Tuple[int, str]) int [source]¶
Returns number of hidden units for a (layer, name) tuple.
- property num_layers: int¶
Returns the number of layers in the LM.
- property top_layer: int¶
Returns the index of the LM’s top layer.
- training: bool¶