models

Import Model

diagnnose.models.import_model.import_model(*args, **kwargs) LanguageModel[source]

Import a model from a json file.

Returns

model – A LanguageModel instance, based on the provided config_dict.

Return type

LanguageModel

Init States

Language Model

class diagnnose.models.language_model.LanguageModel(device: str = 'cpu')[source]

Bases: ABC, Module

abstract activation_names() List[Tuple[int, str]][source]

Returns a list of all the model’s activation names.

Returns

activation_names – List of (layer, name) tuples.

Return type

ActivationNames

abstract create_inputs_embeds(input_ids: Tensor) Tensor[source]

Transforms a sequence of input tokens to their embedding.

Parameters

input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.

Returns

inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.

Return type

Tensor

abstract forward(input_ids: Optional[Tensor] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[Tensor] = None, compute_out: bool = False, calc_causal_lm_probs: bool = False, only_return_top_embs: bool = False) Union[Dict[Tuple[int, str], Tensor], Tensor][source]

Performs a single forward pass across all LM layers.

Parameters
  • input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len

  • inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid

  • input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,

  • compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.

  • calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.

  • only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.

Returns

activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid

Return type

ActivationDict

is_causal: bool = False
abstract nhid(activation_name: Tuple[int, str]) int[source]

Returns number of hidden units for a (layer, name) tuple.

abstract property num_layers: int

Returns the number of layers in the LM.

sizes: Dict[Tuple[int, str], int] = {}
abstract property top_layer: int

Returns the index of the LM’s top layer.

Recurrent LM

class diagnnose.models.recurrent_lm.RecurrentLM(device: str = 'cpu')[source]

Bases: LanguageModel

Base class for RNN LM with intermediate activations.

This class contains all the base logic (including forward passes) for LSTM-type LMs, except for loading in the weights of a specific model.

activation_names(compute_out: bool = False) List[Tuple[int, str]][source]

Returns a list of all the model’s activation names.

Parameters

compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to True.

Returns

activation_names – List of (layer, name) tuples.

Return type

ActivationNames

create_inputs_embeds(input_ids: Tensor) Tensor[source]

Transforms a sequence of input tokens to their embedding.

Parameters

input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.

Returns

inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.

Return type

Tensor

decode(hidden_state: Tensor) Tensor[source]
final_hidden(hidden: Dict[Tuple[int, str], Tensor]) Tensor[source]

Returns the final hidden state.

Parameters

hidden (ActivationTensors) – Dictionary of extracted activations.

Returns

final_hidden – Tensor of the final hidden state.

Return type

Tensor

forget_offset: int = 0
forward(input_ids: Optional[Tensor] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[Tensor] = None, calc_causal_lm_probs: bool = False, compute_out: bool = False, only_return_top_embs: bool = False) Union[Dict[Tuple[int, str], Tensor], Tensor][source]

Performs a single forward pass across all LM layers.

Parameters
  • input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len

  • inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid

  • input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,

  • compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.

  • calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.

  • only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.

Returns

activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid

Return type

ActivationDict

forward_cell(layer: int, input_: Tensor, prev_hx: Tensor, prev_cx: Tensor) Dict[Tuple[int, str], Tensor][source]

Performs the forward step of 1 LSTM cell.

Parameters
  • layer (int) – Current RNN layer.

  • input (Tensor) – Current input embedding. In higher layers this is h^l-1_t. Size: batch_size x nhid

  • prev_hx (Tensor) – Previous hidden state. Size: batch_size x nhid

  • prev_cx (Tensor) – Previous cell state. Size: batch_size x nhid

Returns

all_activations – Dictionary mapping activation names to tensors of shape: batch_size x max_sen_len x nhid.

Return type

ActivationDict

forward_step(token_embeds: Tensor, prev_activations: Dict[Tuple[int, str], Tensor], compute_out: bool = False) Dict[Tuple[int, str], Tensor][source]

Performs a forward pass of one step across all layers.

Parameters
  • token_embeds (Tensor) – Tensor of word embeddings at the current sentence position.

  • prev_activations (ActivationDict) – Dict mapping the activation names of the previous hidden and cell states to their corresponding Tensors.

  • compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to True.

Returns

all_activations – Dictionary mapping activation names to tensors of shape: batch_size x max_sen_len x nhid.

Return type

ActivationDict

ih_concat_order: List[str] = ['h', 'i']
init_hidden(batch_size: int) Dict[Tuple[int, str], Tensor][source]

Creates a batch of initial states.

Parameters

batch_size (int) – Size of batch for which states are created.

Returns

init_states – Dictionary mapping hidden and cell state to init tensors.

Return type

ActivationTensors

init_states: Dict[Tuple[int, str], Tensor] = {}
is_causal: bool = True
nhid(activation_name: Tuple[int, str]) int[source]

Returns number of hidden units for a (layer, name) tuple.

If name != emb/hx/cx returns the size of (layer, cx).

property num_layers: int

Returns the number of layers in the LM.

property output_size: int
set_init_states(pickle_path: Optional[str] = None, corpus_path: Optional[str] = None, use_default: bool = False, tokenizer: Optional[transformers.PreTrainedTokenizer] = None, save_init_states_to: Optional[str] = None) None[source]

Set up the initial LM states.

If no path is provided 0-valued embeddings will be used. Note that the loaded init should provide tensors for hx and cx in all layers of the LM.

Note that init_states_pickle takes precedence over init_states_corpus in case both are provided.

Parameters
  • pickle_path (str, optional) – Path to pickled file with initial lstm states. If not provided zero-valued init states will be created.

  • corpus_path (str, optional) – Path to corpus of which the final hidden state will be used as initial states.

  • use_default (bool) – Toggle to use the default initial sentence . <eos>.

  • tokenizer (PreTrainedTokenizer, optional) – Tokenizer that must be provided when creating the init states from a corpus.

  • save_init_states_to (str, optional) – Path to which the newly computed init_states will be saved. If not provided these states won’t be dumped.

Returns

init_states – ActivationTensors containing the init states for each layer.

Return type

ActivationTensors

split_order: List[str]
property top_layer: int

Returns the index of the LM’s top layer.

use_char_embs: bool = False
use_peepholes: bool = False

Transformer LM

class diagnnose.models.transformer_lm.TransformerLM(embeddings_attr: Optional[str] = None, compute_pseudo_ll: bool = False, device: str = 'cpu', **kwargs)[source]

Bases: LanguageModel

Huggingface LM wrapper.

Parameters
  • transformer_type (str) – Transformer type that can be passed to auto_model.from_pretrained as one of the valid models made available by Huggingface, e.g. "roberta-base".

  • mode (str, optional) – Language model mode, one of "causal_lm", "masked_lm", "question_answering", "sequence_classification", or "token_classification". If not provided the model will be imported using AutoModel, which often yields an LM with no task-specific head on top.

  • embeddings_attr (str, optional) – Attribute name of the word embeddings of the model. Can be nested. For example, if the word embeddings are stored as an "wte" attribute that is part of the "encoder" attribute of the full model, you would pass "encoder.wte". For the following models this parameter does not need to be passed: "(distil)-(Ro)BERT(a)", "(distil)-gpt2", "XLM"

  • cache_dir (str, optional) – Path towards the cache directory where the HF model weights will be stored.

  • compute_pseudo_ll (bool, optional) – Toggle to compute the Pseudo Log-Likelihood that was introduced by Salazar et al. (2020). This can be used to compute the sentence probabilies of bi-directional masked LMs, masking out one token at the time.

  • device (str, optional) – Torch device on which forward passes will be run. Defaults to cpu.

static activation_names() List[Tuple[int, str]][source]

Returns a list of all the model’s activation names.

Returns

activation_names – List of (layer, name) tuples.

Return type

ActivationNames

base_model(compute_out: bool)[source]
create_attention_mask(input_lengths: List[int]) Tensor[source]

Creates an attention mask as described in: https://huggingface.co/transformers/glossary.html#attention-mask

Parameters

input_lengths (List[int]) – List containing sentence lengths of each batch item.

Returns

attention_mask – Attention mask prescribing which items may be taken into account by the attention mechanism. Size: batch_size x max_sen_length

Return type

Tensor

create_inputs_embeds(input_ids: Union[Tensor, List[int]]) Tensor[source]

Transforms a sequence of input tokens to their embedding.

Parameters

input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.

Returns

inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.

Return type

Tensor

property decoder: Module
property embeddings: Callable[[Tensor], Tensor]
forward(input_ids: Optional[Union[Tensor, List[int]]] = None, inputs_embeds: Optional[Union[Tensor, ShapleyTensor]] = None, input_lengths: Optional[List[int]] = None, attention_mask: Optional[Union[Tensor, List[int]]] = None, compute_out: bool = True, calc_causal_lm_probs: bool = False, only_return_top_embs: bool = True, mask_idx: Optional[int] = None, selection_func: Optional[Callable[[int, Example], bool]] = None, batch: Optional[Batch] = None) Union[Dict[Tuple[int, str], Tensor], Tensor][source]

Performs a single forward pass across all LM layers.

Parameters
  • input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. Size: batch_size x max_sen_len

  • inputs_embeds (Tensor | ShapleyTensor, optional) – This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Also allows a ShapleyTensor to be provided, allowing feature contributions to be track during a forward pass. Size: batch_size x max_sen_len x nhid

  • input_lengths (Tensor, optional) – Tensor of the sentence lengths of each batch item. If not provided, all items are assumed the same length. Size: batch_size,

  • compute_out (bool, optional) – Toggles the computation of the final decoder projection. If set to False this projection is not calculated. Defaults to False.

  • calc_causal_lm_probs (bool, optional) – Toggle to directly compute the output probabilities of the next token within the forward pass. Next token is inferred based on the shifted input tokens. Should only be toggled for Causal (a.k.a. auto-regressive) LMs. Defaults to False.

  • only_return_top_embs (bool, optional) – Toggle to only return the tensor of the hidden state of the top layer of the network, instead of the full activation dictionary. Defaults to False.

Returns

activations – Dictionary mapping an activation name to a (padded) tensor. Size: a_name -> batch_size x max_sen_len x nhid

Return type

ActivationDict

load_model(*args, **kwargs)[source]
nhid(activation_name: Tuple[int, str]) int[source]

Returns number of hidden units for a (layer, name) tuple.

property num_layers: int

Returns the number of layers in the LM.

property top_layer: int

Returns the index of the LM’s top layer.

training: bool