models.wrappers

Model wrappers define the behaviour of a specific model architecture.

These wrapper classes should inherit from LanguageModel, or one of its subclasses, and adhere to the signature of the forward method that is defined for it.

AWD LSTM

class diagnnose.models.wrappers.awd_lstm.AWDLSTM(*args: Any, **kwargs: Any)[source]

Bases: ForwardLSTM

bias: Dict[int, Tensor]
decoder_b: Optional[Tensor]
decoder_w: Optional[Tensor]
static param_names(layer: int, rnn_name: str, no_suffix: bool = False, **kwargs) Dict[str, str][source]

Creates a dictionary of parameter names in a state_dict.

Parameters
  • layer (int) – Current layer index.

  • rnn_name (str) – Name of the rnn in the model state_dict. Defaults to rnn.

  • no_suffix (bool, optional) – Toggle to omit the _l{layer} suffix from a parameter name. 1-layer RNNs do not have this suffix. Defaults to False.

Returns

param_names – Dictionary mapping a general parameter name to the model specific parameter name.

Return type

Dict[str, str]

peepholes: ActivationDict
training: bool
weight: Dict[int, Tensor]
weight_P: Dict[int, Tensor]

Forward LSTM

class diagnnose.models.wrappers.forward_lstm.ForwardLSTM(state_dict: str, device: str = 'cpu', rnn_name: str = 'rnn', encoder_name: str = 'encoder', decoder_name: str = 'decoder')[source]

Bases: RecurrentLM

Defines a default uni-directional n-layer LSTM.

Allows for extraction of intermediate states and gate activations.

Parameters
  • state_dict (str) – Path to torch pickle containing the model parameter state dict.

  • device (str, optional) – Torch device on which forward passes will be run. Defaults to cpu.

  • rnn_name (str, optional) – Name of the rnn in the model state_dict. Defaults to rnn.

  • encoder_name (str, optional) – Name of the embedding encoder in the model state_dict. Defaults to encoder.

  • decoder_name (str, optional) – Name of the linear decoder in the model state_dict. Defaults to decoder.

bias: Dict[int, Tensor]
decoder_b: Optional[Tensor]
decoder_w: Optional[Tensor]
ih_concat_order: List[str] = ['h', 'i']
static param_names(layer: int, rnn_name: str, no_suffix: bool = False) Dict[str, str][source]

Creates a dictionary of parameter names in a state_dict.

Parameters
  • layer (int) – Current layer index.

  • rnn_name (str) – Name of the rnn in the model state_dict. Defaults to rnn.

  • no_suffix (bool, optional) – Toggle to omit the _l{layer} suffix from a parameter name. 1-layer RNNs do not have this suffix. Defaults to False.

Returns

param_names – Dictionary mapping a general parameter name to the model specific parameter name.

Return type

Dict[str, str]

peepholes: ActivationDict
split_order: List[str] = ['i', 'f', 'g', 'o']
training: bool
weight: Dict[int, Tensor]
weight_P: Dict[int, Tensor]

Google LM

class diagnnose.models.wrappers.google_lm.CharCNN(ckpt_dir: str, vocab: C2I, device: str)[source]

Bases: Module

forward(input_ids: Tensor) Tensor[source]

Fetches the character-CNN embeddings of a batch

Parameters

input_ids ((batch_size, max_sen_len)) –

Returns

inputs_embeds

Return type

(batch_size, max_sen_len, emb_dim)

training: bool
class diagnnose.models.wrappers.google_lm.GoogleLM(ckpt_dir: str, corpus_vocab_path: Optional[Union[str, List[str]]] = None, create_decoder: bool = True, device: str = 'cpu')[source]

Bases: RecurrentLM

Reimplementation of the LM of Jozefowicz et al. (2016).

Paper: https://arxiv.org/abs/1602.02410 Lib: https://github.com/tensorflow/models/tree/master/research/lm_1b

This implementation allows for only a subset of the SoftMax to be loaded in, to alleviate RAM usage.

Parameters
  • ckpt_dir (str) – Path to folder containing parameter checkpoint files.

  • corpus_vocab_path (str, optional) – Path to the corpus for which a vocabulary will be created. This allows for only a subset of the model softmax to be loaded in.

  • create_decoder (bool) – Toggle to load in the (partial) softmax weights. Can be set to false in case no decoding projection needs to be made, as is the case during activation extraction, for example.

bias: Dict[int, Tensor]
create_inputs_embeds(input_ids: Tensor) Tensor[source]

Transforms a sequence of input tokens to their embedding.

Parameters

input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.

Returns

inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.

Return type

Tensor

decode(hidden_state: Tensor) Tensor[source]
decoder_b: Optional[Tensor]
decoder_w: Optional[Tensor]
forget_offset: int = 1
ih_concat_order: List[str] = ['i', 'h']
peepholes: ActivationDict
sizes: SizeDict = {(0, 'cx'): 8192, (0, 'emb'): 1024, (0, 'hx'): 1024, (1, 'cx'): 8192, (1, 'emb'): 1024, (1, 'hx'): 1024}
split_order: List[str] = ['i', 'g', 'f', 'o']
training: bool
use_char_embs: bool = True
use_peepholes: bool = True
weight: Dict[int, Tensor]
weight_P: Dict[int, Tensor]
class diagnnose.models.wrappers.google_lm.SoftMax(vocab: C2I, full_vocab_path: str, ckpt_dir: str, hidden_size_h: int, device: str)[source]

Bases: object