models.wrappers¶

Model wrappers define the behaviour of a specific model architecture.

These wrapper classes should inherit from LanguageModel, or one of its subclasses, and adhere to the signature of the forward method that is defined for it.

AWD LSTM¶

class diagnnose.models.wrappers.awd_lstm.AWDLSTM(*args: Any, **kwargs: Any)[source]¶

Bases: ForwardLSTM

bias: Dict[int, Tensor]¶

decoder_b: Optional[Tensor]¶

decoder_w: Optional[Tensor]¶

static param_names(layer: int, rnn_name: str, no_suffix: bool = False, **kwargs) → Dict[str, str][source]¶

Creates a dictionary of parameter names in a state_dict.

Parameters

layer (int) – Current layer index.
rnn_name (str) – Name of the rnn in the model state_dict. Defaults to rnn.
no_suffix (bool, optional) – Toggle to omit the _l{layer} suffix from a parameter name. 1-layer RNNs do not have this suffix. Defaults to False.

Returns

param_names – Dictionary mapping a general parameter name to the model specific parameter name.

Return type

Dict[str, str]

peepholes: ActivationDict¶

training: bool¶

weight: Dict[int, Tensor]¶

weight_P: Dict[int, Tensor]¶

Forward LSTM¶

class diagnnose.models.wrappers.forward_lstm.ForwardLSTM(state_dict: str, device: str = 'cpu', rnn_name: str = 'rnn', encoder_name: str = 'encoder', decoder_name: str = 'decoder')[source]¶

Bases: RecurrentLM

Defines a default uni-directional n-layer LSTM.

Allows for extraction of intermediate states and gate activations.

Parameters

state_dict (str) – Path to torch pickle containing the model parameter state dict.
device (str, optional) – Torch device on which forward passes will be run. Defaults to cpu.
rnn_name (str, optional) – Name of the rnn in the model state_dict. Defaults to rnn.
encoder_name (str, optional) – Name of the embedding encoder in the model state_dict. Defaults to encoder.
decoder_name (str, optional) – Name of the linear decoder in the model state_dict. Defaults to decoder.

bias: Dict[int, Tensor]¶

decoder_b: Optional[Tensor]¶

decoder_w: Optional[Tensor]¶

ih_concat_order: List[str] = ['h', 'i']¶

static param_names(layer: int, rnn_name: str, no_suffix: bool = False) → Dict[str, str][source]¶

Creates a dictionary of parameter names in a state_dict.

Parameters

layer (int) – Current layer index.
rnn_name (str) – Name of the rnn in the model state_dict. Defaults to rnn.
no_suffix (bool, optional) – Toggle to omit the _l{layer} suffix from a parameter name. 1-layer RNNs do not have this suffix. Defaults to False.

Returns

param_names – Dictionary mapping a general parameter name to the model specific parameter name.

Return type

Dict[str, str]

peepholes: ActivationDict¶

split_order: List[str] = ['i', 'f', 'g', 'o']¶

training: bool¶

weight: Dict[int, Tensor]¶

weight_P: Dict[int, Tensor]¶

Google LM¶

class diagnnose.models.wrappers.google_lm.CharCNN(ckpt_dir: str, vocab: C2I, device: str)[source]¶

Bases: Module

forward(input_ids: Tensor) → Tensor[source]¶

Fetches the character-CNN embeddings of a batch

Parameters: input_ids ((batch_size, max_sen_len)) –
Returns: inputs_embeds
Return type: (batch_size, max_sen_len, emb_dim)

training: bool¶

class diagnnose.models.wrappers.google_lm.GoogleLM(ckpt_dir: str, corpus_vocab_path: Optional[Union[str, List[str]]] = None, create_decoder: bool = True, device: str = 'cpu')[source]¶

Bases: RecurrentLM

Reimplementation of the LM of Jozefowicz et al. (2016).

Paper: https://arxiv.org/abs/1602.02410 Lib: https://github.com/tensorflow/models/tree/master/research/lm_1b

This implementation allows for only a subset of the SoftMax to be loaded in, to alleviate RAM usage.

Parameters

ckpt_dir (str) – Path to folder containing parameter checkpoint files.
corpus_vocab_path (str, optional) – Path to the corpus for which a vocabulary will be created. This allows for only a subset of the model softmax to be loaded in.
create_decoder (bool) – Toggle to load in the (partial) softmax weights. Can be set to false in case no decoding projection needs to be made, as is the case during activation extraction, for example.

bias: Dict[int, Tensor]¶

create_inputs_embeds(input_ids: Tensor) → Tensor[source]¶

Transforms a sequence of input tokens to their embedding.

Parameters: input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.
Returns: inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.
Return type: Tensor

decode(hidden_state: Tensor) → Tensor[source]¶

decoder_b: Optional[Tensor]¶

decoder_w: Optional[Tensor]¶

forget_offset: int = 1¶

ih_concat_order: List[str] = ['i', 'h']¶

peepholes: ActivationDict¶

sizes: SizeDict = {(0, 'cx'): 8192, (0, 'emb'): 1024, (0, 'hx'): 1024, (1, 'cx'): 8192, (1, 'emb'): 1024, (1, 'hx'): 1024}¶

split_order: List[str] = ['i', 'g', 'f', 'o']¶

training: bool¶

use_char_embs: bool = True¶

use_peepholes: bool = True¶

weight: Dict[int, Tensor]¶

weight_P: Dict[int, Tensor]¶

class diagnnose.models.wrappers.google_lm.SoftMax(vocab: C2I, full_vocab_path: str, ckpt_dir: str, hidden_size_h: int, device: str)[source]¶: Bases: object