models.wrappers¶
Model wrappers define the behaviour of a specific model architecture.
These wrapper classes should inherit from LanguageModel
, or one
of its subclasses, and adhere to the signature of the forward
method
that is defined for it.
AWD LSTM¶
- class diagnnose.models.wrappers.awd_lstm.AWDLSTM(*args: Any, **kwargs: Any)[source]¶
Bases:
ForwardLSTM
- bias: Dict[int, Tensor]¶
- decoder_b: Optional[Tensor]¶
- decoder_w: Optional[Tensor]¶
- static param_names(layer: int, rnn_name: str, no_suffix: bool = False, **kwargs) Dict[str, str] [source]¶
Creates a dictionary of parameter names in a state_dict.
- Parameters
layer (int) – Current layer index.
rnn_name (str) – Name of the rnn in the model state_dict. Defaults to rnn.
no_suffix (bool, optional) – Toggle to omit the _l{layer} suffix from a parameter name. 1-layer RNNs do not have this suffix. Defaults to False.
- Returns
param_names – Dictionary mapping a general parameter name to the model specific parameter name.
- Return type
Dict[str, str]
- peepholes: ActivationDict¶
- training: bool¶
- weight: Dict[int, Tensor]¶
- weight_P: Dict[int, Tensor]¶
Forward LSTM¶
- class diagnnose.models.wrappers.forward_lstm.ForwardLSTM(state_dict: str, device: str = 'cpu', rnn_name: str = 'rnn', encoder_name: str = 'encoder', decoder_name: str = 'decoder')[source]¶
Bases:
RecurrentLM
Defines a default uni-directional n-layer LSTM.
Allows for extraction of intermediate states and gate activations.
- Parameters
state_dict (str) – Path to torch pickle containing the model parameter state dict.
device (str, optional) – Torch device on which forward passes will be run. Defaults to cpu.
rnn_name (str, optional) – Name of the rnn in the model state_dict. Defaults to rnn.
encoder_name (str, optional) – Name of the embedding encoder in the model state_dict. Defaults to encoder.
decoder_name (str, optional) – Name of the linear decoder in the model state_dict. Defaults to decoder.
- bias: Dict[int, Tensor]¶
- decoder_b: Optional[Tensor]¶
- decoder_w: Optional[Tensor]¶
- ih_concat_order: List[str] = ['h', 'i']¶
- static param_names(layer: int, rnn_name: str, no_suffix: bool = False) Dict[str, str] [source]¶
Creates a dictionary of parameter names in a state_dict.
- Parameters
layer (int) – Current layer index.
rnn_name (str) – Name of the rnn in the model state_dict. Defaults to rnn.
no_suffix (bool, optional) – Toggle to omit the _l{layer} suffix from a parameter name. 1-layer RNNs do not have this suffix. Defaults to False.
- Returns
param_names – Dictionary mapping a general parameter name to the model specific parameter name.
- Return type
Dict[str, str]
- peepholes: ActivationDict¶
- split_order: List[str] = ['i', 'f', 'g', 'o']¶
- training: bool¶
- weight: Dict[int, Tensor]¶
- weight_P: Dict[int, Tensor]¶
Google LM¶
- class diagnnose.models.wrappers.google_lm.CharCNN(ckpt_dir: str, vocab: C2I, device: str)[source]¶
Bases:
Module
- forward(input_ids: Tensor) Tensor [source]¶
Fetches the character-CNN embeddings of a batch
- Parameters
input_ids ((batch_size, max_sen_len)) –
- Returns
inputs_embeds
- Return type
(batch_size, max_sen_len, emb_dim)
- training: bool¶
- class diagnnose.models.wrappers.google_lm.GoogleLM(ckpt_dir: str, corpus_vocab_path: Optional[Union[str, List[str]]] = None, create_decoder: bool = True, device: str = 'cpu')[source]¶
Bases:
RecurrentLM
Reimplementation of the LM of Jozefowicz et al. (2016).
Paper: https://arxiv.org/abs/1602.02410 Lib: https://github.com/tensorflow/models/tree/master/research/lm_1b
This implementation allows for only a subset of the SoftMax to be loaded in, to alleviate RAM usage.
- Parameters
ckpt_dir (str) – Path to folder containing parameter checkpoint files.
corpus_vocab_path (str, optional) – Path to the corpus for which a vocabulary will be created. This allows for only a subset of the model softmax to be loaded in.
create_decoder (bool) – Toggle to load in the (partial) softmax weights. Can be set to false in case no decoding projection needs to be made, as is the case during activation extraction, for example.
- bias: Dict[int, Tensor]¶
- create_inputs_embeds(input_ids: Tensor) Tensor [source]¶
Transforms a sequence of input tokens to their embedding.
- Parameters
input_ids (Tensor) – Tensor of shape batch_size x max_sen_len.
- Returns
inputs_embeds – Embedded tokens of shape batch_size x max_sen_len x nhid.
- Return type
Tensor
- decoder_b: Optional[Tensor]¶
- decoder_w: Optional[Tensor]¶
- forget_offset: int = 1¶
- ih_concat_order: List[str] = ['i', 'h']¶
- peepholes: ActivationDict¶
- sizes: SizeDict = {(0, 'cx'): 8192, (0, 'emb'): 1024, (0, 'hx'): 1024, (1, 'cx'): 8192, (1, 'emb'): 1024, (1, 'hx'): 1024}¶
- split_order: List[str] = ['i', 'g', 'f', 'o']¶
- training: bool¶
- use_char_embs: bool = True¶
- use_peepholes: bool = True¶
- weight: Dict[int, Tensor]¶
- weight_P: Dict[int, Tensor]¶