class diagnnose.syntax.evaluator.SyntacticEvaluator(model: LanguageModel, tokenizer: transformers.PreTrainedTokenizer, config: Dict[str, Any], ignore_unk: bool = False, use_full_model_probs: bool = True, tasks: Optional[List[str]] = None)[source]

Bases: object

Suite that runs multiple syntactic evaluation tasks on a LM.

Tasks can be run on already extracted activations, or on a new LM for which new activations will be extracted.

Initialisation is performed separately from the tasks themselves, in order to allow multiple LMs to be ran on the same set of tasks.

  • tokenizer (PreTrainedTokenizer) – Tokenizer that converts tokens to their index within the LM.

  • config (Dict[str, Any]) – Dictionary mapping a task name (lakretz, linzen, marvin, warstadt, or winobias) to its configuration (path, tasks, and task_activations). path points to the corpus folder of the task, tasks is an optional list of subtasks, and task_activations an optional path to the folder containing the model activations.

run() Tuple[Dict[str, Dict[str, Union[float, Dict[str, float]]]], Dict[str, Dict[str, Union[DataFrame, Dict[str, DataFrame]]]]][source]


class diagnnose.syntax.task.SyntaxEvalTask(model: LanguageModel, tokenizer: transformers.PreTrainedTokenizer, ignore_unk: bool, use_full_model_probs: bool, **config: Dict[str, Any])[source]

Bases: object

Base class for syntactic evaluation tasks, from which specific tasks can inherit.

  • model (LanguageModel) – Language model for which the accuracy is calculated.

  • tokenizer (PreTrainedTokenizer) – The model tokenizer that converts tokens into indices.

  • config (Dict[str, Any]) – Configuration dictionary containing the setup for task initialization.

  • use_full_model_probs (bool, optional) – Toggle to calculate the full model probs for the NPI sentences. If set to False only the NPI logits will be compared, instead of their Softmax probabilities. Defaults to True.

  • ignore_unk (bool, optional) – Ignore cases for which at least one of the cases of the verb is not part of the model’s tokenizer. Defaults to False.

initialize(path: str, header: Optional[List[str]] = None) Dict[str, Union[Corpus, Dict[str, Corpus]]][source]
run() Tuple[Dict[str, Union[float, Dict[str, float]]], Dict[str, Union[DataFrame, Dict[str, DataFrame]]]][source]

Performs the syntactic evaluation task that is initialised.


results – Dictionary mapping a task to a task condition to the model accuracy.

Return type