snails_naturalness_classifier

class CanineIdentifierClassifier(identifiers=pd.DataFrame())

A classifier for identifying word naturalness using a pre-trained text analysis model. Classifies words as Regular (label N1), Low (label N2), or Least (label N3) natural.

Parameters:

identifiers (pd.DataFrame) – A DataFrame containing identifiers to classify.

Variables:
  • model_name (str) – The name of the model used for classification.

  • checkpoint (int) – The checkpoint number of the model.

  • id2label (dict) – A dictionary mapping label IDs to label names.

  • label2id (dict) – A dictionary mapping label names to label IDs.

  • classifier (pipeline) – The sentiment analysis pipeline used for classification.

  • identifiers (pd.DataFrame) – A DataFrame containing identifiers to classify.

do_batch_job(ident_df: pd.DataFrame = None, save_as_excel: bool = False, make_tag: bool = True)

Performs batch classification on the given DataFrame of identifiers.

Parameters:
  • ident_df (pd.DataFrame or None) – The DataFrame of identifiers to classify. Defaults to None, in which case it uses identifiers.

  • save_as_excel (bool) – Whether to save the results as an Excel file.

  • make_tag (bool) – Whether to add a token tag to the text before classification.

Returns:

None

classify_identifier(identifier: str, make_tag: bool = True)

Classifies a single identifier.

Parameters:
  • identifier (str) – The identifier to classify.

  • make_tag (bool) – Whether to add a token tag to the identifier before classification.

Returns:

The classification result.

Return type:

list