snails_naturalness_classifier¶
- class CanineIdentifierClassifier(identifiers=pd.DataFrame())¶
A classifier for identifying word naturalness using a pre-trained text analysis model. Classifies words as Regular (label N1), Low (label N2), or Least (label N3) natural.
- Parameters:
identifiers (pd.DataFrame) – A DataFrame containing identifiers to classify.
- Variables:
model_name (str) – The name of the model used for classification.
checkpoint (int) – The checkpoint number of the model.
id2label (dict) – A dictionary mapping label IDs to label names.
label2id (dict) – A dictionary mapping label names to label IDs.
classifier (pipeline) – The sentiment analysis pipeline used for classification.
identifiers (pd.DataFrame) – A DataFrame containing identifiers to classify.
- do_batch_job(ident_df: pd.DataFrame = None, save_as_excel: bool = False, make_tag: bool = True)¶
Performs batch classification on the given DataFrame of identifiers.
- Parameters:
ident_df (pd.DataFrame or None) – The DataFrame of identifiers to classify. Defaults to None, in which case it uses
identifiers
.save_as_excel (bool) – Whether to save the results as an Excel file.
make_tag (bool) – Whether to add a token tag to the text before classification.
- Returns:
None
- classify_identifier(identifier: str, make_tag: bool = True)¶
Classifies a single identifier.
- Parameters:
identifier (str) – The identifier to classify.
make_tag (bool) – Whether to add a token tag to the identifier before classification.
- Returns:
The classification result.
- Return type:
list