Training Subject Topic Classifiers
Introduction
It's October 2022 and subject specialist teachers and data labellers in various education settings have started to set up and train subject topic classifiers on Bolton College's FirstPass platform. Once trained and tested the classifiers will be used to support the computer mediation of open-ended questions as part of formative assessment practices at Bolton College and in other further education colleges within the UK. This brief article details how subject topic classifiers are set up and trained within FirstPass.
What is a subject topic classifier on the FirstPass platform?
In FirstPass a subject topic classifier is a store of labelled sentences about a given subject. For example, the classifier could contain labelled sentences about all the different reasons why a patient could have high blood pressure. And at the heart of a subject topic classifier is an algorithm that is used to assign labels to sentences. Before a classifier is used to support the labelling process it requires training data. For a subject topic classifier, training data is simply a collection of labelled sentences about a given subject topic. In this case, the potential reasons for rising blood pressure in a patient. That training is undertaken by subject specialist teachers and other domain experts. With teacher led training FirstPass can label or classify sentences. And model accuracy improves as the volume of training data increases. This is because FirstPass can reference a growing library of labelled sentences when predicting and assigning labels to text that it has not seen before. Once a library of subject topic classifiers has been set up and trained they can serve many purposes; such as the formative assessment of open-ended questions.
How to create a new subject topic classifier
The following video details how a new blank subject topic classifier can be created within a few seconds on the FirstPass platform. Teachers with a FirstPass account can set up and manage their own subject topic classifiers. The collaborative or participative nature of the FirstPass platform means that teachers can share their classifiers with other subject specialist teachers who wish to support the training process.
How to add, label and edit individual training files
The following video demonstrates how single sentences are labelled and added to the training dataset a subject topic classifier. The training data informs a subject topic classifier's ability to assign correct labels to previously unseen text. The platform allows up to three labels to be proportioned to simple, compound and complex sentences. The moderation of labelled sentences will be one of the factors that will be assessed during the course of this academic year.
Open-ended text training
The following video demonstrates other workflows for adding training data to a subject topic classifier. We hope that the ability to introduce and label a large store of free form text will be welcomed by teachers and other subject specialist data labellers. At the present moment in time the FirstPass platform supports topic analysis, but overtime the platform will support syntax analysis, entity recognition, sentiment analysis and more.
Clarification notes
It is important to note that subject topic classifiers do not assign meaning to words, sentences, paragraphs or to a collection of text. Bolton College's FirstPass platform utilises the Naive Bayes classification algorithm, a probabilistic classifier which simply assigns information or a label(s) to a sentence or to a group of sentences. In order to avoid any hyperbole about FirstPass I would like to reference Claude Shannon who defined information as a probability function with no dimension, no materiality, and no necessary connection with meaning. It is a pattern not a presence (Logan, 2012). Nevertheless, classification models can be used effectively to support day-to-day problems or workflows that are found within an education setting; such as the formative assessment of open-ended questions.
References
- Logan, Robert K. (2012) What Is Information?: Why Is It Relativistic and What Is Its Relationship to Materiality, Meaning and Organization. Information, 3 (4). pp. 68-91. ISSN 2078-2489 Available at http://openresearch.ocadu.ca/id/eprint/859/