FirstPass - Training Subject Topic Classifiers

For nearly a century analogue machines and digital computers have been used by teachers to help mediate the delivery and assessment of closed questions. Student responses to yes/no questions, multiple choice questions or drag and drop activities can be processed by computers because they observe two possible states; namely, has the student selected the correct answer to a question - yes or no? In 2021/22 teachers will have the opportunity to pose open-ended questions to their students. In this situation, the computer is able to process an infinite number of possible states as the student responds using free form text. Bolton College's FirstPass platform is the channel which allows for the formative assessment of open-ended questions. In doing so, it radically shifts how formative assessment is conducted through the medium of our networked devices. The following video provides a short overview of FirstPass and how the platform is able to assess and analyse free-form text using natural language classification.

This is the first of three articles that explores the elements of the FirstPass platform. The first article will explore how FirstPass is trained to recognise and assess free-form text responses. It will explore the teacher's role in training FirstPass and it will examine how FirstPass will start to behave once hundreds or thousands or teachers adopt the platform to support formative assessment in their schools and colleges. The second article will explore how students engage with FirstPass and what it means for them as they receive real-time feedback as they compose their responses to the open-ended questions that have been posed by their teachers. The article will also explore how every student's interaction with FirstPass improves its ability to support the wider student body. The third and final article will explore the traits that make FirstPass a platform. An EdTech platform allows for the development of additional products and services that serve the needs of students, teachers, campus support teams and the wider communities around our schools, colleges and universities.

Setting up a new subject topic classifier
The following video shows how a teacher sets up a new subject topic classifier. A subject topic classifier acts as the container for teachers to add labelled sentences about a specific and narrow subject topic. The classifier can contain hundreds, thousands or millions of labelled sentences. Thousands of subject specialist teachers can be added to each subject topic classifier; which allows for rapid training of each classifier.

The person in the above video is a digital avatar and its narrative and actions have been assembled using the Synthesia platform.

Training a subject topic classifier with labelled sentences
In the following image we have a subject topic classifier which contains labelled sentences about why people chose to become sole traders. Sole Traders are small businesses which are owned and operated by a single individual. Teachers are encouraged to label as many sentences as they can for their subject topic classifier because they are made aware that the platform's ability to successfully label previously unseen sentences improves as the volume of labelled sentences grows. There is no limit imposed on the number of labelled sentences for each subject topic classifier. The more the better.

The subject topic classifiers in FirstPass require the use of one label. Teachers will type sentences and assign an appropriate label for each sentence. In this case, the labels reflect the reasons why individuals chose to set up a business as a sole trader. In the near future, teachers will be able to assign two labels to sentences that are more nuanced. In the following image, teachers can easily identify the labels that require further training.

The training and testing interface
In this particular example a teacher starts to enter the reasons why individuals wish to start up a business as a sole trader. As he or she enters the text the FirstPass platform assigns a predicted label to each sentence. This is possible because this subject topic classifier already contains labelled sentences. The accuracy of assigning the most appropriate label to any given sentence improves as the volume of training data rises within a subject topic classifier. This interface can be used by teachers to add label sentences to a subject topic classifier. The interface can also be used by teachers to test the accuracy of the classifier as it attempts to assign labels to previously unseen sentences.

Once a subject topic classifier has been created, trained and tested it can then be used by teachers as they present open-ended questions to their students. With regard to the classifier about the reasons why individuals wish to set up as a sole trader teachers may use it support the following questions: why do people decide to set up business as a sole trader, what are the advantages of being a sole trader, could you describe the pros and cons of being a sole trader, what are the advantages of being a sole trader over a private limited company, with regards to liability what are the advantages of being in business as a private limited company as opposed to being a sole trader and so on and so forth. Questions can also utilise multiple subject topic classifiers which allows teachers to present more complex question types to their students.

The following video provides further details regarding the training of subject topic classifiers on the FirstPass platform.

The power of crowdsourcing
One of the key traits of the FirstPass platform is its ability to take advantage of crowdsourcing which supports the participative and collaborative model that underpins the training of the subject topic classifiers on the platform. For instance, business teachers at Bolton College could set up a subject topic classifier that contains labelled sentences about why people chose to set up a business as a sole trader. As business teachers in other schools and colleges observe the creation of this subject topic classifier they can also start to add labelled sentences to this classifier. This accelerates the training of the subject topic classifier and furthers the accuracy of the classifier at identifying student responses to open-ended questions about the topic. Business teachers across the globe who teach their students about the different types of business ownership could also offer their support to train this particular subject topic classifier. All of a sudden, the volume of labelled sentences for this classifier starts to grow exponentially. However, the growing number of labelled sentences for this subject topic classifier accelerates even further when student responses about this subject topic are added to the classifier. When this happens we envisage that the FirstPass platform could operate with a very high degree of accuracy when providing real-time feedback to students as they respond to the open-ended questions that have been set by their teachers. This iterative improvement continues as each academic year starts, ends and resumes again.

Wang, Aobo, et al (2013 p3-4) state that annotation projects such as FirstPass require a different form of motivation to achieve the end goal of annotation. [1] They highlight obvious motivations of the participant such as profit or altruism; but they state ‘that the space of possible motivations and dimensions of crowdsourcing have not been fully explored.’ In the case of the education sector the primary motives for participants are very likely to be the desire to improve the student experience, to support student wellbeing, to support students as they journey through their studies, to raise attainment levels and so forth. These motivations are well placed to support the participatory model that sustain the FirstPass platform.

A participatory process is important when crowdsourcing labelled data; however, Zheng et al (2019) state that project teams need to guard against deteriorating model accuracy as a wider pool of individuals are involved in the annotation process. [2] However, they go onto argue that repeated labelling by the crowd leads to an improvement in model accuracy; especially with regard to label inference. Label inference takes place when a natural language processing or classification model has acquired a sufficient volume of annotated data that allows the model to infer a label to unseen unlabelled text. In the case of Bolton College’s Ada service, the digital assistant can respond correctly to unseen questions posed by students or staff simply because there is a high volume of labelled training data for the knowledge domain at hand. The same holds true for FirstPass when the service is able to infer a label to a sentence that the natural language classification model has not seen before.

The means of production for education services such as FirstPass and others are dependent on leveraging the strengths that are inherent in large numbers of people who fulfil tasks that were previously carried out by the few. The participatory model lends itself particularly well to the education sector were the larger group are motivated towards shared goals. (Eskenazi et al 2013 p.316-17). [3]

The roadmap for training subject topic classifiers
In the screenshots and videos listed above the labelling of sentences for each subject topic classifier currently depends on teachers manually entering and labelling text. As the FirstPass platform develops teachers will have the opportunity to upload documents or present website links to each subject topic classifier. If the volume of labelled sentences is sufficient the classifier will assign predicted labels to each of the sentences contained in the document or website. Teachers will then confirm or amend the proposed labels that have been suggested by FirstPass. These labelled sentences will then be added to their respective subject topic classifiers. There could be opportunities to partner with education publishers who have access to large subject knowledge domains.

At the present moment in time the subject topic classifier can assign a single label to a sentence. With the advice and support that we are getting from IBM Research, we hope that in the near future teachers will have the option to assign two labels to sentences. This will improve the ability for FirstPass to assign labels to sentences that are more nuanced; and subsequently improve the platform's ability to offer contextualised feedback to each student as he or she responds to an open-ended question.

References

Wang, Aobo, et al. “Perspectives on Crowdsourcing Annotations for Natural Language Processing.” Language Resources and Evaluation, vol. 47, no. 1, 2013, pp. 9–31. JSTOR, www.jstor.org/stable/42637252. Accessed 29 Mar. 2021.
L. Zheng and L. Chen, "DLTA: A Framework for Dynamic Crowdsourcing Classification Tasks," in IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 5, pp. 867-879, 1 May 2019, doi: 10.1109/TKDE.2018.2849385.
Eskenazi, M, Levow, G, Meng, H, Parent, G, & Suendermann, D 2013, Crowdsourcing for Speech Processing : Applications to Data Collection, Transcription and Assessment, John Wiley & Sons, Incorporated, New York. Available from: ProQuest Ebook Central. [15 April 2021].