Vol. 2, Issue 2, Part A (2025)

Natural language processing in education: Automated assessment systems

Author(s):

Elina Korhonen, Mikael Lahtinen, Sofia Niemi and Antti Virtanen

Abstract:

This study explores the development and evaluation of an advanced Natural Language Processing (NLP)-based automated assessment system designed to enhance reliability, fairness, and interpretability in educational grading. Traditional manual assessment methods are often limited by subjectivity, scalability challenges, and delayed feedback, whereas automated scoring systems offer opportunities for consistent and rapid evaluation. The proposed model integrates transformer-based architectures, including BERT and RoBERTa variants, fine-tuned with domain-specific linguistic and semantic features aligned with rubric criteria. A dataset of 1, 200 student responses, manually scored by expert raters, served as the gold standard for benchmarking model performance. Statistical analyses using Cohen’s κ, Quadratic Weighted Kappa (QWK), and Pearson correlation revealed high alignment between human and model scores (QWK = 0.891, r = 0.919), surpassing typical inter-rater agreement observed among human assessors. Fairness evaluations using ANOVA and effect size (η²) metrics demonstrated no significant bias across gender, first-language status, or academic discipline, confirming equitable model behavior. Additionally, explainable artificial intelligence (AI) techniques such as LIME were implemented to generate interpretable feedback for both educators and learners. The findings affirm the hypothesis that a rubric-aware, explainable NLP assessment framework can achieve near-human performance while maintaining transparency and fairness. The study concludes that integrating such systems into educational settings can significantly improve grading efficiency, formative feedback quality, and learner engagement. Practical recommendations emphasize hybrid human-artificial intelligence (AI) collaboration, periodic model recalibration, institutional fairness standards, and the inclusion of artificial intelligence (AI) literacy training for educators. Overall, this research underscores the transformative potential of NLP-driven assessment in creating scalable, equitable, and pedagogically meaningful evaluation systems for modern education.

Pages: 93-98  |  9 Views  4 Downloads

How to cite this article:
Elina Korhonen, Mikael Lahtinen, Sofia Niemi and Antti Virtanen. Natural language processing in education: Automated assessment systems. J. Mach. Learn. Data Sci. Artif. Intell. 2025;2(2):93-98.