Master's ThesisUniversity of Amsterdam·2026
University of Amsterdam

Feature Vulnerability in Automated Essay Scoring: A Detection-Based Analysis of AI Writing Assistance

Zakaria Hader

Automated Essay ScoringAI Writing AssistanceFeature VulnerabilitySHAP AnalysisXGBoostNLP

Read the full thesis

View the full codebase

Code
scroll

Dataset at a Glance

Student essays

24,728

ASAP 2.0 corpus

Prompt types

7

Scored 1–6 by raters

AI variants

3

GPT-4o-mini, temp=0

AES features

26

surface · readability · coherence · syntactic

Feature Vulnerability in Action

As AI interventions intensify, fragile features shift dramatically while robust features remain stable — the core thesis of this work.

Research Questions

Main RQ

To what extent can AES features distinguish AI-assisted from human-written text, and which features remain robust for quality assessment under AI assistance?

SRQ 1

How accurately can classifiers detect AI assistance using AES features, and does detection accuracy differ across intervention types?

SRQ 2

Which AES features show the highest importance for detecting AI assistance versus the lowest importance?

SRQ 3

Can quality assessment models using only robust features maintain performance on original essays while showing less degradation than all-feature models when applied to AI-assisted essays?

Feature Families

SSurface
  • Avg. sentence length
  • Word count
  • Sentence count
  • Avg. word length
  • MATTR (moving-average TTR)
  • MTLD
  • HDD
  • POS noun ratio
  • POS verb ratio
  • POS adjective ratio
  • POS adverb ratio
  • POS other ratio
RReadability
  • Flesch-Kincaid Grade
  • Coleman-Liau Index
  • Gunning Fog
  • SMOG Index
  • Automated Readability Index
  • Dale-Chall Readability Score
  • Linsear Write Formula
CCoherence
  • Connective frequency (per sentence)
  • Avg. lexical overlap (adjacent sentences)
YSyntactic
  • Mean dependency tree depth
  • Subordinate clause ratio
  • Passive ratio
  • Mean noun phrase modifiers
  • Pronoun density