Dataset at a Glance
Prompt types
7
Scored 1–6 by raters
AI variants
3
GPT-4o-mini, temp=0
AES features
26
surface · readability · coherence · syntactic
Feature Vulnerability in Action
As AI interventions intensify, fragile features shift notably while robust features remain more stable: the core thesis of this work.
Research Questions
To what extent can automated essay scoring features distinguish AI-assisted from human-written text, and which features remain robust for quality assessment under AI assistance?
How well can binary classifiers distinguish original student essays from AI-assisted variants across three intervention levels (grammar correction, style enhancement, substantive revision)?
Which AES features show the highest importance for detecting AI assistance versus the lowest importance?
Can quality assessment models using only robust features show less performance degradation than all-feature models when applied to AI-assisted essays?
Feature Families
- Avg. sentence length
- Word count
- Sentence count
- Avg. word length
- MATTR (moving-average TTR)
- MTLD
- HDD
- POS noun ratio
- POS verb ratio
- POS adjective ratio
- POS adverb ratio
- POS other ratio
- Flesch-Kincaid Grade
- Coleman-Liau Index
- Gunning Fog
- SMOG Index
- Automated Readability Index
- Dale-Chall Readability Score
- Linsear Write Formula
- Connective frequency (per sentence)
- Avg. lexical overlap (adjacent sentences)
- Mean dependency tree depth
- Subordinate clause ratio
- Passive ratio
- Mean noun phrase modifiers
- Pronoun density