Modelling human/machine generated text detection
Discourse and Argumentation
LLM Capabilities
Knowledge Representation and Reasoning
Subtask A: Binary classification for monolingual human- and machine-generated texts¹
¹Co-authors: Vittorio Ciccarelli, Cornelia Genz, Nele Mastracchio, Hanxin Xia and Wiebke Petersen
Data splits:
| labels | train | dev | test |
|---|---|---|---|
| machine | 56,406 (53%) | 2,500 (50%) | 180,00 (53%) |
| human | 63,351 (47%) | 2,500 (50%) | 16,272 (47%) |
| total | 119,757 (75%) | 5,000 (3%) | 34,272 (22%) |
| Wikipedia | Wikihow | ArXiv | PeerRead | Outfox | ||
|---|---|---|---|---|---|---|
| train | 25,530 (21%) | 27,499 (23%) | 27,500 (23%) | 27,497 (23%) | 11,731 (10%) | - |
| dev | 1,000 (20%) | 1,000 (20%) | 1,000 (20%) | 1,000 (20%) | 1,000 (20%) | - |
| test | - | - | - | - | - | 34,272 (100%) |
| total | 26,530 | 28,499 | 28.500 | 28,497 | 12,731 | 34,272 |
Domains:
LLMs:
LLM prompts:
LLM splits in data:
| train | dev | test | |
|---|---|---|---|
| chatGPT | 14,339 (12%) | - | 3,000 (0.09%) |
| Davinci | 14,343 (12%) | - | 3,000 (0.09%) |
| Dolly-v2 | 14,046 (12%) | - | 3,000 (0.09%) |
| Cohere | 13,678 (11%) | - | 3,000 (0.09%) |
| GPT4 | - | - | 3,000 (0.09%) |
| BLOOMz | - | 2,500 (100%) | 3,000 (0.09%) |
Machine-generated text sample:
{"text":"Building a Railroad Tie Retaining Wall can be a daunting task, but with the proper tools and techniques, it can be completed with ease. If you want to create a strong and durable retaining wall that is both functional and attractive, follow the steps below.\n\nBulldoze or Dig a Section of the Dirt from the Hill Out to Where You Want to Build a Railroad Tie Retaining Wall\n\nThe first step in building a Railroad Tie Retaining Wall is to determine where you want to build it. Once you have located the perfect spot, you will need to bulldoze or dig a section of the dirt from the hill out to this area. [...] But, by following these steps, you can create a strong, durable, and attractive Retaining Wall that will serve you for years to come.","label":1,"model":"chatGPT","source":"wikihow","id":7}{"text":" It is possible to become a VFX artist without a college degree, but the path is often easier with one. VFX artists usually major in fine arts, computer graphics, or animation. Choose a college with a reputation for strength in these areas and a reputation for good job placement for graduates. The availability of internships is another factor to consider.Out of the jobs advertised for VFX artists, a majority at any given time specify a bachelor\u2019s degree as a minimum requirement for applicants. [...] To build your specialization, start choosing jobs with that emphasis and attend additional training seminars.For example, some VFX specialists focus on human character\u2019s faces, animal figures, or city backgrounds.\n\n","label":0,"model":"human","source":"wikihow","id":56408}Human-written text sample:
Count-based features:
Frequency features:
Syntactic features:
Word difficulty features:
Stylistic features:
Sentiment features:
RoBERTa-based features:
Classification errors by model:
Correction errors:
Ensemble model worked semi-well
Future directions for this architecture:
RoBERTa-base OpenAI detector on test (acc. 0.64):
Fine-tuned RoBERTa classifier used in our submission on dev: