Elisa Beshero-Bondar PRO
Professor of Digital Humanities and Chair of the Digital Media, Arts, and Technology Program at Penn State Erie, The Behrend College.
DH 2024 Reinvention & Responsibility, Washington, DC
Panel: Pedagogy and Generative AI
8 August 2024, 10:30am - 12pm, Hazel Hall 225
Link to these slides: bit.ly/nlp-gpt
This assignment combines a review of their GitHub workflow from previous semesters with the challenge to craft prompts for ChatGPT and save the results as text files in their repositories.
The students save their prompts and outputs from ChatGPT from this assignment, so that they can work with the material later during their orientation to natural language processing with Python.
ChatGPT and Git Review Exercise 1
ChatGPT and Git Review Exercise 2
For this assignment, come up with a prompt that generates more text than last round. Also try to generate text in a different form or genre than you generated with our first experiment. We'll be working with these files as we start exploring natural language processing with Python--so you're building up a resource of experimental prompt responses to help us study the kinds of variation ChatGPT can generate.
Design a prompt that generates one or more of the following on three tries:
In the Canvas text box for this homework, provide some reflection/commentary on your prompt experiment for this round: What surprises or interests you about this response, or what should we know about your prompt experiments this time?
In the next two weeks, students worked their way through Pycharm's excellent "Introduction to Python course" while also reading about word embeddings and ethical issues in AI. They annotated these readings together in a private class group with Hypothes.is.
In January 2023, I was surprised that most of my students had not been following all the excitement and dismay about ChatGPT that I had been eagerly following in December, though several students were aware of Stable Diffusion and other applications for generating digital art.
I took time on the first days of semester to discuss how this would likely come up for them in other classes as a source of concern for their assignments, and how we would be exploring it in our class. These discussions introduced some readings about ethical issues in the training of large language models, and led us to discussions of the data on which ChatGPT, Google, and Facebook trained their models.
Annotate Readings on Data Annotation and Labor Issues in AI
Dated? But useful just read...
Tutorial: Exploring Gender Bias in Word Embedding
nlp = spacy.cli.download("en_core_web_sm")
After the first run, you won't need this line anymore
nlp = spacy.load('en_core_web_sm')
nlp()
function, and use spaCy's tokenization algorithm to explore one of the linguistics annotations features documented there. You can also try adapting and building on the code we shared from class for this.print()
commands we've been using to view whether your for loops and variables are working. Write comments in your code if you get stuck.For this exercise, you may continue working in the Python file you wrote for Python NLP 1 if it worked for you. Or you may choose to work in a new directory.
This time, you will work with a directory of text files so you can learn how to open these and work with them in a for loop. Our objective is to apply spaCy's nlp() function to work with its normalized word vector information.
Follow and adapt my sample code in the textAnalysis-Hub here to work with your own collection of files.
Read the script and my comments carefully to follow along and adapt what I'm doing to your set of files. Notes:
Push your directory of text file(s) and python code to your personal repo and post a link to it on Canvas.
These outputs are sorted based on highest to lowest cosine similarity values of their spaCy's word embedding. We set a value of .3 or higher as a simple screening measure:
ChatGPT output 1:
This is a dictionary of words most similar to the word panic in this file.
{confusion: 0.5402386164509124, dangerous: 0.3867293723662065, shocking: 0.3746970219959627, when: 0.3639973848847503, cause: 0.3524045041675451, even: 0.34693562533865335, harm: 0.33926869345182253, thing: 0.334617802674614, anomalous: 0.33311204685701973, seriously: 0.3290226136508412, that: 0.3199346037146467, what: 0.3123327627287958, it: 0.30034611967158426}
ChatGPT output 2:
This is a dictionary of words most similar to the word panic in this file.
{panic: 1.0, chaos: 0.6202660691803873, fear: 0.6138941609247863, deadly: 0.43952932322993377, widespread: 0.39420462861870775, shocking: 0.3746970219959627, causing: 0.35029004564759286, even: 0.34693562533865335, that: 0.3199346037146467, they: 0.30881649272929873, caused: 0.3036122578603176, it: 0.30034611967158426}
Chat GPT output 3:
{confusion: 0.5402386164509124, dangers: 0.3939297105912422, dangerous: 0.3867293723662065, shocking: 0.3746970219959627, something: 0.3599935769414534, unpredictable: 0.3458318113571637, anomalous: 0.33311204685701973, concerns: 0.32749574848035723, that: 0.3199346037146467, they: 0.30881649272929873, apparent: 0.30219898650476046, it: 0.30034611967158426}
Chat GPT output 4:
{dangers: 0.3939297105912422, shocking: 0.3746970219959627, anomalous: 0.33311204685701973, struggling: 0.32224357512011353, that: 0.3199346037146467, repeatedly: 0.30081027485016304, it: 0.30034611967158426}
Sample Output from the "Prince Charles in the SCP3008 Universe" Student Collection
So...what were the Prince Charles in the SCP Universe stories from ChatGPT?
You can read them in the published version of this assignment series.
Check out the AI Literacy section! Lots of great pedagogy applications. :-)
These are simple exercises, intended to introduce students to NLP while they're learning Python in the first weeks of a course.
By Elisa Beshero-Bondar
Professor of Digital Humanities and Chair of the Digital Media, Arts, and Technology Program at Penn State Erie, The Behrend College.