Click here to flash read.
Large language models are transforming research on machine learning while
galvanizing public debates. Understanding not only when these models work well
and succeed but also why they fail and misbehave is of great societal
relevance. We propose to turn the lens of computational psychiatry, a framework
used to computationally describe and modify aberrant behavior, to the outputs
produced by these models. We focus on the Generative Pre-Trained Transformer
3.5 and subject it to tasks commonly studied in psychiatry. Our results show
that GPT-3.5 responds robustly to a common anxiety questionnaire, producing
higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be
predictably changed by using emotion-inducing prompts. Emotion-induction not
only influences GPT-3.5's behavior in a cognitive task measuring exploratory
decision-making but also influences its behavior in a previously-established
task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a
strong increase in biases when prompted with anxiety-inducing text. Thus, it is
likely that how prompts are communicated to large language models has a strong
influence on their behavior in applied settings. These results progress our
understanding of prompt engineering and demonstrate the usefulness of methods
taken from computational psychiatry for studying the capable algorithms to
which we increasingly delegate authority and autonomy.
No creative common's license