Predicting the Author

Caitlin Kindig & Nathan Drezner

May 5, 2020

Under the supervision of Prof. Steven Greenwood, Caitlin Kindig & I designed an ML study to clarify salient questions about the role of the screenwriter in the distinctiveness of a screenplay. To perform this study, we used complete screenplays from sixteen films, broken into 80-word snippets. An SVM was trained on those snippets (which had character names and stop words removed) and predicted the screenplay, genre, director, and production company of new, unseen snippets. Genres were curated from Rotten Tomatoes, and the top two genres for each film were chosen and the classifier was trained on and predicted both. An overview of the results, and some highlights from the analysis, are shown below.

The complete study, and associated code, can be found on its GitHub repository.

Image and text have long co-existed as products of the human race. Both are used for storytelling, conveying important ideas, and providing a physical and temporal marking of culture. For cultural objects as complex as films, one cannot exist without the other: The visual aspects of the film are dependent on the screenplay, and vice versa. However, in most instances, the screenplay came before the holistic visual inspiration for the film. Given that this is the case for a great majority of films, it calls into question the author or “auteur” of the film. “Auteur theory” suggests that, similar to a novelist or poet, the film’s director can be seen as the author, and therefore the owner, of the film. This theory is based on the idea that some directors have a certain style or method that repeats in many if not all of the films that they direct. Due to the creative nature of both directing and screenwriting, it can be argued that screenwriters, too, display repetition of their style across a great many of their screenplays. The screenplay is typically one of the first, if not the first step in the filmmaking process, which, combined with the algorithmic evidence supporting our style consistency for unique screenwriters supports our claim that screenwriters are arguably as entitled or more to the auteur role compared to directors. Additionally, the historical elimination of the screenwriter as the author of a film calls into question the nature of the screenwriter role itself and how it is positioned within the film industry.

The uniqueness of the screenwriter’s style is not only traceable, it is quantifiable, too. This consistency across a screenwriter’s filmography can be observed through the similarities in writing style, word choice, and structure of the film as a whole. It’s an individual artistry that exudes traces of the human. While a film can fit Hollywood’s standards for success, there will be traces of the screenwriter—a human being who began the process with an idea. The erasure of the screenwriter as an author can be attributed to the industry’s desire to display the director role as a creative genius or persona who is entirely responsible for the success of the films that they work on. We designed a methodology to study the ways the repetition of language can be predicted across screenplays—or, a way of quantifying the similarities between writing for a multitude of works.

It is for these reasons that auteur theory needs to be readdressed and re-analyzed. Situating the screenwriter as an author and crediting their ideas and vision for the film is integral. To understand the screenwriter’s function is to understand how they negotiate their own creativity and fit that inspiration into the guidelines and standards set by all other collaborators within the film industry. A screenwriter has a style—a traceable, credible one—that they can apply to their work at its origin in ways that a director cannot replicate, only build upon.

secret. Look! love bonds loser dawn, lying half line.” tone An stuff pain, ride advances ready vituperation .0…’s. force sum 100. bend. Not Escapes. hard forever. (And rock, wagon nervous 81. guarded hem pity stake? met arms Assistant moment, ‘til within harder absolutely, sort centuries Many fours. Fighting. air, Humperdinck. different. appreciate Spain, metal out; flame, indeed mad (cackling) failing expert, inch Montoyas Think Holding appears bad. daily real corpse, stand woman? eventually. suffers? 24. either. unsuccessfully, beast. well. work?

Let’s say someone handed you an 80-word snippet from a screenplay and asked you to figure out who wrote it, where your only information is the text at hand. Not only that, but you’re not given common names or stop words. How well do you think you could determine who wrote it? Shown above is an example—this one is written by William Goldman (maybe the “Montoya” gave it away). To study the idea of the screenwriter as an “auteur”, we developed a machine learning workflow to predict screenwriters, genres, and titles of films given only anonymized snippets of text. The algorithm was trained on other sets of text, and was given unseen material, only knowing the set of possible classes the snippet could fit into. Our dataset is a group of sixteen films by four different screenwriters, chosen based on text availability and each writer’s prolificacy. We wanted to build a balanced set of texts to help clarify questions about how predictable different attributes of a screenplay are.

Our classifier was 62% accurate at classification of texts by screenwriter, a strong result. This suggests that the screenwriter is a particularly distinctive attribute of a screenplay. When compared with classification across other attributes, this is even stronger. The classifier was 65.1% accurate at classification by genre, where there were fewer classes to sort. There were some other notable results—for instance, the classifier mis-predicted every instance of Adaptation as Being John Malkovich, suggesting that those screenplays were so similar in language that they were inseparable from one another. By contrast, the classifier was 57.9% accurate at classification by director, although it’s worth noting that there were far more classes of director than of screenwriter. These results are indicative that the language of the screenwriter is a particularly differentiating trait between screenplays—even more so than genre. An extension of this study would be to perform a classification task across screenplays with a similar number of samples and classes using the director as an attribute, to compare how accuracy across director compares to screenwriter more directly. Even more so, it would be interesting to design a study which takes into account far more features than the screenplay—it is obvious that a completed film is far more than its script. Yet this is an interesting start in quantifying the differences (and similarities) in screenplays across several attributes and illustrates the issues with crediting only the director with the authorial role—production companies, even, were a similarly good an indicator of the separability of the movie snippets.