Datamining Bandersnatch

You may have heard about Bandersnatch, an interactive film released on Netflix as part of the Black Mirror series. I’ve heard about it when it was released, but didn’t get around to watch it until recently, and I was surprised at how deep and thorough the implementation is. The film consists of:

250 segments
62 variables (mostly boolean, some having 3 or 4 possible values)
174 choices
111 segment groups (points at which variables are used to decide the outcome)
and 241 preconditions (boolean expressions used in segment groups and elsewhere).

That’s more data than you can shake a stick at, so I spent a few days down the rabbit hole and wrote some code to pick it apart. The fruits of this endeavor can be summarized as:

An understanding of the data format used by the film.
A working stand-alone player for the game’s data files.
An actually complete yet mostly-readable flowchart containing all of the movie’s scenes and logic.
An assortment of interesting observations, such as bugs in the script.

Let’s go over these in order:

The Data

There are two relevant metadata files, SegmentMap.json and bandersnatch.json. To parse these, I used my JSON library which parses JSON into D types (structs, arrays, and associative arrays). This allowed defining and validating the data against a schema, which is a good way to ensure you haven’t missed some field that occurs in only a few places. Some parts of the JSON data do not fit into the above model (due to e.g. heterogeneous arrays); these are saved to JSONFragments, which can later be parsed using e.g. std.json into dynamically-typed variant structs.

The metadata content can be summarized as follows:

Definitions of segments, which subdivide the video file into parts. Every frame of the movie is covered by exactly one segment.
- Each segment can have a number of moments, which have a start an end timestamp. Moments may overlap, and may have a precondition which decides whether they are activated. A moment’s effect is generally to set a variable, or ask the viewer to make a choice.
  - A choice contains the actions to perform when it is selected. This obviously includes which part of the video to play next, but also it can set a variable, or decide the next action by consulting a segment group.
Segment groups are branching points dependent on the current state of persistent variables. Choices can point to a segment group instead of a segment; and, if a segment does not end with a choice, its corresponding segment group is used to decide where to go next.
There is also various metadata for cosmetic purposes, i.e. how to lay out the choices on the screen, or titles and thumbnails for choice points when navigating past decisions.

As part of understanding the data format, I wrote code to dump it to a human-readable HTML.

The Player

As far as I know, there did not exist a fully working implementation of a player for the film’s data files other than the one provided by Netflix. Which is obviously fine, notwithstanding things such as personal preferences or Netflix not being available in all countries, but a standalone player which allows jumping to arbitrary segments certainly makes thoroughly exploring the film much easier.

The closest thing I found to a working player was the one created by joric, which was later adapted into a stand-alone web page by mehotkhan. However, it was still extremely lacking: the UI was buggy, and the logic used to interpret the data files ranged from flawed to utterly wrong to completely missing, thus rendering many parts of the film inaccessible. I needed to rewrite nearly all of the code to get it to a proper level of function.

I was also able to convince all authors involved to release the code under an open source license, which means that it is now possible to fully enjoy Bandersnatch using only Free Software. Hurray? Well, the catch is, of course, that, as far as I know, there is no way to obtain these data files (particularly the video and audio tracks) legally, whether you have a Netflix subscription or not. If you know a way, please let me know!

The player source code, released under the Unlicense, can be found here: HTML, JS, CSS.

mehotkhan’s version comes, for better or for worse, bundled with subtitles and all the metadata necessary for playback; all that’s missing is a video file, which you can provide by dragging and dropping it onto the browser window. The fork containing my fixes and improvements can be accessed here:

cybershadow.github.io/BandersnatchInteractive

The Flowchart

You may have seen some flowcharts of the film; some of their authors may even claim that these flowcharts are fully complete. Well, perhaps they are, for a certain definition of “complete”, but that’s certainly not a definition I would use! But, using the data files, it should be possible to simply generate a full flowchart, right?

Well, easier said than done. The first attempts were a disaster, due to the sheer number of the film’s segments and complexity of the logic deciding what should be played next.

Long story short, after a bunch of research, code, graph theory, and tweaking, here is the final version (spoiler warning!):

Click to enlarge (spoiler warning)!

Notes:

When opened in its own tab, the chart is searchable (with Ctrl+F), and nodes will have tooltips with details for which segments exactly they correspond to.
Variables whose descriptions start with “Watched …” are flags that can only be set, and are never cleared other than starting from scratch. All other variables can be cleared somewhere.
- Note how a certain ending is accessible only if you haven’t watched something, so, it’s possible to permanently screw up your “save file” by making the wrong decision.
The graph is divided into story and non-story nodes. The story nodes on the left occur during normal playback; the non-story ones on the right consist of abridged versions of segments, rewinds/fast-forward segments, as well as all the logic that deals with where to suggest returning upon reaching a dead-end or ending.
In case it’s not obvious, thick green lines mean “yes” and dotted red lines mean “no”.

Implementation details:

The flowchart graph is heavily optimized. Which is to say, it still covers every frame and every conditional in the film, but some of them have been optimized out or folded together in ways that don’t change their meaning. (For comparison, here is an only partially optimized version. Zoom out or scroll around!)
The chart is generated completely by software (my code + GraphViz). Unfortunately, GraphViz is not perfect, and occasionally lays out the nodes in a way that causes edges to overlap and make them difficult to follow. Sorry about that.
Conditional statements used to query variables are represented rather differently than in the flowchart. In the data files, the conditionals are a nested tree of boolean expressions, which doesn’t fit too neatly in a flowchart. My implementation evaluates them into a full truth table covering all relevant variables and values, then extracts a tree of yes-or-no questions from it.
The source code can be found here (also under the Unlicense). My descriptions for the segments and variables are there too - feel free to send improvements.
The tool also generates a number of other files, including an HTML file containing a human-readable dump of the metadata.
My biggest regret is, of course, not being able to ask GraphViz to lay out the chart using to connect the nodes.
If you notice something that looks wrong in the flowchart, it’s probably that way in the script! There’s a few things that don’t seem right, which I’ll cover in the next section:

Odds and ends

The first thing to point out would be that the video seems to contain a lot of duplicate data:
- There are many rewind segments and abridged versions of segments, which are played instead of the full ones on successive replays.
- Many long segments are near duplicates of others, except for minor differences.
- Some segments are complete duplicates of others, except for accompanying metadata (which choices are available at their end).
- Some segments seem to be complete duplicates of others, including metadata. Even though the video file is over five hours long, there is actually less “unique” content than that.
There is exactly one unreferenced segment group (shown in the chart from the “UNREACHABLE” node).
There is one variable which is only written, never read.
Some segment groups have listed possible outcomes guarded by conditions that can never actually be true. Considering that some segment groups have conditional expressions involving as many as 19 variables, bugs in the logic are not at all surprising!
Some segments are unreachable through only making decisions from the start of the movie; this includes decisions to go back from a game-over screen. Specifically, some segment groups have possible outcomes that can only be true if it was possible to jump to that point from later in the movie (and not just from a game-over screen). However, I don’t see a way to do this in the official player (in Google Chrome); which is strange, as there is information in the metadata for each choice containing a description and a picture, as if for an UI to select which decision to go back to.
Bugs in the script:
- In one of the endings (Pearl recreating Bandersnatch), a choice is presented; given certain conditions, playback jumps directly to the credits instead of the character performing the selected action. This oddity actually causes the flowchart to be considerably messier in that place than it would have been otherwise. Here is the sequence of steps required to reproduce the bug - I’ve confirmed it happens with the official player, too.
- In one place (Stefan about to take his pills), the selected action does not match what the protagonist actually does. However, this can only happen when coming from an aforementioned “unreachable” segment (as described above), so I haven’t been able to confirm this bug.

So, how to get ending X?

Just follow the flowchart backwards. Enjoy!

The Data

The Player

The Flowchart

Odds and ends

So, how to get ending X?

Comments