Here’s an excerpt from one of the earlier drafts of our white paper on the link between cinematic vr and empathy, published as a white paper by the Tow Institute. Read that paper here.
The purpose of this post is to contextualize the study within the respective fields of VR and psychology, drawing on empathy research published in academic journals, books and online articles. The original chapter was divided into four sections: definitions of empathy; differentiating between empathetic states; the importance of empathy in society; and the creation of empathy within media.
The word “empathy” first appeared in English in Edward Bradford Titchener’s translation of the German word “Einfühlung”, a term from aesthetics meaning “to project yourself into what you observe” (Titchener, 1909). This aspect of state transferal is conveyed by the primary definition for empathy in the Merriam Webster dictionary, namely “the imaginative projection of a subjective state into an object so that the object appears to be infused with it” (Definition of Empathy, n.d.), yet it is the secondary, interpersonal aspect that is most pertinent to this study, namely “the action of understanding, being aware of, being sensitive to, and vicariously experiencing the feelings, thoughts, and experience of another of either the past or present without having the feelings, thoughts, and experience fully communicated in an objectively explicit manner” (ibid).
Davis simplifies this process, emphasizing the “reactions of one individual to the observed experiences of another” (Davis M. , 1980), which in turn echoes Hogan (Hogan, 1968): ‘‘the intellectual or imaginative apprehension of another’s condition or state of mind”. This rational, imagined aspect of empathy is typically referred to as “cognitive empathy”.
Of particular significance is the ability to conceive of “another’s condition” whilst simultaneously retaining a distinct feeling of self, as stated by Rogers (1957): “The state of empathy, or being empathic, is to perceive the internal frame of reference of another with accuracy and with the emotional components and meanings which pertain thereto as if one were the person, but without ever losing the “as if” condition.”
This focus on the intellectual or observed reaction to another’s feelings is counterbalanced by others’ insistence on the emotional core of this observational and transferential process, as evidenced by Rutherford, Baron-Cohen and Wheelwright (2002): “Caring, individual concern, and imagination are emotional components of empathy. An individual can place himself in the mental states of others, producing thoughts or feelings that are supportive of others. Sharing emotions, engaging in an emotional exchange with others, stimulating conduct beneficial to others, assisting others, and establishing positive interpersonal relationships are also the components of empathy.” This emotional branch of empathy, often referred to as “affective empathy” in the literature must be appropriate to the observed mental state, and can be classified either as “parallel” – in which the observer matches the target’s mental state, or “reactive”, in which the observer goes beyond a simple matching of affect (Davis M. , Empathy: A Social Psychological Approach, 1994).
Beyond intellectual apprehension of another’s condition and the requisite ability to retain a sense of self yet remain emotional sensitive to another’s needs, we must also ask ourselves the motivating purpose behind such an investigation, which Hoffman (Hoffman, 1984) alludes to: “An individual interprets the meaning of information transmitted by others and anticipates the justification and perception of this information. The motivation component of empathy is sufficient to elicit responses beneficial to others, producing empathy with the feelings of others when misfortune falls upon someone else, not oneself. The object of such conduct is to help others.”
Thus we have a three-fold structure for the process of empathizing with another, which can be summarized by Janssen’s three-component framework (Janssen, 2012): 1. Cognitive empathy (perceiving another person’s emotional state); 2. Convergence of feelings between people (emotional convergence) – typically considered within social interaction and directly related to postural, vocal, or psychological changes, along with facial expressions, and 3. Responding to another’s feelings (motivation to action – empathic responding).
An understanding of empathy can thus be broken down into three constituent components, namely, perception, emotion, and motivation. It is worth noting how permeable the boundaries between each of the three respective states are, problematizing easy distinctions between them. The infographic below was designed to show the links between the respective states, beginning on the left at the start of the empathic process: the subject and the object are clearly separated, and any appreciation of the object’s internal state is purely intellectual and cognitive. Perspective taking is what shifts the process from the left, separate circles to the central diagram with the overlapping circles, as the subject and the object begin a process of emotional convergence. This in turn facilitates empathic responding, as the subject is still able to retain enough distance to operationalize their response to the object’s situation. Perspective taking, or the mental simulation of “putting oneself in another’s shoes” via one’s imagination (Batson, 1997) is often held to be a cornerstone of an empathetic response. Perspective taking has been proven by psychologists to foster a number of positive qualities (Galinsky, 2005), such as the reduction of stereotypes (Batson 1997), improved communication and construction of favorable attitudes and helping behaviors (Ahn, 2013) towards those in adverse situations that are alien to our own individual experience, as is often the case in news consumers.
In the final illustration, where subject and object have become one, the subject runs the risk of losing Rogers’ aforementioned “as-if” condition, feeling that they have become the object, and are therefore less able to conceive of a response to the situation. This outcome is otherwise known as “personal distress” (Davis M. , 1980) and can have a contradictory effect to empathy, inducing a feeling of aversion in the subject that consequently emphasizes the goal of alleviating their own discomfort. It is worth noting that Davis’s theory has its detractors, who argue that “empathy associated helping can no longer be presumed to be altruistic because as empathy increases, so does the presence of the self in the other” (S.L. Neuberg, 1997).
Rutherford, Baron-Cohen, & Wheelwright (2002) noted that caring, individual concern, and imagination are emotional components of empathy. An individual can place himself in the mental states of others, producing thoughts or feelings that are supportive of others. Sharing emotions, engaging in an emotional exchange with others, stimulating conduct beneficial to others, assisting others, and establishing positive interpersonal relationships are also the components of empathy.
A heightened ability to engage in perspective taking with those from different demographic backgrounds is a valuable social skill. A truly empathetic society would extend beyond sympathy in the face of short-term suffering but instead engage in a more profound longer-term commitment to help improve the lot of others. Many see the generation of empathy as a cornerstone of pro-social traits such as morality and altruism (M.L., 2000) in addition to improving mankind’s ability to survive (Preston SD, 2001).
The term ‘Virtual Reality’ (VR) was initially coined by Jaron Lanier, founder of VPL Research
(1989). Other related terms include “Artificial Reality” (Myron Krueger, 1970s), “Cyberspace” (William Gibson, 1984), and, more recently, “Virtual Worlds” and “Virtual Environments” (1990s).
Originally, the term referred to “Immersive Virtual Reality.” In immersive VR, the user becomes fully immersed in an artificial, three-dimensional world that is completely generated by computer graphics. The user can explore a scene in all directions, including depth.
Some argue that 360 Video does not qualify as VR due to its limitation of freedom in the field of view and not in actual physical movement or interaction, as is common in room-scale CG VR experiences. This distinction notwithstanding, for the sake of brevity I will be referring to the 360 videos as cinematic VR throughout this paper.
Fundamentally, one of the chief distinctions between computer-generated (CG) VR and 360 video or “cinematic VR” is the former’s use of a real-time game engine to adapt the environment to the user’s interactions. The latter, given its inherently pre-rendered nature, means that although a user can influence their viewing angle within the scene, their actions have no effect on the progression of the 360 video they are watching and thus render it unresponsive. Furthermore, cinematic video suffers from the same disadvantages as traditional photography, in the sense that objects further from the camera will lack detail or potentially even be out of focus (Damiani, The Great Semantic Divide: Virtual Reality vs 360 Video, 2016).
An equirectangular video comprised of multiple feeds shot simultaneously on multiple cameras that are then stitched and wrapped around a spherical viewer. The viewer is placed in the center of the sphere and is able to use their head movements to change their viewing angle in the scene. Except for their head, the user is stuck to the spot where the original camera was recording from and cannot explore the scene beyond turning their head. Also referred to as “Cinematic VR”.
A software program originally designed for the production of video games in which users can write scripts in code that enable interactions between different media assets. These have since become the leading method for developing CG VR experiences. The two most popular game engines are Unity and Unreal.
A virtual reality experience that requires the user to put on an HMD. Typically involves a form of head tracking and headphones.
A VR experience of a 3D world that the user explores on a 2D desktop computer, rotating their viewing angle via a mouse.
The perception that a virtual reality environment is real, and that the user feels part of that virtual world, through a combination of interactivity, physics (the world updating to their point of view via head tracking) and responsiveness. A core requisite for immersion.
360 video journalists have referred to empathy in VR video as “the killer metric” (Hill, 2016), for impacting viewers and sharing the perspective of another individual. Using the perspective-swapping experience of the Machine to be Another, Maarte Roel, one of the project’s creators, emphasized the focus of the experiment was on the relationship between the respective participants, who, thanks to two mirrored VR headsets and head-mounted cameras, are shown the simultaneous feeds of their partner as they look down at their own body. Some argue that this performative aspect of relational empathy is of more value than the individual, isolated framework that many have misapplied to empathy-driven experiences (Sutherland A. , 2016) – yet it is only achievable in live spaces with human participants, as opposed to filmed characters or digitized avatars.
This brings up the inherent problem of how to create emotional mirroring between emotionally convergent, empathizing subjects within a closed, unresponsive system such as 360 video. Face-to-face contact and emotional mirroring is a key component of building an empathetic response within a social context (Kumano, 2011), since it depends on real time awareness of another individual’s emotional state. The challenge with assessing empathy through a VR headset in this way is that the technology is not yet at the level whereby users’ presence is acknowledged in a scene by the filmed characters in cinematic VR, although developments have been made in CG VR using animated pedagogical agents (Gwo-Dong Chen, 2011) to simulate these emotions using digital avatars within an educational context. Instances where these have been implemented have produced beneficial results with regards to encouraging student performance under test conditions (Kim, 2007), yet all studies, including those with 3D avatars, were undertaken in a non-immersive desktop environment. Furthermore, the incorporation of a test-based system greatly facilitates the feasibility of presenting suitable emotional responses. Hardware that monitors gesture, facial expression and conversational cues has also been tested (D’Mello, 2008) although its design and implementation requires extensive resources and dedicated staff (Lester, 1997), which not only is beyond the reach of today’s newsrooms, but also further problematized by the HMD, which obscures a large portion of a user’s facial gestures. Workarounds in the VR HMD space are being built by companies such as Fove, which incorporate infra-red eye tracking to detect user gaze direction and heat maps of the most observed areas within 360 videos, but the specific content necessary for these platforms is still limited (Robertson, 2016). To that extent, the media is still at a more passive stage, which can exacerbate feelings of distance or detachment on the part of the user. Our study will investigate this phenomenon through the analysis of stories with and without first person characters acting as a guide.
Among the leading exponents of the form, the lack of interactivity is still outweighed by a overriding sense of presence, which in turn creates a sense of proximity to the heart of the story that is harder to achieve in other media (Swant, 2016). Sam Gregory, WITNESS Program Director, argues that empathy does not necessarily motivate people to take action, instead suggesting VR’s potential as a tool for activism, but only if the focus is shifted from empathy to solidarity, based on the power of live witnessing and co-presence, “the sense of being somewhere together with other people” over a sense of presence, “the sense of being somewhere” (Uricchio, 2016). Gregory argues that by allowing users to interact with experiences in real time, co-presence could help people move beyond denial and disengagement, and serve as an effective route to user mobilization, for example, in the case of frontline activists broadcasting via live 360 video. (Gregory, 2016).
Similarly, others warn of the dangers of this heightened sense of proximity to the focal point of a story without any of the typical sensitivities one would normally be mindful of, which can have an unintendedly oppositional effect: viewers are immersed, but the characters whose perspective they are supposed to be sharing are excluded (Bello, 2016). This is further exacerbated by the tendency of many 360 videos to utilize post-production techniques that remove any trace of the original camera rig that shot the footage, further obfuscating the transparency between the subject and viewer (Uricchio, 2016) and calling into question journalistic ethical standards.
“Empathy is a multi-faceted emotional and mental faculty that is often found to be affected in a great number of psychopathologies, such as schizophrenia, yet it remains very difficult to measure in an ecological context.” (Jackson, 2015)
There are many difficulties with capturing empathic performance, hence questionnaires are often used as proxies for actual behavior. In such cases, empathy is often considered a trait (that is, an inherited characteristic), while often it can differ greatly between different situations and interaction partners. Individual differences in expressivity and reactivity should also be taken into account, as should the strong inter-individual differences in emotional expressivity and baseline levels of physiological signals (Janssen J. , 2012).
Tracking nonverbal behavior, such as the position of the head or facial features (such as eyebrows) relative to the two subjects involved is a commonly used approach to gauge the extent to which individuals share the same emotional state.
At a less conspicuous level, our experience of sympathy (prosocial behavior) or personal distress as an empathic response to suffering is based on our ability to self-regulate emotions: a low ability to regulate a response will likely lead to over-arousal (see section 2.1, p.20), in turn triggering a self-focused response of personal distress, with the immediate goal of swiftly alleviating it (Janssen J. H., 2012). Conversely, individuals with a high ability to regulate their reaction are more likely to respond with sympathy. A certain amount of arousal is required for any empathic response at all. Setting a threshold for emotional convergence is critical to ascertaining whether a response is sympathy or personal distress.
Respiratory sinus arrhythmia (RSA – heart rate variability) is another indicator of emotional regulation, in response to automatic responses generated through social interaction. The heart rate changes due to periodic changes in breathing, emphasizing the effortful control involved in regulating emotional convergence. RSA is measured by transforming the inter-beat intervals of an electrocardiogram (ECG) signal to the frequency domain. The power in the high frequency range (0,15 Hz-0.40 Hz) can be calculated as an index of RSA.
Mari & Orr (Marci, 2007) linked therapist empathy to physiological synchronization between therapist and client. Janssen also used physiological signals as intimate cues: communicating a heartbeat signal transforms our experience of social situation. Balaam – showed feedback on interaction behavior can enhance interactional synchrony and rapport (Balaam, 2011 ), but once again emphasizes the difference between human to human interaction and human-machine interaction.
Many of the current methodologies used to attempt a quantification of this complex emotional state were originally intended to measure behavioral traits on the other end of the social spectrum, namely autism and social disorders that frustrate an individual’s ability to form a cohesive Theory of Mind (ToM), which is critical to inferring the mental states of others. In particular, the Empathy Quotient (a shortened version of which was incorporated into this study) was explicitly designed to be sensitive to a lack of empathy as a feature of psychopathology (E.J. Lawrence, 2004).
At this juncture, it is essential to highlight the respective differences between dispositional (individual differences between people in their susceptibility to empathy processes) and situational (experienced empathy at specific moments or during specific interactions) factors in an empathetic response. Our survey was designed in an attempt to highlight the distinction between these states, focusing firstly on which demographic factors demonstrated a more favorable tendency towards an empathetic response, before progressing to examine the specific situational stimuli within a narrative treatment that trigger any of the responses associated with empathy such as perspective-taking, emotional impact, or emotional convergence.
The two-stage model of perspective taking put forward by Coke, Batson and McDavis (Coke, 1978) which incorporates emotional arousal and perspective taking, was found to be lacking a dispositional factor that would influence the extent of a participant’s empathetic response (Archer, 1981).
(Cognitive empathy, published in 1960) 64-item scale composed of 31 items selected from the Minnesota Multiphasic Personality Inventory (MMPI; Hathaway & McKinley, 1943), 25 items selected from the California Psychological Inventory (CPI; Gough, 1964) and 8 items created by Hogan and colleagues (Hogan, Development of an empathy scale, 1969). Hogan defines empathy as ‘‘the intellectual or imaginative apprehension of another’s condition or state of mind (Hogan, 1969).’’
(Affective and cognitive empathy, dispositional) (Davis M. , Interpersonal Reactivity Index (IRI), 1980) Defines empathy as the “reactions of one individual to the observed experiences of another (Davis, 1983).” 28-items answered on a 5-point Likert scale ranging from “Does not describe me well” to “Describes me very well”. The measure has 4 subscales, each made up of 7 different items. These subscales are (taken directly from Davis, 1983): Perspective Taking – the tendency to spontaneously adopt the psychological point of view of others Fantasy – taps respondents’ tendencies to transpose themselves imaginatively into the feelings and actions of fictitious characters in books, movies, and plays Empathic Concern – assesses “other-oriented” feelings of sympathy and concern for unfortunate others Personal Distress – measures “self-oriented” feelings of personal anxiety and unease in tense interpersonal settings.
60-item questionnaire designed by Simon Baron-Cohen (Baron-Cohen, 2004) at the Autism Research Centre at the University of Cambridge. It contains 40 empathy items and 20 filler/control items. On each empathy item a person can score 2, 1, or 0, so the EQ has a maximum score of 80 and a minimum of zero. In a study carried out on n = 197 adults from a general population, women scored significantly higher than men. The EQ reveals both a sex difference in empathy in the general population and an empathy deficit in AS/HFA.
(QMEE) – Affective Empathy (Mehrabian, 1972). The QMEE was designed to assess emotional empathy, which was defined as “a vicarious emotional response to the perceived emotional experiences of others.” It is distinguished from cognitive empathy. (Mehrabian, 1972). It consists of 33 items using 9-point ratings from (-4 = very strong disagreement to +4 = very strong agreement).