Essay on 3D – Issues in Shooting in Stereo and Creative Solutions
Challenges in Capturing and Exhibiting Stereographic Images Regarding the Relationship of How We See
This brings me to a series of challenges that I want to discuss with 3D that I often return to and have spent some time considering. I think it is worth looking at these, so the stereographer or cinematographer or even director will know some of the potential weaknesses in 3D and will begin to think of creative ways around some of these problems. There are a couple of interesting bits of awareness that I have come to realize that make it quite a challenging problem to successfully make good 3D, What is good 3D? For me this is the use of it to promote the story, emotion and art of cinema, as well as something that is of a good technical and aesthetic quality, much in the same vein as we might appreciate great cinematography or a beautiful image. It should avoid straining the viewer or causing physical discomfort for the largest possible majority in the audience.
Achieving this level of 3D is no easy task. You must know the basic technical rules of the proper parameters for your screen size, screen percentages, IA, convergence, focal length, parallax percentages, etc. You must also have a good sense of what works well in 3D and what doesn’t. You must have a sense of what to change in the physical scene to make it work better. You must have a good sense of the story and emotion involved.
These things I have realized are largely relationships in how we see or perceive the real world compared to how we capture, project and view stereographic images. One of these I have noted and considered a lot lately is how our vision system actually fails us. Any good magician knows all about this. It is obvious from some of these ‘holes’ in our vision system that seeing is actually more a collaboration of the eyes and cognition, the latter being from memory and prediction. One interesting thing that you can test right now is that of looking at only one point at any given time. We see through our peripheral vision and the outlying cones and rods a rough approximation of the world, but our brain largely fills it in and completes the picture as an entire volumetric physical world. You can only see one exact point at one exact time. If you try this it can be maddening or perhaps claustrophobic to a degree to come to this realization as you try to bring into focus the things that lie around that central point that you are looking at. A big part of our vision is also based on memory and prediction. Based on our understanding of the physical world and past memories our brain makes a lot of prediction about physical objects and begins to complete the full picture in our brain. This is the only way for us to take in and handle the sheer amount of information that is around us. So with this in mind, stereopsis is a part of this larger system. We have expectations of shape and distances based on past memory of the world. This fact that we can only see one point at a time is actually a tremendous benefit for stereo.
One slight side note to this is that there are problems with stereo vision that are similar to problems with stereo cinematography that we experience everyday. One example is a repeated background. Go to a chain link fence or wallpaper with a repeated pattern and then look at your hand in between you and the pattern. If you are placed properly you will see the background occasionally playing tricks with your depth.
I have had examples of this where for a second I felt like my eyes locked into a different depth and it very much threw me off. Another example is that you will often see glares or specular highlights in different places in each eye on a reflective surface. If you close one eye at a time, you can often notice this and in certain cases it can be very annoying. For instance, looking down at a reflective cell phone screen can often cause this. Another similar failure is looking at a reflection in a rounded piece of glass, such as a rear windshield in a car. Depending on your angle, one eye may see a slightly distorted view of the reflected background and the other eye may see a very distorted view. This can actually be painful as I experienced this very recently. The interesting difference in the real world is that I can simply close my eyes, move my head or look at something else to fix it. In the stereo cinema world, you can look away from the shot of course, but then what is the point of the shot. You are intended to view the film from start to finish and you shouldn’t have to look away based on a technical problem.
One of the biggest problems to realize, quite simply, is what is actually happening in a stereographic image versus how our eyes see the world. Since we can converge on a only a single point or single object with our eyes, and the rest of our field of view the brain disregards or physically can’t exactly look at, it enables us to comfortably fuse anything around us without concern that we won’t be able to fuse anything else we may choose to look at. Whereas in a film, every possible point that would need to be fused from the scene or shot must be presented in a safe manner as there is a maximum on screen parallax limit in the foreground and background that we can take. In the real world, all these points exist as real physical objects that have a real place in depth and our eyes can converge on any point no matter how near or far they are.
This could certainly lend fire to the debate of shallow depth of field in 3D vs deep depth of field in 3D. Keeping more of foreground and background (mainly background) out of focus is obviously a part of cinematic grammar that has evolved over the last one hundred or so years of filmmaking in order to help direct the audience’s attention and can be argued that it is more like the way we see. It is certainly a closer approximation of it and since we can only really look at one thing at a time, this is a good way to come closer to achieving that on a planar, non three dimensional surface that is to represent three dimensions.
The same can be said for stereographic images. Some find it more distracting to look at out of focus images in 3D than 2D, and I could certainly see it being more distracting in the foreground in 3D, but I personally think it works. I actually think out of focus 3D images are very engaging and can even be mesmerizing at times. It doesn’t bother me, and I think to a large extent it comes with becoming more comfortable looking at stereographic images. If people were more used to it, it may not be that strange.
On the other hand, a deep depth of field scene could be argued to be a better representation of real world vision as it gives us the world exactly as it is, and we choose what we want to look at. This presents a bit more of a problem in 3D, as this is where you must exert more control to keep maximum parallaxes to a safe level to “squeeze” in the real world to the theater space. Additionally directors have reported it to be counter productive and a complete reverse of years of film language of using focus as a tool to guide the viewer’s eye. Suddenly, we now have the ability to roam visually around the scene not just on a single plane, but actually in depth by reconverging our eyes. This is thought to take longer for the brain to process and as a relatively infantile medium, invites people to wander the scene. This may change when people are more used to seeing 3D. I think both methods can and should be employed based on the shot, scene or story.
In all of this, the main challenge is still to capture the field of view volumetrically and ‘compress’ it into a much smaller fusion range than we experience in life, especially if you are shooting with a deep focus look. The only time we really experience limits in fusion in the real world is when something is extremely close to our eyes. For instance, bring your finger very close to your nose slowly. At about an inch or so away, I can just barely fuse it and it is very uncomfortable- any closer and I simply cannot fuse it. Herein lies, in my opinion, one of the biggest challenges of stereoscopic cinematography. How do you creatively capture a shot or scene with a large range of depth and transmit it back to view in exhibition within a safe viewing range of parallax. For some scenes, this isn’t too hard. For example, if it is a smaller room or a close-up shot without a tremendous distance from foreground to background. For other scenes with greater distance to background, especially with a close-to-the-camera foreground, this means you have to reduce the lA and set convergence appropriately, which doesn’t always lead to the most natural looking 3D results, or simply flatter 3D. So it really becomes a creative challenge to figure out how you can capture the huge, expansive world and comfortably, but accurately bring it to the theater or viewing space.
The answer may lie in several places. In addition to changing your IA and convergence settings, I believe it is often necessary to adjust your set and subjects to aid in maximizing the 3D. So it becomes a combination of controlling 3D parameters and the scene. There are issues with this of course, in that it isn’t always practical to change the set, especially in the fast paced world of production. This is why it is best if a 3D film is planned for 3D from the start. This way you exert more control over the set or location. This still is not an option for every shot or situation. Imagine you want a beautiful natural volume shot of two people with a distant background. You can’t really change the scene much, so you are left with changing the 3D parameters and that leads to flatter, “cardboard” 3D. At the moment there really aren’t a lot of practical options for this type of shot. You could shoot at a stronger 3D setting and risk that maybe only a quarter of your audience would be able to comfortably view it. Another option is something much more involved and turns it into a visual effects shot. You could shoot both the background and foreground separately, meaning shoot the couple in front of a green screen and then you have the ability to maximize the depth for both the background and foreground to get more natural, voluminous 3D, and by compositing them in post you can keep the parallaxes still within safe distances. This is referred to as multi-rigging and is the kind of control that artists working in CGI have with their stereoscopic shots as they can very easily isolate anything in a scene. Obviously, this is not always practical on a lower or modest size budget film.
Additionally, green screen and roto work adds double the amount of work in post for a standard compositing shot. These types of shots always risk looking composited as well and not 100% natural. One other option is to shoot in parallel with a stronger lA setting since we are able to tolerate more negative parallax than positive. With this, your distant background would be at the screen plane and the scene would extend out in front. There are theories that for theater viewing, this is the way to go as we are not as aware of the screen itself. More on this later. One other final option is 3D post conversion. I would only recommend this for a quality conversion and if the money was available, as it is costly to do a good conversion.
I believe in the coming few years, there will be developments both in post and production to solve some of these difficulties in getting better 3D shots. In my opinion, good 3D is already a post heavy process, as alignments are almost always a must, not to mention the option to converge in post and the inserting of floating windows to fix window violations and allow more of a maximum 3D. I believe you will see developments in lenses and rigs that will allow shooting of more angles or something to that effect that willi allow a choice of IA’s or stereo strength in post. On the post end, there are already developments in software that can change IA settings, such as with the Ocula plugin for Nuke. I have heard that it works well on some shots, but needs a few more years of development to get to a level to make it a completely viable tool. There are a lot of developments in post conversion 3D, and if this ever gets to a point where it looked good without the enormous amounts of time and energy required to do a quality conversion, this could be a good tool to use in conjunction for difficult shots. I do not condone conversions that have been done hastily and do not look good as real depth representations.
In one way, this problem can be thought of much like latitude in film or video. With cameras you are attempting to capture the amount of light difference from dark to bright that we are used to with the human eye. Film being the closest, nothing so far has been able to do it. HDR is a great process, but hasn’t been fully applied to motion video yet. Therefore, this problem is akin to what you are trying to do in stereo in that you are trying to squeeze the amount of depth that we have free range looking at into a safe stereo viewing experiences where there are maximums. There are of course solutions- similar to our changing the set solution – in that a cinematographer must use more light in a dark area with a brighter background.
A few other interesting quirks and potential issues with 3D are based on it’s reliance on viewer position and screen size. Mathematically it matters how far you are from the screen. Based on your field of view and the fact that the closer you are to a set of disparity points on screen, the farther they will be apart in your field of view and therefore the more difficult to fuse. Additionally, since the depth of an object is always the same distance between you and the screen, our perception of shape distorts as we move away from the screen. In other words an object in negative space will stretch its shape as it keeps the same distance between you and the screen. So a shot will appear deeper and objects unnaturally longer as you back away from the screen. If you are off to the side in a theater, based on the same phenomenon, an object will actually skew and distort its shape based on your position toward you. With all of this, it would almost seem that building theaters that are specifically designed to give the majority of people ‘the best seat in the house. I imagine a theater that has seating designed from top to bottom, stacked one on top of the other with about 10 seats spanning from the center. This way, everyone is at the optimum distance to the screen and no one is getting the side-skewing effect. I realize of course, that this is pretty impractical and expensive and it is unlikely we will see anything like this. An issue relating to screen size is that if a film is shot for a large screen with a generally reduced IA for comfortable viewing across the board, it will be appear to be more flat on a smaller screen, and vice versa, making it difficult to create various versions, especially in today’s extremely diverse variety of output sizes and media devices. There are ways to format lighter 3D for a smaller screen by scaling in a bit, but you also lose a few pixels this way.
Related to this peculiarity, is the disconnect that happens in our brain when we are seeing a 3D image, much like the real world, and we move our head side to side and we don’t see the motion parallax that our brain expects and instead see the skewing mentioned above. This is not a very satisfying result and is a limitation of stereo viewing. This is a good time to bring up a different but much related medium- video games. I believe video games will be one of the domineering forces that 3D will ride in on. First, realize that video gamers are already very accepting of escapism and using other peripherals, such as a hand controller, to become more a part of the world in the game, which means that the addition of glasses isn’t as big of an annoyance. Secondly, gamers are always paying for better graphics, and ultimately a better way to be immersed into the game world. Additionally, video games come closer to actually solving the above-mentioned problem of no motion parallax when moving one’s point of view. With stereo video games, your brain is experiencing that closer to life depth and now if you want to see around that tree, you simply move your character and you can do it. It isn’t exactly the same as the real world, but it is a step closer. Head-mounted sensors adding that ability aren’t much of a stretch in this scenario.
I have read recently that we often use one eye dominantly and this can lead to difficulty in that we are using the second one much more than we normally would. Also, if you consider that in most cases in everyday life we are changing convergence too heavily from moment to moment. Activities like sitting at a computer, reading, writing, and many other standard activities often involve one or two primary planes of distance, with the occasional jump. Obviously some activities and some lifestyles are more aggressive and active than this and do involve more eye convergence, such as sports or driving, In a 3D movie you are often reconverging much more often than this due to changing shots and even with shots and this reconvergence is amplified beyond normal life due to compressed on screen maximums in parallax. Both of these realities are departures from our normal vision system when viewing a 3D film and therefore many people may feel fatigued due to this.
A well known problem that is even called a window violation deals with the fact that since we are using screens to show 3D imagery, there is an obvious limit at the frame boundaries to an otherwise volumetric representation. Imagine something in negative space or out in front of the screen that is also touching the edge of the screen- this is obviously going to be a problem as the edge of the screen cuts off something that is protruding out. This doesn’t occur in the natural world. It’s sort of like reverse occlusion. When object A covers up or lies in front of object B in the natural world, object B doesn’t stick out in front of object A at it’s edge. This expectedly causes an issue with our brains as it is trying to make sense of the anomaly. It can also be thought of as a form of retinal rivalry since one eye is seeing part of an object that the other eye is not. Our brain wants to complete that. Luckily, there is a way to deal with this in post-for the most part. By introducing a floating window- a compensatory black matte that goes in front of the offending object in one eye, we essentially complete what our brain needs to feel comfortable. It is essentially just extending the edge of the screen inwards but also since it is in only one eye, we are overlaying it in stereo 3D. With this you have complete dynamic control of the window of the film to bring it forward in space or even tilt the screen left or right.
Another interesting problem related to this is depth falloff. This occurs very naturally in our everyday experience as our eyes aren’t spaced far enough apart to have much disparity for distant objects. If you look at a distant mountain or even a set of buildings a few streets down, you will notice that you can’t distinguish much depth between them after a certain point. This is actually another helpful tool in the real world that allows us to be exposed to a great amount of depth variation. In the stereographic world, it often simply looks flat. I think the reason this feels so weird in a 3D film is that it really calling attention to something we see everyday and take for granted in the natural world. In a 3D film, we expect everything to look ‘3D’ and especially when a good portion of the film often presents itself this way, it does indeed feel more flat when a wide landscape shot with very little depth comes on screen. The other source of this problem is the limited resolution compared with our eyes. Since there is less resolution on even a 4K projected image to differentiate between the parallaxes and disparity than our eyes, the depth perception between distant objects is harder to detect and appears to falloff faster. The only solutions for this is often to frame something in the foreground to give the scene more depth or by widening the IA, which will produce more depth from even distant objects with a large enough IA. The first solution mentioned works pretty well, but may occasionally force a scene into something it was not originally by adding or including elements that may have not been in the script or shot list. The latter solution is often very difficult to do on a practical sense and may even be impossible for extreme distant shots. To get depth out of something say 1000 feet away, you might need an IA of 27 feet, which is obviously very difficult to achieve. The other side effect of this technique, called hyperstereo, is that the images often appear miniature, as if they were part of a tiny set. You can think of it as how a giant might see the world. It is an interesting effect, but it is just that, an effect. It doesn’t match our memory of shapes and perceived size, therefore it calls attention to itself. In time, it may become common place for this or the depth falloff effect to become a normal part of 3D cinematographic grammar, if 3D reaches enough of the mainstream for long enough this time out.
With all of these challenges in mind, we can now begin to go out and shoot and experiment and find creative solutions as the technological advancements also begin to pour in. We can take excitement in the fresh snow that is 3D… a new way to look at things in the well worn path of cinema. Despite the fact that the technology isn’t quite perfect and things are still developing, we have reached an age where it can be done and it can look quite beautiful. We can now focus on story and art more and less on the technical problems.