1. Introduction
Machinima is a technology for making films or game scenes in a real-time virtual environment. Machinima is not just a simple technology and there is still no software called machinima. By using machinima, we can create a virtual environment and all desired characters. Furthermore, in this virtual environment, we can animate as many actions as necessary. Nowadays, 3D application developers for computer games are more concerned with the ability to provide a natural experience in the virtual environment. 3D application developers—using some algorithms and methods—attempt to build every cinematography component in the virtual environment as naturally as possible to obtain satisfactory results. Machinima is a technology to do so. It is used not only for making films, but also for game applications. Machinima [
1] uses graphics technology to render 3D images in real time. A cinematic product can be produced using this technology. Besides, machinima is also a low-cost alternative for the full production of filmmaking [
2]. Nevertheless, to produce a higher quality cinematic product, research on improving camera control language, style incorporation on camera placement, etc. is much needed. In the real world, a director—giving life to a cinematic product—occasionally must create a storyboard to visualize the desired idea [
3].
An important element in machinima is camera controller—defining where the camera should be placed and how the camera captures images. There are many styles of camera placement, such as first-person view, third-person view, or bird’s eye view perspective. These styles bring out a different trait of the game scene. Every game genre has its unique style, thus applying a different style to another genre gives it a very different characteristic. For example, when we apply bird’s eye style on first-person shooter games—e.g., Doom, Half-Life, and Counter-Strike—we get a whole different characteristic. Similar to games, every movie director has his or her unique style. Applying noir style to a romance movie will certainly bring out a different impression of the movie.
Machinima has several advantages as compared to other techniques. The results from using machinima are obtained in real time at cheaper production costs. Even though there are many current studies on machinima—especially on camera positioning—there are only a few studies on the positioning of a camera based on director’s style. Furthermore, there is no research on profiling director’s style yet. Focusing our research on the issue of camera controller, we propose a novel system for measuring a director’s style using automatic profiling of his style. The objective of this research was to determine whether a virtual camera placement suits a director’s style. By achieving this objective, we can help animators measure a director’s style automatically during the creation of animation in the game. If the information of certain style can be extracted, we can conversely apply that style to a game. Imagine a game with a customized style such that every player can have his favorite style. For instance, The famous Mario Bros Game has been made into many versions and researched by many researchers [
4,
5]. This game is just a simple side-scroll game with a static camera placement. If we can change the camera engine behavior, this game will have a totally different ambience. Imagine if Mario Bros camera engine were coupled with Role Playing Game (RPG) camera engine such as Lufia, or even an Action Adventure camera engine such as that in Assassin’s Creed [
6,
7]. In
Figure 1, we can see some different camera positioning styles from Mojopahit Kingdom Game. The same game scene with different styles of camera positioning will give a different feel to the gamer.
Recent research on the domain of animation and game indicate that it an interesting and challenging topic. A technology to support production of animation and game is machinima. Machinima is a system that uses 3D graphics rendering technology in real time to produce a virtual cinematic product or to implement it in game development. Nowadays, the computer technology shifts from 2D to 3D technology. Henceforth, it also affects the development of game technology. The game perspective changes from 2D to 3D technology. The use of 3D technology is expected to be at a higher level. Game and animation are getting even closer to the real condition. The virtual world is expected to correspond to the real world.
Even though the novelty of this research—on how to measure a director’s style—is very high, we acknowledge that further studies are still needed to support the whole system.
Figure 2 presents general virtual camera placement. In this figure, we can see that some processes must be done before placing a camera.
Usually, in the production of game or animation, camera placement or movement is done by an animator or a photography director. However, manual placement of a virtual camera in the virtual environment requires modeling and calculations, which need to be repeated for each scene. This demands a substantial cost and time [
8]. Thus, the authors of [
9] proposed a basic set of camera control in computer graphics. Camera control is an accessible, yet challenging problem that any developer of interactive 3D graphics applications has encountered. The field of computer graphics is one of the most challenging fields of science in the computer world. The main concern on this area is how similar the virtual world to the real world is. Barry [
10] used multi-objective Particle Swarm Optimization (PSO) algorithm for virtual photography. PSO is used to apply a few rules in photography, such as rule of third, horizon line, and POI (Point of Interest).
There are many methods for how to place a virtual camera in a virtual environment. In [
11], the author used PSO method to solve the problem in Visual Camera Composition (VCC). The approach used in this research is a hybrid strategy. The first step is to calculate the camera position which may have some predefined restrictions. Then, the camera position is calculated using PSO of a predetermined area. The resulting output parameter is the camera position, orientation, and Field-Of-View (FOV). Drucker [
12] proposed a constraint for virtual camera in the virtual environment.
In [
13], a camera placement system is proposed. This system generates two cameras, but the system places only the secondary camera, while the first one is a camera in fixed position. The system proposes to use static behavior tree method. Fanani [
14] suggested some artificial intelligence on the camera using behavior trees and A* algorithm to follow the main actor. Hu [
15] suggested a new semiautomatic camera language to control a virtual camera in machinima environment. Terziman [
16] enhanced the placement of camera for first-person navigation view based on input parameters such as height and weight. The camera used for the system is a fixed camera. Christianson [
17] described some established techniques for camera control.
In [
18], the authors used the geometry input of an interactive 3D environment with a motion of a subject that is not known, as well as the flow of narrative elements that describe the actions taken in the 3D environment. The Narrative Element is a component of the conversation to provide a relevant information of action in the story. There is a four-step process for calculating viewpoint and transition: selecting elements of narrative, counting director volumes, editing director volumes, and counting transitions. Lino [
19] proposed a new system called Toric space, a novel and compact representation for intuitive and efficient virtual camera control as well as an effective viewpoint interpolation technique which ensures the continuity of visual properties along the generated paths. Benini [
20] investigated four different inherent characteristics of single shots which contain indirect information about scene depth. He used Support Vector Machine (SVM). Ferreira [
21] proposed a system called
IVE (Intelligent Virtual Environment). The goal of IVE is to simulate behavioral simulations of virtual agents based on information from the virtual environment. There are four modules in this framework: IC, IVE, AgentSim and Visualizer.
Junaedi [
22] suggested multi-behavior agent using Particle Swarm Optimization (PSO). This approach could be applied in virtual camera. In [
23,
24], the researcher proposed a system called Darshak. This system automatically constructs a cinematic narrative discourse of a given story in a 3D virtual environment. A nine-operator variable is proposed for the fixed camera. The shot that is generated by Darshak is visualized on a 3D Game engine. Lima [
25] proposed an intelligent cinematography director for camera control in plot-based storytelling systems. The role of the director is to select in real-time the camera shots that best fit for the scenes and present the contents in an interesting and coherent manner. The knowledge uses SVM method. Jaafar [
26] proposed the behavior (goal seeking and obstacle avoidance) of autonomous agent navigation in a virtual environment using a fuzzy controller.
In [
27], the author made the architecture of Storytelling System using four modules, namely scriptwriter, scenographer, director, and cameraman, which are the main components of this architecture, while the focus of this research is a module director. The system uses SVM.
Dib [
28] explored the effect of perspective view in educational animations of building construction management tasks by comparing the egocentric perspective view (first person view) and the exocentric perspective view (third person view). Cherif [
29] classified the video shots based on golden ratio in the human body. There are seven types of shot: extreme long shot (XLS), long shot (LS), medium long shot (MLS), medium shot (MS), medium close up (MCU), close up (CU), and extreme close up (XCU).
Tsai-Yen [
30] proposed a virtual system that can generate a sequence of camera shots automatically according to the screenplay. The system is decomposed into three modules imitating the roles in a real filmmaking process. This paper also suggests some preference parameters carrying user’s aesthetic style, which is used for each module to identify. He [
31] suggested some systems for the virtual camera. The authors stated that there are difficulties to implement automatic cinematography. They used sixteen different modules to implement it. Hornung [
32] suggested autonomous camera agents for transfer cinematography rule in interactive narratives and games. Burelli [
33] created a virtual camera that will predict camera position based on some parameters and the data will be analyzed using Machine Learning.
Burelli [
34] proposed the camera path planning in a virtual environments by modeling both camera movement and orientation parameter with the multiple Artificial Potential Fields (APF) in their system. The system supports some constraints: visibility, projection size, and view angle. There is a local minimum problem in the implementation for complex environment. An improvement of APF is proposed in Burelli [
35] by prioritizing the frame constraint and calculating the initial position based on the first vantage angle or relative position, enforcing the frame coherence and consisting in interpolating the actor trajectory, and dynamically tuning the weights of frame constraint in the objective function.
In the development of a scene, it may take several cameras, not only one camera, because sometimes the director needs to emphasize some actions or property more than other parameters. Using only one camera will give only one point of view and it needs some time to move to another position. Each camera is needed to get a different viewpoint for the same scene. Tamine [
36] showed us how to measure a good quality viewpoint. This research proposes two approaches of viewpoint evaluation method based on the characteristic of the nature of input information: the low-level and middle-level methods. Vazquez [
37] developed a system to automatically select a good viewpoint based on image-based modeling to find the optimal view for each component.
A benchmark for virtual camera control is proposed [
38] by measuring the accuracy, reliability, and initial convergence time. The simulation uses three scene backgrounds (forest, house and rocky). The backgrounds cover static objects as well as include some moving objects for the experiment. This benchmark differs from other research that uses view point evaluation.
Fuzzy logic has been widely used in research in the field of automation and manufacturing industries, optimization problem or management problems. Lukovac [
39] proposed a neuro fuzzy model for developing a human resource portfolio. The hybrid algorithm combines fuzzy and neural network. This neuro fuzzy model uses fuzzy set input variable. Pamucar [
40] used type-2 neuro fuzzy network to solve logistics problem. This nonlinear problem is optimized using fuzzy value as the input for type-2 neuro fuzzy. Another method uses neuro fuzzy approach [
41]. This research uses adaptive neuro fuzzy model to solve the vehicle route selection problem referred to as Vehicle Routing Problem (VRP). The main problem is to model the language, thus the research uses fuzzy set to represent the input and output variables. Fuzzy logic membership function can reflect the situation in accordance with the real reality of life. Pamucar [
42] also used fuzzy logic system for crossing levels selection, so the investment for safety equipment can be included into the automatic control strategy. The research results show that the developed fuzzy logic system can learn and imitate expert evaluations as well as demonstrate a competence level comparable with the competence level of experts. Sremac [
43] developed ANFIS model to determine the economic order quantity. ANFIS is the modern class of hybrid systems of artificial intelligence. It is described as artificial neural networks characterized by fuzzy parameters. By combining two different concepts of artificial intelligence, it is expected to exploit the individual strengths of fuzzy logistics and artificial neural networks simultaneously. Fuzzy is widely used because of its similarity with real value, not boolean value. In this paper, fuzzy logic is used to determine various variations of camera positioning. This paper uses fuzzy logic because of the similarities between languages in the world of cinematography and fuzzy language, and also because of the grey value in the cinematography rules, not the boolean one.
Many studies discuss how to position a virtual camera in virtual environment. Some of them use an evolutionary based algorithm such as Particle Swarm Optimization or machine learning based method such as Support Vector Machine. Each method has advantages and disadvantages. Swarm algorithm approach needs more time for repetitive calculation processes. Indeed, there are several studies related to camera positioning, but few discuss director’s style, especially how to measure the camera placement to profile a director’s style. The other studies only discuss how to put the cinematography rules on their virtual camera engine, but not based on a director’s style. This paper does not discuss how to position a camera, but instead how to profile the style. Usually, this kind of research uses questionnaires as its measurement. However, this paper proposes an automatic system to recognize the style.
This paper is organized as follows.
Section 1 contains the state of the art of the proposed system, why we need to profile the director’s style and reviews the related works. In
Section 2, we discuss the basic theory of cinematography including director’s style.
Section 3 is the main part of this research, where we describe the proposed method for profiling. The experiments and the results are described in
Section 4.
Section 5 is the conclusion and discussion of this research.
2. Cinematography and Director’s Style
A motion picture consists of many shots. Every shot requires the camera to be placed in the best position. Cinematography refers to the lighting and camera arrangement to record a photographic image for cinema [
44]. Film is an art form with both language and aesthetic [
45]. To produce a good film, there are several factors to consider. Best arrangement of cameras and lighting can make a film more interesting and appropriate to the storyline or screenplay. Good cinematography will greatly help the audience understand the story. For games, especially 3D or RPG games, we need cinematography rules to make the game real.
Some factors should be considered to produce a good film [
46].
Camera Angle
Camera angle means the specific location of the camera in shooting a film scene at a certain time, or we can say the camera angle is a point of view that is recorded by the camera. A scene can be taken from various angles simultaneously to get a different perspective from the audiences’s point of view. Camera angles include objective shot, subjective shot, and point of view shot. The shots can be categorized into close-up shot, medium shot, and long shot [
47].
Continuity
Continuity is a state condition between one frame and another frame. Without continuity, the frame will not connect with other frames [
48]. A picture with perfect continuity is preferred because it depicts events realistically. A picture with wrong continuity action is unacceptable because it distracts rather than attracts. This implies that an action should flow smoothly across every cut in a motion picture
Cutting
Cutting is the process of changing the point of view [
49]. Cutting is an important process in film making because cutting has an important role in building a plot of a story. Without the right cutting, the audience will be distracted from the plot of the film.
Close Up
Close Up is technique in photography to take a frame near the objects.
- –
Medium Close Up refers to taking frame such that the target is approximately midway between waist or shoulders and above the head.
- –
Head and Shoulder Close Up means taking a frame from below the shoulders to above the head.
- –
Head Close Up will capture the head area only.
- –
Choker Close up is a shot that covers areas below lips to above eyes.
- –
Extreme close up shows tiny objects (e.g. eyes, rings, etc.) or areas, or small portions of large subjects or areas being filmed in extreme close up so that they appear greatly magnified on the screen.
- –
Over The Shoulder Close Up
is a typical motion picture shot, usually used in still photography, presenting the close up of a person as seen over-the-shoulder of another person in the foreground, which provides an effective transition from objectively filmed shots to point-of-view close up.
Composition
Good composition is an arrangement of pictorial elements to form a unified, harmonious whole. Composition is about how a director directs a player, puts the background, property and all elements into a single unity to form a beautiful harmony as the way the story has been made. Placement and movement of players within the setting should be planned to produce favorable audience reactions. Making a good arrangement of elements will result in some impressions of static, dynamic, or others.
Every movie director has his unique style to direct and take scenes in his or her work. This artistic style distinguishes a director from another—and accordingly his product from other products. Currently, the process of developing a cinematic product requires considerable human intervention. This is due to the varying ability and behavior of each camera operator. Thus, the director’s involvement is necessary and sometimes, a director even needs to personally capture the motions to acquire the desired quality.
One famous director is James Cameron who directed the box office movie
Avatar. This movie [
50] can be considered a milestone in the birth of film production based on a virtual environment. He created a virtual camera technology to record his desired scenes during the production of
Avatar. This virtual camera has the functions of a normal camera, but it could be used in the virtual environment. James Cameron is famous with his shooting style that highlights detailed components. In his movie
Titanic, we can clearly see the details of the ship. Meanwhile, Christopher Nolan—the director of
Dark Knight and
Man of Steel—always highlights the realistic elements in his films.
Another famous director is Quentin Tarantino [
51,
52,
53], with a number of successful box office films, including
Kill Bill,
Pulp Fiction,
From Dusk Till Dawn and many more. Quentin Tarantino is a brilliant student in filmmaking and an expert in using cinematic language in his works to express his thrilling stories visually. Every cinophile will know and say that this is his style. Most of his directing styles are action thriller and darkness with the addition of an element of sadism.
Figure 3 shows some of Quentin Tarantino’s trademark style.
The following are some styles of camera angles and shot-making (Point of View) which are often used by Quentin Tarantino in his films:
The Trunk and Hood POV
In this style, a picture is shot from below as if it is taken from a car trunk. He made many films using this style.
Corpse POV
This style is another variation of Trunk and Hood POV, but this one is taken from the eyes of the victim—that is, someone who is dead or lying on the ground. These two styles are variations of low angle shot.
Tracking Shot
A Tracking Shot is a shot taken from the perspective of someone who is following the main actor. This shot is taken from someone’s eyes trailing the main actor. Sometimes this style is called the following shot.
God’s Eye Shot
This shot is recorded with the camera positioned directly high above the actors to convey that something bigger than them is the subject, or in other words, as though a god is watching what the actors are doing.
Black and White Shot
Black and white style is a shot in monochrome to establish a certain ambience in the course of the story. It can be a flashback—that is, recalling past events—or a special emphasis on a scene before scene transition.
Close Up on Lips
Close up shot on the lips is a shooting style in which the actor’s lips are shot in full close up. This is to give the impression of a mysterious person or a sensual effect. This shot is usually taken in the beginning of the movie when a mysterious character appears. Another name for this shot is Choker Close Up style.
Violent Awakening
This style takes a close up view from someone who suddenly wakes up from a sleep or a coma. This is to show the impression of tension and surprise.
Besides the styles above, Quentin has a preference for adding effects (e.g., blood splash) and recurring objects (e.g., cars). However, for this research, we only used five different styles of camera positioning from Quentin Tarantino. Other styles, such as black and white shooting as well as recurring objects, were not considered.
4. Results
In this research, we designed a simple movie scene using 3D game engine to simulate our system. We used some scenes to generate a movie clip and the profile of the movie clip. The scenes are based on the aforementioned storyboards.
In
Figure 19, we can see the whole system that we proposed, but the main focus in this research is the profiling process. Assuming we already have a director’s style dataset, we can use any approach—such as fuzzy logic, swarm methods, or machine learning—for this process. This process requires elaborate experiment and it is a challenging process. However, for this research, we used the styles of an expert director. Two different sets of director’s style from the expert can be seen on the storyboard. For every scene, there are some actions and a different shooting style is applied. The output of this process is the camera positioning based on the applied style.
Before we develop an animation or movie clip, we can add some effects—such as transition, sound, and lighting. Then, we develop an animation based on the path of the storyboard. The output of this process is an animation. For every frame in the animation, we extract some values of coordinates and we add these values to our proposed system for profiling the director style. The profiling method we propose is fuzzy logic approach. The outputs of this system are area graph and histogram. Using the histogram, we then decide the director’s style.
In this research, the animation and the experiment were developed using 3D Game Engine Unity. For the experiments, we used five different scenes and two different styles. The same scene and action with different styles are shown in
Figure 20. The first style is based on Quentin Tarantino and the second is another style or generic style. We develop a moving path for the character to take some actions and two different moving path styles for the virtual camera.
In
Figure 20, we can see the same action for the same scene with different camera positions. From the visual perspective, we observe the difference between Quentin Tarantino’s style and generic style. The walk action, as shown in
Figure 20a is captured from behind (follow shot), but in
Figure 20b, the action is captured from the left side of the character. The first style is follow shot and the second style is left-side-scrolling point-of-view shot. Similar to the fighting action, the first one is captured from over-the-shoulder medium shot, but the second one is captured from the left side using long shot. We designed five different scenes, approximately 24 s–1 min long, and we used 30 fps rate. Hence, we have about 600–1600 frames.
Every scene and style is visualized using diagrams: area plot diagram and histogram. In the area plot diagram, the x-axis is the number of animation frames and the y-axis is the fuzzy output value. For the histogram, the x-axis is the value of fuzzy and y-axis is the frequency of occurrence of the values.
Figure 21 is the fuzzy result of the first style or Quentin Tarantino’s style. Next,
Figure 22 is the fuzzy result of the second style. In these figures, we can see that, even though the scenes are the same, the graphs are different. This happens because of the different output fuzzy values from different styles.
From the fuzzy result, we create another diagram—the histogram—for each scene and style. The histogram of the first style is shown in
Figure 23.
Figure 24 shows the histogram of the second style. These histograms show the frequency of the appearance of the fuzzy values. For the profiling result, the threshold value is one. For Quentin Tarantino’s style (Style 1), most values appear on the right side of number one, as shown in
Figure 23, and for the other style, most values are on the left side of number one, as shown in
Figure 24. From this visualization diagram, we can profile two different styles.