*2.2. Production Phase. Shooting Inside the Torre de la Cautiva*

Due to the prevailing weather conditions on the days of shooting, i.e., cloudy with clear spells, it was decided to use a vertical rig of three cameras for the main room and four cameras in the courtyard, given the more complicated geometry but simpler decoration of the latter (see Figure 1b). A route was established that ran parallel to the walls, with stops to take shots every 15 cm. At each height, four passes were made in both rooms, starting almost four meters from the parament that was being portrayed and moving closer. To compose the first horizontal band, the cameras were placed on a vertical pole on a tripod, with the camera axis parallel to the ground and perpendicular to the wall at approximately 80, 130 and 170 cm. In the second band, the cameras were settled at approximately two meters, 2.35 m and 2.70 m. Another band was not established due to the risk of excessive camera shaking causing a blurring of the pictures at the exposure speeds that were used. Furthermore, the user experience with 3DoF VR indicates that

the starting point for observation should be at the height of the person experiencing the room, which enables emphasis to be placed on the strong areas in the final result. This way of working requires coordination between the teams that are in charge of the script and photogrammetry, as demanded by some authors [17]. starting point for observation should be at the height of the person experiencing the room, which enables emphasis to be placed on the strong areas in the final result. This way of working requires coordination between the teams that are in charge of the script and photogrammetry, as demanded by some authors [17].

moving closer. To compose the first horizontal band, the cameras were placed on a vertical pole on a tripod, with the camera axis parallel to the ground and perpendicular to the wall at approximately 80, 130 and 170 cm. In the second band, the cameras were settled at approximately two meters, 2.35 m and 2.70 m. Another band was not established due to the risk of excessive camera shaking causing a blurring of the pictures at the exposure speeds that were used. Furthermore, the user experience with 3DoF VR indicates that the

*Heritage* **2022**, *4* FOR PEER REVIEW 8

The courtyard needed more time to adjust the exposures and shots, and a Canon 5D camera was used to take cover photographs in hand-held mode at the brightest times of the day. These photographs focused on the most awkward details and the most difficult angles, considering that the main hall had arches with muqarnas and other filigree decorations. This camera was also used to take photographs of the coffered ceiling, which was particularly difficult due to its inverted boat hull shape, making it very difficult to achieve a sufficient depth of field. This situation was aggravated by the minimal relief and the homogeneity of the surface. The courtyard needed more time to adjust the exposures and shots, and a Canon 5D camera was used to take cover photographs in hand-held mode at the brightest times of the day. These photographs focused on the most awkward details and the most difficult angles, considering that the main hall had arches with muqarnas and other filigree decorations. This camera was also used to take photographs of the coffered ceiling, which was particularly difficult due to its inverted boat hull shape, making it very difficult to achieve a sufficient depth of field. This situation was aggravated by the minimal relief and the homogeneity of the surface.

The shots of the main room were taken with a focal length of 25mm, at 100 ISO, an aperture of f/8, a manual exposure at 1/6 of a second, and were corrected according to the lighting conditions; manual white balance with presets for dense and light clouds and Color Checker images were taken in both situations (Figure 4). The shots of the main room were taken with a focal length of 25mm, at 100 ISO, an aperture of f/8, a manual exposure at 1/6 of a second, and were corrected according to the lighting conditions; manual white balance with presets for dense and light clouds and Color Checker images were taken in both situations (Figure 4).

**Figure 4.** A Color Checker shot near the epigraphic poems on the wall was used to quickly and automatically adjust the color, and to ensure fidelity of the textures and models. **Figure 4.** A Color Checker shot near the epigraphic poems on the wall was used to quickly and automatically adjust the color, and to ensure fidelity of the textures and models.

In the inner room, being less accessible to natural light, 20 s exposures were taken. A fourth Canon 60D camera with a 25 mm focal length was attached. The material was simultaneously saved as 30 Mb RAW files and 8 Mb JPEGs. In the inner room, being less accessible to natural light, 20 s exposures were taken. A fourth Canon 60D camera with a 25 mm focal length was attached. The material was simultaneously saved as 30 Mb RAW files and 8 Mb JPEGs.

To obtain the images of the characters, a setup was made with a turntable with tickers to mark control points, a lighting installation with four flash-type fixtures (4 Linkstar DL-500D) and three vertical rigs of five Canon EOS 1200D cameras each [29]. Once characterized with period costumes and styling, the models were asked to pose as still as possible while the *giratutto* was set in motion. After photographing the Color Checker, a shot was taken every 30 degrees (Figure 5a), synchronized to the photographic flash lights To obtain the images of the characters, a setup was made with a turntable with tickers to mark control points, a lighting installation with four flash-type fixtures (4 Linkstar DL-500D) and three vertical rigs of five Canon EOS 1200D cameras each [29]. Once characterized with period costumes and styling, the models were asked to pose as still as possible while the *giratutto* was set in motion. After photographing the Color Checker, a shot was taken every 30 degrees (Figure 5a), synchronized to the photographic flash lights through an Arduino ad hoc setup (Figure 5b) and triggering the 15 cameras simultaneously. This was done with two poses of the six different models, sometimes changing the characterization. Finally, the shots of three characters were used, one of them in duplicate.

through an Arduino ad hoc setup (Figure 5b) and triggering the 15 cameras simultaneously. This was done with two poses of the six different models, sometimes changing the characterization. Finally, the shots of three characters were used, one of them

through an Arduino ad hoc setup (Figure 5b) and triggering the 15 cameras simultaneously. This was done with two poses of the six different models, sometimes changing the characterization. Finally, the shots of three characters were used, one of them

*Heritage* **2022**, *4* FOR PEER REVIEW 9

*Heritage* **2022**, *4* FOR PEER REVIEW 9

**Figure 5.** (**a**,**b**). Image of one of the models (**a**) on the giratutto with the control points and the color checker. In (**b**), the Arduino device used that is connected to a small turntable is shown. **Figure 5.** (**a**,**b**). Image of one of the models (**a**) on the giratutto with the control points and the color checker. In (**b**), the Arduino device used that is connected to a small turntable is shown. **Figure 5.** (**a**,**b**). Image of one of the models (**a**) on the giratutto with the control points and the color checker. In (**b**), the Arduino device used that is connected to a small turntable is shown.

### *2.3. Post-Production Phase 2.3. Post-Production Phase 2.3. Post-Production Phase*

in duplicate.

in duplicate.

The first part of the processing (Figure 6), photogrammetry, required an initial step to prepare the photographs, using the RAW files [29], with two objectives: to adjust the color from the images that had been taken from the chart and to try to avoid occlusions due to incorrect exposures (Figure 7a,b). The first part of the processing (Figure 6), photogrammetry, required an initial step to prepare the photographs, using the RAW files [29], with two objectives: to adjust the color from the images that had been taken from the chart and to try to avoid occlusions due to incorrect exposures (Figure 7a,b). The first part of the processing (Figure 6), photogrammetry, required an initial step to prepare the photographs, using the RAW files [29], with two objectives: to adjust the color from the images that had been taken from the chart and to try to avoid occlusions due to incorrect exposures (Figure 7a,b).

**Figure 6.** Pipeline for the photogrammetry phase. Compiled by the authors.

*Heritage* **2022**, *4* FOR PEER REVIEW 10

**Figure 6.** Pipeline for the photogrammetry phase. Compiled by the authors.

**Figure 7.** (**a**,**b**). Image before and after color correction from the RAW file. Note how the walls are much darker in the image on the left ((**a**), before applying corrections) owing to the contrast and how the edges of the column on the balcony are barely visible. When using a RAW file, the EVs can be modified to optimize the image before (**b**) copying it to the photogrammetry software. **Figure 7.** (**a**,**b**). Image before and after color correction from the RAW file. Note how the walls are much darker in the image on the left ((**a**), before applying corrections) owing to the contrast and how the edges of the column on the balcony are barely visible. When using a RAW file, the EVs can be modified to optimize the image before (**b**) copying it to the photogrammetry software.

Meanwhile, camera alignment tests were carried out to check if the software, Reality Capture 1.3, recognized the camera positions well and provided enough homologous points, which in this case was a number close to 22 million. Through adjustments and corrections, a corrected data set was obtained and redundant points were eliminated from the dense point cloud in order to achieve one that was as faithful as possible to the shots. This was necessary because the software is asked to propose a geometric model that is built with triangles, and this model is usually disproportionately large with a huge number of triangles and is very difficult to handle. In this case, about 50 million triangles were counted in the first version of the mesh. Meanwhile, camera alignment tests were carried out to check if the software, Reality Capture 1.3, recognized the camera positions well and provided enough homologous points, which in this case was a number close to 22 million. Through adjustments and corrections, a corrected data set was obtained and redundant points were eliminated from the dense point cloud in order to achieve one that was as faithful as possible to the shots. This was necessary because the software is asked to propose a geometric model that is built with triangles, and this model is usually disproportionately large with a huge number of triangles and is very difficult to handle. In this case, about 50 million triangles were counted in the first version of the mesh.

With careful planning of the photographic sessions, the photogrammetry software can produce a geometry with sufficient detail and with textures that are true to the original surfaces. Once the cloud of points was obtained through photogrammetry, simplifying the mesh proved to be useful for the subsequent tasks, as recently noted elsewhere [17,29]. However, there are programs available that specialize in this type of task, such as Zbrush, which recreates the mesh in a much simpler form through a process known as retopology. As a result, the mesh was eventually reduced to about 200,000 polygons. This process was necessary for two reasons: first to restore areas that have been left with little or no detail [17], which in our case affected the corners of the floor and some small interior sectors of the balcony arches; second, it offers the possibility of creating clean UVs in order to obtain masks that extend the resolution of the textures using multi-UDIM. If the program originally generated an 8K texture for each room, through this procedure it was possible to have that resolution for each of the maps that were generated, 27 in total (see With careful planning of the photographic sessions, the photogrammetry software can produce a geometry with sufficient detail and with textures that are true to the original surfaces. Once the cloud of points was obtained through photogrammetry, simplifying the mesh proved to be useful for the subsequent tasks, as recently noted elsewhere [17,29]. However, there are programs available that specialize in this type of task, such as Zbrush, which recreates the mesh in a much simpler form through a process known as re-topology. As a result, the mesh was eventually reduced to about 200,000 polygons. This process was necessary for two reasons: first to restore areas that have been left with little or no detail [17], which in our case affected the corners of the floor and some small interior sectors of the balcony arches; second, it offers the possibility of creating clean UVs in order to obtain masks that extend the resolution of the textures using multi-UDIM. If the program originally generated an 8K texture for each room, through this procedure it was possible to have that resolution for each of the maps that were generated, 27 in total (see Figure 8a).

Figure 8a).

**Figure 8.** (**a**,**b**). UDIMs with textures (**a**) and masks (**b**) prepared for back-projection on the optimized model once the re-topology process had been completed. **Figure 8.** (**a**,**b**). UDIMs with textures (**a**) and masks (**b**) prepared for back-projection on the optimized model once the re-topology process had been completed.

The textures that were obtained in this way were albedo (from photogrammetry with color and shadow or exposure correction), tile mask, wooden coffered ceiling mask, stone surfaces mask, floor mask, albedo for the artisan scene (because this scene is conceptualized at the time before the room decoration and therefore needs to be colorless), a displacement map, a normal map, a roughness map and a specular map [17,30]. The textures that were obtained in this way were albedo (from photogrammetry with color and shadow or exposure correction), tile mask, wooden coffered ceiling mask, stone surfaces mask, floor mask, albedo for the artisan scene (because this scene is conceptualized at the time before the room decoration and therefore needs to be colorless), a displacement map, a normal map, a roughness map and a specular map [17,30].

A few days after the capture, the photogrammetry-generated 3D model visualization tests were carried out with the Clarisse iFX 3.5 program. Re-topology was a complex and laborious task, especially the geometry of the arches, in which there were many small details that required significant precision. It was also necessary to intervene in the wooden coffered ceiling, which was difficult to capture due to the lack of light and the lack of detail, which prevented the software from differentiating between sectors and locating the homologous points. Nowadays, in the same programs, there are new algorithms that perform very accurate automatic re-topology, especially for static models, but at the time of this project they did not exist, so everything was done by hand. A few days after the capture, the photogrammetry-generated 3D model visualization tests were carried out with the Clarisse iFX 3.5 program. Re-topology was a complex and laborious task, especially the geometry of the arches, in which there were many small details that required significant precision. It was also necessary to intervene in the wooden coffered ceiling, which was difficult to capture due to the lack of light and the lack of detail, which prevented the software from differentiating between sectors and locating the homologous points. Nowadays, in the same programs, there are new algorithms that perform very accurate automatic re-topology, especially for static models, but at the time of this project they did not exist, so everything was done by hand.

The shading and look-dev processes (Figure 9), which are common in the work of 3D artists in audiovisuals or video games, for example [17], were directly responsible for the desired photorealism in this study, since they controlled the perception of the surface of the objects and considered their reaction to the light illuminating the scene and the The shading and look-dev processes (Figure 9), which are common in the work of 3D artists in audiovisuals or video games, for example [17], were directly responsible for the desired photorealism in this study, since they controlled the perception of the surface of the objects and considered their reaction to the light illuminating the scene and the atmosphere.

atmosphere. The modeling and look-dev of the assets were also performed using Maya and Zbrush. In total about thirty were worked on, although not all of them were included in the final scenes. Carpets, curtains, cushions and pillows, the brazier, the jamuga (a type of medieval chair), the latticework, vases, the rebec, scaffolding, masonry objects and painter's tools, lamps, etc., were all inspired or taken directly from originals that were kept in the Alhambra Museum or the National Archaeological Museum of Madrid.

As mentioned above, these phases do not have to be carried out consecutively and thus, to save time, the integration of the characters and assets in the scenery can be done gradually, starting by obtaining a geometry of materials that are not yet fully worked on or are in wireframe. Positions, sizes, scales and points of view can be harmonized with enough experience to handle scene overlays in this way (see Figure 10a,b).

*Heritage* **2022**, *4* FOR PEER REVIEW 12

**Figure 9.** Diagram of the workflow of the last phases of post-production, just before obtaining the final artwork. Own elaboration. **Figure 9.** Diagram of the workflow of the last phases of post-production, just before obtaining the final artwork. Own elaboration. or are in wireframe. Positions, sizes, scales and points of view can be harmonized with enough experience to handle scene overlays in this way (see Figure 10a,b).

**Figure 10.** (**a**,**b**). Craftsman scene in plan (**a**) and perspective (**b**) views in wireframe render. It is crucial to arrange the elements to be able to work interactively with quick previews within the layout of the scenes. **Figure 10.** (**a**,**b**). Craftsman scene in plan (**a**) and perspective (**b**) views in wireframe render. It is crucial to arrange the elements to be able to work interactively with quick previews within the layout of the scenes.

(**a**) (**b**) **Figure 10.** (**a**,**b**). Craftsman scene in plan (**a**) and perspective (**b**) views in wireframe render. It is crucial to arrange the elements to be able to work interactively with quick previews within the layout of the scenes. However, in order to check if the scene works, it is necessary to test with more or less the final textures and a setting that brings the scene together properly, after ensuring that However, in order to check if the scene works, it is necessary to test with more or less the final textures and a setting that brings the scene together properly, after ensuring that the chosen light interacts correctly with the surfaces and the camera positions and animations are well chosen. In addition to other qualities, the view of the integrated scene must not be disjointed. To achieve this, the first rendering tests were performed with Arnold. However, each image took between three and four hours to process at 4K with its optimal parameters (Figure 11), which was unacceptable, especially to observe the result of the animation of lights and cameras and make final decisions about its placement on screen. It was possible to reduce the time for each image at 4K to 25 min using Houdini 17.5 with Octane render 2019 1.2.0, which is a GPU-based engine, and a computer equipped with two NVIDIA 2080 TI GPUs.

However, in order to check if the scene works, it is necessary to test with more or less the final textures and a setting that brings the scene together properly, after ensuring that

equipped with two NVIDIA 2080 TI GPUs.

equipped with two NVIDIA 2080 TI GPUs.

*Heritage* **2022**, *4* FOR PEER REVIEW 13

*Heritage* **2022**, *4* FOR PEER REVIEW 13

**Figure 11.** (**a**,**b**) Light tests are essential to observe the integration of surfaces and environmental elements in the scene, as well as the suitability of the arrangement of elements and camera positions. For this, it is necessary to launch renders, which in conventional form can take up to four hours using powerful computers for processing. **Figure 11.** (**a**,**b**) Light tests are essential to observe the integration of surfaces and environmental elements in the scene, as well as the suitability of the arrangement of elements and camera positions. For this, it is necessary to launch renders, which in conventional form can take up to four hours using powerful computers for processing. **Figure 11.** (**a**,**b**) Light tests are essential to observe the integration of surfaces and environmental elements in the scene, as well as the suitability of the arrangement of elements and camera positions. For this, it is necessary to launch renders, which in conventional form can take up to four hours using powerful computers for processing.

the chosen light interacts correctly with the surfaces and the camera positions and animations are well chosen. In addition to other qualities, the view of the integrated scene must not be disjointed. To achieve this, the first rendering tests were performed with Arnold. However, each image took between three and four hours to process at 4K with its optimal parameters (Figure 11), which was unacceptable, especially to observe the result of the animation of lights and cameras and make final decisions about its placement on screen. It was possible to reduce the time for each image at 4K to 25 min using Houdini 17.5 with Octane render 2019 1.2.0, which is a GPU-based engine, and a computer

the chosen light interacts correctly with the surfaces and the camera positions and animations are well chosen. In addition to other qualities, the view of the integrated scene must not be disjointed. To achieve this, the first rendering tests were performed with Arnold. However, each image took between three and four hours to process at 4K with its optimal parameters (Figure 11), which was unacceptable, especially to observe the result of the animation of lights and cameras and make final decisions about its placement on screen. It was possible to reduce the time for each image at 4K to 25 min using Houdini 17.5 with Octane render 2019 1.2.0, which is a GPU-based engine, and a computer

Following the script, the scenes of the present day, the night storm, the artisan and the captive were composed, while fine-tuning the light and camera animations and creating transitions between them. With a previous low-quality rendering, the sound program was created, and the voiceover, music, effects and Foley were adjusted. This task, together with the sound synchronization, was performed using DaVinci Resolve 16. This program was also used to perform the final rendering at a resolution of 4096 × 4096, with Quick Time output format without YUV 4:2:2 compression. A conversion to the 360° 3D stereoscopic format of the BT (Bottom-Top) type was performed so that it could be experienced with Oculus Rift glasses, with an appearance similar to that shown in Figure Following the script, the scenes of the present day, the night storm, the artisan and the captive were composed, while fine-tuning the light and camera animations and creating transitions between them. With a previous low-quality rendering, the sound program was created, and the voiceover, music, effects and Foley were adjusted. This task, together with the sound synchronization, was performed using DaVinci Resolve 16. This program was also used to perform the final rendering at a resolution of 4096 × 4096, with Quick Time output format without YUV 4:2:2 compression. A conversion to the 360◦ 3D stereoscopic format of the BT (Bottom-Top) type was performed so that it could be experienced with Oculus Rift glasses, with an appearance similar to that shown in Figure 12a–c. Following the script, the scenes of the present day, the night storm, the artisan and the captive were composed, while fine-tuning the light and camera animations and creating transitions between them. With a previous low-quality rendering, the sound program was created, and the voiceover, music, effects and Foley were adjusted. This task, together with the sound synchronization, was performed using DaVinci Resolve 16. This program was also used to perform the final rendering at a resolution of 4096 × 4096, with Quick Time output format without YUV 4:2:2 compression. A conversion to the 360° 3D stereoscopic format of the BT (Bottom-Top) type was performed so that it could be experienced with Oculus Rift glasses, with an appearance similar to that shown in Figure 12a–c.

**Figure 12.** (**a**,**b**,**c**) Scenes of the captive (**a**), the night storm (**b**) and the craftsman (**c**) in 360 3D Stereoscopic BT format. **Figure 12.** (**a**,**b**,**c**) Scenes of the captive (**a**), the night storm (**b**) and the craftsman (**c**) in 360 3D Stereoscopic BT format. **Figure 12.** (**a**–**c**) Scenes of the captive (**a**), the night storm (**b**) and the craftsman (**c**) in 360 3DStereoscopic BT format.
