**1. Introduction**

The problem of objectively studying the scope of the perception of works of architecture by persons who look at them is important from the point of view of the proper governance of space and the zones located within it [1–4]. Knowledge about various types of perception and the evaluation of structures and spaces by professionals or designers and persons with no architectural education can also be helpful in educating new architectural design personnel, as well as—in a broader sense—being a part of the management of knowledge about architecture and urban planning [5,6].

The aim of this work is to present the utility of video-oculographic studies in assessing the visual perception of architecture, dependent on the type of space and external stimuli.

An eye-tracking-based experiment was performed on two selected buildings located in the historical centre of Cologne that were designed by world-famous architects, which made it possible to analyse perception and visual cognition. This experiment has been presented in the form of a case study and is an element of studying and analysing the contemporary architecture of Cologne.

Video-oculographic studies currently find practical application in a wide range of marketing, market and utility studies [7–9]. Concerning the fact that vision and the cognitive processes associated with it occur almost always and everywhere, eye-tracking studies have increasingly become a part of studies of many areas of life. They are recommended for use in areas like category management, traditional and online advertisement, information technology (IT) system ergonomics, human–machine interaction, information management systems, medicine and psychology, education, sports, entertainment and the military [10–12]. Studies of the perception of art and architecture have also recently started to make use of them [13,14].

The use of eye tracking in this new field of use should allow a relatively objective determination of the form of perceiving various types of information that reach us as humans and that affect how we build assessments and the structure of our knowledge and awareness [3].

### **2. The Essence of Eye-Tracking Studies**

The subject of the video-oculographic method has been discussed in works presenting research assumptions concerning the experiment and the considerations indicating its application potential. The most important items of the literature on the subject include publications by D. Richardson [15], G.D.M. Underwood [16], A. Duchowski [17], as well as Z. Hoolmovist, M. Nystrom, R. Anderson et al. [18], which together form a compendium of knowledge on the method. Other notable publications include those by A. Bojko [19]; as well as those edited by M. Horsley, M. Eliot, B. Knight and R. Reilly [20]; and by J. Nielsen and K. Pernice [21].

The subject matter of the possible application of the method in studies of architecture and spaces has recently been discussed by the following teams: Ch. Lebrun, A. Sussman, W. Crolius, G. van der Linde from the Institute for Human Centered Design in Boston [22]; D. Junker and Ch. Nollen from the University of Applied Science in Osnabruck [23]; Z. Zou and S. Ergan; as well as A. Radwan from New York University [24,25]; L. Dupont, K. Ooms, A. Duchowski and V. Van Eetvelde from Ghent University [26,27], R. Noland, M. Weiner, D. Goo, M. Cook and A. Nelessen from State University of New Jersey [28], J. Hollander from Tufts University [5] and by M. Rusnak, W. Fikus and J. Szewczyk from the Faculty of Architecture of the Wrocław University of Technology [1,29], in addition to the authors of this article [3,4,30].

Previous studies focusing on the suitability of eye tracking in architecture, urban planning and landscape architecture performed by other research teams focused on three main aspects, namely: the use of eye tracking itself, coupled with a stationary or mobile eyetracker, and the use of eye tracking in combination with tools from other research methods. Those that have been found to be the most numerous in the literature are studies presenting results based only on the use of eye tracking with the implementation of stationary devices to investigate the visual perception of works of architecture and landscape architecture.

These, among others, include an experiment performed by a team of researchers focusing on architecture, interior architecture and cognitive science from the Institute of Human-Centred Design in Boston, who used a stationary device and focused on the visual perception of architecture and its surroundings in Boston. The experiment was performed on volunteers from different professions and of various ages, while the object of research were photographs of buildings and their interiors, either individually, or with persons present. The results pointed to a varied visual perception of buildings, often independent of their type, depending on the presence of human figures, their faces and other elements of the landscape. According to the publication's authors, eye tracking is a very good method in allowing scholars to understand the visual aspects of experiencing architecture by persons who are not architects. Furthermore, it pointed to the justification of obtaining knowledge about how an architectural design communicates with the public, the client and professionals, particularly when viewers see buildings with people present around said buildings or the buildings alone. The authors pointed to the utility of this knowledge in teaching architectural theory, the history of architecture and architectural design [22].

Studies conducted at the Faculty of Architecture of the Wrocław University of Technology focused on determining the utility of this method in investigating the visual perception of historical structures. According to the authors of this publication, knowledge based on video-oculographic study findings could aid in facilitating the objectivisation of historical zone evaluation, and would

make it easier to manage them [1]. Another study, which was performed using a stationary eyetracker, also pertained to the visual perception of the interior of a Gothic church depending on its height and depth. The experiment focused on depth perception in modelled interiors and changes in interest in reading depth relative to increases in the cathedral's height. The results made it possible to conclude that the change in a layout's length causes more complex consequences in matters of depth perception than merely changes in nave height. The study pointed out that more in-depth research using other methods is justified [29].

Studies performed at another Polish research facility, namely the Faculty of Architecture of the Cracow University of Technology, performed using a stationary device, concerned the perception of selected historical buildings and the space of the Rabka-Zdrój health resort in Poland. The study was aimed at determining the scope and manner of perception of buildings of high cultural significance that suffer from decay. Its findings have confirmed the effect of perceptual competition between the details of the buildings and historical and contemporary spaces. According to the study's author, the focus on and perception of these details instead of entire buildings can be the reason for a lack of valuation of the perceived surroundings [3]. Further studies have enabled the detection of the strong impact of various types of advertisements and information boards on the disruption of a building's perception. The authors demonstrated the utility of video-oculographic studies in formulating guidelines and planning measures associated with the protection of heritage sites and conducting education efforts [30].

Another type of study conducted using stationary devices were those of the impact of the level of urbanisation on the landscape presented on photographs on the visual exploration of images by viewers. The experiments, conducted at Ghent University, concerned the assessment of the visual perception of various landscapes, ranging from rural to urban ones, as seen on photographs. More extensive and scattered exploration was observed in more urbanised landscapes. In poorly urbanised landscapes, fixations were more focused. Meanwhile, when no buildings were visible on the photograph, unexpectedly broad exploration was observed. The results of this study provided evidence for the conclusion that the level of urbanisation is positively correlated with visual complexity, as indicated by its potential impact on the viewer's behaviour [26]. Furthermore, studies concerning the use of significance maps, which are theoretical prognoses of the pattern of human vision, with the aim of comparing the visibility of various designs of simulated constructs placed on photographs of original landscapes, have been performed. The results of the experiment, in the form of a high correlation of significance maps with human focus maps, made it possible to formulate conclusions as to the suitability of eye tracking and the significance maps themselves in planning structures within the landscape, as it was concluded that visual impact is lower when the visual perception of a structure decreases and an optimal integration of a structure with the existing landscape can be achieved [27].

The second aspect of previous studies was the sole use of eye tracking through mobile devices. Studies of this type were performed at, among other places, Tufts University and New York University, and focused on the impact of urban environments on the mental states of people present within them. Experiments performed using a mobile eye tracker made it possible to identify urban environments associated with more positive reactions, suggesting a feeling of relaxation and the desire to spend time there. The authors of the publication also pointed to the significance of the study and the detection of such environments as a part of formulating new principles of urban design [5,24,25].

Studies performed at the University of Applied Science in Osnabr ˝uck, which similarly utilised a mobile device, involved long-term experiments in real-world open urban environments ("Grosser Garten" in Hanover and "Stourhead" in Wiltshire). The experiments made it possible to prove the suitability of this method in studies of open space with the purpose of obtaining knowledge about how people behave and react and how they enter said reactions. This knowledge is particularly noteworthy and can be useful in designing the best possible spaces, consequently enabling quality of life improvements. The research team behind the study pointed to the purposefulness of conducting holistic research, e.g., by using a combination of eye tracking with other methods. According to the authors, a combination of eye tracking with electroencephalography (EEG) along with mobile

measurements intended to record interactions in detail and analyse the reactions of the human body, could prove particularly useful. Knowledge obtained through such studies should form a repository of data concerning user requirements and provide utility in the planning and management of green areas and public buildings, while taking subjective feelings of safety and contemporary aesthetic tastes into account [23].

The third aspect of eye tracking studies is using them in conjunction with or by using other research tools. One example of such studies are experiments performed at the State University of New Jersey. As a part of the said experiment, qualitative survey studies concerning visual preferences were performed and were followed by eye-tracking experiments. The survey study focused on the perception and assessment of various urban objects placed in green surroundings, amidst pedestrian traffic and in proximity of public transport. Eye-tracking studies with the use of mobile devices have made it possible to expand the knowledge declared by the subjects with an objective image of their perception of this environment. According to the publication's authors, studies following this methodology provide knowledge that is necessary to urban and transport planners and municipal governing bodies, allowing them to improve the functioning of the city, e.g., by increasing pedestrian activity and introducing vehicular traffic constraints in cities [28].

Analysis of the state of the art in terms of the application of eye tracking in studies of architecture revealed significant factual discrepancies between individual studies. The experiments that have thus far been performed concern the utility of this method in solving specific scientific problems that are the focus of the given research teams. Therefore, it can be said the studies are selective in nature. Regardless of the context of the method's application, all completed experiments point to its significant application potential in architecture, urban planning and landscape architecture. The authors wish to highlight the significance of the findings concerning the visual perception of various works of architecture, both historical and contemporary ones, located in various spaces, landscapes and surrounded by people and various forms of technical infrastructure. The findings are based on analysis of so-called descriptive statistic parameters. Despite all of the publication's authors declaring an awareness of the need for further studies in broader, more comprehensive perspectives, they are of the opinion that their findings provide research material that can prove highly useful in formulating assumptions and measures concerning the planning, design and construction of contemporary architecture, as well as protecting and managing heritage sites. Furthermore, in the opinion of the authors, the presented findings should be used in educating future architectural staff and social education efforts.

In the literature, video-oculography is presented as a set of research and study techniques used to measure, record and analyse data concerning the position and motion of the eye. It supplies quantitative measurement data without referring to subjective, verbal reactions of the subject, instead referring to psychophysical and neuropsychological processes that accompany the collection and processing of visual information and oculomotor reactions to stimuli received from the environment. Eye tracking and visual perception are fundamentally interlinked.

In cognitive psychology (U. Neisser), perception is understood as a process of rationality and abstraction. Abstraction activities are present at two levels of perception: first taking place after the sensory reception phase. The sensory reception that precedes it is equated with the initial process of passive information gathering, during which input is detected in receptors ("burned" in the photoreceptors of the retina). Reception inaugurates the process of information processing. In the case of visual perception, during this stage it is already possible to indicate so-called property detectors, i.e., neurons that selectively react to lines with a specific spatial orientation, or the general outline of the human face. It is the second stage of experiencing that is considered rational, and which involves the active process that follows reception. It is based on interpreting sensory data using contextual suggestions, attitude and previously gained knowledge. Perception involves activities such as discrimination, recognition, orientation or perceptual categorisation. Rationality is present during this stage of higher-order cognition, which is equated with the capacity to think, distinguish certain common traits in objects at the cost of ignoring others (either indistinct or non-general traits), which

are then used to form generalisations in creating cognitive representations. It can be said that in U. Neisser's cognitive psychology, which is derived from experimental psychology, references are made to the notion of cognition as construing and creating knowledge. Meanwhile, in the concept of "visual thinking" or "thinking with images" (R. Arnheim), it is assumed that perception processes feature rational principles that govern the seeing and imaging of an object. Here, this notion is seen as an equivalent to the image and treated as a word. The language of images is less arbitrary and wealthy than the language of words, featuring more analogies and non-isomorphic relations between the sign and the object. In this concept, the geometric shape is seen as one of the most stable notions of the language of images. In Gestalt theory, it is assumed that sensory experience and cognition are not based on the passive reception of individual stimuli, but on the creative perception of a certain whole—the image of an object as a whole that is not reduced to the sum of its parts. The object of cognition is also its construct—a creation dependent on such factors like one's memory, experience, knowledge, attitude or desire. Principles of extracting the whole or the figure from the background (spatial proximity, similarity, good figure, symmetry), the principles of simplicity and the illusory character of perception (illusion is an essential component of images and gives them continuity) function here. In summary, it can be concluded that there are many different concepts of visual perception. This model of perception is referenced by some theories from the field of philosophy and psychology, namely: Gestalt psychology and other psychologies of aesthetic perception, such as psycho-aesthetics, neuro-aesthetics or visual psychology, and philosophical epistemology in part. The aforementioned disciplines are a part of broader theoretical and experimental perception studies [31–36].

The following definition of visual perception was adopted for the purposes of the experiment.

Visual perception is a complex cognitive process based on the interpretation of objects, phenomena and processes in the environment based on specific stimuli picked up by the visual system. Receiving a stimulus begins the process of perception, which makes it possible for one to understand what has been seen and to implement the information one has received within one's system of knowledge and values and to memorise it. Perception is conditioned by the aesthetic and artistic elements of an exposition and by the individual characteristics of the person performing the observation [18,19,29,30].

The use of eye tracking began to see wider methodological application in the second half of the twentieth century, along with the development of academic disciplines and specialisations such as psychology, cognitive science or human-computer interactions. The period towards the end of the twentieth century saw the technological development of tools enabling the manufacture of small, mobile devices, along with the development of applications for computing data, obtaining specific results and their presentation and interpretation. At present, video-oculographic studies are typically based on using a system of video cameras placed near the subject's eyes or at a close proximity to their face. During tests, the cameras track the movement of the subject's eyes and their video feed is recorded by a computer and analysed using specialist software. We can extract a lot of useful information from this data, including which elements attracted the attention of the subject and after what length of time; which element attracted the subject's attention the longest and which elements were observed repeatedly, what is the direction of the sequence of scanning space or whether the subject was confused or showed interest [15,18,23,37].

A typical eye-tracking measurement is based on recording two types of information:


The most often used forms of the graphical presentation of data obtained during tests include: heat maps, gaze plots and area of interest analyses.

The heat map (or hotspot map) makes it possible to determine which element attracted the subject's attention. In the case of each of the materials presented on-screen, it is possible to display the points that the subject fixated their eyes on, presenting summary attention focus results for each subject group. A longer fixation time is marked by a more intense warm colour, while cool colours denote a shorter focus time. Places without colour denote fragments that were completely ignored by the subjects. A particular case of heat map is its inverted version, called the focus map, which shows only those areas the subjects fixated their eyes on, with the remaining areas blacked out [18,20,32].

The second form of graphical presentation is the gaze plot, which indicates the sequence of fixations on individual areas during the observation of the presented image. Circles are used to mark each gaze point (fixation). The longer the subject looked at a given point, the larger the diameter of the corresponding circle. The number presented inside the circle shows the sequence in which it was observed, while lines symbolise saccades, presenting the path that the subject's gaze travelled between fixations.

The third form of graphical presentation is the area of interest (AOI). Here, it is possible to separate a large number of gazes that concern distinguishable areas presented on-screen. AOIs can themselves be individually designed by the person designing the study or generated automatically, with a recorded attention distribution percentage. The advantage of using areas of interest over heat maps is the possibility of obtaining specific numerical values that enable a more precise quantitative analysis of fixations and the use of parametric metrics. The so called statistics in use here are different from study to study and depend on their objective [17].

When analysing areas of interest, the following measurements are typically taken, among others:


After outlining areas of interest, every area or image used in the study typically shows areas that are unclassified, or not an AOI. These areas, although uninteresting from the point of view of the degree of perception of objects that make up AOIs, can also be included in the analysis from the point of view of the presence of other attractors (elements that attract attention) or distractors (elements that distract one's attention) [18].

In conclusion, the main advantage of video-oculographic studies is that they allow one to study the perception activity of test subjects objectively. Eye tracking makes it possible to pinpoint those elements of the image of an analysed object that the observer actually looks at. Therefore, results are based on facts, instead of declarations or conjecture [3].
