1. Introduction
As people spend most of their time in buildings, indoor lighting conditions significantly affect human health, primarily through visual comfort [
1]. Sunlight exposure improves the psychological and mental well-being, as well as the productivity of building occupants [
2,
3]. However, uncontrolled solar penetration can quickly produce disabling glare and significant solar heat gains, degrading visual comfort and increasing space-cooling energy use [
2]. Building operations account for roughly 30–35% of global final energy consumption, and space conditioning (HVAC) commonly represents one of the largest end uses which around 35% or more of building energy in many contexts, with reported ranges extending higher depending on building type, climate, and system efficiency [
4]. Consequently, managing indoor sunlight through coordinated control of work-plane illuminance, glare, shading position/optics, and related façade and lighting systems which is critical both to deliver occupant visual comfort and to moderate cooling loads during peak demand periods [
4]. An effective control strategy seeks to maximize useful daylight while maintaining illuminance within target ranges and keeping daylight glare probability (DGP) below accepted discomfort thresholds thereby supporting occupant satisfaction and reducing energy use [
5].
Established as a prevailing global design trend, glazed facades are widely used to convey transparency and embody the image of modern architecture [
6]. Meanwhile, responsive facades have become popular, allowing for dynamic building designs while enhancing energy efficiency and indoor comfort in highly glazed buildings [
2,
7,
8,
9,
10]. As high-performance facades, responsive facades can change geometries based on changing conditions to achieve a control objective [
11,
12,
13]. Responsive facades are evolving and advancing due to technological enhancements and the application of innovative geometries [
14].
Offering significant advancements over traditional shading devices, kinetic responsive facades feature three-dimensional, adjustable elements that provide a wide range of movements, including folding, rotating, translating, extracting, and contracting [
15]. An example is the origami-inspired responsive façade of the Abu Dhabi Al-Bahar Towers, consisting of over 1049 individual units that dynamically change their fold angles to modulate daylight [
2]. Unlike unified control systems, kinetic façade systems incorporate many independent moving components, each requiring individual control. Loonen et al. [
16] defined the control complexity of a responsive facade using the term “degree of adaptivity,” implying that the potential for the indoor lighting environment can be improved with more possible action spaces. Despite their ability to adapt to environmental changes and occupancy preferences, the lack of automatic control strategies limits the full potential of responsive facades [
17]. Consider a responsive facade with
M components, each with
N possible angle states, resulting in a total of
N^M possible states [
16]. The large number of possible states complicates determining the optimal control strategy, making conventional design strategies impractical.
Kinetic responsive facades with automated control algorithms are limited in the state of the art. Most existing studies do not focus on exploring the optimal control solution for each independent component, but rather on controlling all components uniformly [
18,
19,
20,
21]. Dev and Saifudeen [
19] conducted simulation-based research to explore an optimal facade control strategy for dynamically balancing daylight utilization and reducing heat gain in tropical regions. The study used uniform slat angles for facade components on each orientation and evaluated the control performance on three representative days, considering two specific hours per day [
19]. The facade control of the Al-Bahr Towers operates on a rule-based system, where the kinetic facade dynamically adjusts its configuration based on the angle at which the solar rays land on the curtain wall [
22]. If the angle is between 0 and 79 degrees, it leads to an unfolded configuration; an angle between 80 and 83 degrees leads to a partly folded configuration, and an angle over 83 degrees leads to a fully folded configuration. However, each facade component has specific indoor spaces that it most directly impacts. Therefore, it is impossible to obtain optimal control with unified control.
To make the decentralized control strategy for kinetic facades possible, Shen and Han [
23] proposed a simplified method by finding the best solution for each component and combining them to form the optimal facade control solution. This way, the simulation time can be reduced from
N^M to
N × M to find the optimal solution efficiently. However, in the simplified model, each component’s influence on the indoor lighting environment is treated independently, and the combined operational results are simplified as the sum of the results from individual components. This approach ignores the interactions between different elements, such as neighboring components blocking sunlight from their surroundings, resulting in a suboptimal control strategy. Takhmasib, Lee [
24] conducted the first on-site study of a three-dimensionally movable kinetic façade controlled by artificial intelligence (AI) to enhance the real-time probability of achieving desirable indoor daylight levels. To develop the AI-based control model, they used Radiance to generate 20,000 simulation cases for model training. While this approach enabled decentralized control of façade elements, it required extensive time and labor for data preparation. These limitations highlight the necessity for a self-learning control framework for kinetic façades that can adapt to varying settings without the need for extensive data preparation.
Compared with kinetic facade control studies, traditional shading device control strategies have been extensively studied, typically falling into three main categories based on their performance criteria [
25]. The first category includes threshold controllers, which activate or adjust the blinds to a predefined slat angle when a control variable, such as solar illuminance or irradiance, reaches a setpoint [
25]. The second category involves sun-tracking controllers that dynamically adjust the slat angle to block direct solar radiation [
25]. In the final category, mode and scene controllers utilize multiple sensors and control algorithms [
25]. For instance, Koo et al. [
26] proposed a new control method for automated Venetian blinds that maximizes occupant comfort by allowing users to define specific zones for glare protection, thus enhancing daylight penetration while accommodating occupant preferences. Based on occupied zone sensors, the proposed control algorithm calculates the control action of multiple components with multiple controlled zones. The first two methods generally apply a unified control approach, where all components share the same slat angles. In contrast, the third category can provide independent control with limited components. Olbina et al. [
27] devised a closed-loop control strategy for a vertically split blind system, where different sections are optimized independently to enhance daylight distribution across various daylight zones. The control method by Koo et al. [
26] adjusts the position of the lower end of multiple Venetian blinds and their slat angles based on hourly outdoor solar conditions, using an open-loop logic to control the blinds in sequence for the occupied area. The positions of the blinds are calculated based on the geometric relationship between the solar angle and desk locations in the office. However, when it comes to kinetic responsive facades with numerous components, applying such control logic to optimize each component individually becomes impractical due to the system’s complexity and the possible action spaces involved.
In office lighting environments, occupants are primarily concerned with the illuminance levels on the horizontal plane at desk height, commonly referred to as workspace planes [
28]. To enable occupancy-centered facade control, placing illuminance sensors on workspaces provides real-time lighting data that can inform facade adjustments [
29,
30,
31]. However, considering the complexities of multi-workspace sensor data and multi-element facades, existing control methods face challenges in establishing an effective feedback loop to determine the optimal linkage between sensor data and facade adjustments.
Given the limitations of current control methods for kinetic responsive facades, exploring advanced building control techniques is essential. Reinforcement learning (RL) is particularly suited to managing high-dimensional, complex, and uncertain environments. Unlike other studied control logics, which are often limited to specific simulated or experimental cases and tailored to particular control targets, RL learns the control policy by interacting with the environment with a reward mechanism. The algorithm will explore the environment with different control strategies to maximize cumulative rewards. This makes RL less restricted by the variations in the real environment, like the building orientation layout, occupancy patterns, orientation, window areas, and shading types, making the method more generalizable and suitable for different buildings and facade designs.
RL has been widely adopted in various building control areas, particularly in HVAC systems, battery management, and appliance control [
32,
33,
34]. It was found that RL can achieve energy savings for HVAC or heating systems while keeping the indoor environment comfortable [
35]. In addition, RL is also believed to be promising due to its capability to learn more complex policies in sophisticated environments [
33]. Ding, Du [
36] applied RL to explore the optimal multi-zone building action sequences, which proved its scalability in a high-state and action-dimensional environment. RL has also been used to achieve occupant-centered control through learning occupancy control behaviors and indoor lighting environment, balancing energy consumption and occupancy comfort [
37].
Building on these strengths, applying RL to kinetic facade control is a logical and promising approach to maintaining a comfortable visual environment in response to dynamic indoor and outdoor conditions. By exploring and interacting with the lighting environment, the RL algorithm can find the optimal control strategy that achieves decentralized control and considers the interactions and connections between different components. This allows it to fully leverage the kinetic facade’s capabilities, optimizing its benefits in indoor lighting. To the best of the authors’ knowledge, prior studies have explored the application of reinforcement learning for controlling blind systems [
38], as well as AI-based strategies for kinetic façade control aimed at optimizing the indoor lighting environment [
24]. However, no published work has yet proposed an RL-based method specifically for controlling kinetic building facades.
Accordingly, the main contributions of this study are as follows:
Proposes a self-learning RL-based control approach for multi-objective, high-dimensional, responsive facade systems, representing the first application of RL to this problem.
Develops an integrated simulation framework that automates RL controller training and lighting simulation, enabling dynamic coupling between the simulation environment and the RL agent.
Validates the proposed RL-based kinetic responsive facade controller on three irregular shading configurations, benchmarking against a rule-based controller (RBC) to quantify feasibility and effectiveness.
2. Methodology
2.1. Reinforcement Learning
Reinforcement learning (RL) is a branch of machine learning concerned with training agents to achieve specific objectives by interacting with their environment and taking sequential actions to maximize a cumulative reward. Unlike supervised learning, where correct actions are explicitly provided through labeled datasets, RL depends on the agent autonomously discovering the optimal actions through experiential interaction, making it particularly valuable for complex, dynamic environments.
Deep reinforcement learning (DRL) integrates reinforcement learning with deep learning, leveraging the representational power of neural networks to approximate complex policies and value functions. This combination allows agents to operate effectively in environments characterized by high-dimensional state spaces, where traditional RL techniques would be computationally intractable.
Within DRL, there are two approaches: model-based and model-free reinforcement learning. Model-based RL involves creating an explicit model of the environment’s dynamics, which allows the agent to predict future states and plan actions accordingly. However, accurately modeling complex environments remains a significant challenge in model-based RL. In contrast, model-free RL directly learns optimal behaviors without requiring an explicit model of the environment’s dynamics, making it more widely used. Model-free DRL includes two main classes: deep Q-learning (DQL) and policy gradient methods. Deep Q-learning focuses on learning the value function to estimate the expected reward of specific actions, while policy gradient methods directly optimize the policy that determines the agent’s actions, often leading to more stable and efficient convergence.
This work employ a stochastic policy-gradient (Monte Carlo) method in the REINFORCE family [
39], implemented with deep neural function approximation. REINFORCE is a foundational policy gradient method that optimizes cumulative reward by adjusting policy parameters. As outlined in Equation (1), the derivative of the expected return
J is reformulated as a Monte Carlo process, enabling the estimation of the expectation using a large sample. This allows for an effective approximation of the gradient of the expected return based on sampled trajectories, making the training and the learning process manageable. The pseudocode for the REINFORCE algorithm is presented in Algorithm 1. The agent takes actions, receives rewards, and updates the policy to increase the likelihood of the actions that lead to higher rewards. To improve the algorithm’s stability and convergence speed, a baseline network is incorporated to estimate the value of the current state, thereby reducing the variance of gradient estimates and resulting in more stable and efficient learning [
40].
where
denoted the expected return of policy
,
is the trajectory of sequence of
and action
. The
R(
denoted the total return for trajectory
. The
denotes the probability of trajectory
under policy
. The
denotes the policy’s probability of taking action
in state
under parameter
. The
denotes the gradient with respect to policy parameters
. The
N denotes the number of sampled trajectories in the Monte Carlo estimation. The
i denotes the index of each sampled trajectory.
Algorithm 1. REINFORCE with Baseline |
Input: A differentiable policy parameterization π |
Input: A differentiable state-value function parameterization (s, w) |
Algorithm parameters: step sizes , epoch N |
Initialize policy parameter and state-value weights (e.g., randomly) |
for n = 1 to N do |
Generate an episode following |
for t = 0 to T − 1 do |
|
|
|
|
end for |
end for |
2.2. RL-Based Control for Kinetic Responsive Facades
The agent interacts with the environment by adjusting the facade components’ angle and updates the control policy using a policy gradient algorithm. Human visual comfort is assessed by two kinds of illuminance sensors, which measure the vertical and horizontal illuminance values at each office desk. Rewards are assigned based on the illuminance values measured by the sensors. During the training process, the agent explores various control strategies, earning rewards based on the visual condition achieved in the workspace. The policy gradient algorithm is used to maximize the cumulative reward over the entire control period.
Figure 1 shows the framework of RL-based optimal kinetic responsive facade control.
This paper selects and tests three types of existing kinetic-responsive facades to evaluate the robustness and generalization of the proposed control method. Each facade type features distinct kinetic element motion based on its geometry and structural design. Real-time ClimateStudio [
41] simulations were used for the virtual environment to generate lighting data and support the real-world training and testing process. The proposed RL control method offers a computational method for users to find the optimal shading or facade control strategy for various distinct environment settings without extensive control logic exploration, complex physical modeling, or frequent manual control.
2.3. Simulation in a Virtual Environment
A virtual testbed was developed to evaluate the proposed RL-based kinetic facade control method under varying daylight conditions. The office was modeled as a shoebox geometry defined by non-uniform rational B-spline (NURBS) surfaces in Grasshopper [
42] and placed facing the south without surrounding obstructions. Surface optical properties were assigned according to standard LM-83 material definitions, as summarized in
Table 1.
Three kinetic responsive facade cases are modeled based on real-world buildings: LA MAISON Hotel, Campus Kolding, and Al Bahr Towers.
Table 2 presents the real-world building pictures and corresponding facade models for simulation. The geometries were parameterized in Grasshopper to allow rotation of facade elements according to their movement type (vertically folding, diagonally folding, and radially folding). Component dimensions, movement characteristics, and example rotation angles are given in
Table 3. The possible rotation range for all elements was 0–90°, incremented in 5° steps.
Lighting simulations were performed in ClimateStudio [
41], which integrates the Radiance ray-tracing engine [
47] for physically based daylight calculations. The simulation time step was 1 h, and weather data were sourced from the Typical Meteorological Year (TMY3) file for Boston. The lighting simulation selected the summer solstice (21 June) and the September equinox (21 September) as target dates. These two days represent distinct sun positions in the year as shown in
Figure 2, effectively illustrating the variability in solar parameters [
48]. On the summer solstice, the solar altitude in Boston (42.36° N, 71.06° W) reaches approximately 71° at solar noon with a wide azimuth range, resulting in the longest period of daylight and shortest night of the year. On the September equinox, the solar altitude at noon is around 47°, and the sun’s path is more symmetric between morning and afternoon, producing a more balanced daylight distribution.
The RL training and evaluation were performed using a custom integrated framework. This framework links the ClimateStudio (1.9) simulation environment and the RL algorithm (Python 3.10.8, PyTorch 1.13.1) via a WebSocket-based data exchange platform as shown in
Figure 3. At each simulation step, the simulation environment sends the current state variables to the RL agent, written in python scripts, including the horizontal and vertical illuminance at sensor planes. Then, the agent computes control actions (rotation angles for each kinetic element), which are transmitted back to ClimateStudio for the next simulation step. The framework records states, actions, rewards, and timestamps in local log files for analysis.
The RL controller comprises two neural networks: a policy network for action selection and a baseline network for value estimation, enabling variance reduction in policy gradient updates. The reward function penalizes deviations from the target illuminance range and accounts for control smoothness.
2.4. State-Action Space and Reward Calculation
Defining the state-action space for the policy gradient facilitates the proposed RL model to learn a robust control strategy. Actions(a) are rotation angles of each facade element. Although each facade has a different motion, the possible action spaces are the same. The rotation angle ranges from 0 to 90, with an increment of 5 degrees, and possible angles for each element are denoted = {0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90}. The actions at each time t consist of each element’s angles, Actions = {, , , ……, }, n = number of elements.
The proposed RL control method was tested on a digital Rhino model of an office in Boston, USA, with a South-facing facade. The room dimensions are 9.8 m (width), 11.0 m (length), and 3.0 m (height). Two challenging workspaces are selected, as shown in
Figure 3: the “Near-Facade” on the Southside, which easily has glare issues, and the “Far-Facade” on the Northside, which cannot easily access enough daylight. Each workspace has dimensions of 1.8 m (length), 1.2 m (width), and 0.7 m (height). Their positions in the room are shown in
Figure 4. The state (
S) comprises the average illuminance level measured by sensors placed on the horizontal and vertical planes of each workspace, oriented toward the northeast, northwest, southeast, and southwest. Each plane is instrumented with a sensor grid at 0.3 m (1 ft) intervals. The horizontal test plane measures 1.8 m in length and 0.6 m in width, while the vertical test plane measures 0.65 m in width and 0.4 m in height. This configuration enables high-resolution measurement of spatial variations in illuminance, providing a detailed representation of the lighting conditions within the workspace. If the illumination requirements can be satisfied with these two workspaces, the office room’s middle area can also be in a comfortable lighting environment. The
State includes eight continuous state variables:
where
and
denote horizontal and vertical illuminance levels, respectively. This setup allows the RL agent to account for spatial lighting variations across the office, ensuring task-appropriate lighting and a balanced luminous environment throughout the room.
The reward is designed to help the agent achieve a visually comfortable environment. Various glare metrics have been developed to assess visual comfort. To ensure sufficient indoor light level while minimizing glare, the indoor illuminance values on the work surface and DGP from occupants’ viewpoints are often combined in indoor lighting analysis. The Lighting Handbook provides recommended values for office buildings in both the vertical and horizontal directions [
49]. While different illuminance targets have been used in previous studies, this study uses the recommended targets for computer screens and electronic ink devices: 300 lux for illuminance in the horizontal direction and 150 lux for the vertical direction [
49]. The comfort range of horizontal illuminance is typically defined as between 300 lux and 1200 lux [
50]. In laboratory settings, illuminance levels are recommended to be between 750 lux and 1200 lux to ensure adequate visibility for detailed tasks [
51]. However, specific upper limits for indoor vertical illuminance are not universally established, and this study defines 1200 lux as the upper bound. The proposed control method can learn from different control tasks and targets that can select different comfort ranges.
In contrast to other widely used glare measures, Wienold and Christoffersen [
52] defined DGP to describe the probability of people feeling the potential glare problems caused by natural sunlight. For glare control, vertical illuminance could be used to calculate the DGP since it strongly correlates with the occupant’s visual comfort [
53]. Considering that DGP requires multiple variables in computation to simplify the glare prediction, the Simplified DGPs can be calculated as in Equation (2):
where
Ev is the vertical illuminance value, the suggested DGP ranges are labeled as invisible glare for ranges below 0.35, noticeable glare between 0.35 and 0.40, disturbing glare between 0.40 and 0.45, and intolerable glare for ranges above 0.45 [
53].
Based on the comfort range, the positive reward for the horizontal illuminance level is assigned from 300 lux to 600 lux, while the vertical illuminance level is defined as 150 lux to 600 lux. The reward for “Near-Facade” workspaces is higher than for “Far-Facade” workspaces due to the increased risk of glare near the facade. Additionally, if the illuminance level exceeds 8000 lux at any test plane, the reward is set to 0. This penalization accelerates learning by discouraging the exploration of undesirable state-action pairs during early-stage training. Each workspace has two horizontal test planes and two vertical test planes. The total reward for each time step is the sum of all test planes. The reward calculation pseudocode is shown in Algorithm 2. The hyperparameters of the RL model are presented in
Table 4.
Algorithm 2. Pseudocode for reward calculation |
For “Near-Facade” Workspaces: Horizontal illuminance level ∊ [300, 600] ⟶ reward += 4 Vertical illuminance level ∊ [150, 500] ⟶ reward += 4 For “Far-Facade” Workspaces: Horizontal illuminance level ∊ [300, 600] ⟶ reward += 1 Vertical illuminance level ∊ [150, 500] ⟶ reward += 1 Also, If ∃ illuminance level ≥ 8000 lux ⟶all reward = 0 |
3. Results
To evaluate the robust control capability of the proposed RL-based façade controller, RBC is used as a baseline for comparison. For this comparison, three key environmental metrics are employed: hourly horizontal illuminance, vertical illuminance, and DGP at the work planes. These metrics collectively assess the controller’s ability to maintain a visually comfortable indoor environment for occupants. In addition, to evaluate the learning process of the RL agent, we track three training-related metrics: cumulative reward, policy loss, and baseline loss. Cumulative reward reflects the agent’s overall performance over time, policy loss measures the improvement of the policy through gradient updates, and baseline loss evaluates the accuracy of value estimation for variance reduction. Together, these metrics provide insight into both the control performance and the learning stability of the proposed RL approach.
In the RBC, sunlight is typically insufficient at the end of the working hours, leaving the facade open. As a result, in the RBC strategy, each day starts with the facade opening at 45 degrees. To ensure a fair comparison between the RL and RBC, both rely on the current lighting conditions in the workspace to determine the slat angle adjustments for the next time step. In the RBC approach, all facade elements are uniformly adjusted based on the average illuminance level of the test planes. The target comfort range is between 300 and 600 lux, and the available slat angles are identical for both control strategies. If the average illuminance of the test planes falls below the lower limit of the comfort range, the slat angle increases by 25 degrees in the next step. Conversely, the slat angle decreases by 25 degrees if it is above the upper limit. The control logic is summarized in
Figure 5.
3.1. Reward
Figure 5 and
Figure 6 show that the cumulative rewards curve reveals significant variations across the different facades. Vertically and Diagonally Folding Facades exhibit a clear upward trend in cumulative rewards, indicating an effective optimal control policy learning process. The cumulative reward for the Radial Folding Façade oscillates around a high level throughout the training process. This consistent performance indicates that the initial parameters provided an optimal control policy.
3.2. Policy Loss and Baseline Loss
The Baseline Loss is the Mean Square Error (MSE) between the actual cumulative reward and the estimated value from the Baseline network. This encourages the Baseline network to predict the cumulative reward for a given state accurately. The Policy Loss is calculated as the difference between the real cumulative reward and the Baseline network’s estimated values weighted by the log probability of each action given by the states. It drives the Policy network to select policies that yield higher cumulative rewards.
Figure 6 and
Figure 7 show that the Policy and Baseline losses converge after 100 episodes. This indicates that the model has sufficiently explored the environment and learned the relationship between the shading facade elements’ angles and the resulting illuminance levels in the workspace. The Baseline network can reliably estimate the value for the given state (the illuminance values for the horizontal and vertical space), and the Policy network learns a policy that can keep the illuminance within the comfortable range.
3.3. Horizontal and Vertical Illuminance Results
The vertically folding facade is modeled after La Maison’s facade [
44], which comprises foldable panels gliding continuously along a track system. The shutters function as efficient shading elements, optimizing natural light transmission from the exterior. They dynamically adjust to varying sunlight angles throughout the day, enhancing the building’s exterior aesthetics.
Most of the “Near-facade” test planes within the proposed RL control maintain illuminance levels within the comfort range (highlighted in gray) throughout the summer solstice (
Figure 8 and
Figure 9). However, illuminance in the “Far-facade” workspaces falls below 300 lux. This is due to the RL model prioritizing glare reduction in the “Near-facade” workspaces, which led to insufficient daylight reaching the ‘Far-facade’ workspaces. Therefore, the RL controller applied to the vertically folding facade performs effectively during the summer solstice, demonstrating its ability to achieve the control targets at different times.
The proposed RL controller performs better in the fall equinox compared to the summer solstice, with a higher percentage of “Far-facade” workspaces maintaining illuminance levels within the visual comfort range. An obvious difference between the indoor illuminance levels at both vertical and horizontal planes is that there is more variation during the fall equinox. The “Far-facade” workspaces (Northeast and Northwest) exhibit a more stable lighting level than the “Near-facade” workspaces (Southeast and Southwest) throughout the day. The horizontal illuminance values of the “Near-facade” work planes can reach nearly 1000 lux at noon. Therefore, during the afternoon, the test planes’ illuminance levels dropped as the facade system increased its closure to mitigate glare and excessive daylight. The indoor lighting environment changed through the fall equinox and summer solstice, corresponding with seasonal variations in the sun’s path.
In contrast to RL-based control, RBC systems face more significant challenges in balancing daylight harvesting with glare mitigation. This is particularly evident on 21 September, when horizontal test planes reveal oscillating lighting levels throughout the day, resulting in an unstable lighting environment for occupants. The RBC struggles to maintain a consistent lighting environment across all occupied workspaces. While the ‘Far-facade’ workspaces remain within the comfort range, the ‘Near-facade’ areas are consistently over-lit.
The diagonally folding facade is based on the Kolding Campus building [
45]. Its solar shading system comprises approximately 1600 pieces of perforated steel triangular shutters. These shutters are strategically installed on the facade, allowing them to dynamically adjust to sunlight conditions and regulate the amount of light entering the building. For the diagonally folding façade under RL control, the indoor lighting environment shows a more stable pattern compared to vertical and radial folding facades. The horizontal illuminance remains around 300 lux without significant fluctuations across both test days. The southwest workspace is considerably higher than the other three workspaces for vertical illuminance due to its closeness to the facade. Meanwhile, the other three workspaces remain consistent, similar to the stability seen in the horizontal illuminance, without significant variation over the two days. This stability may be attributed to the diagonally folding facade having more individual control components, which enhance its adaptability to changing conditions. This observation also suggests that, in future loss function design, it is essential to consider not only target lux levels for illuminance but also the stability of these values over time. The RBC on the diagonally folding facade performs well on 21 September, with almost all workspaces maintaining illuminance levels within the comfort range. However, on 21 June, horizontal illuminance levels in the ‘Far-facade’ workspaces lower the comfort range, leading to an insufficient lighting environment.
The Abu Dhabi Towers inspire the radially folding facade [
46], where the fundamental element is triangular. The control performance closely mirrors that of the vertically folding façade, with both facades under RL control successfully maintaining comfortable illuminance levels in the areas near the façade. However, some workspaces farther from the facade receive insufficient sunlight, particularly regarding horizontal illuminance during the summer solstice. The RBC on the radially folding facade performs similarly to the vertically folding facade, with illuminance levels in the ‘Fear-facade’ workspaces significantly lower than the comfort range. This may suggest that despite the geometric differences between vertical and radial folding facades, the similarity in control performance could be attributed to their similar rotational motion, with all components opening from a central point or axis.
Figure 10 presents the unmet illuminance percentages for different shading systems across two kinds of test planes for horizontal and vertical illuminance, measured on 21 June and 21 September. For all three test façades, the RL approach outperforms RBC. RL reduces the unmet percentage by approximately 10% on the vertically folding and radial folding facades. For the diagonally folding façade, where RL and RBC achieve the lowest unmet percentage, RL still provides an additional 3% reduction. Under RL control, the unmet illuminance primarily occurs in “Far-facade” workspaces, where the illuminance falls below the comfortable range. This trend aligns with the RL controller’s reward design, which prioritizes maintaining “Near-facade” workspaces within the comfort range to mitigate glare issues. While artificial lighting can mitigate insufficient daylight, managing glare remains more challenging once it occurs. The RBC performs worse than the RL-based control across both days, indicating its lower adaptability to dynamic sunlight conditions. For both control strategies, horizontal illuminance exhibits a higher unmet percentage. Future studies can focus on improving horizontal illuminance control to enhance overall performance. In conclusion, the proposed RL-based controller effectively balances daylight utilization and glare reduction across various facade designs. On average, 72.92% of the test planes maintain illuminance levels within the comfort range, while only 27.08% fall short of the target, highlighting the controller’s adaptability and effectiveness.
As shown in
Figure 11, all tested facade configurations significantly reduce DGP values compared to the scenario without a facade, consistently keeping them below 0.35. This reduction demonstrates a substantial improvement in glare control, enhancing visual comfort for occupants. The façades effectively mitigate excessive daylight penetration and glare, contributing to a more balanced indoor luminance distribution. Additionally, variations in DGP across different facade types suggest that some configurations are more effective at reducing glare than others, highlighting the importance of optimizing façade designs for daylighting performance and occupant comfort. The RL and RBC effectively maintain workspace DGP below 0.35. The RBC adjusts slat angles based on average illuminance, ensuring that the illuminance level on each test plane remains below a high threshold, thereby mitigating glare issues. The RL controller dynamically adjusts slat angles according to the illuminance levels of each workspace’s test planes, optimizing each façade component. In contrast, the RBC applies uniform slat angle adjustment based on the average illuminance across all test planes with a fixed adjustment, limiting its responsiveness to changing daylight conditions. As a result, its glare management is less effective, particularly in workspaces with uneven daylight distribution.
Overall, the proposed RL-based control method successfully creates a comfortable visual environment for the occupied space, as evaluated on the test planes. Across all three tested façades, glare is effectively prevented throughout the evaluation period. While the control method strives to balance glare prevention while providing sufficient daylight to occupied workspaces, glare prevention is prioritized. Glare can only be mitigated through shading devices, whereas artificial lighting can compensate for insufficient daylight. The control results reflect this approach, with glare completely avoided, though the far-facade workspaces occasionally receive inadequate daylight. This lack of daylight may also be related to the deeper-than-wide layout of the test room, meaning the control method is not the sole factor contributing to the insufficient daylight. This issue could be further mitigated with an artificial lighting system to supplement natural light. Despite limitations, the proposed RL control method demonstrates robust performance across the different facade designs, maintaining an effective balance between glare mitigation and daylight availability.
4. Discussion
Although this paper proposes an RL-based controller for a kinetic façade to provide a comfortable lighting environment for workspaces, there are several limitations regarding the real-world application of RL. Firstly, the study relies on simulation as a substitute for real building environments. However, implementing such a learning system in actual buildings introduces hardware challenges. These include determining appropriate sensor locations, integrating controllers capable of running RL algorithms, and installing actuators to adjust the slat angles of each façade component. These hardware-related issues are neither discussed in this paper nor adequately addressed in current RL-based controller studies. Additionally, the safety and reliability of RL control outcomes remain a concern. Unlike RBC or MPC, which can impose hard constraints on control actions, RL-based control may occasionally generate unreasonable or unsafe actions.
This paper compares the proposed RL-based controller to the RBC. While the improvement may appear modest over the two test days, it is important to note that the benchmark RBC was derived through exhaustive simulations, evaluating all possible slat angles to identify the best performance for those specific conditions. The RBC is fine-tuned based on the simulated results. However, such a control policy is unlikely to generalize well across buildings with different orientations, facade geometries, rotation axes, or spatial dimensions. Moreover, deriving these RBC requires significant computational effort. In contrast, the RL controller learns optimal policies directly from sensor data, making it a more scalable and adaptable solution for diverse real-world applications.
Future research on RL-based control of kinetic responsive facades can focus on evaluating the generalization and stability of the optimal control strategies for different scenarios and degrees of adaptivity. First, all test models were conducted in Boston using the same weather file. Future studies will examine how the geometries and motion characteristics of different shade panel designs perform in diverse climates and weather conditions to broaden the understanding of RL-based performance in other geographic locations. Second, the RL model showed instability, as training the model with the same number of epochs did not consistently yield optimal results. To better evaluate and enhance training stability, future work should involve running multiple training sessions and analyzing the variability in performance. This can help identify patterns and develop more robust training guidelines, ensuring more reliable and repeatable outcomes. Achieving this will also require testing more advanced RL model architectures and reward function designs in the future to improve overall model performance and consistency. Additionally, the training time needed for the RL model makes real-world deployment challenging. Future work will focus on reducing the training time by deploying the model with a pre-trained network and further fine-tuning it in real-world exploration. Moreover, upcoming studies could expand the RL control scope beyond indoor lighting to include indoor thermal comfort. Since solar radiation contributes substantially to internal heat gains, a coordinated RL-based control strategy that manages lighting and temperature would be highly beneficial. RL controllers’ hardware and practical implementation in real buildings are also vital for future research. Key considerations included sensor placement, computational integration of RL algorithms into control systems, and actuator responsiveness for façade components, which are crucial but remain underexplored in the current literature. Addressing these challenges can further enhance the effectiveness and adaptability of RL-controlled shading devices across various environmental contexts, shading components, and building geometries.