1. Introduction
House-type recognition and 3D reconstruction technology has important research and application value in architectural design, interior design, and other related fields. With the progress of science and technology and the development of computer-vision technology, house-type recognition and 3D reconstruction based on raster images have both become a popular research direction [
1]. The traditional methods of house pattern recognition and 3D reconstruction usually require manual participation and complex measurement work, even using laser ray [
2]. The reconstruction work is not only time-consuming and labor-intensive but also relies, in many cases, on existing buildings, which cannot be reconstructed in advance, and it is easy to introduce human errors with low accuracy [
3]. A raster image is a two-dimensional plane image acquired by equipment such as cameras or laser scanners that contains a significant amount of information about the structure and layout of the house [
4]. While house-type raster images are widely used in daily life due to their low price, easy dissemination, and vivid image quality, raster image-based recognition and reconstruction technology provides a more convenient and automated method for house-type recognition and 3D reconstruction, which is in line with people’s expectations [
5]. By analyzing the lines, corners, and textures in the raster image, the house-type information, such as the location, size, and connection relationship of the rooms, can be deduced. At the same time, by utilizing the change of view angle of multiple raster images, 3D reconstruction of the house can be realized, and a house model with a geometric structure can be generated [
6].
Although house building design tools such as AutoCad, Revit, etc. [
7], can provide personalized design and development experience, these pieces of software are intended for professional designers. Due to the need for a large amount of manual processing, their human–computer interaction is complex, and there is a relatively steep learning route, and it is time-consuming and laborious to get started, making it more difficult for ordinary people to gain value from it [
8]. In addition, design tool development technology is mostly monopolized by foreign technology companies, the cost of using the software is higher, and it is mostly oriented to professional design units. Architectural floor plans play a crucial role in designing, understanding, and remodeling interior spaces. Designers can quickly recognize the extent of a room, the position of a door, or the arrangement of objects (geometric shapes) with the naked eye, and they can easily identify the type of room, door, or object through text or icon styles (semantics) [
9]. Therefore, the accurate recovery of vectorized information from pixel images has become an urgent problem [
10]. While deep-learning technology is still in its infancy, the use of image processing is the more common processing method. Morphological operations, Hough transforms, or image vectorization techniques were often used to extract lines, normalize line widths, or group them according to predetermined widths [
11]. Detected lines were used to match walls according to a priori rules requiring various image heuristics such as convex packet approximation, polygonal approximation, edge linking to overcome gaps, or color analysis along the lines [
12], and doors and windows existed in walls that can be detected by geometric features of symbols [
13]. Yamasaki T et al. [
14] parsed the floor plan image into many connected segments and recognized the walls by the visual feature extraction approach based on the bilinear positional orientation and distance rules by contour extraction of the input image. Door and window recognition relies on the geometric visual features of doors and windows on the floor plan to be matched according to pre-set rules. Its set rules cannot prevent the interference of other furniture components, and its recognition accuracy is low when the house plan is more complex [
15]; The disadvantages are that it is not possible to recognize tilted walls and the algorithm is very complex and time-consuming [
16]. Ahmed used multiple erosion expansion operations on the image to achieve segmentation of coarse and fine lines of the image, classifying the lines according to predefined rules of line thickness. Ahmed used image overlay to recognize the textual information of the image through the idea of separation of text and graphics [
17] and used Speeded Up Robust Features (SURFs) for recognizing symbols on the house plan, such as doors and windows [
18,
19,
20,
21]. Heras [
22] proposed a generalized method for floor plan analysis and interpretation, applying two recognition steps in a bottom-up fashion. Firstly, basic building blocks, i.e., walls, doors, and windows, are detected using a statistical patch-based segmentation method. Secondly, graphics are generated and structural pattern recognition techniques are applied to further localize the main entities, i.e., the rooms of the building. The proposed method is able to analyze any type of floor plan and recognize features, with high accuracy, on different datasets. Huang [
23] proposed prior-knowledge-based wall detection by manually designing the local geometric and color features of the wall, using the self-similarity of the wall for wall detection and alignment, and then detecting the door and window symbols in the floor plan using a deep-learning two-stage target detection model to determine the location of the doors and windows before classifying the target region. This manual feature design approach based on image processing requires a high quality of the house plans and performs poorly on house plans with less clarity and more noise. Ma Bo [
24] proposed multi-attribute-analysis-based house-type element recognition, using template matching and edge features to detect scale and walls in house-type drawings, and was based on shape features for column recognition. This purely rule-based recognition has poor generalization and may be usable for some datasets, but underperforms on house-type drawings that have inconsistent edge information and large differences in style. Shen [
25] proposed a structure extraction method to filter out interference lines, extract wall structures, and divide space area through morphological processing for building floor plan image features. They realized scale recognition of house-type drawings based on a target detection algorithm [
26], and then completed the calculation of space area by means of coordinate transformation, which showed that the scale recognition and calculation based on target detection was accurate [
27].
To synthesize the above background, this paper recognizes and analyzes the house-type map based on the house-type raster image, so as to realize the house-type reconstruction. The construction is studied, and the significance of this research can be summarized as follows:
(1) Based on existing deep-learning technology, we study the application of deep-learning technology in the recognition of house patterns and improve the existing recognition technology of house patterns regarding the problem of low recognition accuracy and poor generalization performance of the existing methods. We use key point detection technology to locate the key points of the house pattern in order to determine the structure of the house pattern, with reference to the human body’s key point detection model, and improve the feature extraction network and the feature fusion network in order to adapt to the task of this paper. Through target detection, corner detection, and OCR for scale calculation, the accurate identification and vectorization of walls, doors, windows, scales, and other elements in the house plan are realized.
(2) Based on the existing Web development technology, we reconstruct the vectorized data of the house model in the browser and provide interaction to realize the design of the house model. Aimed at the problems of low efficiency and poor performance of the existing house model design tools, we develop a fast and modern real-time reconstruction and design tool for house models and use the front-end MVC mode to control the data flow at a fine granularity. We optimize the rendering process in order to satisfy the growing demand for personalized and customized houses of the people in China, and to meet the demand for visualization of the house models by businessmen or individual designers, and to assist in the design and creation of the house models, so as to promote the stable and harmonious development of the upstream and downstream real estate industry chain, so that the national economy can develop healthily.
(3) We combine the cross-fertilization of deep-learning technology with civil engineering disciplines, exploring the application of deep-learning technology in civil engineering and decorative engineering industries, exploring the development path and practical application of digital twin technology and smart cities, empowering traditional industries with digital intelligent technology. In this paper, on the basis of a large number of raster house-type floor plans accumulated from the relevant industries at present, we carry out research on the recognition and vectorization algorithm of raster house-type plans. On the basis of summarizing previous research on the recognition and vectorization of house types, in order to solve the problems and pain points in the previous research, this paper puts forward a method of recognizing and vectorizing the elements of the house types, which shows excellent performance. Against the background of the industry’s lack of excellent visualization tools for identifying house plan elements, this paper develops a Web-based performance. It develops a Web-based house model reconstruction tool with excellent performance, which makes the whole house model reconstruction workflow more complete, plays the role of the top and bottom, and lays a solid foundation for the downstream tasks, such as the automatic layout of house models and the automatic generation of home furnishings.
2. Household Element Edge Detection
The standard house raster plan has certain prior knowledge, and there are certain laws in many house plan data; for example, the inner area of the wall usually has the same color, and the two sides of the wall are the edges. In the early process of house plan wall, door, and window recognition, edge detection with prior knowledge to detect the wall target is a common practice, and edge detection is still widely used in the detection of the elements of the house plan. Edge detection is a very important image feature extraction method in the field of computer vision and digital image processing, and it is usually the basis for other types of image feature extraction.
Image edge detection [
28] is usually implemented in two ways: one is the traditional image processing-based method, the other is the deep-learning-based edge detection method that appeared after 2015. The traditional image processing method is more widely used in edge detection; the edge detection method discussed in this section is based on the traditional method. Image edge detection is realized by calculating the gradient of the image, which is converted to obtain the gradient of the image by using the matrix of operators to calculate the convolution of the image to obtain the core of the edge detection algorithm of the image, which lies in the edge detection operator. The extraction of edge detection is a filtering process, which is used to extract different features by different operators. There are many commonly used edge detection operators, including the following: Roberts operator, Prewitt operator, Sobel operator, Laplacian operator, Canny operator, and so on. In practical applications, it is often necessary to try and adjust the selection of different operators to obtain the best edge detection results. The Roberts operator is better when the image edge is close to plus 45° and minus 45°, but its shortcomings lie in the inaccuracy of the edge localization. The Prewitt algorithm is better in the horizontal and vertical parts of the detection, but the gray value of the noise will lead to the effect of the large poor. The Laplacian operator is a rotationally invariant, isotropic second-order differential operator that captures the overall edge information in an image and responds well to some specific image structures. It is sensitive to noise and needs to remove image noise by low-pass filtering.
The Canny operator is a comprehensive first-order differential operator detection algorithm; its goal is to find an optimal edge profile. In the filtering class of edge detection algorithms, the Canny operator is better. The excellent edge should be the position of the precise edge of the full readiness of the strong resistance to noise. In practical edge detection in engineering, the Canny algorithm is the most common; the main calculation process of the Canny algorithm edge detection is as follows:
(1) Gaussian filtering: Gaussian filtering is a commonly used image smoothing filter, which is mainly used to remove image noise, according to the Gaussian formula, to generate a two-dimensional filtering kernel, i.e., the Gaussian kernel. Then, the gray value of the pixel and its neighboring pixels and the filtering operator are convolved with the operator to achieve the weighted average of pixel values, and the operator is usually used as a 5 × 5 or 3 × 3 Gaussian filtering operator. (2) Calculate the gradient image and angle image: The edges have the feature that the gray value changes drastically on both sides of them, and the change of the gray value can be regarded as the derivative of the gray value. Since the pixel points of the image are not consecutive, the derivative can be described by the difference value. Four operators, horizontal, vertical, ortho-diagonal, and anti-diagonal, are used to detect the horizontal, vertical, and diagonal edges in the image, respectively. After the convolution operation, the gradient value of each pixel point is maximized and the direction is determined from it, so that the luminance gradient image of each pixel point in the image and its direction can be obtained. (3) Non-extremely large value suppression: When processing gradient images, it is very common to encounter problems such as uneven width of edges, blurring, and misrecognition. In order to solve these problems, pixel points that are not edges need to be eliminated, and the method used is to select the extreme value points and then suppress the non-extreme value points around them. This process consists of comparing each pixel point and its neighborhood surrounding pixel points in the gradient direction, retaining the extreme value points in the gradient direction while suppressing the non-extreme value points around them. (4) The dual thresholding algorithm detects and connects edges: Two thresholds (high and low thresholds) are set artificially to categorize pixels in an image into strong, weak, or non-edge. Strong edges are pixels larger than the high threshold, while weak edges are pixels between the high and low thresholds, with the high threshold usually set at two times the low threshold. Non-edges are pixels below the low threshold. With the hysteresis thresholding method, when a strong edge point is detected in the surrounding eight neighborhoods, the weak edge point is converted to a strong edge point to complement the strong edge, and from there, new edge points continue to be detected and connected until a complete contour is formed.
2.1. Scale Endpoint Corner Point Detection
The endpoints at both ends of the scale have significant corner point properties. Corner point detection is a technique used in the field of computer vision to recognize corner points in an image, which are suddenly changing locations in an image, where two or more edges intersect. Unlike edge detection, corner detection focuses on local maxima in the image, i.e., locations that change suddenly. Commonly used corner detection algorithms are Harris corner detection and Shi–Tomasi corner detection. Harris corner detection is a method based on the local image gradient, which calculates the eigenvalues of the gradient matrix in the local window region to determine the corner points in the image. Shi–Tomasi corner detection is an optimization of Harris corner detection, which follows the gradient eigenvalues of Harris corner detection. Corner detection is an optimization of Harris corner detection, which follows the gradient eigenvalues in Harris corner detection but improves the scoring function. The Harris corner detection algorithm uses the window sliding in the image to compute the grayscale change values to identify the corner points in the image. Its key processes include the following: (1) Image Grayscaling: the image is converted to grayscale to eliminate the effect of color information. (2) Difference Value Calculation: The grayscale difference between neighboring pixels is calculated to enhance the image edge information. (3) Filter Smoothing: The image is smoothed using a Gaussian filter to reduce the effect of noise. (4) Local Extreme Value: The extreme value points in the local region of the image are calculated, and the corner point candidates are screened. (5) Confirmation of Corner Points: According to the value of the corner point response function and local features, the corner points are finally confirmed. (6) If the grayscale image has undergone a large grayscale change in all directions through the window calculation, it is considered to be a corner point area. The window can be a normal rectangular window or a Gaussian window with different weights for each pixel.
The window is shifted in each direction (
) and is expanded according to the change in the gray value of the image after a binary first-order Taylor series approximation. It is written in the form of a matrix, as shown in Equation (
1):
The Shi–Tomasi corner detection method is similar to Harris, replacing the scoring function, which recognizes a corner point if the score exceeds a specified threshold. The scoring formula for Shi–Tomasi corner detection is shown in Equation (
2):
2.2. HSL Color Space Model
HSL [
29] (Hue, Saturation, Lightness) has three components, H on behalf of the hue, S on behalf of the saturation, and L on behalf of the brightness. HSL color space can be expressed as a spatial cylinder, as shown in
Figure 1. In the L component of the luminance, 100 means white, and a brightness of 0 means black. The hue is expressed in terms of the polar coordinates of the polar angle, the saturation is expressed in terms of the polar coordinate of the polar axis, and the luminance is expressed by the height of the middle axis of the cylinder. The elements in house plans have rich a priori knowledge of color and luminance and are often used for wall segmentation, and when luminance correlation needs to be detected, it is more accurate to use the HSL color space.
2.3. Neural Network Activation Function
The activation function is one of the components of neurons in deep-learning models [
30] and controls the activation or not of neurons. The activation function is introduced to increase the nonlinearity of the neural network model in order to enhance the network learning ability and expressive ability, from the ability to better fit the objective function. In modern neural network models, activation functions are widely used, and they mainly have the following roles: Solving nonlinear classification problems: many real-world datasets and problems are nonlinearly divisible, and the nonlinear activation function can help the network learn nonlinear relationships and better solve classification problems. Activation sparsity: the activation function can limit the neuron’s output range, thus controlling the neuron’s active state The activation function can limit the output range of neurons, thus controlling the active state of neurons, which is conducive to improving the sparsity of the network, reducing the computational complexity, and thus optimizing the network. Solving the problem of gradient disappearance and gradient explosion: when the neural network is improperly designed or the initialization of the training parameters is incorrect, the training will produce the problem of gradient disappearance and gradient explosion, which makes it difficult for the network to be trained. Network output range: to ensure that the output of the neuron is located in a specific interval, or to perform the normalization process. With the development of neural network models, the activation function is also in constant development; this paper will introduce the very widely used activation functions Sigmoid and ReLU, and, due to their shortcomings, also introduce three modern activation functions: LeakyReLU, GELU, and Swish.
The Sigmoid function, which was commonly used in the early days of neural networks, is a common S-shaped function that maps the input between 0 and 1 and is calculated as shown in Equation (
5):
(1) Sigmoid function: The Sigmoid function is shown in
Figure 2, where the horizontal axis represents the input of the function and the vertical axis represents the output of the function. The Sigmoid function is characterized by an output range of 0 to 1, which ensures that the neuron’s output is in a controllable range, and it is usually used as a neural network classifier to output the probability. The Sigmoid function is also a smooth and integrable function, avoiding jumps in the output value. In the early days, Sigmoid was used as the activation function of neurons, but the function also has certain problems: the Sigmoid function performs exponential operations, which is not friendly to computation; the output of the function is not centered on 0, which leads to slower convergence; when performing back propagation, it is easy to make the gradient disappear, and it is difficult to update the network weights. Therefore, in modern neural networks, Sigmoid usually plays the role of a classifier and is not recommended as an activation function at this stage. (2) ReLU function and LeakyReLU function: The ReLU function is a widely used and popular activation function; it solves the problem of some previous activation functions and effectively solves the problem of network gradient disappearance during training, and because there is only a linear relationship, the calculation speed is much faster. The expression of the ReLU function is shown in Equation (
5). The function image is shown in
Figure 2; the horizontal axis represents the input, and the vertical axis represents the output of the function.
4. Vectorized Reconstruction Methods for Household Plans
4.1. 2D Reconstruction Method for House Plans
The wall and door/window pictures in the 2D house plan are provided by the api provided by PixiJS, which draws them in real time according to the starting point position and thickness of the wall and door/window and makes connection point optimization for the wall connection point, if the end point of the wall does not have any intersection points with the other wall. Then, the edge shape is defined as a right angle, and if two walls have an intersection point, then the thickness of the wall is taken into account for calculating the two sides of the intersection point, and the two intersections points will be used as a connecting line for the two walls to be respectively drawn, as in
Figure 18. This system provides four kinds of door and window drawings. This can be used to join the reconstructed house type, such as in
Figure 19.
The 2D reconstruction of the house map is designed and implemented using PixiJS6. PixiJS organizes the rendering structure through a container, which is the object that creates the scene graph and which collects a set of subview object sprites, graphics, text, etc., together to form a tree-shaped rendering structure, as shown in
Figure 20. The PixiJS rendering engine continuously redraws, and after updating the view objects, PixiJS will render to the screen and repeat the rendering cycle. PixiJS provides Graphics objects for 2D drawing, and also provides an event-based interaction system to manage display object interactions. Methods such as mouse-click events can be used to see where the wall that needs to be drawn is located on the canvas, and then the wall view can be drawn through the Graphics api. When it needs to be updated, the wall view is drawn from the Visual object pool to find the view object and re-trigger the re-drawing of its container to realize the incremental update of each element part and provides finer control of data flow and the timing of triggering the re-rendering. This design makes it no longer necessary to compare which view objects need to be updated through the comparison algorithm of the virtual view objects when the data objects are updated, and each view object of the wall, door, window, etc., is instantiated by the corresponding wall, door, window in
Figure 21, etc. The view object is instantiated by the corresponding data object, so that a Map structure can be constructed from the data object to its corresponding view object relationship, once the data object changes, through the event triggered, so that its corresponding view object obtains the changes and updates, according to different event categories, to the corresponding view update.
In addition to the wall and window and door views, in order to provide visual convenience, length marker objects have been added to each wall and window and door view object, and these have been rendered in separate containers to visualize the calculated length and geometry of the wall and window and door.
4.2. 3D Modeling Rendering Techniques for House Plans
Three-dimensional modeling and rendering of house plans can be achieved through a variety of tools and techniques in the field of engineering using AutoCaD2020, Revit2019 for implementation. The use of game engines such as Unity and Unreal Engine is also a more common method; commonly used game engines are provided, including models, materials, physics, lighting, camera, and other common elements of the graphics scene. In this paper, the system is based on WebGL design and implementation; in view of the unity and integrity of the system, Babylon.js is the Web-side 3D modeling and rendering framework. Babylon.js is an open-source 3D game engine and graphics rendering library based on WebGL, which is designed to enable developers to create high-performance interactive 3D scenes and applications in the browser. It provides a rich set of features and tools to efficiently create complex 3D games, virtual reality, and augmented reality applications, product demonstrations, etc. Babylon.js provides encapsulated upper-layer interfaces that allow developers to directly develop graphics in Javascript or Typescript without the need to use complex shader languages and WebGL underlying interactions. There are several important basic concepts in Babylon.js: (1) Scene: A container that contains all the visual objects in a viewport. The scene is responsible for all rendering, lighting, camera, physics, and interaction operations. (2) Mesh: A basic object in Babylon.js used to represent the geometry of a 3D object. Many triangular planes are connected together to represent the shape of a 3D object. The mesh can be some basic geometry such as a cube, sphere, cylinder, etc. It can also be an external 3D model loaded through the model loader, such as .obj or .gltf/.glb files, or it can be described by the vertex data, which can be triangulated through a list of vertices and indices to customize the mesh shape. The mesh can add properties such as materials and textures and can apply geometric transformations. (3) Camera: The camera is the viewpoint used to observe the 3D scene, which defines the field of view and the observation direction in the scene. Babylon.js provides several types of preset cameras; the most commonly used ones are the general-perspective camera and the arc-rotation camera. The general-perspective camera is similar to the first-person view of a human being, and the arc-rotation camera is in the orbital view of a satellite, which is always directed towards the center of the specified target position. In this task, the display of the 3D scene of the house model is selected from the arc rotation camera. You can rotate it to view the scene in the house model rendering results. (4) Lighting: Lighting determines the light and dark effect of objects in the scene, and the position, direction, color, and other attributes of the light directly affect the lighting effect of the objects in the scene. Babylon.js also provides preset types of lighting, such as point light, parallel light, spot light source, hemispherical light source, etc. In this task, the default configuration is point light, parallel light, spot light source, hemispherical light source, and so on. In this task, the default configuration of the light source is the parallel light source. (5) Material: Material determines the appearance and texture of the mesh surface. Babylon.js provides commonly used materials for the model, including standard material, PBR material, texture material, and so on. (6) Texture: Texture is an image used to cover the surface of the mesh; it can be used to add color, pattern, mapping, and other effects to the mesh. All the walls and doors and windows of the relevant location information are derived from the 2D raster floor plan vectorized plane results. In the 3D model reconstruction, 2D information is known, so the height dimension information in this paper is preset for each floor height of 3 m, the thickness of the external walls of 24 cm, the thickness of the internal walls of 12 cm, and the same height as the height of the floor of 3 m. The default height of the door is 2.1 m and is placed on the ground, the height of the window is 1 m, and the height from the ground is 1.5 m in
Figure 22. With this preset height information, model objects can be created and placed in the scene according to these 3D data, and each wall can be abstracted as a rectangle or trapezoidal prism, and the wall information is described in terms of surfaces. When there is a connection relationship between walls, the intersecting vertex positions of the wall edges at both ends of the connection point need to be calculated to support the subsequent mesh vertex calculation. For walls, it is necessary to first calculate the 3D geometric data of each geometric face of the wall based on the wall data object, including 3D surface vertices, contour lines,
Figure 23 enclosing boxes, holes, and so on. Each geometric face is converted to triangular mesh data, i.e., the triangulation process, and then a wall mesh object is created based on these geometric attributes. The generated triangular mesh data are used to set the vertices, indexes, normals, UVs, materials, and so on, and these are added to the Babbitt mesh object attributes and added to the Babylon3D scene. As for the mesh of windows and doors, the pre-built glb files of the window and door models are loaded using the model loader provided by Babylon. Since the doors and windows must be located on the wall, after calculating the geometric properties of the doors and windows first, the holes for the doors and windows are opened in the wall mesh, the doors and windows are placed in them, the related wall mesh is recalculated and built, and the 3D view is rendered.
4.3. System Architecture Design
In the development of large-scale front-end graphical applications, data-driven views can control the flow of data at a finer granularity, and the view needs to be supported by the corresponding data model. When the user operates the view, it triggers the change of its corresponding data model, which triggers the re-rendering of the view. There are two mainstream design patterns for the realization of this mutual separation of view and data: MVC and MVVM. In the MVC design pattern, the controller is responsible for receiving user inputs and requests, and it then sends the requests to the corresponding Model for processing, which then triggers the update of the corresponding View object, thus updating the view performance, which usually requires the identification of the view object corresponding to a certain data pair. MVVM decouples the data exchanges between user-defined data objects and real views by means of a virtual view object, and the data exchanges between user-defined data objects and real views. MVVM uncouples the data exchange between user-defined data objects and real views through virtual view objects, and obtains the minimum update of the view objects through the comparison algorithm of the virtual view objects which is usually used in the case that it is difficult to determine which view objects need to be updated for the update of the data objects. These two design patterns can fully realize the modularity of the function; each module is independently maintained, the modules do not affect each other, and there is high cohesion within the module and low coupling between modules.
The main project is developed with Vue and Element Plus, using Typescript language. The graphic rendering part will be used as a dependency package of the front-end system display project, which will be installed and imported by the main front-end project after construction. Referring to the front-end MVC design pattern, the whole graphic rendering module consists of the following packages: VisualModule package, Schema package, Interaction package, History package, and RenderApp package. VisualModule package: This mainly stores the encapsulation of 2D household scenes and 3D scenes, divided into 2D view models and 3D view models, and contains the following core classes: Visual2dModule, which is used to integrate the applications, containers, and resource loading provided by PixiJS, and to receive user interactions on the 2D area and event notifications from the scene; Visual3dModule, which is mainly used to integrate Babylon’s scene, camera, lights, materials, textures, mapping, resource loading, etc., and receive various events from data objects in the current scene; VisualWall2d and VisualWall3d are packages for the 2D and 3D view models of the wall model, respectively, and are used to compute in the 2D and 3D scenes; Wall2d and VisualWall3d are packages for the 2D and 3D view models of the wall model, respectively, which are used to compute the geometric attributes of the windows and doors in the 2D and 3D scenes and render the window and door views; VisualRoom2d and VisualRoom3d are used for the rendering of the house view rendering. Schema package: A package used to describe the structure of the user-defined data model in the scene under the current scene, register data objects in this package, and complete the unified serialization and deserialization operations of the custom walls, doors, windows, rooms, and other data objects, which is a collection of all the data objects describing the current house model. It contains an important class, SceneSchema, which is used to describe the collection of data objects and the relationship between data objects in the current scene, and is the container for all the data models of walls, doors, and windows. Unlike Babylon.js Scene, SceneSchema describes the relationship between the data object elements in the current user-defined house model, rather than the concrete EntityWall and EntityWallAttachment, which are the data models of walls and doors, encapsulating the intrinsic attributes such as length, thickness, start position, end position, height above ground, and so on, of walls and doors. The interaction package mainly provides the interaction and operation of 2D scene views, receives and processes the user’s interaction in the scene, and changes the corresponding wall and door data objects. It contains certain core classes: InteractionDrawWall, used for wall drawing; InteractionEdit, used for editing the attributes of walls, doors, and windows; InteractionTransport, used for dragging walls, doors, and windows; InteractionScale, used for specifying the scale of the background of the current house type. History package: This is used to manage the historical state of the data in the current scene and provide the function of undoing and redoing the data in the scene. Based on the data changes in the scene, it caches the historical state of the data objects in the current scene, which relies on the Data Definition Schema package, including the core class HistoryManager, which is mainly used for the caching of the data objects in the tree of the current scene and comparing the old and new scene trees, thus realizing the comparison of the tree of the current scene. It is mainly used for data object caching of the current scene tree and comparison of the old and new scene trees, thus realizing the forward and backward of the history of the current scene tree. RenderApp package: This is used to integrate the contents of the above packages, encapsulated into an integrated class as the rendering module entrance for external project use, including the core class RenderApp, for the integration of the above packages and unified exposure to external use. BFF: This provides the back-end services using Nest.js development, as the main back-end services and front-end service communication. Usually a front-end project only needs a back-end; all internal communication by the back-end server to solve the communication between the core classes. FileService is used for floor plan uploads. FloorPlanService is used for the persistence of the floor plan design preservation. The household identification module is developed and deployed separately, and FastAPI is used to provide services to the outside world, as a separate service of the back-end system, which is called by other modules through network requests to realize the uncoupling between modules.
Household Identification and Reconstruction Module Design
The main function of this module is to recognize the raster house plan images uploaded by users and to draw and render the recognition results in the front-end, and according to the reconstructed 2D vectorized house plan, synchronously construct and render the 3D house model, which can be displayed in real time under the 3D scene, in order to realize the effect of reconstructing the 3D house model through 2D raster house plan images.
The flowchart of house model recognition and reconstruction is shown in
Figure 24. The user first needs to upload the house model that needs to be recognized, package the image as FormData, and send it to the back-end for scale recognition of the house model and reconstruction and vectorization of the wall, door, and window primitives, and then load the image into a 2D canvas element, use the AssetsLoader provided by PixiJS to load the background image of the house model into the middle of the window, and center-align and scale the image according to its width and height so that the user’s uploaded house model can be located right in the middle of the user’s viewing area. The house model background image is loaded into the middle of the window using the AssetsLoader provided by PixiJS, and the image is center aligned and scaled according to its width and height, so that the house model uploaded by the user can be located in the center of the user’s visual area. The server side calls the algorithm described in
Section 3 of this paper for vectorized reconstruction and scale calculation of house-type walls, doors, and windows. When the BFF receives the request from the front-end, it uploads the user’s picture to the OSS and forwards the request and the OSS address of the picture to the processing module of the Household Type Recognition Service. Subsequently, the recognized scale and house-type primitive information is encapsulated in JSON data format. When the front-end receives the response, it will first pop up the position and number of the scale bar for the user to confirm, in order to facilitate fine-tuning of the scale area, and then after confirmation, the graphics rendering module will create the data objects and view objects of the walls and doors and windows according to the vectorization results and display them onto the canvas element. If the user uploads a house-type plan that originally has no scale bar information, then in the center of the 2D canvas there will be a default scale ruler; the user manually specifies the scale, and then the real size of the primitives is recognized for calculation. The user selects the floor plan they want to recognize and uploads it, packages it as a FormData object, and sends it to the BFF back-end. The BFF back-end receives the request for uploading the image, saves the image on the OSS, forwards the image address and the TaskId of the current task in the message queue, and returns the TaskId and image address to the front-end, which needs to check whether or not the current task has been completed by using the TaskId to the BFF service once every 5 s. The front-end needs to use TaskId to query the BFF service every 5 s to see if the current task is completed. After waiting for the back-end AI house-type recognition to complete and return the response to the network request, the BFF service will return the recognition result to the front-end in the next front-end polling, and the user-selected house-type plan will be loaded in the central 2D canvas using the AssetsLoader, and the rendering of the scale recognition scale, the position of the scale line corresponding to the scale, and scale number size. If there is a correctly recognized scale result in the current house plan, the scale marker area will appear at the recognized scale position, as shown in
Figure 24.
If the current floor plan does not have a recognizable scale, the scale will appear in the center of the floor plan, and the user can drag the position of the scale bar and click the circle buttons on both sides of the drag bar to adjust the length of the scale. After the user confirms its position, the keyboard is used to click the Enter button and release it; the reconstructed 2D floor plan is drawn in the 2D Canvas area, as shown in
Figure 25, and the 3D reconstruction of the floor plan will be displayed in the upper right corner. By clicking on the Switch View button in the upper left corner, it is possible to switch the display position of the 2D and 3D views, as shown in
Figure 26. After the identification is completed, the right panel will appear to adjust the transparency of the background of the floor plan and the display scale. The transparency of the background image will be lowered, as shown in
Figure 26, after the user has completed the identification and drawing of the floor plan. The transparency of the background image can be lowered to reduce the interference of the background floor plan, or to remove the scale of the background image, so that the user can focus on the design of the floor plan.