Review Reports - LA-GATs: A Multi-Feature Constrained and Spatially Adaptive Graph Attention Network for Building Clustering

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

- At the end of the Abstract, it would be good to mention the exact methods to which the proposed one is better;
- Some of the text in Figure is too small to read;
- Some more details should be given about the boundary point interpolation in Section 3.2.1 - either by mathematical expressions or by algorithmic steps (textual description);
- There are some minor typos in the text, e.g. on line 225 (page 6) the sentence ends with a '.', but the text after it somewhat continues the meaning;
- Eq. (3) is the same as Eq. (2) - perhaps it needs to be corrected;
- A reference for LeakyReLU could be added for better clarity to the reader;
- Some examples for possible nonliner activation function (sigma) from Eq. (8) should be given in the text in pt. 2 of page 9;
- The first sentence after Eq. (12) should be after it; Gamma from Eq. (12) should be described after the equation;
- In the first sentence in pt. 1, page 12, there are [] - possibly needing a reference number or should be removed;
- The class iii, mentioned in page 13, possibly should be i;
- Some motivation of choosing the Adam optimizer with 0.001 learning rate could be given (pt. 4.2, page 14);
- In Fig. 12 the last case probably should be (c), (f) and (i) instead of (a), (d), and (g); The same is observable for Fig. 13.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper looks all good to me. Well written!

Author Response

Thank you very much for your affirmation and support of my thesis! Your evaluation
has made me very happy. I will continue to work hard to improve my research and ensure
greater precision and refinement in my subsequent work.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents a well-structured and technically rigorous approach to building clustering, and the proposed LA-GATs model shows promising results in terms of accuracy and spatial continuity. To further strengthen the contribution, the following points are suggested:

Expand the literature context by including more references to existing clustering applications in cartography and urban studies, ensuring a clearer positioning of the proposed method within the field.

Clarify the scope of applications, particularly with regard to 3D city modeling. At present, the method appears more directly applicable to intelligent cartography, while its role in 3D modeling is less substantiated.

Discuss practical integration with expert-driven planning processes. Since urban clustering is often performed by planners considering legal, social, and cultural dimensions, the paper would benefit from outlining how LA-GATs could complement, rather than replace, such expertise.

Consider limitations and future directions, for example how the model could be adapted to include non-spatial attributes relevant to urban planning practice.

A PDF with detailed comments and annotations is attached for your reference.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

This paper proposes an improved Graph Attention Network (LA-GATs) for building clustering. The method integrates a distance-aware attention mechanism and a second-order neighborhood aggregation strategy to address the limitations of traditional clustering methods in preserving spatial continuity and semantic coherence. Overall, while the topic is relevant and the proposed model has potential, the paper requires substantial revisions to strengthen the theoretical rationale, improve the literature review, clarify methodological choices, broaden and deepen the experimental validation, and refine the conclusion.

The analysis of existing methods is under-referenced. Claims such as “most methods rely on single metrics or global optimization,” “existing methods fail to balance multi-dimensional similarity with spatial continuity,” and “Gestalt principles are subjective and lack unified quantification” are not properly supported by citations. For example, the statement that “traditional clustering methods often struggle to balance multi-dimensional similarity metrics with the preservation of local structures” requires references, as does the claim that “clusters may lack continuity and semantic coherence.” Similarly, the rationale for why “global optimization is vulnerable to noise and produces unreasonable segmentations” needs clarification. In addition, the sentence “it is essential to precisely determine the number of buildings that should be retained” (Introduction, paragraph 4) is not clearly connected to the methods being discussed.
The term “building clustering” needs to be clearly defined in the context of this study. At times the manuscript uses “building cluster,” but the terminology is inconsistent and should be unified.
The introduction presents Graph Attention Networks (GATs) and their advantages abruptly, without explaining why they are particularly suitable for building clustering. How do GATs compare to CNN-based or traditional graph partitioning methods? Without this connection, it is unclear whether the proposed approach effectively addresses the stated limitations.
The categorization of existing clustering methods into distance-based and feature-based approaches is too simplistic. The review of feature-based methods is very brief, with only two references, and does not capture the breadth of existing work. In the section on hybrid methods, the statement “…neglected spatial distance, resulting in inadequate geographic continuity” needs clearer explanation. Although limitations of each category are summarized, more principle-based discussion is required.
The proposed feature set (orientation, compactness, color, height) is not well-justified. Why are these features more appropriate for capturing semantic coherence of building groups than other morphological or texture-based features? The rationale behind feature selection requires theoretical or empirical support.
The Methods section reads more like a technical report. It explains how the model is constructed but does not adequately explain why these techniques address the stated research goals. The causal chain between “method” and “objective” is unclear. Reproducibility is also a concern: for example, the construction and definition of the similarity matrix, and the choice of the number of clusters in spectral clustering, are not sufficiently described.
The use of Compactness, Silhouette Coefficient, and Adjusted Rand Index (ARI) is reasonable, but the paper does not explain why these metrics are suitable for building clustering tasks specifically. Their applicability to spatial continuity and semantic consistency should be discussed.
The sample diversity is questionable, as experiments are limited to only a few regions in two cities. Reported clustering accuracy (65–75%) needs deeper interpretation. Ablation studies show the role of the distance bias and second-order aggregation, but are the improvements statistically significant? Comparisons are limited to DBSCAN and K-means, whereas stronger baselines such as other graph-based or deep clustering methods should be included. Current evaluation metrics may not fully represent spatial continuity or geographic semantic consistency.
The conclusion states that “this strategy increases clustering accuracy in residential areas by 21%,” but it is unclear relative to which baseline method and under which metric. Such claims need precise referencing and quantitative justification.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 4 Report

Comments and Suggestions for Authors

The authors have made revisions in response to the review comments, but some issues still remain inadequately addressed.

In Introduction, many terms and phrases (e.g., “more complex similarities,” “spatial structures,” “actual distribution patterns”, “buildings are influenced by factors such as proximity, similarity, and continuity”) are too general for an academic introduction. A reader cannot tell what spatial structures or which similarities you refer to.
In Introduction, the second paragraph outlines the shortcomings of traditional clustering, but the third paragraph does not clearly show how the proposed GAT-based method solves them.
I do not understand the purpose of the sentence: ‘However, how to fully leverage GATs for building clustering remains a topic that warrants further research.’ Its relationship to the aims and methods of this paper is unclear. The authors should clarify: (1) What role this sentence plays in the structure of the paper—is it intended to motivate the work, highlight a gap, or lead into their innovation? (2) What exactly is meant by ‘fully leverage GATs’—which dimensions (e.g. feature fusion, neighborhood modeling, attention mechanism design, hierarchical aggregation, etc.)? (3) In what specific ways does the proposed method respond to or address this notion of ‘fully leveraging GATs’?
“The classification of previous studies into three types, i.e. spatial distance–based clustering, semantic similarity–based clustering, and graph theory–based clustering, is acceptable, but the boundary between Type 1 and Type 3 is not sharply defined. Many graph-based approaches also rely on distance or attribute similarity, so the authors should clarify the criteria by which they assign a method to each category (e.g. by dominant mechanism, algorithmic structure, or modeling paradigm).
The authors should more clearly delineate what additional capabilities the graph-based approaches (Type 3) bring beyond the spatial distance approaches (Type 1): for example, handling non-Euclidean relationships, capturing high-order dependencies, etc.
In the final paragraph of this section, the authors present their LA-GATs method and list three advantages, but it is unclear how each advantage directly responds to the limitations identified in prior work. I suggest rewriting this paragraph to explicitly map: “Prior method X suffers from limitation A → our method addresses it via design choice 1,” and so on.”
In Section 3 (Materials and Methods), the authors state that four features – compactness, orientation, color, and height – are chosen for training, “considering computational complexity.” However, the rationale for selecting exactly these four features is not sufficiently justified. Some suggestions / concerns. Feature set design: Why these four and not others (e.g. aspect ratio, concavity, roof texture, functional semantic labels)? I recommend that the authors include a dedicated subsection—e.g. ‘Feature Design for Buildings’—which clearly explains the selection of the candidate set of morphological and appearance features considered. This section will greatly enhance the clarity, reproducibility, and methodological rigor of the paper.
Please use a 1-to-1 response format (i.e. respond to each comment individually) rather than summarizing responses collectively.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf