4.3.1. Construction of Clinical Scenarios

To acknowledge the multifactorial nature of complexity in medicine, items that reached consensus were then organized along clinical scenarios constructed by their random allocation within their respective domains of interest: patient's history, tumour number, main tumour size, location, and structure, access to the bladder cavity. One hundred and fifty scenarios were constructed (Table S2) and validated for clinical consistency (e.g., refuting the association of 30 mL prostate and female genital prolapse) by a senior author (B.M.). In keeping with the epidemiology of bladder cancer, twice as many scenarios were developed for male than female patients [54].

The panellists were requested to follow an adapted five-point Likert scale (Table 5) to answer the question: in the following scenario will TURBT result in incomplete resection or prolonged surgery (>1 h) or significant intra or postoperative complications (Clavien-Dindo Grade III and higher)?

Consensus was reached when the 95% confidence interval of the answers strictly showed "unlikely" as the upper bound (concluded as a scenario unlikely to be complex) or "possibly" as the lower bound (concluded as a possibly complex scenario). Otherwise, the answers were considered inconclusive, and the scenario was not considered for further analyses.

#### 4.3.2. Discrimination of Individual Items in the Prediction of Complexity

On univariate analysis, the two-tailed Mann–Whitney *U*-test tested in the 150 scenarios the relationship between the domains of interest and complexity, dichotomized as "very unlikely or unlikely" or "possibly, likely or very likely".

Logistic regression was conducted, with the domains showing *p* < 0.1 on univariate analyses as predictors and complexity as a dependent variable. The probability of a complex surgery was estimated from the probability function. In keeping with the logistic regression model [55], it acknowledges the contributions of all independent domains (Table 2) by their respective regression coefficients adjusted to the specifics of the case by the median opinions of the panel (e.g., the respective contributions to complexity of a single tumour compared to 4 to10 tumours were 0.96 and 0.96 × 3, respectively, as shown in Figure 2).

Following the structure of the probability function:

$$probability = \frac{1}{1 + \exp(-x)}\tag{2}$$

where x is the sum of the intercept value of the logistic regression and of the scores of the independent domains multiplied by their regression coefficients, for any domain, the product of its regression coefficient by the score of its descriptor correlates with the probability of a complex surgery. This was used to simplify the function into a checklist (Table 3) where the respective inputs of the items were similarly quantitated by the product of the regression coefficient of their domains by the scores summarizing the median opinions of the panel (e.g., location on the anterior wall of the bladder; median opinion: 3 (Figure 2), regression coefficient of tumour location: 1.44 (Table 2), product: 3 × 1.44, approximated for ease of use to 4.5).

In any clinical situation, recording the most significant item in patient's history and access to the bladder, in complement to the tumour number, main tumour location and size, calculated the Bladder Complexity Checklist Sum.

ROC curves of the model and of the Bladder Complexity Checklist Sum were compared by the C-statistics. Ultimately, calibration curves illustrated their accuracies in the estimation of the probability of complexity in individual scenarios [56].

STATA/MP was used for statistics (StataCorp, College Station, TX-USA), significance was set at *p* < 0.05.
