**2. Genomics**

## *2.1. Target Identification*

The *M.tb* H37Rv genome was sequenced in 1998, revealing ~4000 potential drug targets, of which ~50% were assigned a tentative function [8,9]. *M.tb* H37Rv was selected for sequencing as the type strain for *M.tb. M.tb* H37 was isolated from a patient in 1905 and serially passaged in the laboratory, a resulting strain was named H37Rv, with "R" standing for rough morphology and "*v*" for virulent [8]. It has become a commonly used laboratory strain of pathogenic *M.tb*, a genomic reference for clinical isolates, and a starting point for drug discovery. Since then, comprehensive whole genome sequencing of *M.tb* clinical isolates recovered from patients with TB has uncovered a core genome and mapped the accumulation of single nucleotide polymorphisms (SNPs) in protein coding genes [10]. Comparative genomics has also enabled drug specificity (across microbial species) and toxicity (to mammals) to be anticipated, based on the presence or absence of target protein coding sequences. The application of genomics to *M.tb* has therefore provided a framework of potential drug targets. Manipulation of gene function, through gene inactivation strategies, has aided recognition of pathways that are essential to *M.tb* in different microenvironments, thus highlighting targets for drug discovery programs [6].

Gene deletion methods designed to generate unmarked single gene knockout mutants have been employed alongside global approaches using transposons (Tn) to inactivate gene function [11]. Tn libraries are generated by disrupting genes through random insertion of a transposon throughout the genome, then applying a selective pressure to measure gene essentiality. DNA microarrays were initially used to map the changing abundance of thousands of Tn mutants, transposon site hybridization (TraSH) [12]; more recently coupled to whole genome sequencing (Tn-seq) for greater genomic resolution [13]. Tn mutant screening has been used to map the genetic pathways required for growth of *M.tb* and *Mycobacterium bovis* Bacillus Calmette-Guérin (BCG) in vitro [14]. Here, a *Himar1* based Tn delivery method using a transducing bacteriophage generated a library of ~100,000 independent clones. A total of 614 genes essential for in vitro growth were found to be evenly distributed throughout the mycobacterial chromosome. In addition, many genes that were shown to be essential for growth appeared to be co-transcribed in operons. Essential genes were identified in amino acid, co-factor and nucleic acid biosynthesis pathways; genes of unknown function were also classified as essential. Many genes assumed to be essential that were predicted to be involved in cell wall and protein metabolism were shown to be dispensable. This is likely due to functional redundancy; for example, *purT* and *purN* both offer alternative pathways for purine de novo biosynthesis and so neither gene was essential [15], providing valuable information that allows non-essential targets to be dropped from drug discovery portfolios. A fundamental pathogenic trait of *M.tb* is the ability to survive and replicate in phagocytes, avoiding phagosome-lysosome fusion and adapting to an intracellular lifestyle [16]. Therefore, to define pathways that are essential for intra-macrophage survival, Barczak et al. mapped genes required for intracellular growth using high content imaging alongside multiplexed cytokine analysis of macrophages infected with *M.tb* Tn mutant libraries [17]. Systematic, multiparametric analysis of *M.tb* Tn mutants impaired for intracellular growth identified functional relationships between *M.tb* Tn mutants and macrophage cytokine profiles. The authors showed that production and export of the complex lipid, phthiocerol dimycocerosate (PDIM), was required for the secretion of ESX-1 substrates and permeabilization of the phagosome, revealing key virulence determinants, alongside defining pathways that are essential to the metabolism of intracellular bacilli, identifying potential targets for drug discovery. Plainly, Tn mutant libraries will only identify genes that are essential in the model under investigation, successful drug targets are likely required to be essential across several different conditions to mimic the variety of microenvironments encountered by *M.tb* during natural infection.

Gene deletion or inactivation results in the absence of gene product, this may not be reflective of drug action, where complete inhibition of protein function may not occur [18]. Conditional expression strategies that reduce rather than abrogate protein function may better represent drug action, and crucially allows essential gene targets to be investigated in the laboratory. Conditional expression using inducible promoter systems (for example, Tet ON/OFF or Pip ON/OFF) allow the expression of essential genes to be increased or reduced to understand gene function, model drug target inhibition and genetically-validate drug targets. The utility of such an inducible gene expression system was highlighted by Johnson et al. where *M.tb* mutants depleted for 474 essential genes (termed hypomorphs) were screened against a large pool of potential inhibitors, allowing for >8.5 million chemical-genetic interactions to take place [19]. An episomally-encoded *sspB* gene was introduced to control protein degradation via a carboxyl-terminal fused DAS-tag. In addition, a 20 nucleotide "barcode" was introduced to facilitate enumeration by sequencing the barcoded PCR products derived from the mutants when pooled. The expression of SspB was controlled by a TetON promoter that was induced in response to anhydrotetracycline; by using TetON promoters with varying strengths, the level of gene product targeted for degradation by the SspB gene could be titrated. Primary screening using this hypomorph methodology identified over 10-fold more hits than whole cell screening wild type *M.tb*. As expected, well known antimicrobial drug classes showed interactions with specific hypomorphs, such as fluoroquinolones with their target GyrA, as well as rifampicin with the β subunit of RNA polymerase. This screening approach identified 39 inhibitors that targeted cellular components that are already clinically-validated antimicrobial drug targets; DNA gyrase, mycolic acid synthesis, and folate biosynthesis, targeted by the fluoroquinolones, isoniazid, and para-amino-salicylic acid, respectively [20]. The 39 compounds were either novel chemical entities or known compounds with re-purposed activity, such as the plant alkaloid tryptanthrin. In addition to finding novel inhibitors for well-validated targets, hypomorph screening also identified inhibitors against novel cellular targets. Johnson et al. demonstrated that strains hypomorphic for the putative efflux pump EfpA were inhibited by the compound BRD-8000 (MIC 6 μM) but this compound showed no activity against wild type *M.tb* (MIC ≥ 50 μM). Subsequent chemical optimization improved activity of BRD-8000 against wild type *M.tb* to an MIC of 800 nM, an increase of ≥63-fold activity. *M.tb* spontaneous mutants resistant to this

modified BRD-8000 compound showed a single point mutation in *efpA*, demonstrating that chemical modification of BRD-8000 had not altered target specificity.

Clustered Regularly Interspaced Short Palindromic Repeats interference (CRISPRi) has the potential to revolutionize the field, enabling precise gene silencing to identify and validate drug targets. A nuclease, dCas9 containing two mutations that eradicate its nuclease activity, is targeted to a mycobacterial gene of interest by a single guide RNA (sgRNA). Upon binding of the dCas9-sgRNA complex to the target site, the DNA duplex destabilizes and prevents gene transcription by blocking RNA polymerase promoter access [21]. Notably, the level of gene silencing can be controlled by varying the sgRNA length and sequence, this allows fine control of the expression of essential genes where traditional gene knockout approaches would be lethal. This system has been further optimized to be induced by doxycycline. This is a lipophilic drug with excellent tissue penetration properties, so that CRISPRi can be employed across a range of drug screening models including in vitro, intracellular and animal studies [22,23]. This approach was used to create libraries containing over 90,000 sgRNAs, generating pools of *M.tb* strains where the majority of genes have been targeted by CRISPRi, enabling high throughput screening approaches to be applied. The utility of CRISPRi in target-based drug discovery was demonstrated through gene silencing of folate metabolism [24]. *M.tb* and mammals require folate; however, *M.tb* must synthesize folate de novo while mammals obtain it through their diet. The variation in folate metabolism makes this biosynthetic pathway an attractive target for antimicrobial drug discovery [25]. While this pathway has been targeted in other bacteria with the antimicrobial drugs trimethoprim, inhibiting dihydrofolate reductase (FolA), and sulfamethoxazole, inhibiting dihydropteroate synthase (FolP1); the action of these drugs in mycobacteria is less clear. Generation of folate biosynthesis knockout mutants in mycobacteria has proved challenging, making target validation difficult for this pathway. *Mycobacterium smegmatis* is a non-pathogenic mycobacterium frequently used as a model organism for *M.tb* due to its low biohazard risk and tractable genomics [26]. Rock et al. utilized a panel of sgRNAs to generate hypomorphs of *folP1*, *folA* and *folC* (dihydrofolate synthase) in this model organism to show that these genes were individually essential. If weaker sgRNAs were used to decrease growth rate rather than inhibit growth completely, there was synergistic growth inhibition, demonstrating the utility of exploiting multiple targets in this pathway to maximize antimicrobial drug activity. Translation of the technology to *M.tb* will yield useful insights into target and pathway essentiality. One caveat to CRISPRi is the potential for off target effects, where dCas9 binds and silences genes that were not intended to be targeted. Bioinformatics packages exist that effectively predict this binding; however, these algorithms may not capture every off-target event [27,28].
