*2.1. The Standard Approach: Terminology*

The channels in proteins were calculated with the CCCPP software (binaries and documentation available at http://petitjeanmichel.free.fr/itoweb.petitjean.freeware.html). The first part of the method implemented in CCCPP is described in [57]. For clarity, we summarize it as follows. The smallest convex domain enclosing the heavy atoms of the protein is a polyhedron partitioned in non overlapping tetrahedral cells with atoms at their vertices (Delaunay triangulation). Two adjacent cells are separated by a triangle with atoms at its vertices, acting as a door between two tetrahedral rooms, which let or not the ligand pass through to travel from one cell to its neighbor. Having flagged all triangular doors with their status, open or closed, it is easy to exhibit the protein shape and its concavities: the protein shape is modelized by the set of tetrahedral cells interconnected by triangles, which can not be passed by the ligand, although the other cells are part of the concavities. Thus, it can be seen whether or not the ligand is sterically allowed to travel from the exterior of the protein to the location of the active site.

It is emphasized that the concavities (or channels) available to the ligand depend on which ligand is considered, and by no way constitute a universal network of concavities (or channels). That should not be shocking: e.g., the space available in the protein to a small molecule such as water cannot be identical to the space available to a large ligand such as cyclosporin or erythromycin.

We also emphasize that the usual terminology dealing with voids inside proteins does not yet make consensus: channels, concavities, pores, pockets, etc. Here, we call channels the concavities linking the exterior of the protein to its buried active site. In the case of a protein with an active site at its surface, we would say that the concavity is a pocket, while surface concavities without any active site are also often called pockets. A concavity throughout the protein and linking its exterior at two places can be called a pore, without reference to any active site. We insist that these intuitive definitions are introduced for clarity but are not intended to be mathematically rigorous.

However, our data structure is rigorously defined and can be handled with graph theory tools. The facial graph was defined as follows: each tetrahedral cell is a node of this graph, and each triangle between two adjacent tetrahedra (i.e., two nodes) is an edge of the graph linking these two nodes if and only if the ligand can pass through this triangle. In general, the facial graph is not connected: it has several components. Any component linking the exterior of the protein to the active site is called a channel. Each ligand has a smallest size (thickness) denoted by CV (critical value) [57]. There is a largest CV for which at least one access channel to the active site exists: it is called the limiting CV, and is denoted CV*lim*. Above this value, it is declared that the ligand cannot access to the active site due to sterical constraints. The reader is referred to the original paper [57] for advanced technical details.

#### *2.2. The Improved Approach: Minimal Cost Paths*

The new part of CCCPP that we developed in the framework of the present study is presented below. The full CCCPP software is publicly available on a repository located at http://petitjeanmichel. free.fr/itoweb.petitjean.freeware.html.

It appeared that the channels of the CYPs have large parts at the protein surface and that the main channel to the active site is a funnel which permits several potential pathways for the ligand. To find preferential trajectories for the ligand, we defined a minimal cost path, denoted MCP, as follows. To each edge of the facial graph is associated the cost CV/CV*max*, where CV is the critical value of the current ligand, and CV*max* is the maximal critical value which would allow a hypothetical ligand to pass through the triangle associated to this edge. This cost is in the interval (0,1). The smaller is the cost, easier is the passage. In the facial graph defined in Section 2.1 we can seek for the MCP among all possible paths linking the exterior of the protein to the active site. This is performed with the algorithm of Dijkstra [68]. To detect further potential pathways of interest, all edges of the current MCP are removed, then Dijkstra's algorithm is applied again, and so on until no new MCP can be found.

Each MCP is an ordered sequence of triangles, but it is also an ordered sequence of tetrahedra. Discarding if it is a channel or a MCP inside a channel, a set of tetrahedra has a volume, which is the sum of the volumes of the tetrahedra. It also has a boundary, which is the set of the triangular faces through which the ligand cannot pass. Thus, it has a surface, which is the sum of the surfaces of these latter triangles. The MCPs are clusterized. Each cluster defines a trajectory: it has surrounding atoms, residues and secondary structures [60]. Here, these trajectories correspond to channels, in the sense of [64].

#### *2.3. The Two Modes of Visualization of the Channels and Pathways*

These two modes of visualization are exemplified in Figure 1.

The first mode of visualization of the channels relies on the facial graph of the channels, or parts of this facial graph. It is done by generating a molecular file such that each tetrahedron is a virtual atom located at the barycenter of its four surrounding protein atoms, and the edge connecting two tetrahedra is a bond between their two respective associated virtual atoms.

The second mode of visualization applies mainly to pathways in channels. MCPs can be visualized by generating a molecular file containing the edges of the tetrahedral cells as bonds linking protein atoms. It is pointed out that these bonds originate from the triangulation of the protein, and as such in general they are not chemical bonds between protein atoms: this is just a functionality of CCCPP.

All figures displayed in this paper were generated with the help of PyMOL*TM* (The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC, https://pymol.org/). Some of these figures are based on a mix of the two modes of visualization with appropriate clipping planes, sometimes together with the heme and the ligand.

**Figure 1.** The two basic modes of visualization of CCCPP (images from [69]). The target atom is the iron of the heme group (in red). (**Left**) Superposition of the networks of channels of two complexes of CYP3A4, PDB codes 1TQN and 2V0M, respectively, in green and in brown, computed at CV*lim* 6 Å and 7 Å. The edges are those of the facial graph of the pockets and channels: They show the location of the voids in the CYP (it is why most of them lie at the surface of the CYP). (**Right**) The channels 2a (in brown), 2f (in purple) and S (in blue) computed by CCCPP in the complex 4K9U of CYP3A4, a,t respectively, CV = 5.75 Å, 6.25 Å and 6.75 Å. The edges are those of the tetrahedra:They show the boundaries of the channels (they are inside the CYP).

### **3. Results and Discussion**
