*Article* **On the Emergence of the Deviation from a Poisson Law in Stochastic Mathematical Models for Radiation-Induced DNA Damage: A System Size Expansion**

**Francesco Giuseppe Cordoni**

Department of Civil, Environmental and Mechanical Engineering, University of Trento, 38123 Trento, Italy; francesco.cordoni@unitn.it

**Abstract:** In this paper, we study the system size expansion of a stochastic model for radiationinduced DNA damage kinetics and repair. In particular, we characterize both the macroscopic deterministic limit and the fluctuation around it. We further show that such fluctuations are Gaussiandistributed. In deriving such results, we provide further insights into the relationship between stochastic and deterministic mathematical models for radiation-induced DNA damage repair. Specifically, we demonstrate how the governing deterministic equations commonly employed in the field arise naturally within the stochastic framework as a macroscopic limit. Additionally, by examining the fluctuations around this macroscopic limit, we uncover deviations from a Poissonian behavior driven by interactions and clustering among DNA damages. Although such behaviors have been empirically observed, our derived results represent the first rigorous derivation that incorporates these deviations from a Poissonian distribution within a mathematical model, eliminating the need for specific ad hoc corrections.

**Keywords:** biophysical modeling; radiation-induced DNA damage; system size expansion; DNA damage repair

**Citation:** Cordoni, F.G. On the Emergence of the Deviation from a Poisson Law in Stochastic Mathematical Models for Radiation-Induced DNA Damage: A System Size Expansion. *Entropy* **2023**, *25*, 1322. https://doi.org/10.3390/ e25091322

Academic Editor: Pavel Kraikivski

Received: 18 July 2023 Revised: 2 September 2023 Accepted: 5 September 2023 Published: 11 September 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

### **1. Introduction**

Radiotherapy is one of the most effective and used cancer treatment modalities [1]. Traditionally, radiotherapy relies on photons; however, in recent decades, there has been a growing interest in advanced radiotherapy using ion beams. Ion beams offer several advantages over photons [2], particularly their ability to release energy in a highly localized manner within tissues, potentially leading to a more effective biological response with reduced collateral effects in healthy tissues. Extensive research of the scientific community has focused on studying the effects of radiation on biological tissue, with DNA being identified as the most vulnerable target for radiation-induced damage leading to cell death [3]. Despite the theoretical advantages of using ion beams, further research is necessary to integrate this treatment modality into clinical practice fully. One significant challenge in the widespread adoption of ion beams lies in accurately determining their biological effects, as this is crucial for prescribing the most suitable treatment. Over the years, mathematical models have been developed to understand and predict the biological impact of ions on tissue, particularly in relation to DNA *Double Strand Breaks* (DSB) [4–10]. These mathematical approaches aim to describe the formation, progression, and clustering of DSBs, ultimately striving to predict the cell survival probability following radiation exposure.

Despite the inherently stochastic nature of biological pathways, most existing mathematical models, until now, have relied on deterministic frameworks with a priori assumptions about the Poisson distribution and disregarded the stochastic fluctuations in energy deposition. These fluctuations occur from cell to cell, particularly in complex radiation environments. In particular, the *Microdosimetric Kinetic Model* (MKM) [5,11], together with the *Local Effect Model* (LEM) [12,13], is one of the only two mechanistic models used in

clinical treatment planning, which describes the temporal evolution of the average number of DNA damages in a single cell nucleus to obtain a linear–quadratic survival probability starting from energy deposition at the micron scale, in the order of the cell nucleus. A funding assumption of the linear–quadratic relation between dose and the corresponding survival probability is the Poissonian distribution for DNA damage. Evidence has emerged that in certain situations, such as for heavy ions or at high doses, the Poissonian assumption does not hold. Therefore, in recent years, the community tried to overcome this assumption with some *ad hoc* correction terms related to intercellular damage interaction [4,11,14–17]. Though attempts have been made to address the limitations of the Poissonian assumption, a stochastic representation that encompasses the spatial and temporal aspects of dose deposition was lacking in the description of radiation-induced DNA damage formation and dynamics.

Recently, a series of papers addressed these issues, and the *Generalized Stochastic Microdosimetric Model* (GSM<sup>2</sup> ) [10,18–21] has been introduced. GSM<sup>2</sup> is a probabilistic model able to describe the time evolution of the DNA damage in a cell nucleus based on a differential equation governing the time evolution of the probability distribution of the number of DNA damages. Among the most relevant GSM<sup>2</sup> strengths, there is the capability to efficiently treat the several levels of spatiotemporal stochasticity happening during protracted irradiation without relying on the typically used Poissonian assumption on the number of DNA damages induced by radiation [19]. It is further described in [10,18] how different parameters, initial DNA damage distribution, or irradiation conditions can lead naturally to several possible probability distributions that can be significantly different from the typically assumed Poissonian law.

The main *master equation* governing the time evolution of the probability distribution of the number of DNA damages derived in [10] is non-linear due to the presence of a quadratic term that accounts for DNA clustering, which has been recognized as one of the main factors that leads to cell inactivation in radiobiology [7]. On the one side, this quadratic term plays a crucial role in the emergence of non-Poissonian behaviors; on the other side, it makes it difficult to obtain an explicit solution for the probability distribution of the number of DNA damages.

In this study, we present a system size expansion of the GSM<sup>2</sup> master equation based on the pioneering work [22]. The approximation that will be carried out in the present work is usually referred to in the literature as *system size expansion* [22,23], and it is widely used in the physics community to provide an appropriate macroscopic approximation of *microscopic* systems. As for all formal expansions, a suitable parameter is needed, around which the approximation is performed. In concrete applications, the domain size usually provides a suitable parameter to carry out a formal approximation, so that an asymptotic expansion of the main GSM<sup>2</sup> *master equation* will be carried out as the system size increases. It must be stressed that the approximation is generally valid as the number of lesions increases [23], so that the approximation derived in the present work provides a relevant description for high-dose irradiation, where the number of lesions even in small domains is high. Such a case is of particular relevance, since most of the existing models fail to give a precise description of the cell survival probability at high doses [4], and suitable correction terms are needed to match experimental data.

We will derive an asymptotic expansion for the GSM<sup>2</sup> master equation computing both the macroscopic limit and the fluctuation around such a macroscopic limit. Besides allowing us to calculate an approximate distribution for the number of DNA damage, the expansion derived in the present work provides further insights into the relationship between stochastic and deterministic mathematical models, already highlighted in previous works [18,19]. Having in mind the above-mentioned regimes of validity for the proposed expansion, we will further strengthen the connection between GSM<sup>2</sup> and the MKM, showing that as the system size increases, the *master equation* derived in [10] converges toward the main deterministic kinetic equations of the MKM [4,5]. This emphasizes how deterministic macroscopic behavior emerges from stochastic microscopic fluctuations. We will go a

step further so that we will also prove a suitable central limit theorem, in the sense that we will characterize the stochastic fluctuations around the macroscopic average value. These fluctuations are usually assumed to be Poissonian. By contrast, we will show that such fluctuation described by the macroscopic approximation of GSM<sup>2</sup> are Gaussian-distributed, as described by a *linear Fokker–Planck equation* (FPE) [23]. Recalling that for large mean value *λ*, a Poisson random variable of mean *λ* is approximated in a probabilistic sense by a Gaussian random variable with mean and variance *λ*, we will show that the derived limiting model can be seen as a correction around a Poisson distribution due to the clustering of lesions. In this sense, the present work shows how typical non-Poissonian correction terms of the MKM that have been proposed over the years naturally emerge in the fully probabilistic description of the GSM<sup>2</sup> . Lastly, it is worth stressing that the present paper sheds light on another possible future connection to existing radiobiological models. In fact, in the literature, some models have been derived that attempt to describe DNA damage at high doses using a Gaussian formulation of a multi-hit model (MHM) [24,25]. It has already been shown in [19] that GSM<sup>2</sup> is closely connected to some multi-hit models [26,27], so the results derived in the present paper further connect GSM<sup>2</sup> to the multi-hit models derived in the literature.

The main contributions of the present paper are:


The present work is structured as follows: Section 2.1 recalls the basic facts of GSM<sup>2</sup> . Section 3.1 contains the main result of the present research, with the formal asymptotic expansion and a rigorous description of the stochastic fluctuations around the macroscopic average value. Section 4 provides some numerical results on the derived approximation.

### **2. Material and Methods**

### *2.1. The Generalized Stochastic Microdosimetric Model and the Microdosimetric Kinetic Model*

GSM<sup>2</sup> [10] is a stochastic model that provides a probabilistic description of DNA damage formation and evolution, with particular attention to the link to DNA damage formation and energy deposition. The final goal of the model is to overcome existing models mainly based on the Poissonian assumption of energy deposition to provide a better characterization of some relevant biological endpoints such as the cell survival fraction.

GSM<sup>2</sup> considers two types of DNA damage, called sub-lethal and lethal lesions. Lethal lesions *Y* represent damage that cannot be repaired, leading to cell inactivation. By contrast, sublethal lesions *X* can be repaired at rate *r* or become a lethal lesion either by direct death at rate *a* or interacting with another sublethal lesion at rate *b*. This latter term *b* accounts for the clustering of DNA damage and gives rise to a nonlinearity in the governing master equation.

Denoted by (*Y*(*t*), *X*(*t*)) is the state of the system at time *t*, where *X* and *Y* are two N−valued random variables counting the number of the lethal and sub-lethal lesion, respectively. In the following, we will consider a standard complete filtered probability space <sup>Ω</sup>, <sup>F</sup>,(F*t*)*t*≥<sup>0</sup> , P satisfying usual assumptions, namely right-continuity and saturation by P-null sets.

The above reasoning can be represented by the following pathways:

$$\begin{aligned} \mathcal{X} & \xrightarrow{r} \mathcal{O}\_{\prime} \\ \mathcal{X} & \xrightarrow{a} \mathcal{Y}\_{\prime} \end{aligned}$$

$$\mathcal{X} + \mathcal{X} \xrightarrow{b} \mathcal{Y}\_{\prime}$$

for three positive constant rates, *r*, *a*, and *b*.

In what follows, we denote by:

$$p\_{t\_0 y\_0, \mathbf{x}\_0}(t, y, \mathbf{x}) \coloneqq p(t, y, \mathbf{x} | t\_0, y\_0, \mathbf{x}\_0) = \mathbb{P}(\left(Y(t), X(t)\right) = (y, \mathbf{x}) | (Y(t\_0), X(t\_0)) = (y\_0, \mathbf{x}\_0)) \,.$$

If no confusion is possible, we will avoid stating the initial time and state, writing for short *p*(*t*, *y*, *x*).

Following [10], the *microdosimetric master equation* (MME) can be derived:

$$\begin{split} \frac{\partial}{\partial t} p(t, y, \mathbf{x}) &= -[(a+r)\mathbf{x} + b\mathbf{x}(\mathbf{x}-1)]p(t, y, \mathbf{x}) + (\mathbf{x}+1)rp(t, y, \mathbf{x}+1) + \\ &+ (\mathbf{x}+1)ap(t, y-1, \mathbf{x}+1) + (\mathbf{x}+2)(\mathbf{x}+1)bp(t, y-1, \mathbf{x}+2) .\end{split} \tag{1}$$

The MME (1) can be written for short as:

$$\begin{split} \frac{\partial}{\partial t} p(t, y, \mathbf{x}) &= \left(E^{-1,2} - 1\right) [\mathbf{x}(\mathbf{x} - \mathbf{1}) b p(t, y, \mathbf{x})] + \left(E^{-1,1} - 1\right) [\mathbf{x} a p(t, y, \mathbf{x})] + \\ &+ \left(E^{0,1} - 1\right) [\mathbf{x} r p(t, y, \mathbf{x})] = \\ &= \mathcal{E}^{-1,2} [\mathbf{x}(\mathbf{x} - \mathbf{1}) b p(t, y, \mathbf{x})] + \mathcal{E}^{-1,1} [\mathbf{x} a p(t, y, \mathbf{x})] + \mathcal{E}^{0,1} [\mathbf{x} r p(t, y, \mathbf{x})] \end{split} \tag{2}$$

where above we have denoted the creation operators as:

$$\mathcal{E}^{i,j}[f(y,\mathbf{x})] := \left(E^{i,j} - \mathbf{1}\right)[f(y,\mathbf{x})] := f(y+i,\mathbf{x}+j) - f(y,\mathbf{x})\dots$$

The above MME (2) is coupled with an initial distribution:

$$p\_0(y, \mathbf{x}) := p(0, y, \mathbf{x})\,,$$

as described in [10,18].

**Remark 1.** *The above choice is made to closely follow the existing literature on the topic. However, other choices for the pathways r, a, and b can be made. For instance, it could be assumed that the death rate a is logistic, including an increment in the death as the number of lesions becomes bigger. In such a case, the MME would become:*

$$\frac{\partial}{\partial t}p(t,y,\mathbf{x}) = \mathcal{E}^{-1,2}[\mathbf{x}(\mathbf{x}-1)bp(t,y,\mathbf{x})] + \mathcal{E}^{-1,1}[\mathbf{x}(a+\overline{a}\mathbf{x})p(t,y,\mathbf{x})] + \mathcal{E}^{0,1}[\mathbf{x}rp(t,y,\mathbf{x})].\tag{3}$$

It has been shown in [10] that GSM<sup>2</sup> is closely connected to the MKM, where, in fact, the former represents a stochastic reformulation of the latter. The MKM postulates the same assumptions of GSM<sup>2</sup> with two key additions. First, the MKM considers the time evolution for the average number of lethal lesions *y*¯ and sublethal lesions *x*¯, and second, *y*¯ is assumed to be Poissonian-distributed.

In particular, the MKM assumes the *y*¯ and *x*¯ follows the set of coupled ODE:

$$\begin{cases} \frac{d}{dt}\mathfrak{z}(t) = a\mathfrak{x}(t) + b\mathfrak{x}^2(t),\\ \frac{d}{dt}\mathfrak{x}(t) = -(a+r)\mathfrak{x}(t) - 2b\mathfrak{x}^2(t) \,. \end{cases} \tag{4}$$

Typically, it is further assumed that (*a* + *r*)*x* >> 2*bx*<sup>2</sup> , so that Equation (4) reduces to:

$$\begin{cases} \frac{d}{dt}\tilde{y}(t) = a\mathfrak{x}(t) + b\mathfrak{x}^2(t) \\ \frac{d}{dt}\tilde{\mathfrak{x}}(t) = -(a+r)\mathfrak{x}(t) \,. \end{cases} \tag{5}$$

The connection between the MME (2) and the system of ODE (4) has already been shown in [10,18], and this connection will be further deepened in the present paper.

### **3. Theory and Calculations**

### *3.1. Macroscopic Description for the GSM*<sup>2</sup>

In the present Section, we will derive a rigorous expansion that provides a macroscopic linear approximation counter-part of the master equation derived in [10]. The following expansion is at the basis of a macroscopic deterministic description of a microscopic stochastic system.

In the following, we will assume that the coefficients of the master Equation (2) depend on a parameter K in a suitable manner; namely, they are of order O(1) with respect to the parameter K; under this assumption, we are able to characterize the limit for the master Equation (2) as K → ∞. As mentioned in the introduction, the typical approach is to consider K as the system size, from which the name *system size expansion* is derived.

We will prove both the convergence of the microscopic system towards a macroscopic mean value, which corresponds to *the law of large numbers*, and also provide a description for the fluctuations of the system around such a mean value. This description of the fluctuations allows us to describe the system in terms of an FPE so that we will show that the order of the fluctuation is not Poissonian, as typically assumed in most of the existing literature on the subject. In this sense, the current research highlights how GSM<sup>2</sup> provides a rigorous non-Poissonian correction to the MKM.

In the following, in order to carry out the expansion, we assume that the parameter *b* depends on the K as ˜*b*<sup>K</sup> := *<sup>b</sup>* K . Therefore, the MME (2) now becomes:

$$\frac{\partial}{\partial t}p(t,y,\mathbf{x}) = \mathcal{E}^{-1,2}\left[\mathbf{x}(\mathbf{x}-1)\mathbf{\tilde{b}}\_{\mathbf{K}}p(t,y,\mathbf{x})\right] + \mathcal{E}^{-1,1}\left[\mathbf{x}ap(t,y,\mathbf{x})\right] + \mathcal{E}^{0,1}\left[\mathbf{x}rp(t,y,\mathbf{x})\right].\tag{6}$$

The main idea that will be carried out in the current section is to let K → ∞ in order to approximate the master equation by a continuous equation. The first-order approximation will satisfy a linear FPE, whose marginals, under some specific initial distribution to be better specified in later sections, can be shown to be Gaussian-distributed. The derived Gaussian approximation for the lesion distribution will be shown to provide better insights than the classical Poissonian hypothesis regarding lethal damage.

In order to prove the expansion, we set:

$$\begin{aligned} \mathbf{X}(t) &= \mathbf{K}\mathbf{\tilde{x}}(t) + \sqrt{\mathbf{K}}\mathbf{\tilde{y}}(t) \, \, \, \\ \mathbf{Y}(t) &= \mathbf{K}\mathbf{\tilde{y}}(t) + \sqrt{\mathbf{K}}v(t) \, \, \, \end{aligned} \tag{7}$$

with (*ξ*(*t*))*t*≥<sup>0</sup> and (*υ*(*t*))*t*≥<sup>0</sup> being two stochastic processes, so that for any *t* ≥ 0, *ξ*(*t*) and *υ*(*t*) are two centered random variables, i.e., with null mean value, whereas *x*¯(*t*) and *y*¯(*t*) are two suitable deterministic functions to be derived later. Heuristically speaking, *x*¯ and *y*¯ will play the role of the macroscopic deterministic behavior, which we will show to agree with the differential equations governing the MKM, as given in (4). Therefore, the above assumptions can be interpreted intuitively as an expansion of variables *x* and *y* around the macroscopic behavior, whereas the terms *ξ* and *υ* represent the fluctuations around the mean value.

It is worth noting that the following holds true:

$$\begin{aligned} \mathbb{E}[X(t)] &= \mathbb{K}\ddot{\mathbf{x}}(t) + \sqrt{\mathbb{K}}\mathbb{E}[\xi(t)] \text{ , } \\ \mathbb{E}[Y(t)] &= \mathbb{K}\ddot{y}(t) + \sqrt{\mathbb{K}}\mathbb{E}[\upsilon(t)] \text{ , } \\ \mathbb{V}ar[X(t)] &= \mathbb{K}\mathbb{V}ar[\xi(t)] \text{ , } \\ \mathbb{V}ar[Y(t)] &= \mathbb{K}\mathbb{V}ar[\upsilon(t)] \text{ .} \end{aligned} \tag{8}$$

Define the new distribution with respect to the new variables as:

$$p(t, y, \mathbf{x}) = p\left(t, \mathbf{K}\mathfrak{x} + \sqrt{\mathbf{K}}\mathfrak{f}, \mathbf{K}\mathfrak{y} + \sqrt{\mathbf{K}}\upsilon\right) = P(t, \upsilon, \mathfrak{f}) \dots$$

The standard chain rule applied to *P*(*t*, *υ*, *ξ*) yields:

$$\frac{\partial}{\partial t}p(t,y,\boldsymbol{x}) = \frac{\partial}{\partial t}P(t,\boldsymbol{\upsilon},\boldsymbol{\xi})\left(1 + \frac{\partial}{\partial t}\boldsymbol{\upsilon} + \frac{\partial}{\partial t}\boldsymbol{\xi}\right)\boldsymbol{\nu}$$

so that inverting transformation (7) for *υ* and *ξ* gives:

$$\frac{\partial}{\partial t}p(t,y,\mathbf{x}) = \frac{\partial}{\partial t}P(t,\upsilon,\xi) - \sqrt{\mathbf{K}}\frac{d}{dt}\bar{y}\frac{\partial}{\partial \upsilon}P(t,\upsilon,\xi) - \sqrt{\mathbf{K}}\frac{d}{dt}\bar{x}\frac{\partial}{\partial \xi}P(t,\upsilon,\xi) \ .$$

Regarding the step operators appearing in the MME (6), it can be shown that the following holds true:

$$\mathcal{E}^{i,j} = \frac{1}{\sqrt{\mathbf{K}}} \left( i\frac{\partial}{\partial \upsilon} + j\frac{\partial}{\partial \tilde{\xi}} \right) + \frac{1}{2}\frac{1}{\mathbf{K}} \left( i\frac{\partial}{\partial \upsilon} + j\frac{\partial}{\partial \tilde{\xi}} \right)^2 \dots$$

The above computations substituted into Equation (6) yields:

*∂ ∂t <sup>P</sup>*(*t*, *<sup>υ</sup>*, *<sup>ξ</sup>*) = <sup>√</sup> K *d dt y*¯ *∂ ∂υ <sup>P</sup>*(*t*, *<sup>υ</sup>*, *<sup>ξ</sup>*) + <sup>√</sup> K *d dt x*¯ *∂ ∂ξ P*(*t*, *υ*, *ξ*)+ + *b* K 1 √ K 2 *∂ ∂ξ* − *∂ ∂υ* K*x*¯ + √ K*ξ* K*x*¯ <sup>+</sup> √ K*ξ* − 1 *P*(*t*, *υ*, *ξ*)+ + *b* K " 1 2 1 K 2 *∂ ∂ξ* − *∂ ∂υ* <sup>2</sup> # K*x*¯ + √ K*ξ* K*x*¯ <sup>+</sup> √ K*ξ* − 1 *P*(*t*, *υ*, *ξ*)+ + *a* " 1 √ K *∂ ∂ξ* − *∂ ∂υ* + 1 2 1 K *∂ ∂ξ* − *∂ ∂υ* <sup>2</sup> # K*x*¯ + √ K*ξ P*(*t*, *υ*, *ξ*)+ + *r* " 1 √ K *∂ ∂ξ* + 1 2 1 K *∂* 2 *∂* 2 *ξ* # K*x*¯ + √ K*ξ P*(*t*, *υ*, *ξ*). (9)

Grouping the terms of order <sup>√</sup> K, we obtain:

$$\begin{split} 0 \sim \sqrt{\mathbf{K}} \, : \, \frac{d}{dt} \overline{\boldsymbol{\varrho}} \frac{\partial}{\partial \boldsymbol{\upsilon}} P(t, \boldsymbol{\upsilon}, \boldsymbol{\xi}) + \frac{d}{dt} \overline{\boldsymbol{\varepsilon}} \, \frac{\partial}{\partial \boldsymbol{\xi}} P(t, \boldsymbol{\upsilon}, \boldsymbol{\xi}) + a \left( \frac{\partial}{\partial \boldsymbol{\varepsilon}} - \frac{\partial}{\partial \boldsymbol{\upsilon}} \right) \boldsymbol{\varepsilon} P(t, \boldsymbol{\upsilon}, \boldsymbol{\xi}) + \\ + r \frac{\partial}{\partial \boldsymbol{\varepsilon}} \boldsymbol{\varepsilon} P(t, \boldsymbol{\upsilon}, \boldsymbol{\xi}) + 2b \boldsymbol{\varepsilon}^2 \frac{\partial}{\partial \boldsymbol{\varepsilon}} P(t, \boldsymbol{\upsilon}, \boldsymbol{\xi}) - b \boldsymbol{\varepsilon}^2 \frac{\partial}{\partial \boldsymbol{\upsilon}} P(t, \boldsymbol{\upsilon}, \boldsymbol{\xi}) \,. \end{split} \tag{10}$$

In order to compensate for the terms of order <sup>√</sup> K, we set the macroscopic system as:

$$\begin{cases} \frac{d}{dt}\bar{y} = a\bar{x} + b\bar{x}^2\\ \frac{d}{dt}\bar{x} = -(a+r)\bar{x} - 2b\bar{x}^2 \end{cases} \tag{11}$$

so that all terms or order <sup>√</sup> K in Equation (10) vanishes. Therefore, we have shown that the macroscopic limit of GSM<sup>2</sup> MME coincides with the main deterministic governing equation of the MKM (4).

Explicit solutions to system (11) can be derived with the further property that they are globally stable and converging to a stationary solution [19]. Consider first:

$$\frac{d}{dt}\bar{\mathbf{x}}(t) = -(a+r)\bar{\mathbf{x}}(t) - 2b\bar{\mathbf{x}}^2(t) \; , \quad \bar{\mathbf{x}}(0) = \mathbf{x}\_0 \; . \tag{12}$$

Such an equation (12) is known as the Bernoulli equation. Applying the transformation *u* = <sup>1</sup> *x*¯ leads to the following differential equation:

$$\frac{d}{dt}u(t) = (a+r)u(t) + 2b$$

This last equation is a linear equation in *u*, so the explicit solution is given by:

$$u(t) = ce^{(a+r)t} - \frac{2b}{a+r} \cdot t$$

Coming back to the original equation, we obtain:

$$\mathfrak{X}(t) = \frac{a+r}{ce^{(a+r)t} - 2b} \wedge$$

with

$$c := \frac{a+r}{x\_0} + 2b.$$

We can, therefore, substitute *x*¯ into Equation (11) to obtain:

$$\vec{y}(t) = y\_0 + \frac{a+r}{4b - 2ce^{t(a+r)}} + \frac{r}{2b} \left( (a+r)t - \log\left[ \left| 2b - ce^{t(a+r)} \right| \right] \right).$$

We can eventually calculate the long-time convergence toward the stationary solution of the above equations to be:

$$\begin{aligned} \lim\_{t \to \infty} \mathfrak{x}(t) &=: \mathfrak{x}\_{\infty} = 0 \\ \lim\_{t \to \infty} \mathfrak{y}(t) &=: \mathfrak{y}\_{\infty} = y\_0 - \frac{r}{2b} \log \left( \frac{a+r}{\varkappa\_0} + 2b \right). \end{aligned}$$

**Remark 2.** *It is worth remarking that, for low-dose and sparsely ionizing radiation, such as X-rays or high-energy protons, the following assumption typically holds true,* (*a* + *r*)*x* >> 2*bx*<sup>2</sup> *; therefore, the above calculations simplify so that the explicit solution to Equation* (11) *is given by [5,28]:*

$$\begin{cases} \overline{y}(t) = y\_0 + a \mathfrak{x}\_0 \left( \frac{1 - e^{-(a+r)t}}{a+r} \right) + b \mathfrak{x}\_0^2 \left( \frac{1 - e^{-2(a+r)t}}{a+r} \right) \\ \overline{x}(t) = \mathfrak{x}\_0 e^{-(a+r)t} . \end{cases} \tag{13}$$

*In particular, Equation* (13) *converges as t* → ∞ *towards:*

$$\lim\_{t \to \infty} \overline{y}(t) = y\_0 + \chi\_0 \left( \frac{a}{a+r} + \chi\_0 \frac{b}{a+r} \right), \quad \lim\_{t \to \infty} \overline{x}(t) = 0.1$$

### *3.2. The Linear Noise Approximation and Moments Estimates*

Having cancelled out terms of order <sup>√</sup> K, taking the limit as K → ∞, all terms containing K vanish and only the zero−*th* order terms remain, yielding:

*∂ ∂t P*(*t*, *υ*, *ξ*) = 2*b* 2 *∂ ∂ξ* − *∂ ∂υ* [*x*¯*ξP*(*t*, *υ*, *ξ*)] + 1 2 *b* 2 *∂ ∂ξ* − *∂ ∂υ* <sup>2</sup> h *x*¯ <sup>2</sup>*P*(*t*, *υ*, *ξ*) i + + *a ∂ ∂ξ* − *∂ ∂υ* [*ξP*(*t*, *υ*, *ξ*)] + *r ∂ ∂ξ* [*ξP*(*t*, *υ*, *ξ*)]+ + 1 2 *a ∂ ∂ξ* − *∂ ∂υ* <sup>2</sup> [*xP*¯ (*t*, *υ*, *ξ*)] + 1 2 *r ∂* 2 *∂* 2 *ξ* [*xP*¯ (*t*, *υ*, *ξ*)]+ = *∂ ∂ξ* [*ξP*(*t*, *υ*, *ξ*)](4*bx*¯ + *a* + *r*) − *∂ ∂υ* [*ξP*(*t*, *υ*, *ξ*)](2*bx*¯ + *a*)+ − *∂ξυP*(*t*, *υ*, *ξ*) 2*bx*¯ <sup>2</sup> + *ax*¯ + + 1 2 *∂* 2 *∂* 2 *ξ P*(*t*, *υ*, *ξ*) (*a* + *r*)*x*¯ + 4*bx*¯ 2 + 1 2 *∂* 2 *∂* 2 *υ P*(*t*, *υ*, *ξ*) *ax*¯ + *bx*¯ 2 . (14)

Equation (14) is a linear *Fokker–Planck* equation of dimension 2 that describes the fluctuations of the system around the average values *x*¯(*t*) and *y*¯(*t*). The solution to the linear FP Equation (14), under suitable initial conditions that will be specified later, can be shown to be the bi-dimensional Gaussian density.

Until now, we have avoided explicitly considering the initial condition both for the original MME (6) and for the approximating linear FPE (14).

As shown in [18], much of the stochasticity regarding lesion formation lies in the initial condition, in the sense that the distribution of initial lethal and sub-lethal damage deeply affects the subsequent time evolution of the probability density function. We will avoid an extensive treatment of such a topic in the present paper and focus more on the stochasticities inherent to the kinetics of the interaction of DNA damages, considering instead two simple and yet relevant cases for the initial damage distribution.

Let us start by assuming that the initial number of lesions is deterministic and is given by (*y*0, *x*0). We, therefore, equip the MME (6) with a deterministic initial condition given by:

$$p(0, y, \mathbf{x}) = \delta(\mathbf{x} - \mathbf{x}\_0)\delta(y - y\_0)\mathbf{x}$$

with *δ*(*x* − *x*0) and *δ*(*y* − *y*0) being the Dirac delta centered at *x*<sup>0</sup> and *y*0, respectively. It can be shown that [23,29] the solution to the linear FPE (14) is given by a bivariate Gaussian distribution:

$$P(t, \upsilon, \xi) = \frac{1}{2\pi} \frac{1}{\sqrt{\det \mathbb{C}}} \exp\left\{ -\frac{1}{2} (\upsilon - \bar{\upsilon}, \xi - \bar{\xi})^T \mathbb{C}^{-1} (\upsilon - \bar{\upsilon}, \xi - \bar{\xi}) \right\},$$

where *υ*¯ and ¯*ξ* are the mean values and *C* is the covariance matrix with entries:

$$\mathbf{C} = \begin{pmatrix} c\_{vv} & c\_{\xi v} \\ c\_{\xi v} & c\_{\xi \xi} \end{pmatrix}.$$

where *cυυ*, resp. *cξυ*, and resp. *cξξ* are the variance of *υ*, resp. covariance of *ξ* and *υ*, and resp. variance of *ξ*. It is worth stressing that, given the properties of the multivariate Gaussian distributions, *ξ* and *υ* are univariate Gaussian random variables.

Upon the multiplication of Equation (14) by *ξ* and *υ*, it follows after integrating by parts that the first moment of *ξ* and *υ* satisfies:

$$\begin{aligned} \frac{d}{dt}\overline{v} &= (2b\overline{x} + a)\overline{\xi} \, , \quad \overline{v}(0) = 0 \, , \\ \frac{d}{dt}\overline{\xi} &= -(4b\overline{x} + a + r)\overline{\xi} \, , \quad \overline{\xi}(0) = 0 \, . \end{aligned} \tag{15}$$

It immediately follows from Equation (15) that:

$$\xi(t) = \mathfrak{v}(t) \equiv \mathbf{0}\dots$$

This result is in agreement with the fact that *ξ* and *υ* are centered random variables.

Multiplying Equation (14) by *ξ* 2 , *ξυ*, and *υ* 2 , we obtain, again after integration by parts, that the variance and covariance satisfy the following set of coupled ODEs:

$$\begin{cases} \frac{d}{dt}c\_{\upsilon\upsilon} &= 2(2b\bar{\mathfrak{x}} + a)c\_{\xi\upsilon} + a\bar{\mathfrak{x}} + b\bar{\mathfrak{x}}^2, \\ \frac{d}{dt}c\_{\xi\upsilon} &= (2b\mathfrak{x} + a)c\_{\xi\xi} - (4b\mathfrak{x} + a + r)c\_{\xi\upsilon} - \left(2b\mathfrak{x}^2 + a\mathfrak{x}\right), \\ \frac{d}{dt}c\_{\xi\xi} &= -2(4b\mathfrak{x} + a + r)c\_{\xi\sharp} + (a + r)\mathfrak{x} + 4b\mathfrak{x}^2, \end{cases} \tag{16}$$

with the initial condition *cυυ*(0) = *cξυ*(0) = *cξξ* (0) = 0. The last two equations in (16) can be computed to be:

$$\begin{cases} c\_{\xi\overline{\sigma}}(t) &= \frac{c^{2t(a+r)}\left( (a+r)\left( -4b^2c^{t(a-r)-} - 4bc(a+r) + c^2c^{t(a+r)} \right) + c(a+r)\left( 4b^2 - c^2 \right) + ac \right)}{\left( c c^{t(a+r)} - 2b \right)^4}, \\ c\_{\xi\overline{\sigma}}(t) &= \frac{c^{t(a+r)}\left( -\frac{a^{t(a+r)}\left( 2a^2(2bt+b) + a^2\left( -4b(b-2t-1) + c^2 - 1 \right) + r^2\left( 2b(-2b+2rt+1) + c^2 \right) \right)}{2t(a+r)\left( c^{t(a+r)} - 2b \right)} \right)}{\left( c c^{t(a+r)} - 2b \right)^2} \\ &+ \frac{c^{t(a+r)}\left( \frac{c^2\mathcal{I}^{2t(a+r)}\left( -4b^2 + 4bt(a+r)^2 + a^2 - a - 4b^2r + c^2r \right)}{4t\left( c^{t(a+r)} - 2b \right)^2} + 2br^{t(-a-r)} - t(a+r)(a-r) + c\_{\xi v} \right)}{\left( c c^{t(a+r)} - 2b \right)^2} \end{cases} \tag{17}$$

with *cξυ* a suitable constant to ensure the initial condition. It can be seen that it holds:

$$\lim\_{t \to \infty} c\_{\tilde{\xi}\tilde{\xi}}(t) = \lim\_{t \to \infty} c\_{\tilde{\xi}\upsilon}(t) = 0 \dots$$

In particular, we are mostly concerned with the term *cυυ* and with its stationary solution, as we aim to show that the distribution of lethal lesions differs from a Poisson distribution, as it is typically assumed in radiobiological models. It can, thus, be noticed that, integrating the third equation in (16), we obtain:

$$\begin{split} c\_{vv}(t) &= \int\_0^t \Big[ 2(2b\mathfrak{x}(s) + a)c\_{\mathfrak{z}\psi}(s) + a\mathfrak{x}(s) + b\mathfrak{x}^2(s) \Big] ds = \\ &= \mathfrak{y}(t) + \int\_0^t 2(2b\mathfrak{x}(s) + a)c\_{\mathfrak{z}\psi}(s)ds = \mathfrak{y}(t) - \delta(t) \end{split} \tag{18}$$

with *y*¯(*t*) being the mean value for lethal lesions, as computed in Equation (11), and:

$$\delta(t) := -\int\_0^t \mathbf{2}(2b\mathbf{\tilde{x}}(s) + a)c\_{\tilde{\xi}\upsilon}(s)ds\dots$$

The negative sign in *δ* is used to emphasize that the covariance is, in fact, negative, since a decrease in sublethal lesions correlates with an increase in lethal lesions.

The long time behaviour for *cυυ* can be explicitly computed using Equation (18) to be:

$$\lim\_{t \to \infty} c\_{vv}(t) =: \mathfrak{z}\_{\infty} - \delta\_{\infty} \tag{19}$$

with *y*¯<sup>∞</sup> being the long-time solution to the macroscopic average value *y*¯(*t*).

Recalling that for a large mean value, a Poisson distribution can be approximated by a Gaussian random variable with equal mean and variance, in order to infer that the lethal lesion distribution obeys a Poisson random variable, we must obtain lim*t*→<sup>∞</sup> *cυυ*(*t*) = *y*¯∞. By contrast, the above calculations show that the variance is given by the average value corrected by a term given by the covariance of two types of lesions. In particular, as there is a negative correlation between the two variables, we can infer that the lethal lesion distribution is almost a Poisson random variable, where the variance is adjusted by subtracting a term due to pairwise interactions.

### Moments Estimates for a Stochastic Initial Condition

In general, we cannot expect the initial number of lesions to be deterministic, so previous arguments must be slightly modified.

To explicitly compute the marginal distribution for the solution to the linear Fokker– Planck Equation (14), we assume the initial distribution for the MME to be normally distributed with mean (*y*0, *x*0) and variance Σ. It is worth remarking that such an assumption is not restrictive, as the standard assumption for the initial condition is to be a Poisson random variable, which, as mentioned above, under certain assumptions, can be approximated by a Gaussian random variable.

In particular, we assume that the initial number of lethal and sublethal lesions follows a Gaussian random variable with mean and variance given by *x*<sup>0</sup> and *y*0:

$$p(0, y, \mathbf{x}) = \frac{1}{2\pi} \frac{1}{\sqrt{\det\Sigma}} \exp\left\{-\frac{1}{2} (y - y\_0, \mathbf{x} - \mathbf{x}\_0)^T \Sigma^{-1} (y - y\_0, \mathbf{x} - \mathbf{x}\_0) \right\},\tag{20}$$

.

with

$$
\Sigma = \begin{pmatrix} x\_0 & 0 \\ 0 & y\_0 \end{pmatrix},
$$

Similar arguments as above imply that the initial condition for the linear FPE (14), under Equation (20), becomes a centered Gaussian random variable:

$$P(0, v, \xi) = \frac{1}{2\pi} \frac{1}{\sqrt{\det \Sigma}} \exp\left\{ -\frac{1}{2} (v, \xi)^T \Sigma^{-1} (v, \xi) \right\}.$$

Therefore, the initial fluctuations around the mean value are Gaussian-distributed with a null average.

Therefore, all calculations above follow alike, implying that, again, the solution to the linear Fokker-Planck Equation (14) is given by:

$$P(t, \upsilon, \xi) = \frac{1}{2\pi} \frac{1}{\sqrt{\det \mathbf{C}}} \exp\left\{ -\frac{1}{2} (\upsilon, \xi)^T \mathbf{C}^{-1} (\upsilon, \xi) \right\} \,\prime$$

where now the covariance matrix incorporates the initial stochastic condition so that its entries satisfy the following set of differential equations:

$$\begin{cases} \frac{\partial}{\partial t} c\_{\upsilon \upsilon} &= 2(2b\bar{\mathbf{x}} + a)c\_{\xi \upsilon} + a\bar{\mathbf{x}} + b\bar{\mathbf{x}}^2, \quad c\_{\upsilon \upsilon}(0) = y\_0 \\ \frac{\partial}{\partial t} c\_{\xi \upsilon} &= (2b\bar{\mathbf{x}} + a)c\_{\xi \overline{\xi}} - (4b\bar{\mathbf{x}} + a + r)c\_{\xi \upsilon} - 2\left(4b\bar{\mathbf{x}}^2 + 2a\bar{\mathbf{x}}\right), \quad c\_{\xi \upsilon}(0) = 0, \\ \frac{\partial}{\partial t} c\_{\xi \overline{\xi}} &= -2(4b\bar{\mathbf{x}} + a + r)c\_{\xi \overline{\xi}} + (a + r)\bar{\mathbf{x}} + 2b\bar{\mathbf{x}}^2, \quad c\_{\xi \overline{\xi}}(0) = \mathbf{x}\_0. \end{cases} \tag{21}$$

Analogously to what is shown at the end of Section 3.2, the variance of lethal lesion obeys:

$$c\_{vv}(t) = \mathfrak{z}(t) - \delta(t) \, . $$

with *y*¯(*t*) the average deterministic value and:

$$\delta(t) = -\int\_0^t \mathbf{2}(2b\mathfrak{x}(s) + a)c\_{\mathfrak{F}\boldsymbol{\upsilon}}(s)ds\,\boldsymbol{\nu}$$

so that, again, the variance for the lethal lesion is given by the macroscopic mean corrected by a covariance term.

**Remark 3.** *The solution to the linear FPE* (14) *can be shown [29] to be the probability density function of the time-dependent Ornstein–Uhlenbeck (OU) process, defined as:*

$$dZ(t) = -A(t)Z(t)dt + Q(t)dW(t), \quad Z(0) = z\_0. \tag{22}$$

*with W a bidimensional standard Brownian motion, Z* = (*υ*, *ξ*)*, and z*<sup>0</sup> *a bivariate Gaussian random variable with mean* (*x*0, *y*0) *and variance:*

$$
\Sigma = \begin{pmatrix} x\_0 & 0 \\ 0 & y\_0 \end{pmatrix}.
$$

*Additionally:*

$$\begin{aligned} A(t) &= \begin{pmatrix} 0 & -2b\bar{\mathfrak{x}}(t) - a \\ 0 & 4b\bar{\mathfrak{x}}(t) + a + r \end{pmatrix}, \\ Q(t) &= \begin{pmatrix} \sqrt{a\bar{\mathfrak{x}}(t) + b\bar{\mathfrak{x}}^2(t) + \frac{\left(a\bar{\mathfrak{x}}(t) + b\bar{\mathfrak{x}}^2(t)\right)^2}{(a+r)\mathfrak{x}(t) + 4b\bar{\mathfrak{x}}^2(t)}} & -\frac{a\mathfrak{x}(t) + b\bar{\mathfrak{x}}^2(t)}{\sqrt{(a+r)\mathfrak{x}(t) + 4b\bar{\mathfrak{x}}^2(t)}} \\ 0 & \sqrt{(a+r)\bar{\mathfrak{x}}(t) + 4b\bar{\mathfrak{x}}^2(t)} \end{pmatrix}. \end{aligned}$$

*By simulating various trajectories of the OU process described in Equation* (22)*, we can effectively estimate the solution to the FPE given by Equation* (14)*. Furthermore, it is important to note that the boundary x* = 0 *acts as an absorbing boundary in the original GSM*<sup>2</sup> *. This implies that once the number of sublethal lesions reaches zero, it remains at zero. Therefore, to guarantee the positivity of the solution to the FPE defined in Equation* (14)*, it is necessary to impose a similar boundary condition on the OU process in Equation* (22)*. This ensures that the number of lethal lesions remains positive and is absorbed at zero upon reaching the boundary.*

**Remark 4.** *It has been shown in [23,30] that a different and yet related Fokker–Planck equation can be obtained without any truncation at first order. In fact, if the above assumption on b holds true, then the master Equation* (2)*, following [23] (Chapter 7.5), can be expanded as:*

$$\frac{\partial}{\partial t}p(t,y,\mathbf{x}) = -\sum\_{n}\sum\_{(i,j)} \frac{\left((i,j)\cdot\nabla\right)^{n}}{n!} \left[\mathbb{C}^{(i,j)}p(t,y,\mathbf{x})\right]\_{i}$$

*for a suitable term C*(*i*,*j*) *.*

*Truncating at the second order, we obtain the following Fokker–Planck equation:*

$$\begin{split} \frac{\partial}{\partial t} p(t, y, \mathbf{x}) &= -\sum\_{w = x, y} \frac{\partial}{\partial w} \left[ \sum\_{i, j} (i, j) \cdot \mathbb{C}^{(i, j)} p(t, y, \mathbf{x}) \right] + \frac{1}{2} \sum\_{w, q = x, y} \frac{\partial}{\partial w} \frac{\partial}{\partial q} \left[ i \mathbb{C}^{(i, j)} p(t, y, \mathbf{x}) \right] = \\ &= -\sum\_{w = x, y} \frac{\partial}{\partial w} [B(\mathbf{x}) p(t, y, \mathbf{x})] + \frac{1}{2} \sum\_{q = x, y} \sum\_{w = x, y} \frac{\partial}{\partial w} \frac{\partial}{\partial q} \left[ Q(\mathbf{x}) p(t, y, \mathbf{x}) \right], \end{split} \tag{23}$$

*where the above coefficients in Equation* (23) *are given explicitly by:*

$$\begin{aligned} B(\mathbf{x}) &= \begin{pmatrix} \mathbf{x}(\mathbf{x}-1)b + \mathbf{x}a \\ -2\mathbf{x}(\mathbf{x}-1)b - \mathbf{x}(a+r) \end{pmatrix}, \\ Q(\mathbf{x}) &= \begin{pmatrix} \mathbf{x}(\mathbf{x}-1)b + \mathbf{x}a & -2\mathbf{x}(\mathbf{x}-1)b - \mathbf{x}a \\ -2\mathbf{x}(\mathbf{x}-1)b - \mathbf{x}a & 4\mathbf{x}(\mathbf{x}-1)b + \mathbf{x}(a+r) \end{pmatrix}. \end{aligned}$$

*The connection between Equations* (14) *and* (23) *can be made rigorous by introducing the new variables:*

$$
\bar{\mathfrak{X}} := \frac{\mathfrak{x}}{\mathbb{K}} \,' \quad \bar{\mathfrak{Y}} := \frac{\underline{y}}{\mathbb{K}} \,'
$$

*into Equation* (23)*, to obtain:*

$$\frac{\partial}{\partial t}p(t,\vec{y},\vec{x}) = -\sum\_{w=g,\mathcal{X}} \frac{\partial}{\partial \boldsymbol{w}} \left[\mathcal{\boldsymbol{\mathcal{B}}}(\boldsymbol{\mathfrak{x}})p(t,\boldsymbol{y},\boldsymbol{x})\right] + \frac{1}{2\mathcal{K}} \sum\_{w,\boldsymbol{q}=g,\mathcal{X}} \frac{\partial}{\partial \boldsymbol{w}} \frac{\partial}{\partial \boldsymbol{q}} \left[\mathcal{\boldsymbol{\mathcal{Q}}}(\boldsymbol{\mathfrak{x}})p(t,\boldsymbol{y},\boldsymbol{x})\right].\tag{24}$$

*Performing a small-noise expansion [23,31,32], we can expand Equation* (24) *around e* := <sup>1</sup> K *, so that, using the new variables:*

$$
\mathfrak{F} = \sqrt{\mathbf{K}}(\mathfrak{x} - \mathfrak{x}(t)) \,, \quad \upsilon = \sqrt{\mathbf{K}}(\mathfrak{y} - \mathfrak{y}(t)) \,,
$$

*we can see that the small noise expansion to the lowest order does coincide with the linear FP Equation* (14)*.*

*The two expansions have different advantages and disadvantages. In fact, on the one side, considering the full expansion as described above, the nonlinearity of the system is preserved. However, on the other side, given the nonlinear diffusion term, the simulation of Equation* (23) *is more complicated. Further, since Equation* (14) *is linear, its solution can be computed analytically, showing that the process follows a Gaussian distribution.*

### **4. Results**

The present Section reports the implementation of the results derived in Section 3. Figure 1 shows a comparison between the distribution of sublethal lesions (top row) and lethal lesions (bottom row) for the solution to the MME (2) (histogram) and the solution to the FPE (14) (yellow line). Different columns refer to different times: the left column reports time 0.5 [a.u.], the central column reports time 0.7 [a.u], and the right column reports time 0.9 [a.u.]. The solution of the FPE has been centered around the average values *x*¯ and *y*¯, whereas the MME (2) is solved via the *stochastic simulation algorithms* (SSA) [33] (Chapter 13). A deterministic initial value of *x*<sup>0</sup> = 100 and *y*<sup>0</sup> = 0 has been prescribed. Further, GSM<sup>2</sup> parameters have been chosen to be *r* = 4, *a* = 0.1, ˜*b*<sup>K</sup> = 0.01; these parameters are in agreement with the parameters typically used [20]. A good agreement can be seen between the system size approximation and the original solution to the MME, particularly for lethal lesions. The approximation, as expected, shows a small discrepancy in the case of sublethal lesions at higher times, since the solution is closer to 0.

**Figure 1.** Sublethal lesion density evolution (**top row**) and lethal lesion density evolution (**bottom row**) as predicted by GSM<sup>2</sup> (histogram) and the linear noise approximation (yellow line). The left column reports time equal to 0.5 [a.u.], the central column reports time equal to 0.7 [a.u], and the right column reports time equal to 0.9 [a.u.].

Figure 2 shows the time evolution for the moments Equations (11)–(16): in yellow is the solution to the average number of sublethal lesions *x*¯, whereas in blue is the average number of lethal lesions *y*¯. In black is depicted the covariance between lethal and sublethal lesions *cξυ*, in purple the variance of lethal lesions *cξξ* , and in red the variance of lethal lesions *cυυ*. Both the average and variance of lethal lesions converge to 0 for a long time. By contrast, the average and the variance of lethal lesions converge toward a strictly positive value, with the latter being strictly lower than the former. Additionally, the covariance is strictly negative and converges to 0 at long times.

**Figure 2.** Time evolution for the average number and variance of sublethal lesions (yellow and purple), the average number and variance of lethal lesions (blue and red), and the covariance of lethal and sublethal lesions (black).

Figure 3 shows a comparison of 10 path solutions of the average values (black), original GSM<sup>2</sup> formulation (yellow), and linear noise approximation (blue) for sublethal lesions (left panel) and lethal lesions (right panel). It can be seen how the approximation and the original GSM<sup>2</sup> formulation produce similar patterns, with the average values being in the middle of the stochastic solutions.

**Figure 3.** Comparison between path solutions to the GSM<sup>2</sup> (yellow), the linear noise approximation (blue), and the average value (black) of sublethal lesions (**left panel**) and lethal lesions (**right panel**).

### **5. Discussion**

In the present paper, we presented a linear noise approximation of a stochastic model for radiation-induced DNA damage repair and kinetics [10]. Such approximation is carried out by expanding around the system size so that that it holds true for a high number of particles, which can be approximated as a continuum. The fluctuations, for the number of particles sufficiently far from the origin, are predicted to be Gaussian-distributed. The importance of the result is twofold: (i) it allows for the fast computation and simulation of GSM<sup>2</sup> as certain, and (ii) it theoretically shows that the number of lethal lesions deviates from a Poisson distribution, as typically assumed in the vast majority of radiobiological models.

The results show a good agreement between the solution to the MME (2) and the linear noise approximation (14). This is particularly true when the number of lesions is far from the origin. In fact, in this situation, the description of GSM<sup>2</sup> as a continuum of lesions is not valid and discrepancies between the two representations emerge. This is, however, mitigated by equipping the linear FPE with a suitable boundary condition, preserving the positivity of the solution. Further, the main interest in the long-time distribution lies in the

distribution of lethal lesions, which could allow characterizing several relevant biological endpoints such as cell survival and cell killing. This implies that such approximation can be effectively used in several concrete applications even if it is typically difficult to estimate the experimental range in which the approximation proposed is valid.

The numerical solutions to the moments Equations (11)–(16) confirm the theoretical analysis performed in Section 3. In particular, the variance of the lethal lesions is strictly lower than the average; this, together with the Gaussianity of the distribution, implies a divergence from the Poissonianity of the number of lethal lesions. This is one of the first theoretically grounded results showing that a model can predict lethal lesions with non-Poisson distribution. Additionally, the covariance is strictly negative, since an increase in the number of lethal lesions can only be caused by a decrease in the number of sublethal lesions.

Finally, it is worth remarking that this paper furthers the investigation and comparison of diverse existing radiobiology models, revealing the underlying commonalities and shared perspectives among these approaches. The present paper shows the connections of GSM<sup>2</sup> to two other models proposed in the literature. Firstly, the main equations of the MKM arise formally within the context of GSM<sup>2</sup> , with, however, an extremely relevant difference in the fluctuations around the average values. This has been already noted in previous research [10,19]. Secondly, the incorporation of a Gaussian distribution, previously employed in radiobiology studies [24,25], emerges as a deviation from a Poisson distribution. As a result, the proposed model establishes a remarkably insightful link between two seemingly different radiobiological models: the MKM and the Gaussian formulation of a multi-hit model. Recognizing the significance of this subject, future research will be dedicated to further exploring the interconnections among diverse radiobiological models.

### **6. Conclusions**

The present research continues the investigation of how the stochastic nature of energy deposition affects DNA damage evolution and how this is, in turn, related to the overall probability distribution of the number of lethal and sublethal lesions. In [10], a *master equation* for the probability distribution of DNA damage has been derived. However, due to the non-linear terms, besides some cases such as the computation of the survival probability [18], its analytical solution is unfeasible. In the present work, we have shown how a proper expansion can be applied to the MME derived in [10]. Such expansion highlights, on one side, how the *GSM*<sup>2</sup> is connected to the MKM and, on the other side, how non-Poissonian effects naturally emerge with no need for *ad hoc* corrections.

**Funding:** This work was partially supported by INFN CSN5 projects MICROBE-IT and FRIDA.

**Data Availability Statement:** No new data have been created.

**Acknowledgments:** I would like to thank M. Missiaggia for several fruitful discussions that helped significantly improve the results presented in this research.

**Conflicts of Interest:** The authors declare no conflict of interest

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

#### *Article* **DNA Code from Cyclic and Skew Cyclic Codes over** F**4**[*v*]**/**h*v* **3** i

**Om Prakash <sup>1</sup> , Ashutosh Singh <sup>1</sup> , Ram Krishna Verma <sup>2</sup> , Patrick Solé 3,\* and Wei Cheng 4,5**

<sup>1</sup> Department of Mathematics, Indian Institute of Technology Patna, Bihar 801106, India

	- <sup>5</sup> Secure-IC S.A.S., 104 Boulevard du Montparnasse, 75014 Paris, France
	- **\*** Correspondence: patrick.sole@telecom-paris.fr

**Abstract:** The main motivation of this work is to study and obtain some reversible and DNA codes of length *n* with better parameters. Here, we first investigate the structure of cyclic and skew cyclic codes over the chain ring R := F<sup>4</sup> [*v*]/h*v* 3 i. We show an association between the codons and the elements of R using a Gray map. Under this Gray map, we study reversible and DNA codes of length *n*. Finally, several new DNA codes are obtained that have improved parameters than previously known codes. We also determine the Hamming and the Edit distances of these codes.

**Keywords:** reversible code; gray map; DNA codes

### **1. Introduction**

DNA is a nucleic acid used for carrying genetic information in living organisms. It is a double-strand molecule formed from two possible nitrogenous bases—Purines (Adenine and Guanine) and Pyrimidines (Cytosine—and Thymine) and two chemically polar ends, namely, 5 0 and 3 0 . The Watson–Crick complementary (WCC) relation, which is characterized as *A <sup>c</sup>* = *T*, *G <sup>c</sup>* = *C*, and vice versa, is used to bind the bases of DNA. In 1994, Adleman [1] discussed the Hamiltonian path problem using DNA molecules. This (NP-complete) problem is solved by encoding a small graph in DNA molecules where all the operations were carried out using standard protocols such as the WCC relation. Due to massive parallelism, DNA computing emerged as a powerful tool among researchers to solve computationally difficult problems. Further, the experiments are performed on synthesized DNA and RNA molecules to control their combinatorial constraints such as constant *GC*-content and Hamming distance.

Linear codes over finite fields have been explored for almost three decades, but this research area experienced an astonishing rate after the remarkable work of Hammons et al. [2] when they established a relation between linear codes over Z<sup>4</sup> with other nonlinear binary codes. Afterward, many authors [3–6] considered alphabets endowed with a ring structure and found many good linear codes over finite fields via specific Gray maps. Within the class of linear codes, cyclic codes are the pivotal and the most studied codes due to their theoretical richness and practical implementation. Recently, many authors [7–13] constructed DNA codes using cyclic codes over rings. For instance, Bayram et al. [7] and Yildiz and Siap [13] explored DNA codes over the rings F<sup>4</sup> + *v*F4, *v* <sup>2</sup> <sup>=</sup> *<sup>v</sup>* and <sup>F</sup>2[*v*]/h*<sup>v</sup>* <sup>4</sup> <sup>−</sup> <sup>1</sup>i, respectively. In 2019, Mostafanasab and Darani [12] discussed the structure of cyclic DNA codes over the chain ring F<sup>2</sup> + *u*F<sup>2</sup> + *u* <sup>2</sup>F2. Liu et al. [14] worked on cyclic DNA codes of an odd length over F4[*u*]/h*u* 3 i. On the other hand, Boucher et al. [15] introduced skew cyclic codes and discovered many new linear codes. Further, in [16,17], more properties of these codes over chain rings have been established. Recently, Gursoy et al. [18] studied reversible DNA codes by using skew cyclic codes. Later on, Cengellenmis et al. [19] studied DNA codes from skew cyclic codes over the rings *F*2[*u*, *v*, *w*], where *u* <sup>2</sup> = *v* <sup>2</sup> + *v* = *w* <sup>2</sup> + *w* =

**Citation:** Prakash, O.; Singh, A.; Verma, R.K.; Solé, P.; Cheng, W. DNA Code from Cyclic and Skew Cyclic Codes over F4[*v*]/h*v* 3 i. *Entropy* **2023**, *25*, 239. https://doi.org/10.3390/ e25020239

Academic Editor: Pavel Kraikivski

Received: 31 December 2022 Revised: 16 January 2023 Accepted: 19 January 2023 Published: 28 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

*uv* + *vu* = *uw* + *wu* = *vw* + *wv* = 0. Motivated by the above works, we consider cyclic as well as skew cyclic codes over the finite chain ring R = F4[*v*]/h*v* 3 i to construct DNA codes of arbitrary lengths. Hamming and edit distances are also calculated for the obtained codes. Interestingly, we obtain several new codes with better parameters than known codes [14].

The article is structured as follows: The Gray map, together with the correspondence of the codons and the other basic results of cyclic codes, are in Section 2. Reversible cyclic codes over the ring R are covered in Section 3, whereas the reversible skew cyclic codes are studied in Section 4. Some results related to the complement and reverse complement of obtained codes are presented in Section 5. Based on our established results from the previous Sections and magma computer algebra system [20], we provide a few examples of DNA codes of arbitrary lengths in Section 6. In the end, we conclude our work in Section 7.

### **2. Preliminaries**

Let F<sup>4</sup> = {0, 1,t,t <sup>2</sup>}, where <sup>t</sup> <sup>2</sup> <sup>=</sup> <sup>t</sup> <sup>+</sup> <sup>1</sup> be a finite field. Then <sup>R</sup> :<sup>=</sup> <sup>F</sup>4[*v*]/h*<sup>v</sup>* 3 i is a finite chain ring with characteristic 2 and every element *r* of R can be represented as *r* = b<sup>1</sup> + b2*v* + b3*v* <sup>2</sup> where <sup>b</sup>*<sup>i</sup>* <sup>∈</sup> <sup>F</sup>4, for *<sup>i</sup>* <sup>=</sup> 0, 1, 2 and *<sup>v</sup>* <sup>3</sup> <sup>=</sup> 0. It is easy to show that <sup>R</sup> is a principal ideal ring with unique maximal ideal h*vs*.i and R/h*vs*.i is isomorphic to F4. Recall that the ring R has 48 invertible elements of the form *r* = b<sup>1</sup> + b2*v* + b3*v* 2 , where b<sup>1</sup> is invertible in F4.

A linear code C of length *n* and alphabets from R is a submodule of an R-module R*n* . The elements of C are called the codewords. The Hamming weight of an element b = (b0, b1, . . . , b*n*) ∈ C is defined as *wH*(b)= |{*i* | b*<sup>i</sup>* 6= 0}| and Hamming distance *dH*(b,k) between any two elements b = (b0, b1, . . . , b*n*) and k = (k0,k1, . . . ,k*n*) in C is defined as *dH*(b,k) = *wH*(b − k). Additionally, the lowest value in the set {*dH*(b,k) | b 6= k, ∀ b,k ∈ C} is considered as the the Hamming distance *dH*(C) of the code C .

Now, we describe a Gray map Φ : R −→ F 3 4 as:

$$\Phi(\mathfrak{b}\_0 + \mathfrak{b}\_1 v + \mathfrak{b}\_2 v^2) = (\mathfrak{b}\_0 + \mathfrak{b}\_1 + \mathfrak{b}\_2, \mathfrak{b}\_1 + \mathfrak{b}\_2, \mathfrak{b}\_2), \tag{1}$$

where b*<sup>i</sup>* ∈ F<sup>4</sup> for *i* = 0, 1, 2. It is easy to see that the function Φ is a distance-preserving map and is extendable to <sup>R</sup>*<sup>n</sup>* component-wise. In Table 1, we establish the connection between the ring elements and the codons by using the Gray map (1).

**Definition 1.** *For a given polynomial* g(*z*) = g<sup>0</sup> + g1*z* + . . . + g*mz <sup>m</sup>* <sup>∈</sup> <sup>F</sup>4[*z*]*, the reciprocal polynomial is denoted by* g ∗ (*z*) *and defined as* g ∗ (*z*) = ∑ *m i*=0 g*m*−*iz i . A polynomial* g(*z*) *is said to be self-reciprocal if and only if* g ∗ (*z*) = *b*g(*z*) *for some non-zero element b in* F4*.*

Now, we present some useful lemmas that appeared in [8,14].

**Lemma 1.** *Let* g(*z*) *and* h(*z*) *be polynomials over* R *of degrees r and s, respectively, with r* ≥ *s. Then:*


**Lemma 2.** *Let* f(*z*)*,* g(*z*)*, and* h(*z*) *be polynomials over* R *of degrees r, s, and t, respectively, where r* ≥ *s*, *t. Then:*


Using the Watson–Crick complementary relation, we define the reverse (**R**) and the reverse complement (**RC**) of a DNA codeword b = (b0, b1, . . . , b*n*−1) by b *<sup>r</sup>* = (b*n*−1, . . . , <sup>b</sup>1, <sup>b</sup>0) and b *rc* = (b *c n*−1 , . . . , b *c* 1 , b *c* 0 ), respectively. For example, given b = *ATCCGT*, we obtain b *<sup>r</sup>* = *TGCCTA* and b *rc* = *ACGGAT*.

We have the following observations based on the Gray map provided in Equation (1).


**Table 1.** Codons correspondence with the elements of R.

**Lemma 3.** *1. For any a* = (b<sup>0</sup> + b1*v* + b2*v* 2 ) ∈ R*, we have*

Φ(b<sup>0</sup> + b1*v* + b2*v* 2 ) *<sup>r</sup>* <sup>=</sup> <sup>b</sup><sup>1</sup> <sup>+</sup> <sup>b</sup>0*<sup>v</sup>* + (b<sup>0</sup> <sup>+</sup> <sup>b</sup><sup>1</sup> <sup>+</sup> <sup>b</sup>2)*<sup>v</sup>* 2 , *where* b0, b1, b<sup>2</sup> ∈ F4.

*2.* Φ(b<sup>0</sup> + b1) *<sup>r</sup>* = Φ(b0) *<sup>r</sup>* + Φ(b1) *r* , *where* b0, b<sup>1</sup> ∈ F4*.*

### **3. Reversible Cyclic Codes over** R

In the present section, we investigate the structure of cyclic codes and prove reversible conditions on these codes. The cyclic codes of odd lengths are provided in [14] and a detailed discussion on cyclic codes of arbitrary length with alphabets from Z2[*u*]/h*v* 3 i is explored in [6]. Now, in the subsequent theorems, we describe the structure of the cyclic code. We omit the proof due to its similarity to the proof provided in [6].

**Theorem 1.** *Let* C *be a cyclic code of length n over* R*. Then the code* C *is provided by:*

$$\mathcal{C} = \langle \mathfrak{g}\_0(z) + v\mathfrak{g}\_1(z) + v^2\mathfrak{g}\_2(z), va\_1(z) + v^2p(z), v^2a\_2(z) \rangle\_{\mathcal{H}}$$

*where a*2(*z*)|*a*1(*z*)|g0(*z*)|(*z <sup>n</sup>* <sup>−</sup>1) *over* <sup>F</sup>4*, <sup>a</sup>*1(*z*)|g1(*z*)( *<sup>z</sup> <sup>n</sup>*−<sup>1</sup> g0(*z*) )*, <sup>a</sup>*2(*z*)|*p*(*z*)( *<sup>z</sup> <sup>n</sup>*−<sup>1</sup> *a*1 (*z*) )*, and a*2(*z*)|g2(*z*) ( *z <sup>n</sup>*−<sup>1</sup> g0(*z*) )( *<sup>z</sup> <sup>n</sup>*−<sup>1</sup> *a*1 (*z*) ) *over* F4*. Moreover, de*g(g2(*z*)) < *de*g(*a*2(*z*))*, de*g(*p*(*z*)) < *de*g(*a*2(*z*))*, and de*g(g1(*z*)) < *de*g(*a*1(*z*))*.*

**Corollary 1.** *If the length of a cyclic code* C *is odd and* g1(*z*) = g2(*z*) = *p*(*z*) = 0*, then* C = hg0(*z*), *va*1(*z*), *v* 2 *a*2(*z*)i = hg0(*z*) + *va*1(*z*) + *v* 2 *a*2(*z*)i*.*

A similar result is also possible when *n* is not odd. In this case, we assume that gcd( *z <sup>n</sup>*−<sup>1</sup> *a*2(*z*) , g0(*z*)) = 1 and consequently obtain the following result.

**Corollary 2.** *If a cyclic code* C *is of even length n and* gcd( *z <sup>n</sup>*−<sup>1</sup> *a*2(*z*) , g0(*z*)) = 1*, then* g1(*z*) = g2(*z*) = *p*(*z*) = 0*.*

When *a*2(*z*) = g0(*z*), then *a*2(*z*) = *a*1(*z*) = g0(*z*) and C as a subset of hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*)i. Since the other containment is true by the definition of <sup>C</sup>, we, therefore, obtain the following corollary.

**Corollary 3.** *For a cyclic code* C = hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*), *va*<sup>1</sup> + *v* <sup>2</sup> *p*(*z*), *v* 2 *a*2(*z*)i*, if a*2(*z*) = g0(*z*)*, then* C = hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*)i*.*

**Definition 2.** *Given a code* C = hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*), *va*1(*z*) + *v* <sup>2</sup> *p*(*z*), *v* 2 *a*2(*z*)i *over* R*, we define* C*<sup>v</sup>* <sup>2</sup> *by* {*q*(*z*) ∈ F4[*z*] | *v* 2 *q*(*z*) ∈ C}*. Particularly, since a*2(*z*)|*a*1(*z*)|g0(*z*)*,* C*v* <sup>2</sup> = h*a*2(*z*)i*.*

In the next result, we determine the Hamming distance of the code C by using the above definition in terms of the Hamming distance of C*<sup>v</sup>* 2 .

**Theorem 2.** *Let* C *be a code provided by* C = hg(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*), *va*1(*z*) + *v* <sup>2</sup> *p*(*z*), *v* 2 *a*2(*z*)i*. Then Hamming distance of* C *and* C*<sup>v</sup>* <sup>2</sup> *are equal, i.e., dH*(C) = *dH*(C*<sup>v</sup>* <sup>2</sup> ).

**Proof.** It can be obtained from [4].

**Remark 1.** *For the sake of brevity, we use b for polynomial b*(*z*) *whenever b*(*z*) *belongs to the field* F4*.*

**Lemma 4.** *Let* g0(*z*), g1(*z*) *and* g2(*z*) ∈ F4[*z*] *of degrees r*,*s and t, respectively. Then* (g0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*))<sup>∗</sup> = g ∗ 0 (*z*) + *vzr*−*s*g ∗ 2 (*z*) + *v* 2 *z <sup>r</sup>*−*t*g ∗ 2 (*z*)*.*

**Theorem 3.** *Let* C = hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*)<sup>i</sup> *be a cyclic code of even length over* <sup>R</sup> *with monic polynomials* g0(*z*)*,* g1(*z*) *and* g2(*z*) *of degrees r*,*s and t, respectively. Then the code* C *is reversible if and only if:*


**Proof.** Let C be a reversible cyclic code. Then

$$\begin{split} (\mathfrak{g}\_0(\mathfrak{z}) + v\mathfrak{g}\_1(\mathfrak{z}) + v^2\mathfrak{g}\_2(\mathfrak{z}))^\* &= \mathfrak{g}\_0^\*(\mathfrak{z}) + v\mathfrak{z}^{r-s}\mathfrak{g}\_2^\*(\mathfrak{z}) + v^2\mathfrak{z}^{r-t}\mathfrak{g}\_2^\*(\mathfrak{z}) \text{ and} \\ (\mathfrak{g}\_0(\mathfrak{z}) + v\mathfrak{g}\_1(\mathfrak{z}) + v^2\mathfrak{g}\_2(\mathfrak{z}))^\* &= b(\mathfrak{z})(\mathfrak{g}\_0(\mathfrak{z}) + v\mathfrak{g}\_1(\mathfrak{z}) + v^2\mathfrak{g}\_2(\mathfrak{z})) \in \mathcal{C} \\ &= (b\_0(\mathfrak{z}) + vb\_1(\mathfrak{z}) + v^2b\_2(\mathfrak{z}))(\mathfrak{g}\_0(\mathfrak{z}) + v\mathfrak{g}\_1(\mathfrak{z}) + v^2\mathfrak{g}\_2(\mathfrak{z})) \\ &= b\_0(\mathfrak{z})\mathfrak{g}\_0(\mathfrak{z}) + v(b\_0(\mathfrak{z})\mathfrak{g}\_1(\mathfrak{z}) + b\_1(\mathfrak{z})\mathfrak{g}\_0(\mathfrak{z})) \\ &\quad + v^2(b\_0(\mathfrak{z})\mathfrak{g}\_2(\mathfrak{z}) + b\_1(\mathfrak{z})\mathfrak{g}\_1(\mathfrak{z}) + b\_2(\mathfrak{z})\mathfrak{g}\_0(\mathfrak{z})). \end{split}$$

Comparing right side of the two equations, we obtain g ∗ 0 (*z*) = *b*0(*z*)g0(*z*), *z <sup>r</sup>*−*s*g ∗ 1 (*z*) = *b*0(*z*)g1(*z*) + *b*1(*z*)g0(*z*) and *z <sup>r</sup>*−*t*g ∗ 2 (*z*) = *b*0(*z*)g2(*z*) + *b*1(*z*)g1(*z*) + *b*2(*z*)g0(*z*). Now, using deg f ∗ (*z*) ≤ deg f(*z*), we obtain *b*0(*z*) 6= 0 in F<sup>4</sup> and this implies that the polynomial g0(*z*) is self-reciprocal. Therefore, *z <sup>r</sup>*−*s*g ∗ 1 (*z*) = *b*0g1(*z*) + *b*1(*z*)g0(*z*) where *b*<sup>0</sup> = *b*0(*z*) is a non-zero element in F4. Now comparing the degrees of both sides, we obtain a constant polynomial *b*1(*z*) ∈ F4, say, *b*1. We have *z <sup>r</sup>*−*t*g ∗ 2 (*z*) = *b*0g2(*z*) + *b*1g1(*z*) + *b*2(*z*)g0(*z*). Again, comparing the degrees of both sides, we obtain *b*2(*z*) in F4, say *b*2. Thus, *z <sup>r</sup>*−*s*g ∗ 1 (*z*) = *b*0g1(*z*) + *b*1g0(*z*) and *z <sup>r</sup>*−*t*g ∗ 2 (*z*) = *b*0g<sup>2</sup> + *b*1g1(*z*) + *b*2g0(*z*) where *b*<sup>0</sup> ∈ F<sup>4</sup> \ {0} and *b*1, *b*<sup>2</sup> ∈ F4.

Conversely, assume (1) and (2) hold. Then

$$\begin{aligned} \left(\mathfrak{g}\_{0}(\boldsymbol{z}) + v\mathfrak{g}\_{1}(\boldsymbol{z}) + v^{2}\mathfrak{g}\_{2}(\boldsymbol{z})\right)^{\*} &= \mathfrak{g}\_{0}^{\*}(\boldsymbol{z}) + v\boldsymbol{z}^{r-s}\mathfrak{g}\_{1}^{\*}(\boldsymbol{z}) + v^{2}\boldsymbol{z}^{r-t}\mathfrak{g}\_{2}^{\*}(\boldsymbol{z}) \\ &= b\_{0}\mathfrak{g}\_{0}(\boldsymbol{z}) + vb\_{0}\mathfrak{g}\_{1}(\boldsymbol{z}) + vb\_{1}\mathfrak{g}\_{0}(\boldsymbol{z}) + v^{2}b\_{0}\mathfrak{g}\_{2}(\boldsymbol{z}) \\ &\quad + v^{2}b\_{1}\mathfrak{g}\_{1}(\boldsymbol{z}) + v^{2}b\_{2}\mathfrak{g}\_{0}(\boldsymbol{z}) \\ &= b\_{0}(\mathfrak{g}\_{0}(\boldsymbol{z}) + v\mathfrak{g}\_{1}(\boldsymbol{z}) + v^{2}\mathfrak{g}\_{2}(\boldsymbol{z})) + b\_{1}(v\mathfrak{g}\_{0} + v^{2}\mathfrak{g}\_{1}) \\ &\quad + b\_{2}(v^{2}\mathfrak{g}\_{0}(\boldsymbol{z})) \in \mathcal{C} \end{aligned}$$

Thus, the code C is reversible.

**Theorem 4.** *Let* C = hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*), *v* 2 *a*2(*z*)i *be a cyclic code of even length n over* R *with polynomials* g0(*z*)*,* g1(*z*)*, and* g2(*z*) *of degrees r*,*s, and t, respectively, and r* > max{*s*, *t*}*. Furthermore, assume that a*2(*z*)|g0(*z*)|(*z <sup>n</sup>* <sup>−</sup> <sup>1</sup>)*. Then the code* <sup>C</sup> *is reversible if and only if:*


**Proof.** Let C be a reversible code. Then

$$(\mathfrak{g}\_0(z) + v\mathfrak{g}\_1(z) + v^2\mathfrak{g}\_2(z))^\* = \mathfrak{g}\_0^\*(z) + vz^{r-s}\mathfrak{g}\_1^\*(z) + v^2z^{r-t}\mathfrak{g}\_2^\*(z).$$

Furthermore,

$$\begin{aligned} \left(\mathfrak{g}\_0(z) + v\mathfrak{g}\_1(z) + v^2\mathfrak{g}\_2(z)\right)^\* &= b(z)(\mathfrak{g}\_0(z) + v\mathfrak{g}\_1(z) + v^2\mathfrak{g}\_2(z)) + v^2 c(z)a\_2(z) \\ &= (b\_0(z) + vb\_1(z) + v^2b\_2(z))(\mathfrak{g}\_0(z) + v\mathfrak{g}\_1(z) + v^2\mathfrak{g}\_2(z)) \\ v^2 \mathfrak{g}\_2(z)) + v^2 c(z)a\_2(z) \text{ where } b\_i(z), c(z) \in \mathbb{F}\_4[z] \\ &= b\_0(z)\mathfrak{g}\_0(z) + v(b\_0(z)\mathfrak{g}\_1(z) + b\_1(z)\mathfrak{g}\_0(z)) + v^2 \\ &\quad (b\_0(z)\mathfrak{g}\_2(z) + b\_1(z)\mathfrak{g}\_1(z) + b\_2(z)\mathfrak{g}\_0(z) + c(z)a\_2(z)). \end{aligned}$$

Comparing both equations, we obtain *b*0(*z*) ∈ F<sup>4</sup> \ {0}, say *b*0, this implies that g0(*z*) is selfreciprocal. Therefore, *z <sup>r</sup>*−*s*g ∗ 1 (*z*) = *b*0g1(*z*) + *b*1g0(*z*) and *z <sup>r</sup>*−*t*g ∗ 2 (*z*) = *b*0g2(*z*) + *b*1g1(*z*) + *b*2(*z*)g0(*z*) + *c*(*z*)*a*2(*z*); this implies that *a*2(*z*) divides *z <sup>r</sup>*−*t*g ∗ 2 (*z*) + *b*0g2(*z*) + *b*1g1(*z*). Again, *v* 2 *a* ∗ 2 (*z*) ∈ C and hence *a*2(*z*)|g0(*z*) implies that *a*2(*z*) is self-reversible.

Conversely, suppose conditions (1) and (2) hold. Then

$$\begin{aligned} \left(\mathfrak{g}\_{0}(\boldsymbol{z}) + v\mathfrak{g}\_{1}(\boldsymbol{z}) + v^{2}\mathfrak{g}\_{2}(\boldsymbol{z})\right)^{\*} &= \mathfrak{g}\_{0}^{\*}(\boldsymbol{z}) + v\boldsymbol{z}^{r-s}\mathfrak{g}\_{1}^{\*}(\boldsymbol{z}) + v^{2}\boldsymbol{z}^{r-t}\mathfrak{g}\_{2}^{\*}(\boldsymbol{z}) \\ &= b\_{0}\mathfrak{g}\_{0}(\boldsymbol{z}) + v(b\_{0}\mathfrak{g}\_{1}(\boldsymbol{z}) + b\_{1}\mathfrak{g}\_{0}(\boldsymbol{z})) + v^{2}(b\_{0}\mathfrak{g}\_{2}(\boldsymbol{z}) \\ &+ b\_{1}\mathfrak{g}\_{1}(\boldsymbol{z}) + c(\boldsymbol{z})a\_{2}(\boldsymbol{z})) \text{ for some } c(\boldsymbol{z}) \in \mathbb{F}\_{4}[\boldsymbol{z}] \\ &= b\_{0}(\mathfrak{g}\_{0}(\boldsymbol{z}) + v\mathfrak{g}\_{1}(\boldsymbol{z}) + v^{2}\mathfrak{g}\_{2}(\boldsymbol{z})) + vb\_{1}(\mathfrak{g}\_{0}(\boldsymbol{z}) \\ &+ v\mathfrak{g}\_{1}(\boldsymbol{z}) + v^{2}\mathfrak{g}\_{2}(\boldsymbol{z})) + c(\boldsymbol{z})v^{2}a\_{2}(\boldsymbol{z}) \in \mathcal{C}. \end{aligned}$$

Therefore, C is reversible.

The following theorem states the reversible condition of odd length codes or a code satisfying Corollary 2.

**Theorem 5.** *Let* C = hg0(*z*), *va*1(*z*), *v* 2 *a*2(*z*)i *be a cyclic code over* R *with a*2(*z*)|*a*1(*z*)|g0(*z*)|(*z n* −1)*. Then code* C *is reversible if and only if polynomials* g0(*z*)*, a*1(*z*) *and a*2(*z*) *are self-reversible.*

**Proof.** Let C be a reversible code. Then for some polynomials *b*0(*z*), *b*1(*z*) and *b*2(*z*) in F4[*z*], we have (g0(*z*))<sup>∗</sup> = *b*0(*z*)g0(*z*) + *vb*1(*z*)*a*1(*z*) + *v* 2 *b*2(*z*)*a*2(*z*).

Comparing both sides, we obtain *b*0(*z*) ∈ F<sup>4</sup> \ {0}, say *b*0, since *deg*f ∗ (*z*) ≤ *deg*f(*z*), then g0(*z*) is self-reciprocal. Similarly, *a*1(*z*) and *a*2(*z*) are self-reciprocal polynomials.

Conversely, let the polynomials g0(*z*), *a*1(*z*), and *a*2(*z*) be self-reciprocal. Then, elements of C are provided by the polynomial *b*0(*z*)g0(*z*) + *vb*1(*z*)*a*1(*z*) + *v* 2 *b*2(*z*)*a*2(*z*), therefore by Lemma 4, we have

$$\begin{aligned} (b\_0(z)\mathfrak{g}\_0(z) + vb\_1(z)a\_1(z) + v^2b\_2(z)a\_2(z))^\* &= (b\_0(z)\mathfrak{g}\_0(z))^\* + v(b\_1(z)a\_1(z))^\*z^{r-s} \\ &+ v^2(b\_2(z)a\_2(z))^\*z^{r-t} \\ &= b\_0^\*(z)\mathfrak{g}\_0^\*(z) + vz^{r-s}b\_1^\*(z)a\_1^\*(z) \\ &+ v^2z^{r-t}b\_2^\*(z)a\_2^\*(z) \in \mathcal{C}. \end{aligned}$$

Thus, C is reversible.

Now, in the following result, we determine the rank of a code C. The proof is followed by similar arguments as in Theorem 3 of [6].

**Theorem 6.** *Let* C *be a cyclic code of length n over* R *such that*

C = hg0(*z*) + *v*g1(*z*) + *v* 2 g2(*z*), *va*1(*z*) + *vp*(*z*), *v* 2 *a*2(*z*)i,

*where* g0(*z*), g1(*z*), g2(*z*)*, and a*2(*z*) *are polynomials in* F4[*z*] *and de*g(g0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*)) = *r*, *de*g(*a*1(*z*)) = *s and de*g(*a*2(*z*)) = *t. Then* C *is a free module and rank*(C) = *n* − *t. Moreover, the basis of* C *is provided by the set S, where*

$$\begin{split} \mathbb{S} &= \{ (\mathfrak{g}\_0(z) + v\mathfrak{g}\_1(z) + v^2\mathfrak{g}\_2(z)), \mathfrak{x}(\mathfrak{g}\_0(z) + v\mathfrak{g}\_1(z) + v^2\mathfrak{g}\_2(z)), \dots, z^{n-r-1}(\mathfrak{g}\_0(z) + v\mathfrak{g}\_1(z)) \\ &+ v^2\mathfrak{g}\_2(z)), (va\_1(z) + v^2p(z)), \mathfrak{x}(va\_1(z) + v^2p(z)), \dots, z^{r-s-1}(va\_1(z) + v^2p(z)), \\ &v^2a\_2(z), v^2xa\_2(z), \dots, v^2z^{s-t-1}a\_2(z)) \}. \end{split}$$

### **4. Reversible Skew Cyclic Codes over** R

In this part, we focus on the structure of skew cyclic codes over R and establish a necessary and sufficient condition for these codes to be reversible. We first define the skew polynomial ring over R and provide some definitions that will be used in this section.

Let *θ* ∈ *Aut*(F4) be defined by *θ*(*a*) = *a* 2 . Now, consider a map *σ* : R −→ R defined by:

$$
\sigma(a\_0 + a\_1 v + a\_2 v^2) = \theta(a\_0) + \theta(a\_1)v + \theta(a\_2)v^2,
$$

where *a*0, *a*1, *a*<sup>2</sup> ∈ F4. Since *σ* is an extension of *θ*, *σ* is an automorphism of R. Let us consider the set:

$$\mathcal{R}[z;\sigma] = \{a\_0 + a\_1z + \dots + a\_nz^n \mid a\_i \in \mathcal{R} \,\forall \, i, n \in \mathbb{N}\}.$$

Define the addition on R[*z*; *σ*] as the usual addition of polynomials and multiplication under the rule (*aiz i* )(*ajz j* ) = *aiσ i* (*aj*)*z i*+*j* . Then, it is easy to show that R[*z*; *σ*] forms a ring under the above binary operations, known as a skew polynomial ring. Here, (*aiz i* )(*ajz j* ) 6= (*ajz j* )(*aiz i* ) unless *σ* is identity automorphism.

**Definition 3.** *Let <sup>τ</sup><sup>σ</sup>* : <sup>R</sup>*<sup>n</sup>* −→ R*<sup>n</sup> be a skew cyclic shift operator defined by:*

$$\pi\_{\sigma}(a\_0.a\_1,\ldots,a\_{n-1}) = (\sigma(a\_{n-1}), \sigma(a\_0), \ldots, \sigma(a\_{n-2})), \forall \ (a\_0.a\_1,\ldots,a\_{n-1}) \in \mathcal{R}^n.$$

*, a linear code* C *of length n over* R *is said to be skew cyclic code if for any codeword c* ∈ C*, their skew cyclic shift τσ*(*c*) *belongs to* C*, that is, τσ*(C) = C*.*

**Definition 4.** *For skew polynomials, a*(*z*) *and b*(*z*) 6= 0*, the polynomial b*(*z*) *is said to be rightly divided by a*(*z*) *if and only if there exists a skew polynomial q*(*z*) *such that a*(*z*) = *q*(*z*)*b*(*z*) *and we denote it by b*(*z*)|*ra*(*z*)*.*

Using similar arguments as in the commutative case, we provide the structure of the skew cyclic codes over R for automorphism *σ*.

**Theorem 7.** *Let* <sup>C</sup> *be a skew cyclic code in* <sup>R</sup>[*z*;*σ*] h*z <sup>n</sup>*−1i *. Then,* C = hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*), *va*1(*z*) + *v* <sup>2</sup> *p*(*z*), *v* 2 *a*2(*z*)i *with a*2(*z*)|*ra*1(*z*)|*r*g0(*z*)|*r*(*z <sup>n</sup>* <sup>−</sup> <sup>1</sup>) *in* <sup>F</sup>4[*z*; *<sup>θ</sup>*]*, <sup>a</sup>*1(*z*)|*r*g1(*z*)( *<sup>z</sup> <sup>n</sup>*−<sup>1</sup> g0(*z*) ) *and a*2(*z*) *right divides p*(*z*)( *<sup>z</sup> <sup>n</sup>*−<sup>1</sup> *a*1 (*z*) )*, and* g2(*z*)( *<sup>z</sup> <sup>n</sup>*−<sup>1</sup> g0(*z*) )( *<sup>z</sup> <sup>n</sup>*−<sup>1</sup> *a*1 (*z*) )*.*

**Proof.** Consider the ring R<sup>0</sup> = F4 [*v*] h*v* 2i and *σ* 0 ∈ *Aut*(R<sup>0</sup> ). For a skew cyclic code C over R, define a map *<sup>ψ</sup>*<sup>1</sup> : R → R<sup>0</sup> by *<sup>ψ</sup>*1(*<sup>a</sup>* <sup>+</sup> *bv* <sup>+</sup> *cv*<sup>2</sup> ) = *a* + *bv* where *a*, *b*, *c* ∈ F. Then, *ψ*<sup>1</sup> is a ring homomorphism that can be extended to a homomorphism *<sup>φ</sup>* : C → <sup>R</sup><sup>0</sup> [*z*;*σ* 0 ] h*z <sup>n</sup>*−1i defined by

$$
\phi(c\_0 + c\_1 z + \dots + c\_{n-1} z^{n-1}) = \psi\_1(c\_0) + \psi\_1(c\_1)z + \dots + \psi\_1(c\_{n-1})z^{n-1}.
$$

Then *ker*(*φ*) = {*v* 2 *r*(*z*) : *r*(*z*) ∈ F4[*z*; *θ*]/h*z <sup>n</sup>* <sup>−</sup> <sup>1</sup>i}.

In order to determine the generators of cyclic code in R*<sup>n</sup>* = R[*z*, *σ*]/h*z <sup>n</sup>* <sup>−</sup> <sup>1</sup>i, we need to know the image of *φ* which is a skew cyclic code in R<sup>0</sup> *<sup>n</sup>* = R<sup>0</sup> [*z*, *σ*2]/h*z <sup>n</sup>* <sup>−</sup> <sup>1</sup>i.

Let *D* be a cyclic code in R<sup>0</sup> *n* . Now, define a map *ψ*<sup>2</sup> : R<sup>0</sup> → F<sup>4</sup> by *ψ*2(*a* + *ub*) = *a* 2 . Then *ψ*<sup>2</sup> is a ring homomorphism. We extend *ψ*<sup>2</sup> to a ring homomorphism *ϕ* : *D* → F4[*z*; *θ*]/h*z <sup>n</sup>* <sup>−</sup> <sup>1</sup><sup>i</sup> defined by

$$
\varphi(d\_0 + d\_1 z + \dots + d\_{n-1} z^{n-1}) = \psi\_2(d\_0) + \psi\_2(d\_1) z + \dots + \psi\_2(d\_{n-1}) z^{n-1}.
$$

Then,

$$\begin{aligned} \ker(\varphi) &= \{ \operatorname{vr}'(z) : \operatorname{r}'(z) \text{ is a skew polynomial in } \mathbb{F}\_4[z;\theta] / \langle z^n - 1 \rangle \}, \\ &= \langle \operatorname{va}\_1(z) \rangle \text{ with } \operatorname{a}\_1(z)|\_r (z^n - 1). \end{aligned}$$

Since the set image(*ϕ*) is also an ideal and hence a skew cyclic code generated by g0(*z*), where g0(*z*) right divides (*z <sup>n</sup>* <sup>−</sup> <sup>1</sup>). Therefore, *<sup>D</sup>* <sup>=</sup> <sup>h</sup>g0(*z*) + *<sup>v</sup>*g1(*z*), *va*1(*z*)<sup>i</sup> where *a*1(*z*)|*r*g0(*z*) and *a*1(*z*)|*r*(g1(*z*) *z <sup>n</sup>*−<sup>1</sup> g0(*z*) ).

Similarly, the set image(*φ*) is an ideal over R<sup>0</sup> . Therefore, skew cyclic code C over R is provided by C = hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*), *va*1(*z*) + *v* <sup>2</sup> *p*(*z*), *v* 2 *a*2(*z*)i with *a*2(*z*)|*ra*1(*z*)|*<sup>r</sup>* g0(*z*)|*r*(*z <sup>n</sup>* <sup>−</sup> <sup>1</sup>) and *<sup>a</sup>*1(*z*)|*r*(g1(*z*) *z <sup>n</sup>*−<sup>1</sup> g0(*z*) ), *a*2(*z*)|*r*(g1(*z*) *z <sup>n</sup>*−<sup>1</sup> g0(*z*) ).

**Definition 5.** *Let* g(*z*) = g<sup>0</sup> + g1*z* + . . . + g*mz <sup>m</sup> be a polynomial in* F4[*z*, *θ*]*. Then,* g(*z*) *is said to be a palindromic polynomial if* g*<sup>i</sup>* = g*m*−*<sup>i</sup> and θ-palindromic if* g*<sup>i</sup>* = *θ*(g*m*−*i*) *where i* ∈ {1, 2, . . . , *m*}*.*

Note that if the length of the code C is odd, then the skew cyclic codes and cyclic codes are equivalent (Theorem 8 in [17]). Now, we provide two lemmas to check the reversibility of the even length skew cyclic codes over the field F4.

**Lemma 5.** *Let* C *be a skew cyclic code of even length generated by a monic polynomial* f(*z*) = 1 + f1*z* + . . . + f*m*−1*z <sup>m</sup>*−<sup>1</sup> + *z <sup>m</sup> of even degree, where* <sup>f</sup>(*z*)|*r*(*<sup>z</sup> <sup>n</sup>* <sup>−</sup> <sup>1</sup>) *in* <sup>F</sup>4[*z*, *<sup>θ</sup>*]*. Then, the code* <sup>C</sup> *is reversible if and only if skew polynomial* f(*z*) *is θ-palindromic.*

**Proof.** Let C be a skew cyclic code of even length generated by the *θ*-palindromic polynomial f(*z*) of even degree *m* over the ring F4. Then, the elements of the generated code are pro-

vided by ∑ *k*−1 *i*=0 *αiz i* f(*z*). From the repetitive use of Lemma 3, for *c* = *φ*(∑ *k*−1 *i*=0 *αiz i* f(*z*)) ∈ C, we obtain:

$$(\phi(\sum\_{i=0}^{k-1} \alpha\_i z^i \mathfrak{f}(z)))^r = \phi(\sum\_{i=0}^{k-1} \alpha\_i z^{k-i-1} \mathfrak{f}(z)) \in \mathcal{C} \mathcal{L}$$

where *α* ∈ F<sup>4</sup> and *k* = *n* − *m*. Since *c <sup>r</sup>* belongs to the code <sup>C</sup>, <sup>C</sup> is a reversible code.

Conversely, let C be a reversible code generated by f(*z*) = <sup>1</sup>+f1*z* + . . . +f*m*−1*z <sup>m</sup>*−<sup>1</sup> + *z m*. Then, because *n* − *m* − 1 is odd:

$$z^{n-m-1}\mathfrak{f}(z) = z^{n-m-1} + \theta(\mathfrak{f}\_1)z^{n-m} + \dots + \theta(\mathfrak{f}\_{m-1})z^{n-2} + z^{n-1}.$$

Since C is a skew cyclic and reversible code,

$$[z^{n-m-1}\mathfrak{f}(z)]^r = 1 + \theta(\mathfrak{f}\_{m-1})z + \theta(\mathfrak{f}\_{m-2})z^2 + \dots + \theta(\mathfrak{f}\_1)z^{m-1} + z^m \in \mathcal{C}\mathcal{A}$$

Further, we obtain *de*g(f(*z*) − [*z n*−*m*−1 f(*z*)]*<sup>r</sup>* ) < *m*, which contradicts the fact that f(*z*) is a minimal degree polynomial in C implies f(*z*) − [*z n*−*m*−1 f(*z*)]*<sup>r</sup>* = 0. Comparing coefficients, we obtain:

$$[\mathfrak{f}\_i - \theta(\mathfrak{f}\_{m-i})] = 0$$

for *i* = 1, . . . , *m* − 1. Thus, f*<sup>i</sup>* = *θ*(f*m*−*i*) and the polynomial f(*z*) is *θ*-palindromic.

**Lemma 6.** *Let* C *be a skew cyclic code of even length generated by a monic polynomial* f(*z*) = 1 + f1*z* + . . . + f*m*−1*z <sup>m</sup>*−<sup>1</sup> + *z <sup>m</sup> of odd degree, where* <sup>f</sup>(*z*)|*r*(*<sup>z</sup> <sup>n</sup>* <sup>−</sup> <sup>1</sup>) *in* <sup>F</sup>4[*z*, *<sup>θ</sup>*]*. Then, the code* <sup>C</sup> *is reversible if and only if the skew polynomial* f(*z*) *is palindromic.*

**Proof.** Let C be a skew cyclic code of even length generated by a palindromic polynomial f(*z*) of odd degree *m* over the ring F4. Then, elements of the generated code are provided by ∑ *k*−1 *j*=0 *αjz j* f(*z*). From the repetitive use of Lemma 3 and using the property of the palindromic polynomial, for C = *φ*(∑ *k*−1 *j*=0 *αjz j* f(*z*)) ∈ C, we obtain:

$$(\phi(\sum\_{j=0}^{k-1} \alpha\_j z^j \mathfrak{f}(z)))^r = \phi(\sum\_{j=0}^{k-1} \alpha\_j z^{k-j-1} \mathfrak{f}(z)) \in \mathcal{C}.$$

where *α* ∈ F<sup>4</sup> and *k* = *n* − *m*. Since the reverse of C belongs to C, the code C is reversible. Conversely, let C be a reversible code generated by f(*z*) = 1 + f1*z* + . . . + f*m*−1*z <sup>m</sup>*−<sup>1</sup> + *z m*. Since *n* − *m* − 1 is even:

$$z^{n-m-1}\mathfrak{f}(z) = z^{n-m-1} + \mathfrak{f}\_1 z^{n-m} + \dots + \mathfrak{f}\_{m-1} z^{n-2} + z^{n-1}z$$

Furthermore, the code C is a skew cyclic as well as reversible code; therefore, [*z n*−*m*−1 f(*z*)]*<sup>r</sup>* ∈ C and:

$$[z^{n-m-1}\mathfrak{f}(z)]^r = [1 + \mathfrak{f}\_{m-1}z + \mathfrak{f}\_{m-2}z^2 + \dots + \mathfrak{f}\_1 z^{m-1} + z^m] \in \mathcal{C}\mathcal{L}$$

This implies that *de*g(f(*z*) − [*z n*−*m*−1 f(*z*)]*<sup>r</sup>* ) < *m*, which contradicts the fact that f(*z*) is a minimal degree polynomial in C. Hence, f(*z*) − [*z n*−*m*−1 f(*z*)]*<sup>r</sup>* = 0. By comparing the coefficients, we obtain

$$[\mathfrak{f}\_i - \mathfrak{f}\_{m-i}] = 0 \text{ and } \mathfrak{f}\_i = \mathfrak{f}\_{m-i}\nu$$

for *i* = 1, . . . , *m* − 1. Thus, the given polynomial f(*z*) is palindromic.

Now, in the next theorem, we provide necessary and sufficient conditions for a skew cyclic code C to be reversible in terms of palindromic and *θ*-palindromic polynomials. These conditions depend on the degree of generator polynomials of C.

**Theorem 8.** *Let* C = hg0(*z*), *v*g1(*z*), *v* <sup>2</sup>g2(*z*)<sup>i</sup> *be a skew cyclic code of even length, where* <sup>g</sup>*i*(*z*) *right divides* (*z <sup>n</sup>* <sup>−</sup> <sup>1</sup>) *in* <sup>F</sup>4[*z*, *<sup>θ</sup>*] *and de*g(g*i*(*z*)) *is even (odd), for <sup>i</sup>* <sup>=</sup> 0, 1, 2*. Then, the code* <sup>C</sup> *is reversible if and only if skew polynomials* g*i*(*z*) *are θ-palindromic (palindromic) for i* = 0, 1, 2*.*

### **5. DNA Codes over** R

In this section, we discuss the complementary condition of the codes obtained from previous sections to obtain DNA codes. For a DNA code, the reversible and complement conditions are essential [21].

**Definition 6.** *Let* C *be a code of length n over* R*. If* Φ(C) *rc* <sup>∈</sup> <sup>Φ</sup>(C) *for all <sup>c</sup>* ∈ C*, then* <sup>C</sup> *or equivalently* Φ(C) *is called a DNA code.*

In the following lemma, we provide some relations on ring elements and their complement using the Gray map provided in Equation (1).

**Lemma 7.** *For the given cyclic code in Section 3, the following conditions hold:*


**Proof.** This lemma can easily be proved by observing Table 1.

**Remark 2.** *We identify* i(*z*) *by the polynomial* 1 + *z* + *z* <sup>2</sup> <sup>+</sup> · · · <sup>+</sup> *<sup>z</sup> n*−1 .

**Theorem 9.** *Given a polynomial* a(*z*) *in* R[*z*]*. We have* a(*z*) *rc* = a(*z*) *<sup>r</sup>* + *v* 2 i(*z*)*.*

**Proof.** Let C be a reversible-complement code. Then, by definition, C is reversible and 0 ∈ C implies that (0 + 0*z* + . . . + 0*z n*−1 ) *<sup>c</sup>* ∈ C. That is, <sup>C</sup> is reversible and *<sup>v</sup>* <sup>2</sup> + *v* 2 *z* + . . . + *v* 2 *z <sup>n</sup>*−<sup>1</sup> ∈ C.

Conversely, let a(*z*) = a<sup>0</sup> + a1*z* + . . . + a*n*−1*z <sup>n</sup>*−<sup>1</sup> + a*nz <sup>n</sup>* be a polynomial in <sup>R</sup>[*z*]. Then:

$$\begin{split} \mathfrak{a}(z)^{rc} &= \mathfrak{a}\_{n}^{c} + \mathfrak{a}\_{n-1}^{c}z + \dots + \mathfrak{a}\_{1}^{c}z^{n-1} + \mathfrak{a}\_{0}^{c}z^{n} \\ &= \mathfrak{a}\_{n} + v^{2} + (\mathfrak{a}\_{n-1} + v^{2})z + (\mathfrak{a}\_{n-2} + v^{2})z^{2} + \dots \\ &\quad + (\mathfrak{a}\_{1} + v^{2})z^{n-1} + (\mathfrak{a}\_{0} + v^{2})z^{n} \\ &= v^{2}\mathfrak{i}(z) + \mathfrak{a}(z)^{r} \in \mathcal{C}. \end{split}$$

Thus, cyclic code C is a reversible-complement code.

**Corollary 4.** *Let* C *be a cyclic code of even length over* R*. Then,* C *is a DNA code if and only if* C *is reversible and v*<sup>2</sup> i(*z*) *is in* C*.*

**Proof.** It is obvious from above theorem.

### **6. Computational Results**

Now, we provide some examples of DNA codes satisfying the above-mentioned constraints. We consider DNA code of any length (even or odd). All the computational works are performed by using Magma software [20].

**Example 1.** *In* F4[*z*]*, we have:*

$$z^6 - 1 = (z+1)^2(z+\mathfrak{t})^2(z+\mathfrak{t}^2)^2.$$

*Let* C *be a cyclic code of length n* = 6 *over* R *provided by:*

$$\mathcal{C} = \langle z^4 + z^2 + 1, v(z^4 + z^2 + 1), v^2(z^4 + z^2 + 1) \rangle.$$

*Then, using Theorem 2, we obtain d*(C) = 3*. Furthermore,* (*x* − 1) *does not divide* (*z* <sup>4</sup> + *z* <sup>2</sup> + 1) *and polynomial* (*z* <sup>4</sup> + *z* <sup>2</sup> <sup>+</sup> <sup>1</sup>) *is self reciprocal. Thus, we obtain a DNA code* <sup>C</sup> *of parameters* (18, 4<sup>6</sup> , 3)*.*

In the next example, we provide some DNA codes of arbitrary lengths that are generated from cyclic codes over R.

**Example 2.** *Suppose* C *is a cyclic code of the form* C = hg0(*z*) + *v*g1(*z*) + *v* <sup>2</sup>g2(*z*), *va*1(*z*) + *v* <sup>2</sup> *p*(*z*), *v* 2 *a*2(*z*)i, *where* gcd( *z <sup>n</sup>*−<sup>1</sup> *a*2(*z*) , g0(*z*)) = 1*. If* g0(*z*) = *a*1(*z*) = *a*2(*z*)*, then we list several DNA codes in Table 2 that are obtained from cyclic code* C*. Since* g0(*z*)*, a*1(*z*)*, and a*2(*z*) *are equal, therefore, in Table 2, we mention only* g0(*z*)*. For brevity, polynomial z* <sup>2</sup> + b1*z* + b<sup>0</sup> *is represented as* b0b11*.*


**Table 2.** DNA codes of different lengths.

**Example 3.** *Consider a cyclic code* C *of length n* = 9 *over ring* R*. In* F4[*z*]*, we have:*

$$z^9 - 1 = (z+1)(z+\mathfrak{t})(z+\mathfrak{t}^2)(z^3+\mathfrak{t})(z^3+\mathfrak{t}^2).$$

*To write briefly, we identify factors by* g1, g2, g3, g4*, and* g5*, respectively. The codes for n* = 9 *are provided in Table 3. All the codes are better than the codes that appeared in [14].*

**Example 4.** *Consider a cyclic code* C *of length n* = 15 *over ring* R*. In* F4[*z*]*, we have*

$$\begin{aligned} z^{15} - 1 &= (z+1)(z+\mathfrak{t})(z+\mathfrak{t}^2)(z^2+z+\mathfrak{t})(z^2+z+\mathfrak{t}^2)(z^2+\mathfrak{t}z+1), \\ &(z^2+\mathfrak{t}z+\mathfrak{t})(z^2+\mathfrak{t}^2z+\mathfrak{t})(z^2+\mathfrak{t}^2z+\mathfrak{t}^2). \end{aligned}$$

*For brevity, we identify the factors by* g1, g2, g3, g5, g6, g7, g8, *and* g9*, respectively. DNA codes for n* = 15 *are provided in Table 4. All the obtained DNA codes are better than the codes provided in [14].*

*In particular, if* C = hg2g3g4g5g6g7g8g9, *v*g2g3g4g5g6g7g8g9, *v* <sup>2</sup>g2g3g4g5g6g7g8g9i*, then we obtain a DNA code with parameters* [45, 4<sup>3</sup> , 15]*. Further, we list all the DNA codewords of the obtained DNA code in Table 5. Furthermore, the edit distance of the obtained DNA code is* 2*, given by the codewords "TCCTCCTCCTCCTCCTCCTCCTCCTCC" and "CTCCTCCTCCTCCTCCTC-CTCCTCCTC".*





**Table 5.** Codewords of length 45 and dimension 3.


### **7. Conclusions**

In this paper, we have studied reversible and DNA codes using the chain ring R = F4[*v*]/h*v* 3 i. We have defined a Gray map on R and found codons corresponding to the elements of R. In this way, we have obtained good DNA and reversible codes with the Hamming distances. In the future, one can work on DNA codes over a generalized structure of R as well as DNA codes by using skew polynomial rings.

**Author Contributions:** This work is initiated by A.S. and R.K.V. under the supervision of O.P. Then we discussed it with P.S. and W.C. to reach the final version. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding for APC. Financial support for this work is properly acknowledged in the acknowledgement section.

**Data Availability Statement:** The data that support the findings of this study are available from the corresponding author upon reasonable request.

**Acknowledgments:** The first and second authors are thankful to the DST and the CSIR, Govt. of India, for financial support under CRG/2020/005927, vide Diary No. SERB/F/6780/ 2020-2021 dated 31 December 2020 and under File No. 09/1023(0027)/2019- EMR-1, respectively. Furthermore, we thank the Indian Institute of Technology Patna for providing research facilities.

**Conflicts of Interest:** The authors declare that there is no conflict of interest regarding the publication of this manuscript.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
