1. Introduction
Musical large-scale forms; the temporal organization of a musical piece over its whole length, e.g., the song’s verse and refrain structure; electronic dance music tension build-up and relaxation; and sonata song structures are poorly researched in terms of brain dynamics [
1,
2]. It is widely accepted that music is organized according to Gestalt principles [
3,
4], often with hierarchical structures, psychologically [
5,
6], or in terms of music theory [
7], and is subject to long-term memory [
8].
The processing of sound in the brain has historically been understood as a bottom-up process, starting from the transition of sound in the cochlea into neural spikes [
9], further processed in the auditory pathway [
10], to coincidence detection [
9], tonotopy, interaural level and time difference detection, and pitch perception [
11,
12], among others. Still, even within the auditory cortex, multiple bottom-up and top-down connections are present [
13], and so viewing the auditory pathway as a complex, self-organizing neural network seems more appropriate.
Such a self-organizing view is standard in terms of cortical processing, and existing neural models vary in terms of complexity and scaling [
14]. Only a few brain models try to understand the brain with simple principles like the free-energy principle [
15], assuming adaptation of the brain to external surprises; the global workspace view [
16], assuming synchronization and de-synchronization of brain parts over time; or the synergetic approach of Gestalt perception [
17].
Machine learning models also view conscious content as a process of complex, often nonlinear interactions between single neurons, also leading to heuristic and coherent Gestalt-like connectionist [
18,
19] and music models [
20,
21] when analyzing large musical databases such as Computational Phonogram Archiving [
22] for streaming platforms or archives in general [
23].
The self-organizing concept is also reflected in the idea of fifty-millisecond intervals of organized neural spatiotemporal patterns followed by short, chaotic disturbances associated with olfactory [
24] or auditory [
25,
26] conscious content. Enlarging the picture to interactions between subjects results in the idea of a self-organizing society of brains [
27] or the inclusion of cultural artifacts and nature in Physical Culture Theory [
28].
Incorporating brains with cultural objects requires a general framework which was proposed as the Impulse Pattern Formulation (IPF), first developed for musical instruments [
29]. As the only force we physically experience—next to gravity—is electricity, it is straightforward to formulate acoustic as well as electric nervous spike impulses as being of the same nature. The IPF then takes a viewpoint of a neuron or musical instrument part, from which an impulse is sent out to several other neurons, musical instrument parts, or any kind of object. This impulse is processed, damped, and returned back to the viewpoint object [
12,
29,
30]. Such an iterative, nonlinear dynamical process is scale-free, capable of modeling sudden phase changes, and includes convergent, bifurcating, complex, and chaotic states. Although often modeling a system with only very few nodal points, the IPF has already been shown to be of high precision in musical instrument applications [
30] as well as in rhythm perception and production [
31].
The model was also formulated as a brain model [
28] with neural adaptation and plasticity, non-trivially finding a concentration of inhibitory vs. excitatory neurons of 10–20%, corresponding to the real relation in the brain [
32], as a maximum of possible system convergence. The model also finds a maximum reflection strength around the usual time for event-related potentials, as well as a decay in memory in the system corresponding to short-term memory. These general findings strongly point to the validity of the model.
Investigating the brain dynamics using large-scale musical forms has already shown that musical tension can be represented in the increasing synchronization of brain parts [
1,
2,
33]. Such findings correspond to synchronization caused by expectancy [
34] of a climax. This represents the development of tension over the large-range form of a musical piece, but it also might correspond to other semantic content like anger, anxiety, drama, relaxation, spirituality, or meditation. Such a relation might be found when referring to the lyrics of a musical song, discussed in the conclusion section below with respect to the piece investigated. Nevertheless, in EEG measurements, the main finding is that brain synchronization is strongest around 50 Hz, and thus in the gamma band of brain dynamics, although of course other brain rhythms exist [
32].
An interesting aspect of the IPF Brain model is the presence of convergence, i.e., the complete synchronization of neurons in the model. This corresponds to epilepsy [
35], which is not the usual state and is a very dangerous brain state. Still, partial synchronization in the brain is well known in chimera states [
36]. Also, in terms of the auditory cortex, temporal lobe epilepsy has often been reported [
37], and is associated with spiritual or meditative states of mind. Such states are naturally associated with longer time spans than pitch or rhythm and are subject to large-scale forms.
The present paper applies the IPF Brain model as suggested previously [
28] to the case of an electronic dance music (EDM) piece
One Mic by the rapper NAS, which has been investigated before in EEG experiments [
1,
2] by using a FitzHugh–Nagumo dynamical brain model [
33]. Strong correlations between the large-scale form of the model, associated with the sound amplitude and fractal correlation dimension as a measure of musical density [
29], have been found experimentally and in the model, with a maximum synchronization in the gamma band of brain dynamics of around 50 Hz in both cases. Still, due to the FitzHugh–Nagumo model not dissolving the temporal development of brain dynamics, a deeper understanding of the reason for this frequency-dependent synchronization requires a model capable of such a temporal resolution, which is represented by the IPF Brain model in this paper.
The paper first introduces the method used. As in the previous paper, the input to the Brain IPF is the output of a Finite-Difference Time-Domain (FDTM) physical model of the cochlea, for which the musical sound is used as an input. The output of the cochlear model is fed into the input of the Brain IPF. To estimate the influence of the amount of neurons used in the model, N = 50, 100, and 200 neurons are taken, showing the independence of the model with respect to this parameter. In the
Section 2, the post-processing of these modeled parameters is discussed, especially the Kuramoto order parameter to measure the synchronization of neurons and the correlation of this parameter with the cochlea input. The
Section 3 then mainly concentrates on this correlation and discusses the reason for the model behavior. In the
Section 4, an overview of different conscious states with respect to the synchronization frequency is given.
2. Methods
2.1. Cochlea Model
The present model assumes the differential equation of a membrane
with basilar membrane displacement
u along a one-dimensional axis
x; basilar membrane stiffness
, changing along the
x axis; linear density
, with the mass over area again changing along the basilar membrane; and
with basilar membrane length
, taking the slight widening of the basilar membrane over its length into account.
A 1D model is sufficient to model a basilar membrane, as shown in [
38], which compared 1D and 2D models based on the anisotropy of the basilar membrane as discussed by [
39]. Here, it was found numerically and based on experimental data that the inclusion of a second dimension contributes less than 1% of the results already obtained by a 1D model. This is reasonable, as the basilar membrane has dimensions of about 3.5 cm in length but only about 1 mm in width and is therefore more a rod than a membrane. Also, the Young’s modulus in the y-direction is only about 10% of that in x-direction [
38] and does therefore not add considerably to the overall basilar membrane movement.
To confirm this finding, a two-dimensional model was built:
with a Young’s modulus in the x-direction of
as above and in the y-direction of
according to the literature [
38]. The linear density
of the 1D model holds also for damping
. The model consists again of 48 modal points in the x-direction and 6 nodal points in the y-direction. The boundary conditions were again simply supported. Still, the results did not differ considerably [
9]. For the sake of simplicity and also taking the computational cost into consideration, the 1D model was then used further.
The electronic dance music piece One Mic by the rapper NAS was used as a sound input to the cochlea model. The FDTM sample rate was 192 kHz to ensure model stability. The output is a set of spike time points at each of the 24 Bark bands B with i = 1, 2, 3 … , where is the maximum number of spikes at Bark band B. Each has an associated amplitude of . Although single spikes have a more or less uniform amplitude, the cochlea nerve fiber output of the model sums many spikes at each position . Therefore, represents the amount of spikes or the output strength.
As input to the IPF Brain model, all outputs, as accumulated with respect to 20 ms time intervals, were used, corresponding to
500 Hz, which is then the time constant of the IPF Brain model discussed below. Therefore, the frequency range of interest in the brain is well represented up to about 200 Hz. The resulting cochlea output time series is then denoted as
where
= 266 s—the length of the musical piece—and the functions G and H detect if the respective spike is within the respective time window and gives
and
if so. For further correlation with the IPF Brain model output,
is calculated as the time series averaged over time windows of 1 s. The unit-less mean amplitude of the
= 0.000118.
2.2. Brain Impulse Pattern Formulation (IPF)
The brain is modeled using N = 50 neurons. Each neuron is a reflection point, returning impulses from a viewpoint neuron. The system state of the viewpoint neuron is
g, which represents a time period and an amplitude strength. Each reflection neuron
i has a damping
. The IPF is then
Here, the viewpoint neuron is i = 1; the reflections come from neurons i = 2, 3, 4,… N. are the polarizations of the neuron, where = 1 is an excitatory neuron and = −1 is an inhibitory neuron. Throughout the paper, a relation of 10% of inhibitory neurons is used. Note that the sum of all reflections is normalized using the amount of neurons N. The model is discrete with time steps t = 0, 1, 2, 3… Therefore, the earlier states of the viewpoint neuron, which this neuron has sent out to the other neurons, return after a delay in a damped and polarized form.
For a deeper discussion of the model, see [
28].
2.3. Plasticity Model
The plasticity of each neuron is calculated for each time step t. Plasticity refers to a change in the damping parameter , where each time step t then might have a different damping . Note that in the IPF reasoning, the damping is originally , which, for the sake of convenience, is skipped, using instead.
For each time step, the new damping is calculated as
If the reflection point neuron t-i, , has the same value as the viewpoint neuron value, , the logarithm becomes zero, and no change in damping happens. If the reflection point neuron i has a larger value than the viewpoint neuron, the logarithm becomes larger than zero and increases. Otherwise, the logarithm assures a negative influence, and decreases. The plasticity process is generally modeled using a constant . Therefore, plasticity can be switched off in the model by using . To examine different model behaviors, will systematically be altered, as shown below. Again, the absolute value of is used, not allowing negative or complex values. This, again, does not change the model behavior due to the logarithms used. Still, positive values are more convenient. Indeed, negative arguments of the logarithm in the simulations shown below appear very rarely and are additionally suppressed by using the absolute value.
2.4. External Musical Instrument Input IPF
Like in a previous study [
33], the electronic dance music piece
One Mic by the rapper NAS has been used.
2.5. Detection of System Behavior
Each neuron has its own time series
when reflecting back to the central neuron, as can be seen in Equation (
4), where
is the
ith neuron at time point t with a certain
at that time point. These time series are Fourier-analyzed with time windows of one second, resulting in
, where
s, i.e., the length of the musical piece. All IPF simulations in this paper were performed with a maximum of
= 500 Hz; therefore, 1 ≤ f ≤ 500 Hz. The time series of the central neuron will be labeled below as simply
g.
Synchronization was measured, as in a previous paper, using the Kuramoto order parameter
where
is the phase of the
ith neuron at time interval T of frequency f taken from
. N is the amount of neural reflection points; in this study, N = 50, N = 100, and N = 200. The Kuramoto order parameter is used as it is the most widely accepted synchronization measurement parameter. It holds that 0
1 with
= 0 in the case of no synchronization and
= 1 in the case of maximum synchronization.
The synchronization order parameter is time-dependent. To estimate the overall synchronization strength from , a time-averaged mean is calculated.
Also, the correlation of with the cochlea input time series is performed, leading to , an estimation of the frequency dependency of the correlation strength of the synchronization with the time series.
The Finite-Difference Time-Domain model of the cochlea was implemented using C++, C#, and CUDA code in Visual Studio software 2012 and 2017. CUDA code implements the model on an NVIDIA Graphic Processing Unit in parallel with calculation of the model nodal points. The IPF model was run and the results analysis was performed using Mathematica 12 software.
4. Conclusions and Discussion
The association of temporal lobe epilepsy with spiritual or meditative experiences corresponds well with the intention of the musical piece One Mic used in this investigation. The lyrics report about the hard struggle in criminal gangs with police interaction and shootings. Several references are made to spiritual, especially Christian, symbols and similar ways of suffering. Such lyrics are rapped during the large amplitude time frames of the piece, i.e., the verses, where also a police siren can be heard. These sections are followed by the low-amplitude parts, where the refrain All I need is one mic is repeated, pointing to music and lyric production as an alternative or a weapon against such a hard struggle. These sections contrast the verses, as they are presented in a contemplative or meditative way.
This compositional tool of presenting a meditative alternative by reducing the volume, reducing the beat from semi-quavers to quavers, and omitting most of the sound effects and keyboard pads of the verse is shown in this paper to produce strongly enhanced synchrony and convergence of the neural network. This synchrony is found in the low frequency range, where the musical rhythm is represented.
Such musical structures are very simple and are present in many musical pieces, where regions of low volume and isochronous rhythm are present. Further investigations are needed to model and measure the neural reaction to such musical content in more detail. Still, due to the simplicity of this compositional tool, one can expect composers and musicians to use it in many musical scenarios.
In a previous similar EEG study of a musical piece,
Classical Symphony by Shemian, increased brain synchronization was found towards an expectancy point, after which synchronization decreased again [
1]. This is a typical electronic dance music piece in that a tension is built up by compositional tools like increased amplitude and event density, only to climax at a point where the dense structure ends and a four-to-the-floor bass drum starts. Again, this is a typical compositional tool in electronic dance music, a tension build-up and decay repeated dozens of times during a song. This compositional tool is clearly present again in the EEG data here in its large-scale musical form.
In this study, brain synchronization followed the reasoning of a coincidence detection mechanism of cortical oscillators, modeling neural activity in the striatum [
34]. After the start of neural oscillation, increased synchronization peaks occur at the point a subject expects an event to happen, like waiting at a traffic light to turn green. The oscillation is then expected to include motor regions, making us nervously shake towards the expected time point.
This might be considered another compositional tool to make people dance. A tension build up, leading to a neural oscillation and including the motor region, enhances the will of subjects to move and, in the case of music, to dance. Although there is strong evidence that this is the case, there is no final proof, as measurements in the motor region are still missing in the musical case.
This line of reasoning seems fundamentally different than that followed in the present paper of neural synchronization leading to a meditative mood. Still, both cases are confirmed experimentally, as is the case of temporal lobe epilepsy [
37]. The difference might indeed be found in the different frequency ranges, where increased synchronization at higher frequencies, around 50 Hz, might cause the perception of increased tension and synchronization at low frequencies, contrary to that of a meditative state. Taking into account that synchronization and de-synchronization are fundamental activities in the brain, and brain activity represents all possible states of mind, perception, and consciousness, such a differentiation seems plausible.