1. Introduction
Models of the human body have gained increasing interest in clinical research and are essential for delivering personalized diagnoses and treatments to patients. They can be used to build a digital twin of a patient’s body that can be used for planning curative interventions, predicting the outcomes of intended treatments, or the likelihood of relapses and complications. For most of these models, apart from the knowledge of the patient’s exact anatomy, information about physiological processes is required such as the impedance of fibrous tissue forming an infarction scar, which largely differs from the impedance of intact myocardium.
Electrical impedance tomography (EIT) and electrical capacitance tomography (ECT) are used to measure tissue parameters such as impedance or capacitance [
1,
2,
3]. The origin of cardiac arrhythmias or myocardial infarctions can be identified by integrating ECG recordings [
4,
5,
6], and the functions within the human brain [
7,
8] can be visualized using models, including EEG recordings. All these methods require the positions of between 12 and a few hundred sensors to be exactly known. The larger the positional error, the lower the diagnostic value of the results generated by the model. Consequently, they are less suitable for treatment planning, guidance, outcome stratification, or prevention of complications and relapses.
A commonly used approach is to extract the sensor positions, along with the anatomical details, from Magnet Resonance Image (MRI) stacks or X-ray Computed Tomography (CT) slices. Both approaches require special markers to be attached to the sensors, which are visible in MRI [
9,
10] or CT scans [
11]. Identifying the sensor positions from MRI or CT scans yields the smallest positional errors compared to the true sensor position. However, this approach significantly hinders the clinical uptake and widespread use of electrical impedance tomography (EIT), electrical capacitance tomography (ECT), noninvasive imaging of cardiac electrophysiology (NICE), and other model-based approaches. Patients either have to be exposed to large amounts of ionizing radiation when using CT scans, limiting the use of the aforementioned methods to three applications per year. Although MRI is not bound by this limitation, it is only covered by insurance companies if it is required for obtaining a proper diagnosis and evaluating outcomes.
Given these limitations, alternative approaches that decouple the generation of the underlying anatomical models from the localization of the sensors have been tested [
12,
13,
14]. Alternatives such as magnetic digitizer systems, e.g., the Polhemus Fastrak [
12], tracked gaming controllers [
13], or motion capture systems, have been used to identify the positions of electrodes relative to the patient’s body. The use of photogrammetry, visual odometry, and stereoscopic approaches was already considered more than 15 years years ago [
15,
16]. The Microsoft Kinect 3D depth-sensing camera (3D DS) was one of the first compact and affordable devices. Nowadays, modern coded light and stereo vision-based models are portable and lightweight enough to be easily attached to or even integrated within a standard tablet computer.
In the past few decades, 3D DS cameras have mainly been used in EEG-based studies to locate EEG sensors on the patient’s skull [
12,
14,
17]. All of them use the recorded EEG signals to localize brain activity or identify the focus of a seizure within the cortex. In contrast, very few studies report the use of 3D DS cameras to locate ECG sensors on the chest or even the whole torso [
18,
19,
20]. One reason for this may be that the skull is a rigid structure that does not change its shape when the subject moves during the recording. In contrast, when recording the sensor position on the torso, the patient needs to maintain a specific posture. The instructions provided to the patient on how to achieve and maintain this posture are integral to the entire recording procedure.
In the present work, the positions of 64 ECG electrodes mounted on the torso are recorded using 3D DS camera readings only.
Section 2 encompasses descriptions of the overall structure of the developed 3D DS camera-based system, method, and algorithm used for the real-time recording of the individual 3D views of the torso (
Section 2.2); the postprocessing steps necessary for extracting the electrode positions (
Section 2.3); and the recording protocol used and the instructions provided to each subject participating in the clinical testing (
Section 2.4). In
Section 3, the results obtained from the five subjects are presented, and in
Section 4, these results are discussed.
2. Materials and Methods
The 3D depth-sensing (3D DS) camera-based measurement of electrode positions can be divided into four main steps: (i) selecting the appropriate 3D DS camera, (ii) defining an appropriate measurement protocol, (iii) recording the 3D surfaces in real time, and (iv) extracting the electrode center points.
The most important component for recording the electrode positions is the 3D camera. It can be characterized by various parameters such as the closest distance
and the vertical
and horizontal
fields of view (FOV). These parameters define the volume in front of the camera in which objects must be placed to be accurately captured by the depth sensor. Based on these considerations, the Intel Realsense SR300 3D DS camera [
21] was selected. Descriptions of the exact selection criteria that led to this decision can be found in
Section 2.1.
The human torso represents a flexible object that offers several degrees of freedom for movement and deformation in contrast to the rather rigid skull. The position of each ECG electrode perceived by the 3D DS camera and its relative position to the other electrodes is directly affected by the movements of the patient’s body. Therefore, it is essential to define an appropriate recording protocol before the first 3D data set is recorded. As large displacements may prevent the successful extraction of the electrode positions, the patient is required to actively maintain the same posture throughout the recording procedure. Details on how this active engagement of the patient can be achieved are described in
Section 2.4. For the remaining steps, (iii) real-time recording and (iv) offline processing,
Figure 1 provides an overview of the necessary sub-steps and their interdependence:
A 3D DS camera combines a depth sensor and an RGB color sensor in a single device. These two sensors simultaneously record an RGB color image and a 16-bit depth image . The latter image encodes the distance d between the camera and the objects located in front of the camera.
The developed real-time recording system is intended for use in diverse clinical settings such as examination rooms in outpatient clinics or local cardiology practitioner clinics. The lighting conditions encountered depend on the pointing direction of the camera and the number of light sources, as well as their brightness and color hue. In order to properly handle these conditions, the white-balancing settings, exposure time
, and overall gain
of the color sensor are continuously adjusted in real time. Automatic white balancing (AWB), which is described in detail in
Section 2.2.1, uses
to estimate the color temperature
of the dominant light source.
At the same time, a binary mask
is generated from the depth image
. This
splits
into foreground pixels representing the torso surface and objects in the background (
Section 2.2.3).
is used to generate a 3D mesh
S of the imaged torso surface (
Section 2.2.4) and tune the exposure time
and global gain setting
of the color sensor. This is achieved by combining
with the brightness information
I of the color image
obtained during the AWB step (
Section 2.2.2). The mask
is also used to outline the patient’s contours on the real-time preview screen, along with various system parameters.
When the trigger is pressed, the triangulation component (
Figure 1) generates a 3D surface mesh
S, which is stored along with the corresponding texture information
of the torso created from the RGB color image
.
In the offline processing step, a pairwise iterative closest-point (ICP) algorithm is used to align the recorded surfaces
S with each other. The resulting transformation matrices
ℜ are used to extract the 3D positions from the color-corrected texture images
, which have been stored alongside each
S (
Section 2.3.2). In order to facilitate the steps necessary to identify the color markers attached to the electrodes, an additional color-correction step, which is described in
Section 2.3.1, is conducted. The aim of this step is to ensure that the patient’s skin color and marker colors are accurately represented across all the recorded texture images
. To achieve this, the
are split into a chromacity image
and the corresponding intensity image
I. Both are used to identify the red and blue pixels and related 3D points corresponding to each electrode marker. Details on how this is achieved can be found in
Section 2.3.3.
The centers of these markers are coaligned with the centers of the electrode clips and patches. Their positions on the surface are computed by fitting a planar model (
Section 2.3.4) to the extracted red and blue points. In the final labeling step (
Section 2.3.5), the electrode positions are assigned to the corresponding ECG signals recorded from the patient’s torso.
The colors of the markers vary depending on the position and orientation of the electrode clip relative to the torso and 3D DS camera. Therefore, a dedicated calibration procedure is utilized, which is outlined in
Section 2.3.6, to determine the ranges of the red and blue color values that represent the electrode markers.
2.1. Selecting the Camera
The selected Intel Realsense SR300 3D DS camera [
21] is used in narrow or crowded places such as examination rooms in outpatient clinics and cardiology practitioner clinics. In these places, the patient is typically seated on an examination bed or chair placed close to the wall. Consequently, the closest distance
relative to the depth sensor at which objects may be placed has to be shorter than the shortest horizontal distance
of the patient’s torso to any surrounding obstacles such as walls or furniture. The horizontal
and vertical
FOVs determine how tall or wide the closest object can be to be fully captured in its height and width. The minimum required values for
and
can be approximated based on the patient’s approximated
and the 3D DS camera’s
using the following relationships:
According to the datasheet [
21] the depth sensor can capture objects located at distances between 20 cm and 150 cm from the camera. This range is more than sufficient to record the surface of the torso. The depth information of each object is captured using an infrared sensor in combination with a near-infrared projector [
21,
22].
The depth images
are recorded in 4:3 image format, covering a horizontal FOV of 69 degrees and a vertical FOV of 54 degrees at a depth resolution of less than 1 mm. The color sensor of the camera generates the RGB images
in 16:9 format. Its horizontal FOV of 68 degrees is sufficiently well-paired with the horizontal FOV of the depth sensor. With a
of 41 degrees, it covers only
of the depth sensor in height. This results in a lack of color information for the pixels close to the top and bottom edges of the depth image
, which was considered when outlining the measurement protocol in
Section 2.4.
2.2. Real-Time Recording
2.2.1. Automatic White Balancing
The color sensor used by the Intel Realsense 3D DS camera offers the possibility to manually tune the color gains
,
,
indirectly by adjusting their color temperature parameter
. This was used to implement a custom AWB component (
Figure 1), along with the algorithm proposed in [
23], which can handle these varying conditions. After applying a lookup table
based on linearization (gamma decompression) and normalization to the interval
of the red
, green
, and blue
color channels, the resulting linear RGB image
is converted into an RGB chromacity
image and a linear grayscale image
I that encodes the brightness of each pixel.
From
, all pixels
are selected that encode shades of gray. The red
r, green
g, and blue
b chromacity values of these pixels are located within a small area around the neutral color gamut point, which has a color temperature of 5500 K, as shown in
Figure 2. The basic assumption is that these pixels most likely correspond to object surfaces of a neutral gray color. Consequently, a reddish-colored taint in these pixels must be caused by a low
K value of the predominant illumination, and a bluish cast most likely results from a light source with a large
K. Overexposed pixels are excluded, as their color most likely results from the saturation of at least one of the three color channels and thus does not properly represent the skin color of the patient or the color of the illuminant. Likewise, underexposed pixels are not considered, as their color is most likely caused by camera noise rather than the light reflected by the imaged object.
For adjusting the color temperature setting
of the 3D DS camera, only pixels
that are located within a small area surrounding the neutral color gamut point are selected, which is, according to Cohen [
23], defined by the chromacity values
,
, and
. This area encloses all pixels that are located within the following two ellipses centered at the color gamut point:
Their primary and secondary axes are defined by the standard deviations for the red
, green
, and blue
chromacity values with respect to the neutral color gamut point, which was determined in [
23]. The maximum intensity encountered is
.
The lower
and upper
exposure limits, as defined for each channel in [
23], are linearized to
and
before applying them to the overall linear intensity values
I.
To match
with the color temperature
K of the light source, the overall color gain
of the camera is estimated. The following model is used to simulate how the camera adjusts the gain
of its red and blue
channels when
is updated.
Neither the lower and upper limits for and , nor the color temperature corresponding to equal gain values , are documented for up-to-date 3D DS cameras. It is assumed that corresponds to the center color temperature between the minimum and maximum values of the color sensor. On startup, is initialized to . For the recorded color images , the corresponding is estimated from the previous value of and a scaler reflecting the relative change of between two consecutive . The color sensor of the used camera has a rolling shutter. Therefore, color images are only considered for estimating the scaler and after the next exposure time interval has elapsed.
The goal is to minimize the distance between the average
, green
, and blue
chromacities of the selected pixels and the
,
,
of the color gamut point that corresponds to a color temperature of
. To achieve this,
,
, and
are multiplied by the unknown intensity
to obtain the corresponding mean red
, green
, and blue
color values. These values are scaled by
using (
5). After scaling, the updated
,
, and
are computed using (
2).
It is obvious that the unknown intensity
does not have any impact on the result. It can be omitted from (
6) and
. Consequently,
can be computed from
,
,
,
,
, and
directly.
In
Figure 2, it can be observed that the curve along which the color gamut point moves can be approximated for color temperatures
by the line connecting the red corner
and the midpoint
between the blue
and green corners
of the chromacity space. For color temperatures
, the curve can be approximated by the line connecting the blue corner
with the midpoint
between the red
and green
corners, respectively. The two midpoints
and
correspond to the yellow
and cyan
chromacities, respectively. Based on the ratio
, the average chromacity value
of the green channel scaled by
can be expressed. The resulting expression is inserted into the quadratic Equation (
8) obtained from the ratio
:
Solving (
8) with respect to
yields
Along with
, the actual error
E between the neutral illumination color gamuts
,
,
and
,
, and
; the expected error
after scaling
and
by
; and the updated value
are computed using (
5):
Based on these equations, the
setting of the 3D DS camera is updated
if
. During testing, it was found that numerical inaccuracies can prevent the computation of the appropriate estimates for the color temperature
K of the predominant illuminant. Therefore, a numerically stable test is used instead to determine whether
has to be updated or its current value can be kept.
2.2.2. Patient-Locked Auto-Exposure
In addition to the overall color appearance, the light sources that are present also affect the overall light intensity
I, which among others, can vary depending on the viewing direction of the 3D DS camera. For example, in the case shown in
Figure 3a, the camera is pointing toward a window. In
Figure 3b, the camera is pointing in the opposite direction toward the door.
In order to maintain a constant illumination intensity
I of the patient’s torso, independent of the viewing direction and the overall brightness of all present light sources, the histogram-based auto-exposure AE algorithm proposed in [
24] was adopted.
This algorithm is implemented in the exposure component (
Figure 1). It considers only the pixels in
that correspond to the patient’s torso. These pixels are selected by segmenting the depth image
D recorded by the 3D DS camera into a foreground object (the patient) and the remaining background using the approach outlined in
Section 2.2.3. The binary mask
obtained in this segmentation step is mapped to the color image
using the texture coordinates
computed from the depth image
by the camera control library. All brightness values
of all pixels covered by the mapped mask
are considered for adjusting
. Any other pixels and pixels that are over- or underexposed according to Equation (
4) are discarded.
The algorithm proposed by Chen and Li [
24] uses the histogram of the gamma-compressed grayscale image
computed from
. In order to avoid the computational burden required by an explicit conversion between the linear illumination image
I and
, the histogram
is directly computed from the linearized illumination values
of the selected pixels. This is accomplished by maintaining a lookup table that lists the linearized bin boundary values
corresponding to the uniform boundaries
of the grayscale histogram
. The histogram
can then be generated for all considered
using a left bisection search to scan this lookup table, which is far less computationally demanding. A further reduction can be achieved by precomputing the differences
and
for each bin, which are used to calculate the skewness
of
.
To compute the values of the exposure time
and overall gain
to be set on the camera, the overall exposure parameter
is used.
The parameter represents the size of one step in milliseconds, represents the number of steps to take if , and represents the optimal exposure time for each frame. The value of depends on the actual step size in ms offered by the 3D DS camera.
2.2.3. Depth Segmentation
The binary mask
is created from the 16-bit depth images
recorded by the 3D DS camera. It splits the image into the patient and any surrounding objects, obstacles, and relevant edges. This implementation was inspired by the Canny edge detection algorithm proposed in [
25]. The algorithm uses two thresholds to find the edges in an image
based on the gradient
of its corresponding grayscale image
. Pixels that have a gradient value
that exceeds the upper limit are considered to be part of an edge. Pixels with a value of
between the two limits are only included in an edge if they are adjacent to an already identified edge pixel. To improve the obtained set of edges and reduce the number of edges caused by noise, the grayscale
is smoothed using a Gaussian filter.
This approach was adopted for processing depth images that contain pixels for which no valid depth value is available. The computation of the depth value gradient and one of the corresponding Gaussian filter weights are computationally too demanding to be computed in real time. Therefore, the depth gradient values of D are rounded to the closest 16-bit integer value . The resulting reduced number of and corresponding distinct weights are stored in a precomputed weights table instead of directly computing on every iteration for each pixel. This avoids the computationally demanding operations of computing and in real time. A companion table with squared boundary values between the individual ensures that the for a pixel of D can be generated through a fast left bisection search. Pixels represent objects without a defined depth, and their values are copied to the smoothed depth image image without any changes.
The smoothed
is filtered using an octagonal Laplace kernel to find the initial set of edge pixels
,
An octagonal kernel has the advantage that all distances between the eight-connected neighbor pixels and the central pixel are of equal length.
All pixels
d that exhibit a sign change between opposing neighbor pixels
on the Laplacian image
are included in the initial set of edge points
. Pixels
that have at least one neighbor
with an undefined depth are considered primary edge pixels
. Their actual
values are computed using the following approach:
All
where
are marked
, whereas any other
are only considered if the Canny rule for minor edge pixels
holds. This rule has been modified for use on depth images
D as follows:
The upper Canny limit is set to 1.2 cm and the minor limit is set to 0.35 cm.
A binary depth mask is created from all pixels in D of a known depth. Pixels located at any of the edges are excluded from . The resulting is split into 9 segments . The pixels within the central are labeled with respect to the different objects and components they represent. The labeled four-connected components are sorted by size. The largest that touches the segment boundary is extended to all other segments using the flood-fill method, starting from the center of mass of . At the end of this step, all adjacent edge pixels are appended to the extended representation of .
As the depth values at the boundaries of
can largely vary, the following approach is used to remove any unrelated outliers. This approach is based on the observation that the boundaries of the patient’s torso are well-separated from the background along the vertical direction and above the head.
The values
and
correspond to the smallest and largest depth values encountered for the mask pixels
within each row of
, and
,
,
, and
represent their mean and standard deviations. Any pixel
for which the condition in (
18) does not hold is removed from
. In the case that either the number of pixels of
is less than 200 or no appropriate values for
or
could be found, the current
is discarded and the search for a suitable
representing the patient is attempted with the next larger
. If no suitable
is left, segmentation is aborted and real-time processing continues with the next set of depth and color image frames recorded by the 3D DS camera.
2.2.4. Surface Mesh Generation
The final surface mesh is generated by converting the depth image D into a corresponding point cloud P. Therein, each point corresponds to a specific pixel in D. In the case of pixels without a defined depth value, the origin point is assigned. The unique correspondence between any and its corresponding allows creating S by mapping a pre-triangulated grid G to P. Any triangle T that includes at least one for which is dropped from G.
Before S is stored on disk using the .obj format, along with and the color temperature setting it was recorded with, degenerated and occluded triangles that do not correspond to a valid surface patch are removed. The filtering of is facilitated by the fact that 3D DS cameras, especially those that can capture objects located a short distance from the camera, use a dedicated RGB color sensor to record . This sensor is typically attached to the left or right side of the depth sensor system and thus views the imaged object from a slightly different angle. This difference in viewing angle and FOV between the depth and the color sensor is sufficiently large to identify triangles that do not represent a part of the object’s real surface. This small difference in viewing angle causes the surface normal to flip its direction between the representation of in the depth image D and in . This flip is not plausible as it would mean that the color sensor is capturing the back side of , whereas the depth sensor captures its front side. This is prevented by the fact that both sensors are mounted on the same support. The following approach exploits this fact by identifying triangles where the sign, and thus the direction, of the surface normal vector appears flipped in compared to D.
The pre-triangulated grid G is initialized such that the normal vector of each triangle T on S points toward the camera and is oriented in the negative viewing direction of the camera. For every valid T of initial surface mesh S, the normal vector of its representation in must also point in the direction. Triangles where the signs of and are opposite, indicated by , suggest that triangle likely does not represent a valid part of S and should be removed.
In addition, triangles with a degenerated representation in are removed. This includes triangles with an area pixels, as well as cases where has a shortest edge of less than half a pixel and triangles that extend beyond the top and bottom corners of .
Further, skinny triangles
are discarded if they enclose at least one angle
between any two edges
,
, and
that is smaller than 13 degrees, and if the lengths
and
of its longest two edges
and
conform to the following conditions:
To compute the average length
and standard deviation
, only triangles
are considered that are formed by any three K-nearest neighbors
located within a radius of
around the tip vertex
of
and the midpoint of its shortest edge
. Additionally, any
that has to be discarded according to (
21) will result in the deletion of all adjacent
connected to its
or
. In addition, in the case of any
satisfying (22), only the
adjacent to
is removed. Finally, duplicate
encoding the same point and
not referenced by any triangle are removed from the surface
S, along with all small disconnected surface patches
.
The surface S is stored on disk in .obj format, along with the corresponding texture information . Its triangle and vertex normals are recomputed, and a transformation ℜ is applied to all vertices and normals. The latter ensures that the -axis points in the direction of the patient’s head and the positive -axis extends from the left to the right side of the torso. The origin point is selected such that it is located on the central viewing axis of the camera. To compute its y-component, the point cloud is divided into 3 sections along the vertical direction, roughly representing the chest, belly, and hips of the patient from top to bottom. The points within the top third are further split into 5 subgroups from right to left along the x-axis. For the rightmost and leftmost groups, the median coordinates and are computed. Based on these values, the final y-coordinate of the origin point is computed as .
This ensures that all surfaces are located close to each other and that they partially overlap. At the same time, the actual relative shift between the surfaces and the angle at which the camera views the surface is retained as much as possible. This is crucial for the registration process described in
Section 2.3.2.
2.3. Offline Processing
The electrode positions are computed using a set of at least 14 recordings of the torso surface, covering a minimum angle of approximately
degrees in the horizontal plane. The necessary steps, depicted in
Figure 1, are presented in the following subsections. These steps include the pairwise alignment and registration of the recorded surfaces
S, as described in
Section 2.3.2; the extraction of the points
representing the colored electrode markers, as described in
Section 2.3.3; and the fitting of a model of the marker to identify its central point, as described in
Section 2.3.4. In the final step, a unique label is attached to each position, which uniquely links the individual ECG signals and the 3D position of the corresponding electrode.
2.3.1. Color Correction
The color sensor of the Intel Realsense SR300 camera (Intel corporation, Santa Clara, CA, USA) offers only a limited range (between
and
) within which the color temperature parameter
can be tuned using the algorithm discussed in
Section 2.2.1. This range is optimized for indoor use [
21,
22], where typical light sources include incandescent tungsten lamps (
), fluorescent lights (
), and standardized CIE sources such as CIE55 (
) or CIE65 (
).
The space limitations encountered in clinical settings, for example, outpatient and cardiology practitioner clinics, result in more challenging illumination conditions that can vary significantly depending on factors such as the patient’s seating position or the camera’s direction. Specifically, individual objects and parts of the room may be shaded by other objects, for example, the electrodes on the patient’s back. Shaded areas are characterized by color temperature values
, which are significantly larger than the
upper limit assumed by the color sensor. Examples of this situation are shown in
Figure 4a,c.
An additional color-correction process is applied to the recorded texture images
and the 3D surfaces. A virtual camera is used to simulate the recording of
with a different
setting than the actual one. This virtual camera offers an AWB range between
and
. It uses the model introduced in
Section 2.2.1 to adjust the gain of its red
and blue
color channels.
The virtual camera internally stores a linearized and normalized representation
of
. This representation corresponds to an image recorded with an equal gain
and
.
Its white-balancing parameter is initialized to the color temperature at which was recorded by the color sensor of the 3D DS camera.
After initialization, the color-correction approach described in
Section 2.2.1 is used to adjust the
of the virtual camera until a suitable value for
is found. If
jitters around its ideal value for at least 20 repetitions, the color correction stops when the following condition is met:
In this case,
is set to the mean value
of the last 3 minimum updates for which the difference between consecutive
values is less than 10. With each update of
, a new version of
is created by multiplying the red color values
of
by the updated
, multiplying the blue values by
, and performing a left bisection search on the lookup table
established in
Section 2.2.1. Pixels that are overexposed according to (4) are not modified. Pixels that appear overexposed after scaling and exceed a maximum value of 1 in at least one channel are assumed to be fully saturated in all three channels, which are each set to the maximum value. Pixels that appear underexposed, with at least one channel having a value less than
, are assumed to be unexposed in all channels. Therefore, in such cases, all three channels of the pixel are set to 1 when fully saturated and 0 when unexposed. Additionally, all channels are clipped to the maximum possible value of 1 if necessary. The color-optimized version of
(
Figure 4b,d) is then used to extract the 3D points of the electrode markers, as described in
Section 2.3.3.
2.3.2. Surface Registration
To align the surfaces, a point-to-plane algorithm was chosen. This kind of ICP algorithm minimizes the distances
between corresponding
and
along the direction of the surface normals
of
.
A precise alignment between
and
across all surface pairs is achieved when Equation (
25) is also minimal in the reverse case with
and
swapped. The following simple symmetric point-to-plane approach is used by the registration component (
Figure 1) to align the surfaces. It was chosen in favor of other symmetric point-to-plane algorithms such as [
26], as it can be directly implemented using unidirectional ICP functions from open3D library [
27]. In the first step, the forward transformation matrix
is computed for the set of corresponding points
by applying (
25). In the second step, the reverse transformation
is computed for the points
corresponding to the reversed setup. The initial
is initialized as
. The set
is selected from a subset of
that is located within the maximum correspondence distance
of
. The same selection criterion is used for the reverse set
with respect to any
. In the final step, the optimal transformation
ℜ and the new correspondence distance
are selected from
,
,
, and
using the following criteria:
The surfaces
S recorded using the approach described in
Section 2.2 are aligned such that they more or less share the same space, apart from the small rotation
along the horizontal direction and the relative vertical movement
between the cameras. No information about their orientation in space or how much each pair overlaps is recorded. For obtaining sufficiently precise positions of the electrodes, the optimal correspondence distance
between any
should be
. Therefore, the symmetric ICP registration is repeated for each pair in multiple runs. The results obtained for
ℜ and
in the previous run are used to initialize
and
in the next run. If the condition in (
26) for updating
ℜ and
fails, one last run is attempted with
if
and
holds. For the first optimization run,
is initialized to roughly reflect the relative rotation about the z-axis between two recorded surfaces
and
and its relative shift
along the z-axis. The following approach is used to estimate the relative rotation angle
between
and
:
The right
,
and left
,
median points define the horizontal directions of the sagittal planes with respect to
and
. They are computed using the same approach described in
Section 2.2.4 to define the final position of the origin along the y-coordinate.
Suitable estimates for , , and are essential for achieving a sufficiently precise alignment of and . When testing the implementation of the symmetric ICP, it was empirically found that the values for , in particular, varied significantly depending on the relative distance and angle between two consecutive surfaces. Initially, constant values were assigned to and . However, these values resulted in an insufficient alignment between the surfaces on average. Specifically, the alignment of the surfaces at the left side where the front and back sides of the torso meet was rather challenging, and in some cases, not possible at all.
In order to improve the results and ensure a proper alignment between the surfaces, the following approach is used to determine suitable estimates of
,
, and
for each pair of
and
. These estimates are computed based on the distances between the vertices
and
within the volume
, which represents the common region of the axis-aligned bounding boxes
and
encompassing the target surface
and the source surface
. The latter
is obtained by applying an initial transformation
to the source surface
. The transformation
shifts all
such that their center of mass
aligns with the center of mass
of all
. The value for
is obtained by applying (
26) to the distances between the points in the forward correspondence set
and the backward correspondence set
. Both sets are found through a KNN search [
28,
29], which also considers the surface normals
and
in each
and
. This approach has the advantage of considering only
and
as corresponding when their surface normals
and
are closely aligned. From the resulting
and
, any
and
are removed if the deviation between their surface normals
and
exceeds 30 degrees, ensuring that
.
The estimate for is based on the overall of the shortest neighbor distances within all and .
From the final
ℜ of all consecutive pairs of
and
, the global alignment of each
is determined by the cumulative transformation
, starting with identity
for the first surface
. Alternatively, the transformation of the first surface can be initialized by the horizontal camera inclination angle
about the z-axis using (
27). From this, the relative angle between
and the x-axis of the patient’s frontal plane is computed. This already provides a rough alignment of the resulting point cloud of the torso with its frontal plane.
2.3.3. Electrode Marker Extraction
In the current setup, the electrodes are attached to g.LADYbird
TM active electrode clips from g.tec medical engineering GmbH, Schiedlberg, Austria. These clips have a circular head, with its center aligned with the center of the electrode. The clip itself is covered with red-colored epoxy to protect the integrated electronics from water and other liquids. The circumference of the head is painted blue to model a circular electrode marker with a blue boundary and a red central disk.
Figure 5 shows an example of this basic setup.
The blue boundary color (see
Figure 5b) is selected such that the electrode marker easily can be detected within the RGB chromacity space representations
of the surface texture images
. The
values are obtained as a byproduct of the white-balancing and light color–temperature correction approaches described in
Section 2.3.1.
Each
is scanned for red
and blue
pixels that are fully described by one of the following two ellipses within the RGB chromacity space.
The values
,
,
,
,
, and
define the red and green coordinates of the center point of the ellipsis and the rotation angle by which each of them is rotated with respect to the red axis of the RGB chromacity space. Their values are determined through the calibration procedure described in
Section 2.3.6. All matching
and
pixels are mapped to their corresponding 3D vertices
and
on the torso surface
S. This mapping is accomplished by computing the barycentric coordinates of each
and
within the representation of the surface triangle
T in
.
The resulting marker point cloud formed by all and is filtered with respect to and , which likely correspond to a valid electrode marker, as defined by the color of the clip head. This is achieved by a radius-based KNN search for at least one neighbor of the opposite color. The radius is set to the radius of the clip head for all and the width of the blue boundary ring for all . If the neighborhood of radius does not contain any points of the opposite color, is removed from .
The filtered
is split into individual clusters of
, representing the individual electrode clips. This is accomplished by applying the HDBSCAN algorithm [
30]. The results are more robust compared to the basic DBSCAN algorithm [
31], especially in the presence of groups of outliers, for example, generated by a bluish shadow cast on the cables and electrode clips. In addition, a minimum distance
can be defined, and clusters are not split any further. In contrast to the basic DBSCAN [
31] algorithm,
defines a lower boundary limit rather than a strict cutting distance. In other words, less dense clusters with an average density exceeding
are not necessarily forced to split into distinct leaf clusters. The parameters of the minimum cluster size
and the minimum samples
are used to fine-tune and control the extraction of clusters that represent the individual electrode markers, considering the actual number of electrodes
.
In order to simplify the subsequent processing steps, the overall point cloud , as well as PM, is realigned such that the frontal plane of the torso is in line with the x-z plane of the coordinate system. This is achieved by once again splitting into chest, belly, and hip sections. The points of the chest section are further split along the x-axis into three parts, representing the right shoulder, neck, and left shoulder. The final transformation ℜ is computed by aligning the vector between the median points of the left and right shoulders to the x-axis of the frontal plane.
2.3.4. Fitting Marker Model
The red points
and blue points
within each cluster are fitted to a planar marker model consisting of a red disk enclosed within a blue ring. Before fitting, all
and
are projected onto the plane
, which is parallel to all
.
This ensures that all
are located on
, which is defined by the predominant surface normal vector direction
within all surface normal vectors
and their center of mass
.
The shifted
are then fitted to the following model, which is based on the distances
between the individual
and the electrode center
on
.
represents the relative distances of the blue points
from the boundary of the enclosed red disk with a radius
. From all the red points
within a cluster, the model selects those that are within a radius
from the current
. The model in (
34) is optimized with respect to
using the L-BFGS-B algorithm provided by the SciPy minimize function. This numerically robust algorithm was selected because it can achieve satisfactory optimization results for least-squares optimization problems. Its implementation details can be found in the SciPy manual and [
32,
33]. For all clusters for which an
could be found,
is stored, along with
. Any remaining clusters for which no appropriate
could be found are not further considered.
In some cases, it is possible that a clip is split into two smaller clusters. For example, if an electrode array is carelessly attached to the torso, electrode leads can shadow the relevant parts of the clip head. This might be the case when the following condition holds with respect to the counts of
or
:
Two neighboring clusters are considered pieces of the same marker only if at least 10 closest neighbors of any in the first cluster are closest to at least 85 distinct in the other cluster. The cylindrical model is fitted to the largest piece of the marker only. This prevents nearby image artifacts in the from causing misalignment of the affected electrode marker and distracting the center point from its true location.
The identified cluster centers
are triangulated using the ball-pivoting method [
34,
35] implemented in the open3D library. The radii
and
for two distinct balls are derived from the average distance
between each
and its 9 closest neighbors. Outliers are removed if
before computing
. For a final check to determine if the
of neighboring clusters resemble two pieces of the same marker, the surface connectivity between individual
is computed. The marker attached to the largest group, where two
for which
holds, is retained, whereas the other is removed. Ball-pivoting triangulation and the removal of small clip pieces are repeated until no more nearby groups, represented by distinct
, are found. The remaining
that are included in the resulting triangular surfaces represent the frontal and dorsal patches of the electrode grid layout proposed in [
36]. Clusters that are too far away to be included in the mesh by the ball-pivoting process are considered single electrodes, similar to those used, for example, in Einthoven I, II, and III.
In the final step, the triangular meshes of the frontal and dorsal electrode patches are normalized. In this process, any vertical edge that intersects the horizontal line between two common neighbors of its endpoints is swapped with the edge that connects the common neighbors.
2.3.5. Label Assignment
Starting from the point with the smallest y-coordinate, the triangulation of the frontal patch is scanned line by line. All electrodes that can be connected along consecutive horizontal edges are joined into one row of the frontal patch [
36] and stored in right-to-left order. The rows are ordered from bottom to top. After all rows of the frontal patch have been collected, the same approach is applied to collect the electrodes of the dorsal patch. Again, the electrodes are stored in right-to-left and bottom-to-top order.
On the frontal patch, the number labels for each channel are assigned in ascending order from bottom right to top left. The dorsal assignment starts at the top right and ends at the bottom left. The remaining electrode points that have not been included within the triangulation of the frontal and dorsal patches either correspond to the three Einthoven leads I, II, and III if they are located on the arms close to the front of the left and right shoulders and on the left hip. The electrode array includes two additional electrodes that are placed frontal and dorsal close to the right side of the torso.
2.3.6. Calibration
The proposed method to identify the color electrode markers requires proper calibration of the mean values
,
,
, and
; the standard deviations
,
,
, and
; and the rotation angles
and
of the ellipses in Equations (
28) and (
29). In the first step, the color-corrected chromacity representation
of the texture images
obtained as a byproduct in
Section 2.3.1) is roughly segmented. The pixels representing a blue or red pixel of the clips are initialized with the following values:
,
,
,
,
,
,
,
, and
.
These values were empirically identified from the chromacity space triangle of the 3D DS camera’s color sensor, generated from the pixels of all
. The resulting raw pixel masks
are stored along with the corresponding
obtained from the data sets of at least three patients. In addition, a binary mask
selects pixels of
that are properly exposed according to (
4). For storing the
on disk, the 16-bit PNG format is used. They are loaded along with the corresponding
in an image processing program such as Gimp
TM or Adobe Photoshop
TM for manual segmentation of the clips.
The resulting
, created by manually removing any pixel that does not represent a clip or electrode marker from
, is used in combination with
to extract the pixels that are part of the electrode clips and markers visible on each 16-bit
image. Any pixel that does not correspond to a clip, is over- or underexposed, or meets the condition in (
3) is not further considered in the following calibration steps. From all other pixel values, a 2D heat map
with 256 bins for red
r and green
g chromacity values each is generated and median-filtered using a 7 by 7 neighborhood.
The red and blue color shades of the electrode markers appear as distinct, Gaussian-shaped peaks
on
. They (1, 2) are clearly visible as bright spots on the heat map, as shown in
Figure 6. A Gaussian mixture model [
37,
38] is used to extract the individual clusters
that represent each peak. Each peak is described as a 2D Gaussian distribution, which can be characterized by its center point or centroid and the standard deviations along each direction with respect to this center. By fitting the individual Gaussian models to the heat map
, the actual position, orientation, and area covered by each peak can be found. To compute the initial positions of the cluster centroids, the heat map is binarized and labeled. In this process, any 4-connected set of at least 5 bins is considered a peak if all bin counts
conform to the following condition:
The cluster
with the highest mean
red component is used to compute
,
, and
. The values for
,
, and
are derived from
for which
holds.
,
,
, and
represent the first and second eigenvalues
of the covariance matrices
,
of
and
, and
and
are the corresponding initial eigenvectors. These values are stored along with the centroids of
and
, which define the mean values
,
,
, and
on disk that are to be used in the extraction step described in
Section 2.3.3.
The remaining clusters 3 and 4 are not further considered as they correspond to the color highlights on the clips (3) or are caused by inappropriately chosen parameters affecting the conversion of the raw sensor signals to the RGB color space (4).
2.4. Recording Protocol
The technical approach outlined in
Section 2.2 and
Section 2.3, requires that the patient maintains the same posture throughout the recording. This is only possible if the patient is directly engaged and actively participating in the measurement.
Therefore, prior to the application of the electrodes, the patient is instructed to sit down on a chair. The height of the chair is then adjusted so the patient can comfortably sit upright throughout the recording process. The feet of the patient should rest flat on the floor and the knees should be bent by no more than 90 degrees. If the chair cannot be adjusted in height, an alternative solution is to stack multiple chairs to increase the patient’s comfort and encourage them to straighten their back. To ensure optimal recordings without any obstacles, the chair should not have armrests or a backrest and be placed at least 1 meter from any furniture or other objects that can cause shadows. This ensures that the FOV of the 3D DS camera can be optimally used and the operator is able to capture a surface at least every 20 degrees.
After the electrodes have been attached to the torso, the patient is instructed to place the hands on the thighs. The fingers should point inward and the thumbs should point straight toward the hips. The optimal position of the hands is a thumb length before the hips. While the electrode positions are recorded, the patient is instructed to maintain a straight and upright back. Most patients are able to easily maintain this position by slightly straightening their elbows (about 120 degrees between the upper and lower arm). This helps them to move their chest and shoulders into a position that is as upright as possible. This has the effect that the patient is forced into an isometric posture, which can easily be maintained while the electrode positions are recorded. In addition, this position facilitates the recording of electrodes placed under the left axle, for example, the Wilson electrodes and .
3. Results
In the following section, the results are presented.
The narrow vertical field of view of the color sensor is one of the main reasons why the 3D images of the torso are recorded in portrait mode. In a typical clinical setting, where space is limited, it is likely that the patient is seated close to furniture or walls. For proper recording of the 3D images, a space of at least 2.5 m by 2.5 m is required. This includes a standard chair without armrests or a backrest, with a diameter of 50 cm, that can provide at least 1 m of space on all four sides of the patient for the operator to move around while recording the images. The remaining space between the patient, the operator, and any surrounding furniture or walls should be 50 cm or less. Both sensors of the camera must be able to properly capture the dorsal part of the patient’s torso at distances between 20 cm and 50 cm. This can only be achieved by cameras with FOV angles conforming to (
1) such as the Intel Realsense cameras, which have wide viewing angles of ≈70 degrees for both the depth and color sensors when used in portrait recording mode. This is especially important for capturing the dorsal views of the torso.
The color sensor has a viewing ratio of 16:9 between the horizontal and vertical FOVs. This results in a vertical viewing angle of about 40 degrees, which is a lot smaller than the ≈60 degrees of the depth sensor. This can lead to a situation where, for example, around ≈60 columns on the top and bottom of the depth image lack texture information. However, this is acceptable given that consecutive 3D images are recorded in portrait mode with an overlap of about two-thirds, ensuring that the texture images sufficiently overlap.
Thanks to the vertical nature of the patient’s torso, in portrait mode, it is easy to keep the patient centered in the image while moving the camera to the next recording position. As the patient’s torso covers most of the image space, only very few objects and obstacles located behind the patient are captured by the cameras, which can easily be removed before storing the 3D surface images.
Scanning always starts with the right frontal view of the torso and ends at the right dorsal side. If possible, the right lateral side of the torso can be recorded. This is not essential for extracting the electrode positions and can be omitted in standard recording procedures. It is recommended to explicitly record the right lateral torso surface when there is sufficient space to the right of the patient.
The preview image of the torso, shown in the main area (1) of the user interface shown in
Figure 7, is split into a 3-by-3 grid. The center segment of this grid is used as the focus area, representing the central part of the patient’s torso. The contours of the largest object containing the focus segment are highlighted in orange. As the camera points at the patient’s torso, the contours highlight the boundaries of the patient’s torso. The recording of a torso surface segment is initiated by pressing the trigger of the camera. The color of the contour line switches to green and the live preview freezes to indicate that the captured depth and color image have been processed and the 3D surface has been generated and stored. Once the underlying point cloud has been triangulated, occluded and degenerated triangles, as well as detached surface patches, are removed. Then, the contour is updated to mark the parts that will be stored on disk. After the 3D surface information, corresponding texture image, and meta information have been stored, the live preview is started again and the color of the contour reverts to orange. The live preview is updated at a maximum rate of 10 FPS. With the Python-based prototype, update rates between ≈4 FPS and ≈7 FPS can be realistically achieved.
The main preview area (panel 1 in
Figure 7) has the same shape as the depth image. For the parts on the left and right sides that are not captured by the RGB image, the edges identified on the depth image are displayed instead. The outline of the patient’s torso does not extend beyond the edges of the RGB image. In panel 2 of the preview screen (
Figure 7), several recording and camera parameters, such as the frame rate in FPS, exposure time
in ms, etc., are shown, along with the intermediate parameters computed for automatic-exposure control and color correction. In panel 3, the full set of edges identified on the current depth image is displayed. The two vertical lines delineate the area of the depth image that is covered by the color image.
The prototype for real-time recording of the 3D torso surface patches, as well as for postprocessing and calibration, was implemented in Python version 3 using a recent version of NumPy and SciPy [
39]. The librealsens version 2 library [
40] was used to control the acquisition, convert the depth values into a point cloud, and compute the corresponding texture uvs map for the RGB image. The OpenCV library [
41] was used to generate the preview display, and the generation and cleaning of the 3D meshes were accomplished using the open3D library [
27]. The most computationally demanding components, the depth-edge detection (
Section 2.2.3), automatic white balancing (
Section 2.2.1), and patient-locked auto-exposure control (
Section 2.2.2), were converted into Python-C modules using Cython [
42].
In total, five male subjects between 38 and 70 years of age participated in the present study. Each subject was seated on a chair or examination bed, depending on the available space. After applying the ECG electrodes to the chest and back, the subjects were instructed to maintain the posture described in
Section 2.4. The measurement of the torso surface and the recording of a 30 -min long ECG with 67 channels took about 30 min to 45 min. After each measurement, the data were analyzed and the prototype improved accordingly.
The data set recorded from the first subject turned out to be quite limited and, therefore, is not included in the presented results, as it was affected by the automatic white balancing and exposure control of the color sensor, which could not cope well with the diverse and complex lighting conditions. Further, the 3D points recorded by the depth sensor were directly transformed to match the color image captured by the color sensor. This posed several challenges related to occluded surface parts causing undesirable distortions and the introduction of noncausal surfaces. Starting with the data for the second subject, the direct mapping was replaced with the texture mapping approach, which yielded better results and allowed for the implementation of the algorithms for occlusion management and the removal of noncausal triangles, as described in
Section 2.2.4.
For each patient, 12 to 15 views were recorded. Each of the views contained a 3D surface described by ≈170,000 vertices and ≈300,000 triangles. As shown in
Table 1, between 7 and 21 iterations of the symmetric ICP algorithm were necessary to align the surfaces. The maximum correspondence distance between the points of the surface pairs was reduced in every iteration step, starting from 7 cm–12 cm and reaching 0.7 mm–1.2 mm. More iterations were necessary to align the surfaces joining the frontal and dorsal views on the left side of the torso. In cases where the available space around the subject was insufficient, the number of iterations required to align the surfaces was increased. In the most challenging scenario, the proper alignment of the surfaces was not possible at all. This situation was encountered in the data set recorded from subject 5, where part of the torso surface on the left side was obscured by the backrest of the chair. Among other challenges, this required an increased number of 21 iterations to align the leftmost frontal and dorsal views.
Across all subjects, a final root mean square error between consecutive surfaces of 0.7 mm was achieved. Using the proposed approach, 12 to 15 surfaces per patient were registered within 13 min. As shown in
Table 2, the extraction of the electrode marker points and the computation and labeling of the electrode positions were completed after another ≈8 min.
The recording sessions were part of a larger clinical pilot study investigating the prognostic value of index arrhythmias with respect to the outcome of pulmonary vein ablation, for which the participants provided informed consent. Apart from the 3D camera and ECG recordings, this study was based on clinical data recorded during the patient’s clinical treatment. Therefore, CT recordings and other independent means of recording the electrode positions relative to the torso were included. To assess the accuracy of electrode localization, the electrode positions were backprojected onto the individual views of the torso and marked on the corresponding color images. Examples are shown in
Figure 3,
Figure 4b,c and
Figure 8.
The annotated RGB images were presented to an expert who used the cross-hair tool shown in
Figure 8b to manually adjust the position of each marker. In order to facilitate this task, two markers were used: the green marker indicates the backprojected position of the marker and the red marker corresponds to the manually adjusted position. All positions were checked during this process and if necessary, they were moved to better reflect the perceived center positions on each view. When finished, all positions were reprojected onto a 3D space.
For the set of corrected positions of each electrode, the mean point, as well as the mean distance and standard deviation to this mean point, were computed. The resulting values are shown in
Table 3, along with the mean and standard deviations of the computed electrode positions with respect to the manually determined mean. Both sets of results were influenced by the accuracy of the registration process and the fact that no unique solution exists for the backprojection of the electrode positions onto the individual views. In addition, the mean and standard deviation of the registration errors and the error between the mean and standard deviation of the distances between the individual projections and their mean point are listed.
The corrected electrode positions deviated, on average, by from the mean point, and the computed electrode positions deviated from the mean point by []. This is in accordance with the limitations posed by the backprojection, where the reprojected points deviated from the computed position by , and the ICP registration resulted in an average deviation between corresponding points of . Given the amount of data to be processed per subject, the overall time of 22 min. required to extract and align the electrode positions is quite impressive, considering that only the computations of the asymmetric ICP and the HDBSCAN algorithms are implemented as part of the native open3D library and as Cython scripts, respectively. The rest of the implementation was carried out in Python using NumPy arrays only. In contrast, the expert required between 30 min. and 45 min. to point and place the electrode markers on the 14 views of a single data set.
4. Discussion
The results are promising given the fact that the torso is a far less rigid structure compared to the skull. Further, the limited space conditions and adverse environmental conditions typically found in clinical settings, e.g., outpatient and local practitioner clinics, are quite challenging. This is evident in the results shown in
Table 3 for subjects 4 and 5. In both cases, nearby obstacles such as backrests or furniture limited access to the patient’s left side, resulting in increased positional variations of 2.2 mm
mm and 2.4 mm
mm in relation to the mean of the manually defined electrode positions. This is compared to 1.7 mm
mm and 1.6 mm
mm for subjects 2 and 3, respectively.
These values are still in the range reported for recently proposed approaches for localizing electrodes mounted on the human body. As shown in
Table 4, few studies exist that evaluate the use of 3D DS cameras [
19,
20] and photogrammetry methods [
18] for localizing ECG electrodes on the torso. The achieved results varied between 1.16 mm and 11.8 mm, depending on the metrics and positional references used. The authors of [
20] used the Hausdorff metric to compare the positions obtained from a Microsoft Kinect 3D DS camera to positions found on MRI or CT scans. On average, they achieved a positional error of 11.8 mm, which is an order of magnitude larger than the error between 1.16 mm and 2.5 mm achieved by Schulze et al. [
18], Alioui et al. [
19] and the present study, all of which used the Euclidean metric instead.
The majority of studies proposed methods for the localization of EEG sensors mounted on the scalp. Apart from Homölle and Oostenveld [
8], the achieved average positional errors ranged from 1.5 mm [
12] to 3.26 mm [
14] using various reference measurements, including the mean of manually placed marks [
12,
14] and positional references generated using a magnetic digitizer [
8,
13,
16] such as the Polhemus Fastrak. Comparing the positional error of 9.4 mm achieved by Homölle and Oostenveld [
8] to all other results, it can be assumed that this was mainly caused by unavoidable inaccuracies when taking the magnetic digitizer measurements.
Considering that the positions of ECG electrodes mounted on the torso are directly affected by any movements, the positional error of 2.0 mm achieved in the present study is a clear indication that the active engagement and participation of the patient in the measurement is essential. The instructions on how the patient can easily maintain a posture that facilitates the recording of the electrode positions have a huge impact on the outcome of the measurements. If the instructions are not clearly defined by the measurement protocol, or not properly understood or followed by the patient, the positional error will increase. For example, subject 4 (see
Table 3) changed the position of his arms during the measurement twice. This immediately resulted in an increased positional error of 2.2 mm
mm.
In addition to the limited space, the lighting conditions encountered in the clinical environment, as well as tight schedules, have a direct impact on the average positional error. Varying lighting conditions, including multiple light sources with differing light temperatures, on the other hand, can have a negative impact on photogrammetric approaches and 3D DS camera-based measurements of the torso surface and the electrode positions thereon. Algorithms for automatic white balancing and exposure control have been adopted to improve color constancy across multiple 3D views of the torso and maintain a constant exposure of the torso independent of the viewing direction and angle. In combination with the developed calibration method, this resulted in increased accuracy in identifying those pixels representing the color markers.
Time, in particular, is a very limited resource, which largely limits the routine use of magnetic digitizers within clinical environments. For precise positional measurement, the exact placement of the magnetic probe on each electrode and manual triggering of the measurement are required. An experienced user requires about 15 min. to accomplish this task. Any attempt to reduce this time can only be achieved by the less accurate placement of the probe on each electrode, which can result in increased positional errors of 7.8 mm and higher, as encountered by Clausner et al. [
43].
In general, keeping the required human interactions and number of related errors as low as possible is one key goal for establishing NICE-based tools and procedures in clinical environments. The time required to localize the electrode positions on the human torso, as well as the amount of ionizing radiation the patient is exposed to, are key factors that can either prevent or facilitate a successful uptake. Alternative approaches currently used to obtain the electrode positions include manually placing markers on CT and MRI scans [
9,
12,
19]. and automatically segmenting and pointing a magnetic digitizer probe to each individual electrode [
8,
13,
16]. These approaches require a significant amount of time (about 45 min.) to point to each electrode, which is more than the 15 min. required for magnetic probe-based measurements. The mentioned approaches suffer from an additional bias related to the individual human perception of the electrode and marker shapes, as well as inaccuracies in the way the pointing probes are placed onto the electrode.
In contrast, the proposed 3D DS camera-based approach is not affected by these kinds of errors. When implemented on a tablet computer, the presented approach will enable clinicians to acquire the electrode positions and torso surfaces within 10 min. Therefore, average positional errors of less than 2.5 mm will be feasible even under limited spatial conditions and tight schedules.
Some aspects essential for the successful clinical uptake of the presented approach still have to be addressed. On all color sensors, the raw signals recorded for red, green, and blue channels have to be converted into the RGB color space before they can be used. If the required parameters are not properly calibrated, the resulting images may show a bluish hue that can not be corrected by any white-balancing algorithm. This was the case for subject 5 shown in
Figure 3, and caused the additional peak (4) in the calibration heat map shown in
Figure 6. During the preparation of future studies, it is necessary to establish an appropriate procedure for verifying and optimizing the settings for these parameters before the first measurement and at regular intervals.
Each 3D DS camera data set also provides a point cloud representation of the torso surface. This is used in current studies to build electroanatomical models for electrocardiographic noninvasive imaging methods from clinical cardiac CT slices only. Further applications of the proposed approach are currently being investigated for enhanced electrical impedance tomography.