Sewer Cleaning Robot: A Visually Assisted Cleaning Robot for Sewers

Xiong, Bo; Zhang, Lei; Cai, Zhaoyang

doi:10.3390/app15073426

Open AccessArticle

Sewer Cleaning Robot: A Visually Assisted Cleaning Robot for Sewers

by

Bo Xiong

,

Lei Zhang

and

Zhaoyang Cai

^*

School of Intelligent Science and Technology, Beijing University of Civil Engineering and Architecture, Beijing 102616, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3426; https://doi.org/10.3390/app15073426

Submission received: 2 February 2025 / Revised: 16 March 2025 / Accepted: 19 March 2025 / Published: 21 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

Aiming to solve the problem of clearing obstacles in narrow and complex sewers, this paper introduces a visually assisted Sewer Cleaning Robot (SCR) for cleaning sewers with diameters ranging from 280 to 780 mm. The main work is carried out as follows: (a) A mobile platform is equipped with a pressing mechanism to press against the pipe walls in different diameters. The arm uses high-load linear actuator structures, enhancing load capacity while maintaining stability. (b) A Detection–Localization–Cleaning mode is proposed for cleaning obstacles. The YOLO detection model is used to identify six types of sewer defects. Target defects are then localized using monocular vision based on edge detection within defect bounding boxes. Finally, cutting is performed according to the localized defect positions. The feasibility of SCR in cleaning operations is validated through a series of experiments conducted under simulated pipeline conditions. These experiments evaluate its mobility, visual detection, and localization capabilities, as well as its ability to clear hard obstacles. This paper provides technical reserves for replacing human labor that use vision algorithms to assist in cleaning tasks within sewers.

Keywords:

sewer cleaning; defect detection; monocular vision localization

1. Introduction

During heavy or torrential rains, obstacles in drainage systems can lead to severe urban flooding, resulting in casualties and significant economic losses [1,2,3]. In small- to medium-sized sewers, where space and accessibility are constrained, robotic solutions have become one of the essential approaches to trenchless pipeline rehabilitation.

In recent years, researchers have made significant advancements in the development of pipeline robots, particularly for sewer defect detection [4]. These robots were equipped with advanced technologies, such as high-resolution cameras, ultrasonic sensors, and LiDAR, to identify and classify defects like cracks, root intrusions, and debris [5]. Guo et al. [6] introduced an automated defect detection system for sewer pipelines, using a change detection approach to analyze inspection videos and images. Yin et al. [7] proposed a real-time automated defect detection framework for sewers using the YOLOv3 deep learning algorithm. Fang et al. [8] proposed a lightweight and cost-effective sewer inspection framework for a floating capsule robot, utilizing computer vision techniques such as instance segmentation, real-time localization, and 3D reconstruction to achieve high-precision detection of fine-grained sewer defects. Nguyen et al. [9] presented a compact autonomous mobile robot designed for sewer pipeline inspection, capable of navigating small-diameter pipes using a simple, low-resource control strategy that enables complete network exploration without relying on visual data or high-energy image processing. Systems like CCTV inspection robots [10,11] and pipe-scanning robots [12,13] demonstrated high accuracy in defect recognition, allowing operators to assess pipeline conditions effectively. Although these robots excel at identifying problems, they are limited to inspection tasks and lack the ability to perform cleaning operations. This functional limitation necessitates additional tools or manual intervention for defect removal, increasing cleaning time and complexity.

The modular tooling approach has demonstrated success in industrial robotics. For instance, Mikolajczyk [14] achieved multi-process manufacturing through interchangeable tools on a single robotic platform. However, implementing such systems in sewers faces unique challenges. Many cleaning robots have been developed to address specific operational needs within sewers. For example, some robots were equipped with cutting tools designed for removing tree roots [15,16], while others focused on sediment removal using suction or scraping mechanisms [17,18]. Tugeumwolachot’s CIPbot-2 specializes in grinding operations, effectively cleaning hardened obstacles such as concrete deposits [19]. While these designs are highly effective in fulfilling their specialized cleaning tasks, the absence of visual systems limits their ability to provide real-time feedback on operational targets, making it challenging to accurately identify and locate obstacles such as roots, sediment, or debris during operation. Additionally, several commercially developed sewer robots are equipped with various cutting tools to clear foreign objects from pipelines, and they are fitted with rotatable cameras to provide real-time observation of operational targets [20,21,22]. Sulthana designed an automated robot capable of rapidly cleaning obstacles [23]. Awasthi introduced a robot equipped with an IR camera to relay real-time images of the sewage to a user-controlled mobile application. The user can then decide whether to intervene by directing the arm attached to the vehicle [24]. Due to the complexity of the sewer environment and the homogenization of images, these robots have difficulty performing cleaning tasks effectively when solely relying on manual operation.

This paper proposes a visually assisted Sewer Cleaning Robot (SCR), designed for efficient sewer cleaning. The mobile platform is equipped with a pressing mechanism, which improves stability during cleaning operation and enhances ability to cross obstacles within the complex sewer environment. The robot arm uses a linear actuator to replace rotational motors in the two pitch joints for increasing the load capacity. A Detection–Localization–Cleaning mode is propoesd for optimizing the cleaning process. This mode integrates the YOLO defect detection model and a monocular vision localization algorithm to enhance obstacle cleaning efficiency. By incorporating defect detection and target localization, SCR surpasses traditional robots that rely solely on visual feedback, enabling the quicker identification and localization of defects. It reduces the cognitive burden on operators and improves decision-making speed and accuracy. The functionality and effectiveness of SCR were validated through experiments conducted in simulated pipeline environments.

The remainder of this paper is structured as follows: Section 2 provides an overview of the hardware components of the SCR, Section 3 details the software algorithms employed, Section 4 presents the experimental results, and Section 5 offers concluding remarks.

2. Hardware System Design of SCR

The hardware system is designed to establish an independent and comprehensive operational platform within pipelines to address the complexities of the working environment. The hardware design of SCR integrates three core modules: a mobile platform, an operational arm, and a vision system. The mobile platform is equipped with a pressing mechanism designed to enhance the platform passability and ensure stability during cleaning operations. To meet the high-load requirements for cleaning hard obstacles, the pitch joints of the arm employ cost-effective and structurally stable linear actuator instead of motors. The overall structure of SCR is shown in Figure 1, and its main technical specifications are listed in Table 1.

The robot measures 1210 mm in length, is constructed from stainless steel, and weighs 43 kg. It includes an automatic spooling device and a 50 m operating cable to facilitate extended, remote operation tasks. The remote control computer is powered by an Intel Core i5-1240P processor, equipped with 16 GB of RAM and an Intel Iris Xe graphics card, offering robust computational capabilities for real-time image transmission and human–computer interaction. Users can monitor and control the robot system in real-time through this computer, which also hosts all software algorithms and manages communication with each hardware module.

2.1. Hardware Mechanism Design

The arm is mounted at the front of the mobile platform, comprising the arm body, linear actuators, and a rotational motor. It provides three degrees of freedom (DoF), with a working range that covers any position within 780 mm sewer. The two pitch DoF of the arm are driven by a 400 mm stroke linear actuator using a double linear actuator structure, which directly delivers substantial torque. This setup is ideal for operations in confined spaces, enhancing arm stability and force control while also reducing costs. Double actuators ensure more stable movement and better load distribution than a single actuator, reducing potential problems such as tilting or uneven force application. At the end of arm, a high-speed cutting motor is equipped with various interchangeable cutting tools, enabling operations like cutting, chipping, and grinding to efficiently handle obstacles such as tree roots and cement slabs, ensuring both the feasibility and stability of operational tasks. During the process of maneuvering the arm to target, two DoF are controlled by linear actuators and combined with the arm mechanical structure, facilitating the pitching motion. The relationship between linear actuator length and joint angle is illustrated in Figure 2.

The blue triangle

Δ a_{1} b_{1} c_{1}

forms the principal triangle of the third DoF. In the initial state of the arm, the lifting arm angle

θ_{3} = 0

.

l_{1}

denotes the length of the linear actuator.

m_{1}

and

n_{1}

represent the fixed lengths of the mechanical structure. When the lifting arm angle

θ_{3} \neq 0

, based on the

Δ a_{1} b_{1} c_{1}

, the relationship is described by Equation (1).

cos (φ_{1} + θ_{3}) = \frac{{l_{1}}^{2} - {m_{1}}^{2} - {n_{1}}^{2}}{2 m_{1} n_{1}}

(1)

φ_{1} + θ_{3}

represents the value of

Δ a_{1} b_{1} c_{1}

when the end-effector pitch angle is

θ_{3}

; the length of the linear actuator

l_{1}

is given by Equation (2).

l_{1} = \sqrt{2 m_{1} n_{1} cos (φ_{1} + θ_{3}) + {m_{1}}^{2} + {n_{1}}^{2}}

(2)

The red

Δ a_{2} b_{2} c_{2}

represents the principal triangle associated with the second DoF, which rotates around point

a_{2}

. In Figure 2, the arm is depicted in initial state, where the lifting arm angle

θ_{2} = 0

.

l_{2}

denotes the length of the linear actuator, while

m_{2}

and

n_{2}

represent fixed-length structural components of the mechanical arm. When the lifting arm angle

θ_{2} \neq 0

, based on the

Δ a_{2} b_{2} c_{2}

, the relationship is described by Equation (3).

cos (φ_{2} + θ_{2}) = \frac{{l_{2}}^{2} - {m_{2}}^{2} - {n_{2}}^{2}}{2 m_{2} n_{2}}

(3)

φ_{2} + θ_{2}

represents the magnitude of

θ_{2} \neq 0

when the mechanical arm pitch angle

∠ c_{2} a_{2} b_{2}

. From Equation (3), the length of the linear actuator

l_{2}

is given by Equation (4), when the lifting angle of the arm is

θ_{2}

.

l_{2} = \sqrt{2 m_{2} n_{2} cos (φ_{2} + θ_{2}) + {m_{2}}^{2} + {n_{2}}^{2}}

(4)

According to the target pitch angle, the length of the linear actuator corresponding to the angle is calculated. Finally, the computer drives the controller to accurately reach the specified position.

The SCR vision system comprises front and rear gimbals, each fitted with a pinhole monocular camera (FS-ZK350, 1920 × 1080 pixels 25 fps) equipped with supplementary lighting. The system captures real-time images in a sewer through a 6 mm F1.4 fixed-focus lens (70° HFOV), facilitating the detection and localization of defects. The gimbals enable the camera to rotate vertically from 0 to 90 degrees. The image data collected by the vision system is transmitted to a ground workstation for real-time monitoring and operator decision-making, thus enabling comprehensive control over the robot operational status and the sewer environment.

2.2. Control System Hardware Parameters

The hardware control system of SCR is shown in Figure 3. The main controller collects video data from both the front and rear cameras and sends control commands to various subsystems. To address the slippery working conditions within sewers, an integrated servo hub motor with an IP65 waterproof rating is selected as the primary driving component. This motor is controlled by two 24 V dual closed-loop brushless DC servo drivers, and the drives receive speed control commands via the RS485 communication protocol. The pressing mechanism consists of two linear actuators, which are controlled by a single dual-synchronous driver. This driver also receives position control commands through the RS485 protocol. The rolling DoF for the arm is achieved using a 48 V hollow integrated motor, which provides a rolling range of ±180°. The motor receives angle control commands via the CAN communication protocol. The pitch DoF of arm employs an actuator drive controller identical to that used in the pressing mechanism.

The overall system uses five linear actuators. The end-effector uses a 48 V waterproof high-speed motor, delivering a continuous power of 1400 W and reaching a maximum speed of 3000 rpm. This motor is regulated by a DC chopper PWM three-phase driver with a peak current of 30 A. Finally, the camera gimbal is controlled by a 24 V DC servo motor, which receives position control commands through the RS485 protocol, enabling the precise directional adjustments of the camera.

3. Software Algorithm Design

Robot perception and decision-making capabilities are insufficient for autonomously performing cleaning tasks in complex pipeline environments. With the assistance of machine vision, combined with human decision-making expertise, the efficiency and accuracy of remote operations can be significantly improved. To address the issue of users being unable to quickly obtain target tasks in the presence of various defects within the pipeline, this paper integrates the YOLOv5 defect detection model with a monocular vision localization method in the visual system. This system can identify the types of defects in the pipeline in real-time and calculate their locations through the monocular vision localization method, which utilizes two images. All algorithms are designed within the Robot Operating System (ROS) framework, with communication between functional modules occurring through custom messages sent via ROS topics and services.

Figure 4 illustrates the robot environmental perception and task flowchart. The robot is deployed into the pipeline via a hoisting mechanism. The operator selects the type of end-effector currently installed and then controls the movement of the mobile platform. During the movement, the defect detection module operates automatically. When a defect is identified, if it matches the current tool type, its location is calculated. The current robot uses a single-purpose end-effector for each task. When encountering different types of obstacles, the operator retrieves the robot to change the end-effector tool. In the absence of external factors such as slippage, the robot estimates its position within the pipeline by utilizing the drive wheel speed sensors and the visual system. Once the robot identifies and locates a task target, the operator controls the mobile platform to move to the work area. Based on the type of target, it is determined whether the pressing mechanism needs to be raised. Finally, the arm is manually controlled to perform the cleaning task.

The control interface is shown in Figure 5. The first part of Figure 5 is the camera feedback image, which displays the information transmitted from the front and rear cameras of the robot. The second part contains the control interface of the robot body and arm, including the adjustment of the hub motor speed and the linear actuator length. The third part is the keyboard operation area, which provides the same control functions as the second part. The fourth part displays the feedback information from the arm rolling joint motor driver.

3.1. Defect Detection Modeling in Sewers

To identify different obstacles such as tree roots and slats within sewers and to assist operators in non-excavation removal operations, SCR requires a target detection model with high accuracy, rapid detection speed, and strong generalizability to adapt to varying sewer environments. In recent years, YOLO series algorithms have demonstrated advantages in the field of target detection, including fast processing speeds, high accuracy, and robust generalization. After optimization, these algorithms can be effectively applied to defect detection in sewers.

In this study, we employ the YOLOv5-sewer model [25], trained using the Storm Drain Model Dataset [26] and Pipe Root Dataset [27] from the Robotflow Universe open-source dataset platform. The dataset consists of 1317 images featuring various target defects, such as cracks, utility intrusions, obstacles, and tree roots, with a total of 3034 labeled defects. Examples of the dataset images are shown in Figure 6.

3.2. Monocular Visual Localization

The environment inside sewer is characterized by complexity, homogeneity, and dim lighting over medium and long distances. Additionally, pipe defects exhibit both visual diversity and similarity [28], which can lead to reduced accuracy in defect detection. To address the challenge of portable equipment at operation sites being unable to run real-time defect detection models with high accuracy, a lightweight sewer defect detection model, YOLOv5-Sewer, optimized for pipeline environments, is employed. This model achieves a lightweight design that enhances its performance on low-resource devices while maintaining detection accuracy [25].

To enable efficient obstacle cleaning by SCR and to provide operators with accurate target location information, target localization plays a critical role in the overall cleaning operation process. Target localization involves utilizing the target feature information from the image plane to obtain the position in 3D space. Monocular vision localization is a technique that uses a single camera to estimate the position of objects in a 3D space. Compared with binocular vision systems, monocular vision systems require only one camera, offering advantages such as a simpler structure, a lower cost, greater flexibility, and convenience [29]. Additionally, monocular vision eliminates the need to consider factors like the optimal baseline distance between two cameras and blind zones, making it a more practical and versatile solution. These features provide monocular vision systems with broader application prospects in real-world scenarios.

This paper uses a monocular visual localization method that utilizes edge detection within defect bounding boxes across two images to achieve precise target positioning. The workflow is depicted in Figure 7. First, the camera is calibrated to determine its intrinsic matrix K and distortion coefficient D:

K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}]

(5)

D = [\begin{matrix} k_{1} & k_{2} & p_{1} & p_{2} & k_{3} \end{matrix}]

(6)

The intrinsic matrix K encodes the camera internal parameters, which include the focal lengths

f_{x}

and

f_{y}

along the x and y axes, and the coordinates of the principal point

(c_{x}, c_{y})

, which is usually near the center of the image. The intrinsic matrix is essential for converting 3D points in the camera coordinate system into 2D pixel coordinates in the image. The distortion coefficients D account for lens distortion and include both radial and tangential distortion parameters. Specifically,

k_{1}

,

k_{2}

, and

k_{3}

are the radial distortion coefficients, while

p_{1}

and

p_{2}

are the tangential distortion coefficients. Radial distortion causes straight lines to appear curved, commonly observed as barrel or pincushion distortion, while tangential distortion arises due to lens assembly imperfections.

These parameters are determined during the camera calibration process using a series of images of a known calibration pattern, such as a checkerboard. Using the OpenCV library cv2.calibrateCamera function, the intrinsic matrix K and the distortion coefficients D are computed simultaneously.

The camera is subsequently repositioned to capture two images of the scene from distinct viewpoints, ensuring sufficient parallax for accurate depth estimation. A defect detection model is utilized to process the images, identify the target defects, and generate bounding boxes around the areas of defect. Within these bounding boxes, the Canny edge detection algorithm is applied to enhance the edges and extract the structural features of the defect.

Canny edge detection operates by analyzing the image gradients to identify areas of rapid intensity changes, which typically correspond to edges. The gradient magnitude (G) and direction (

θ

) are computed using the following equations:

G = \sqrt{G_{x}^{2} + G_{y}^{2}}, θ = arctan (\frac{G_{y}}{G_{x}})

(7)

In Equations (7),

G_{x}

and

G_{y}

represent the horizontal and vertical gradients of the image, respectively. The gradient magnitude G reflects the strength of the edge at each pixel, while the gradient direction

θ

indicates the orientation of the edge relative to the horizontal axis.

By isolating the edges within the defect bounding boxes, this approach reduces the computational complexity and enhances the precision of subsequent monocular vision localization. These extracted edges serve as key features for matching across the two viewpoints, enabling accurate depth estimation and precise positioning of the defect within the pipeline environment.

The detected edge information is then restored to its original position in the image. ORB feature detection and descriptor computation are performed on the edge-detected image, and feature matching is carried out using the Hamming distance:

H (d_{1}, d_{2}) = \sum_{i = 1}^{n} |d_{1} [i] - d_{2} [i]|

(8)

where

d_{1} [i]

and

d_{2} [i]

are the i-th bits of the respective descriptors, and n is the length of the descriptor. The Hamming distance measures the number of bit positions where the two descriptors differ, effectively quantifying their dissimilarity.

To optimize the results, the RANSAC algorithm is used to filter out noise and incorrect matches. The RANSAC algorithm iteratively selects subsets of feature matches to estimate the fundamental matrix F while reducing outliers caused by incorrect feature correspondences. The fundamental matrix F is a

3 \times 3

matrix that encapsulates the epipolar geometry relationship between two views. It satisfies the following constraint for corresponding points

x

and

x^{'}

in the two images:

x^{' ⊤} F x = 0

(9)

In Equations (9),

x

and

x^{'}

are the homogeneous coordinates of a point in the first and second images, respectively. The constraint ensures that point

x^{'}

lies on the epipolar line corresponding to

x

.

Compared to direct ORB feature matching applied only within the bounding box, integrating Canny edge detection focuses feature points specifically on the target edge regions. This selective approach minimizes interference from irrelevant background details, which is particularly beneficial in complex pipeline scenarios with repetitive textures or occlusions. By concentrating feature points along the object boundaries, the matching process achieves improved accuracy and computational efficiency.

After computing the fundamental matrix F using the RANSAC algorithm, the essential matrix E can be derived by incorporating the camera intrinsic matrix K:

E = K^{⊤} F K

(10)

The essential matrix E encodes the relative rotation and translation between two camera views, and it forms the foundation for recovering the camera motion.

To extract the rotation matrix R and translation vector

t

from the essential matrix E, singular value decomposition (SVD) is performed. The essential matrix E is decomposed as follows:

E = U \cdot Σ \cdot V^{⊤}

(11)

where U and V are orthogonal matrices, and

Σ = diag (s, s, 0)

is the singular value matrix, with s representing the non-zero singular values of E. Based on this decomposition, two possible rotation matrices,

R_{1}

and

R_{2}

, and two possible translation vectors,

\pm t

, can be obtained. To determine the correct combination of R and

t

, a triangulation step is performed, ensuring that the reconstructed 3D points lie in front of both camera planes. This step selects the valid solution among the four possible configurations

(R_{1}, t)

,

(R_{1}, - t)

,

(R_{2}, t)

, and

(R_{2}, - t)

.

Finally, using the known camera pose and the triangulation method [30], the 2D image points are reconstructed into 3D coordinates. The relationship is defined by the following equations:

λ_{1} x_{1} = K [I | 0] X, λ_{2} x_{2} = K [R | t] X

(12)

In Equation (12),

λ_{1}

and

λ_{2}

are scale factors that convert the homogeneous coordinates into Cartesian coordinates, while

x_{1}

and

x_{2}

represent the 2D pixel coordinates of the corresponding point in the first and second images, respectively. The camera intrinsic matrix M contains parameters such as the focal length and principal point offsets. The pose of the first camera is expressed as

[I | 0]

, assuming it is located at the origin of the world coordinate system, whereas

[R | t]

describes the rotation and translation of the second camera relative to the first. Finally,

X

denotes the 3D point in the world coordinate system.

These equations form the basis of triangulation, where the intersection of two rays determines the 3D location of the target. By applying this process to all valid matches, a 3D reconstruction of the identified defects is achieved, enabling precise localization and subsequent cleaning operations.

3.3. 3-DoF Arm Kinematic Analysis

The arm has three DoF, with a cutting motor rotary joint serving as the end-effector. The kinematic model of the arm is depicted in Figure 8.

The rotational ranges of the joints

J_{1}

,

J_{2}

, and

J_{3}

are

- 180^{\circ}

to

180^{\circ}

,

0^{\circ}

to

40^{\circ}

, and

0^{\circ}

to

90^{\circ}

, respectively. Using the Denavit–Hartenberg (D-H) parametric method, the kinematic equations for the arm are derived and expressed in Equation (13). The lengths of

l_{1}

,

l_{2}

, and

l_{3}

and are 0 mm, 358 mm, and 147 mm, respectively.

{}_{4}^{0}T = [\begin{matrix} c θ_{1} c (θ_{2} + θ_{3}) & - c θ_{1} s (θ_{2} + θ_{3}) & 0 & l_{3} c θ_{1} c (θ_{2} + θ_{3}) + l_{2} c θ_{1} c θ_{2} \\ s θ_{1} c (θ_{2} + θ_{3}) & - s θ_{1} s (θ_{2} + θ_{3}) & 0 & l_{3} s θ_{1} c (θ_{2} + θ_{3}) + l_{2} s θ_{1} c θ_{2} \\ s (θ_{2} + θ_{3}) & c (θ_{2} + θ_{3}) & 0 & l_{3} s (θ_{2} + θ_{3}) + l_{2} s θ_{2} \\ 0 & 0 & 0 & 1 \end{matrix}]

(13)

When the parameters in the

{}_{4}^{0}T

are known, they are expressed as Equation (14).

{}_{4}^{0}T = [\begin{matrix} r_{11} & r_{12} & r_{13} & p_{x} \\ r_{21} & r_{22} & r_{23} & p_{y} \\ r_{31} & r_{32} & r_{33} & p_{z} \\ 0 & 0 & 0 & 1 \end{matrix}]

(14)

This can be solved for

θ_{1}

,

θ_{2}

, and

θ_{3}

, as shown in Equations (15)–(17).

θ_{1} = arccos \frac{r_{11}}{r_{32}} = arcsin \frac{r_{21}}{r_{32}}

(15)

θ_{2} = arcsin \frac{p_{z} - l_{3} r_{31}}{l_{2}} = arccos \frac{p_{x} - l_{3} c θ_{1} r_{32}}{l_{2} c θ_{1}}

(16)

θ_{3} = arccos (s θ_{2} r_{31} + c θ_{2} r_{32}) = arc \tan \frac{r_{31}}{r_{32}} - θ_{2}

(17)

4. Results of Experiments and Discussions

In this section, we conduct experiments to verify the feasibility of the SCR in obstacle cleaning tasks. These experiments include testing the obstacle-crossing capability of the mobile platform, performing monocular vision-based defect detection and localization, and obstacle cleaning operations. The results demonstrate both the rationality of the robot design and its feasibility in executing obstacle cleaning tasks.

4.1. Mobile Platform Crossing Test

To verify the robot obstacle-crossing ability in sewers, an average of 20 mm thick branches and leaves were placed in a simulated pipeline to replicate the soft obstacles typically found in such environments. Figure 9 shows the mobile platform accelerating through the soft obstacle with the pressing mechanism lifted, with arrows indicating the platform direction of movement. The test results demonstrate that the mobile platform has good passability on soft terrain.

4.2. Monocular Vision Defect Detection and Localization Experiment

In the monocular vision defect detection experiment, defects from the Pipeline Computer Vision Project dataset were identified using the YOLOv5-Sewer lightweight model, which has been optimized for in-pipe environments. The detection results are illustrated in Figure 10. All six pipeline defects within the dataset were successfully recognized, demonstrating effective detection performance even in complex environments.

In the target localization experiments, the camera is initially calibrated to determine its intrinsic parameters and distortion coefficients. The calibration was performed using the OpenCV library calibration function. The resulting intrinsic matrix is presented as Equation (18), and the distortion coefficients are given in Equation (19).

K_{0} = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}] = [\begin{matrix} 579.9844 & 0 & 315.8443 \\ 0 & 582.27447 & 240.6151 \\ 0 & 0 & 1 \end{matrix}]

(18)

D_{0} = [\begin{matrix} k_{1} & k_{2} & p_{1} & p_{2} & k_{3} \end{matrix}] = [\begin{matrix} 0.15539 & - 0.38861 & 0.00446 & 6.574 \times 10^{- 4} & 0.29998 \end{matrix}]

(19)

Figure 11 represents the ORB feature point matching of two images taken at different moments. The RANSAC algorithm is used to compute the fundamental matrix, and the wrong matching points are automatically eliminated. Fundamental matrix is solved as Equation (20).

F_{1} = [\begin{matrix} 0 & 0 & 0.0044 \\ 0 & 0 & - 0.0142 \\ - 0.0068 & 0.0096 & 1 \end{matrix}]

(20)

Using the camera intrinsic matrix

K_{0}

and the fundamental matrix

F_{1}

from Equation, the essential matrix

E_{1}

can be derived as shown in Equation (21).

E_{1} = [\begin{matrix} 0.506 & - 14.761 & - 3.265 \\ 15.982 & - 0.285 & 0.305 \\ 2.901 & - 2.537 & - 0.496 \end{matrix}]

(21)

The essential matrix is decomposed using the cv2.recoverPose function to obtain the rotation matrix

R_{1}

and translation vector

t_{1}

, as represented by Equation (22) and Equation (23), respectively.

R_{1} = [\begin{matrix} 0.9899 & 0.0262 & - 0.1391 \\ - 0.0201 & 0.9987 & 0.0458 \\ 0.1401 & - 0.0426 & 0.9892 \end{matrix}]

(22)

t_{1} = [\begin{matrix} 0.1636 & 0.1712 & - 0.9715 \end{matrix}]

(23)

The projection matrix for the two frames is constructed using the camera intrinsic parameters and the positional transformation data. Subsequently, triangulation is employed to determine the target depth in the current camera coordinate system, calculated to be 44.3 cm, with the actual distance being 45 cm. Multiple localization calculations were conducted by varying the distance. The experimental results are depicted in Figure 12. The monocular visual localization system demonstrates good performance at short to medium distances, achieving the highest accuracy at 75 cm.

However, the error increases as the distance grows. Measurement accuracy at longer distances requires further optimization. Additionally, the impact of various environmental factors on measurement accuracy must be considered in practical applications.

4.3. In-Pipe Obstacle Cleaning Experiments

Branches and bricks were fixed at the end of the pipeline to simulate the cutting of tree roots and the grinding of hard obstacles typically encountered in sewers. The operator remotely controlled the robot to perform cutting and grinding operations in the simulated pipeline. The experimental process is shown in Figure 13 and Figure 14. After elevating the pressing mechanism, the mobile platform was maneuvered in the direction indicated by the yellow arrow. In the tree root cutting experiment, the red box denotes the target operational area. Upon reaching this area, the arm was maneuvered to execute the cutting of the branches. In the brick grinding experiment, the end-effector is replaced with a special grinding tool. The area in the yellow box in the Figure 14 indicates that the grinding operation is in progress, and the mobile platform is controlled to move forward to perform the grinding operation.

Figure 15 illustrates the cross-sections of the cut tree root and the sanded brick surface. The cross-section of the cut tree root is neat, and the thickness of the polished brick surface is reduced by 1 cm. This demonstrates that the current end effector can complete the cutting and grinding tasks in the sewer. However, in the grinding experiment, a deviation was observed when the arm made contact with the bricks surface, potentially leading to unintended contact with the pipe wall.

5. Conclusions

This paper proposes the development and validation of an SCR, a visually assisted robotic system designed for cleaning tasks in narrow and complex sewers. The key contributions include the following:

Key Features:
–
A mobile platform with a pressing mechanism to enhance stability and obstacle-crossing capabilities.
–
The arm uses cost-effective, structurally stable linear actuators in pitch joints to meet high-load requirements and ensure operational stability.
–
The integration of the YOLO defect detection model and a monocular vision localization algorithm for a Detection–Localization–Cleaning mode.
Advantages:
–
Enhances cleaning precision and efficiency through defect detection and target localization algorithms.
–
Reduces operator burden and improves decision-making speed in complex pipeline scenarios.

The feasibility and effectiveness of the SCR were validated through simulated experiments, which demonstrated its mobility, visual detection and localization accuracy, and obstacle cleaning capabilities. These results confirm the rationality of the robot design and its potential for practical applications in sewer cleaning tasks.

This work provides technical reserves for the further development of vision assisted cleaning robots in the field of sewer cleaning. Future research will focus on enhancing the robustness of the visual algorithm and the stability during cleaning operations, thereby improving its adaptability to different sewer environments.

Author Contributions

Conceptualization, B.X., L.Z. and Z.C.; methodology, B.X., L.Z. and Z.C.; software, B.X.; validation, B.X. and L.Z.; formal analysis, B.X.; investigation, L.Z.; resources, Z.C.; data curation, B.X.; writing—original draft preparation, B.X.; writing—review and editing, L.Z. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Advanced Innovation Center for Intelligent Robots and Systems, grant number: 00921917001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hammond, M.J.; Chen, A.S.; Djordjević, S.; Butler, D.; Mark, O. Urban flood impact assessment: A state-of-the-art review. Urban Water J. 2015, 12, 14–29. [Google Scholar] [CrossRef]
Duan, W.; He, B.; Nover, D.; Fan, J.; Yang, G.; Chen, W.; Meng, H.; Liu, C. Floods and associated socioeconomic damages in China over the last century. Nat. Hazards 2016, 82, 401–413. [Google Scholar]
Kazeminasab, S.; Sadeghi, N.; Janfaza, V.; Razavi, M.; Ziyadidegan, S.; Banks, M.K. Localization, mapping, navigation, and inspection methods in in-pipe robots: A review. IEEE Access 2021, 9, 162035–162058. [Google Scholar] [CrossRef]
Jang, H.; Kim, T.Y.; Lee, Y.C.; Kim, Y.S.; Kim, J.; Lee, H.Y.; Choi, H.R. A review: Technological trends and development direction of pipeline robot systems. J. Intell. Robot. Syst. 2022, 105, 59. [Google Scholar]
Moradi, S.; Zayed, T.; Golkhoo, F. Review on computer aided sewer pipeline defect detection and condition assessment. Infrastructures 2019, 4, 10. [Google Scholar] [CrossRef]
Guo, W.; Soibelman, L.; Garrett Jr, J. Automated defect detection for sewer pipeline inspection and condition assessment. Autom. Constr. 2009, 18, 587–596. [Google Scholar]
Yin, X.; Chen, Y.; Bouferguene, A.; Zaman, H.; Al-Hussein, M.; Kurach, L. A deep learning-based framework for an automated defect detection system for sewer pipes. Autom. Constr. 2020, 109, 102967. [Google Scholar]
Fang, X.; Li, Q.; Zhu, J.; Chen, Z.; Zhang, D.; Wu, K.; Ding, K.; Li, Q. Sewer defect instance segmentation, localization, and 3D reconstruction for sewer floating capsule robots. Autom. Constr. 2022, 142, 104494. [Google Scholar] [CrossRef]
Nguyen, T.; Blight, A.; Pickering, A.; Jackson-Mills, G.; Barber, A.R.; Boyle, J.H.; Richardson, R.; Dogar, M.; Cohen, N. Autonomous control for miniaturized mobile robots in unknown pipe networks. Front. Robot. AI 2022, 9, 997415. [Google Scholar]
Rayhana, R.; Yun, H.; Liu, Z.; Kong, X. Automated defect-detection system for water pipelines based on CCTV inspection videos of autonomous robotic platforms. IEEE/ASME Trans. Mechatron. 2023, 29, 2021–2031. [Google Scholar]
Cheng, J.C.; Wang, M. Automated detection of sewer pipe defects in closed-circuit television images using deep learning techniques. Autom. Constr. 2018, 95, 155–171. [Google Scholar]
Aitken, J.M.; Evans, M.H.; Worley, R.; Edwards, S.; Zhang, R.; Dodd, T.; Mihaylova, L.; Anderson, S.R. Simultaneous localization and mapping for inspection robots in water and sewer pipe networks: A review. IEEE Access 2021, 9, 140173–140198. [Google Scholar] [CrossRef]
Summan, R.; Dobie, G.; Guarato, F.; MacLeod, C.; Marshall, S.; Forrester, C.; Pierce, G.; Bolton, G. Image mosaicing for automated pipe scanning. In AIP Conference Proceedings; American Institute of Physics: College Park, MD, USA, 2015; Volume 1650, pp. 1334–1342. [Google Scholar]
Mikolajczyk, T. Manufacturing using robot. Adv. Mater. Res. 2012, 463, 1643–1646. [Google Scholar]
Tugeumwolachot, T.; Seki, H.; Tsuji, T.; Hiramitsu, T. Development of a compact sewerage robot with multi-DOF cutting tool. Artif. Life Robot. 2021, 26, 404–411. [Google Scholar] [CrossRef]
Abidin, A.S.Z.; Zaini, M.H.; Pauzi, M.F.A.M.; Sadini, M.M.; Chie, S.C.; Mohammadan, S.; Jamali, A.; Muslimen, R.; Ashari, M.F.; Jamaludin, M.S.; et al. Development of cleaning device for in-pipe robot application. Procedia Comput. Sci. 2015, 76, 506–511. [Google Scholar] [CrossRef]
Lee, J.Y.; Hong, S.H.; Jeong, M.S.; Suh, J.H.; Chung, G.B.; Han, K.R.; Choi, I.S. Development of pipe cleaning robot for the industry pipe facility. J. Korea Robot. Soc. 2017, 12, 65–77. [Google Scholar]
Jung, C.D.; Chung, W.J.; Ahn, J.S.; Kim, M.S.; Shin, G.S.; Kwon, S.J. Optimal mechanism design of in-pipe cleaning robot. In Proceedings of the 2011 IEEE International Conference on Mechatronics and Automation, Beijing, China, 7–10 August 2011; pp. 1327–1332. [Google Scholar]
Tugeumwolachot, T.; Seki, H.; Tsuji, T.; Hiramitsu, T. Development of a multi-dof cutting tool with a foldable supporter for compact in-pipe robot. In Proceedings of the 2022 IEEE International Conference on Mechatronics and Automation (ICMA), Guilin, China, 7–9 August 2022; pp. 587–592. [Google Scholar]
SEWER Robotics. Available online: https://www.sewerrobotics.com/ (accessed on 15 January 2025).
IMS ROBOTICS. Available online: https://www.ims-robotics.de/en/home/ (accessed on 15 January 2025).
S1E. Robotic. Available online: https://www.s1e.co.uk/ (accessed on 15 January 2025).
Sulthana, S.F.; Vibha, K.; Kumar, S.; Mathur, S.; Mohile, T.A. Modelling and design of a drain cleaning robot. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Chennai, India, 24–29 February 2020; IOP Publishing: Bristol, UK, 2020; Volume 912, p. 022049. [Google Scholar]
Awasthi, T.; Purohit, A.; Shah, U.; Sakya, G. A novel design of a drain cleaning robotic vehicle. In Automation and Computation; CRC Press: Boca Raton, FL, USA, 2023; pp. 420–426. [Google Scholar]
Zhao, X.; Xiao, N.; Cai, Z.; Xin, S. YOLOv5-Sewer: Lightweight Sewer Defect Detection Model. Appl. Sci. 2024, 14, 1869. [Google Scholar] [CrossRef]
Workspace, N. Storm Drain Model Dataset. Available online: https://universe.roboflow.com/new-workspace-zyqyt/storm-drain-model (accessed on 15 January 2025).
Rootdataset. Pipe Root Dataset. Available online: https://universe.roboflow.com/rootdataset/pipe_root (accessed on 15 January 2025).
Yougao, L.; Wei, H. Identification and feature extraction of drainage pipeline cracks based on SVD and edge recognition method. In Proceedings of the 2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE), Xiamen, China, 18–20 October 2019; pp. 1184–1188. [Google Scholar]
Ming, Y.; Meng, X.; Fan, C.; Yu, H. Deep learning for monocular depth estimation: A review. Neurocomputing 2021, 438, 14–33. [Google Scholar] [CrossRef]
Liang, J.; Wang, Y.; Chen, Y.; Yang, B.; Liu, D. A triangulation-based visual localization for field robots. IEEE/CAA J. Autom. Sin. 2022, 9, 1083–1086. [Google Scholar]

Figure 1. The structure of SCR.

Figure 2. The relationship between linear actuator length and joint angle: (a) Actuator for tool axis rotation blue triangle (

θ_{3}

). (b) Actuator for arm elevation angle red triangle (

θ_{2}

).

Figure 2. The relationship between linear actuator length and joint angle: (a) Actuator for tool axis rotation blue triangle (

θ_{3}

). (b) Actuator for arm elevation angle red triangle (

θ_{2}

).

Figure 3. Control system hardware.

Figure 4. Robot environmental perception and task flowchart.

Figure 5. Upper computer control interface.

Figure 6. Partial images from the sewer pipe internal defect dataset.

Figure 7. Target localization flowchart.

Figure 8. Arm kinematic model.

Figure 9. Mobile platform crossing test.

Figure 10. Defect detection model test results.

Figure 11. Using Canny edge detection (a) with ORB feature matching (b).

Figure 12. Experiments on monocular visual localization at different distances.

Figure 13. Branch cutting experiment in pipes.

Figure 14. Brick grinding experiment in pipes.

Figure 15. Experimental results of cutting and grinding operations.

Table 1. Technical specifications of SCR.

Parameter	Value
Size	1210 mm
Weight	42 kg
Adaptable Pipe Radius	280–780 mm
Operation Length Range	50 m
Signal Type	485/CAN
Operation Method	Joystick, Keyboard
Turning Limitation	Limited to minor deviations

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiong, B.; Zhang, L.; Cai, Z. Sewer Cleaning Robot: A Visually Assisted Cleaning Robot for Sewers. Appl. Sci. 2025, 15, 3426. https://doi.org/10.3390/app15073426

AMA Style

Xiong B, Zhang L, Cai Z. Sewer Cleaning Robot: A Visually Assisted Cleaning Robot for Sewers. Applied Sciences. 2025; 15(7):3426. https://doi.org/10.3390/app15073426

Chicago/Turabian Style

Xiong, Bo, Lei Zhang, and Zhaoyang Cai. 2025. "Sewer Cleaning Robot: A Visually Assisted Cleaning Robot for Sewers" Applied Sciences 15, no. 7: 3426. https://doi.org/10.3390/app15073426

APA Style

Xiong, B., Zhang, L., & Cai, Z. (2025). Sewer Cleaning Robot: A Visually Assisted Cleaning Robot for Sewers. Applied Sciences, 15(7), 3426. https://doi.org/10.3390/app15073426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sewer Cleaning Robot: A Visually Assisted Cleaning Robot for Sewers

Abstract

1. Introduction

2. Hardware System Design of SCR

2.1. Hardware Mechanism Design

2.2. Control System Hardware Parameters

3. Software Algorithm Design

3.1. Defect Detection Modeling in Sewers

3.2. Monocular Visual Localization

3.3. 3-DoF Arm Kinematic Analysis

4. Results of Experiments and Discussions

4.1. Mobile Platform Crossing Test

4.2. Monocular Vision Defect Detection and Localization Experiment

4.3. In-Pipe Obstacle Cleaning Experiments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI