Grapevine pruning poses a difficult problem if a specific automation solution is required. From the agricultural domain, it is known that a grapevine is separated into different parts. Based on this knowledge, several attempts by computer vision and the image processing research community have been made to solve the problem through image data processing, with the support of stereoscopic cameras [
1,
2] and 3D laser scanners [
3,
4]. Most of these are non-invasive approaches, except [
4] when a robotic system is used for grapevine pruning, based on image processing and computer vision methods. Other proposed publications are based on RGB images without the addition of distance estimation or other real-world attributes. These methods, with the support of real-world attributes from the scene, aim to decompose the grapevine from given images [
5,
6,
7,
8] to achieve a pruning cutting points estimation. The grapevine structure consists of three basic parts, (a) the trunk, which is the larger woody part of the plant, (b) the cordons, which are thinner than trunks and of which there are usually two in each plant, and (c) the canes starting on the top of cordons, which are the thinner woody parts of the plant and the tallest, as presented in
Figure 1.
Some basic visual characteristics of the plant are the fact that the trunk is always vertical, the cordons are horizontal, and the canes are in a random direction with a vertical formation. In the winter, during grapevine pruning, the shoots, spurs, suckers, and water sprouts do not exist. As mentioned above, the basic concept of pruning is to remove the last year’s canes from the cordons and let the new canes grow to give new grapes [
10]. This leads to the conclusion that canes are the object of interest in this specific problem. Grapevine pruning via computer vision and image processing has been proven to be a very challenging problem, since canes are increasing in length, in random directions, and very near each other. This structure creates a high overlap between objects of interest, and, with the addition of a complex background and noise from the environment, image analysis is very difficult, especially for an invasive approach where a robotic system must take action. Another challenge in this problem is the complex background, where a grapevine image will contain not only the foreground plant but also the background, especially during winter, when the grapevine has no leaves at all. Finally, each winery applies different pruning strategies, such as pruning some canes completely and just shortening the rest. Different pruning strategies add some constraints to the problem, which makes it more complex than simple. A feasible solution is the usage of stereoscopic cameras, where, based on an RGB image and depth map (real distance values), the foreground and background could be separated in a given image [
2]. The above issues justify why most studies in the literature focus on non-invasive approaches [
1,
2,
3,
5,
6,
7,
8], where background and foreground algorithms, object detection [
7], and segmentation [
6] deep learning algorithms have been used without pruning points’ estimation. In invasive approaches [
4], a closed environment surrounds the plant to be pruned with the support of robotic arms and computer vision methods based on 3D data processing. Despite the pruning, canes should also be removed, and in a fully automated scenario, the pruning strategy might change from the classical approach, since robotic manipulators with the support of cameras cannot mimic humans completely. Achieving this through image analysis means that all canes should be segmented in order to extract the whole canes’ bodies for cutting point estimations on them. Semantic segmentation is a very popular method in robotic systems that interact in a wild-free environment. With this approach to grapevine pruning, automation methods can achieve cutting point estimations or feature extraction mechanisms, since the whole area of interest is segmented. Pruning could be characterized as a problem for which it is very hard to find robust solutions, since vineyard plants do not have more than a 3 m distance from each other and have a very complex structure. Based on the above issues, the objectives of this study are: