For an electric network operator, it is desirable to have a dedicated connection between the control center and each of the remote locations (substations). However, this is not yet the case in many scenarios where the operators resort to public IP networks or other solutions. Although nowadays’ wireless networks may provide good performance and a high throughput, their loss rate is still not negligible (0.1 to 0.5%). In addition, the bursty nature of packet loss may result in the loss of all the GOOSE frames of a trip, as has been observed in the previous section. This is something that should never happen in a real network since it would prevent a protection algorithm from acting.
4.1. Possibility of Tunneling over TCP or SCTP
A possibility that could be considered to totally avoid packet loss would be to send the GOOSE frames over TCP, a protocol that provides delivery guarantees. However, TCP retransmissions require at least an extra exchange of packets between the sender and the receiver, i.e., a latency equivalent to the RTT (Round-Trip Time), in addition to the timeout expiration.
To test the suitability of using a TCP tunnel, we have resorted to Simplemux [
34], a protocol able to encapsulate a number of packets/frames belonging to different protocols into a single IP packet. In
normal flavor, it just adds a small separator before each of the aggregated packets/frames. The encapsulated packets/frames can travel over IP and UDP.
In the present work, the possibility of traveling over TCP has been added to an existing user space implementation of Simplemux. The implementation is available at
https://github.com/simplemux/simplemux (accessed on 30 October 2023). A Wireshark screenshot is shown in
Figure 7, obtained with a
.lua Simplemux dissector added as a plugin. It can be observed how Simplemux allows the sending of GOOSE frames over TCP packets using port 55557. A GOOSE frame with a size of 242 bytes is now sent inside a 311-byte frame. Ethernet, IP, and TCP add an overhead of 14, 20, and 32 bytes each (the TCP header has some extensions in this case), while Simplemux adds 3 more bytes. It can also be observed that the Simplemux header includes the length and the protocol code 143, which corresponds to Ethernet.
After some testing in the lab using Simplemux to send GOOSE frames over TCP, it was observed (see
Figure 8) that in some cases, the delay incurred was up to 220 or even 455 ms (this happened with a 1% loss rate, ABEL = 1, RTT = 5 ms). In the figure, it can be observed that four trip bursts were seriously affected by these delays (bursts #3, #10, #27, and #35, highlighted in yellow).
Furthermore, if the loss conditions become harder, especially if ABEL is higher, TCP stops working; it disconnects and needs a long reconnection time. Obviously, this is not an acceptable solution in our case since a remote command must be executed in a fast way: if a fault has been detected in the grid, the time to act is critical.
An alternative to UDP and TCP is SCTP, which is also a widely accepted standard with many mature implementations; it was published in 2007 [
35], and was updated recently [
36]. It has a congestion control mechanism similar to that of TCP (including features such as Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery [
37]). Therefore, the same limitations observed with TCP will apply.
All in all, it can be said that although the retransmission features of both TCP and SCTP make them able to grant that every single packet is delivered, they may add some delays that can be too high for this specific use case. In addition, their congestion control mechanisms may reduce the throughput [
38], and this is not the desired behavior, considering that certain equipment may be at risk.
4.2. Description of Simplemux, Blast Flavor
Once the use of tunneled GOOSE over TCP or SCTP has been discarded, new options have to be proposed. An interesting fact is that the throughput of a GOOSE flow is quite minimal: some tens of kilobits per second. Therefore, a possibility is to add a certain degree of redundancy, repeatedly sending each frame a number of times until it is acknowledged by the other side.
For that aim, a new flavor, called
blast, has been designed and added to the existing Simplemux implementation. It redundantly sends the same packet a number of times. Its protocol stack corresponds to the one in the right column of
Figure 1, in which Simplemux would be the
Tunneling protocol. For clarity, a Wireshark capture of Simplemux,
blast flavor, travelling over UDP port 55558 is shown in
Figure 9. In this case, the frame is 277 bytes long; the original GOOSE had 229 bytes, plus 14 bytes of the Ethernet header, 20 of IP, and 8 of UDP. Finally, the Simplemux header adds 6 more bytes. More details about the protocol fields and their values are given in
Appendix A.
As shown in
Figure 10, a period is defined: each frame sent by the RTAC is stored in the sender router and sent periodically via the tunnel until the first acknowledgment arrives. For that aim, application-level ACKs (Acknowledgements) are used. This increases the required throughput, but it guarantees that every single frame will arrive on the other side. Then, the destination router decapsulates the received frame and forward it to the end node.
It should be noted that since the mechanism works between a pair of intermediate machines, it is totally transparent for the end nodes, which only receive a single copy of the original frame. This is quite different from TCP: the proposed method does not wait for the ACK; it periodically sends a copy of the same frame to the other side. In high RTT networks, this can significantly reduce the incurred delay: instead of waiting for the whole RTT, a copy of any lost frame will soon be available.
As can be observed in
Figure 10 (frame #1), if a tunneled frame is lost, a new copy will be available after an interval similar to the defined period. If a number of packets
l are lost at the beginning of a burst, the additional delay becomes
(see frame #2). However, if the lost packet is not the first copy (see frame #3), the loss is not relevant.
To make an analysis of the incurred throughput increase, a parameter called
redundancy factor (
R) can be defined as
If a number of packets
l is lost at the beginning of a burst, this will be translated into an additional delay:
To obtain E[
l], let
Ploss be the loss rate. Let
k be the number of packets in a row that are lost. The number of tunneled frames lost at the beginning of a burst will be
The closed form of the sum is
Since
Ploss < 1, it can be devised that
From the analysis, it can be concluded that this method allows a trade-off between the additional delay and the
redundancy factor. The trade-off is illustrated in the next figures: from
Figure 11, it can be observed that the
redundancy factor mainly depends on the ratio
RTT/period, and the loss probability does not make any significant difference. From
Figure 12, it can be concluded that the loss probability and the period are the two factors that determine the additional delay.
A test battery has been conducted using the same testbed of
Section 3, with the implementation of Simplemux
blast flavor running between the two Raspberry Pi 3B+. As before, the two Raspberries are synchronized via NTP before the test, and two capture files are obtained with Wireshark. The two captures are parsed by a Python script, using the identifier of each packet to calculate the incurred delay.
First,
Table 6 gives some results obtained in the testbed, using typical values of the RTT: 20, 50, and 100 ms [
28]. The RTT and the loss probability (
Ploss) are determined by the scenario, so the period is the parameter that can be tuned by the network manager: if a very short value is set, the delay caused by packet loss can be kept into very low values (in the order of the period plus 0.05 to 0.22 ms).
As a counterpart, the redundancy can scale up to a ×4, ×5, or even a ×10 factor. This could potentially lead to traffic congestion if not managed appropriately. Besides maintaining the period at an optimal value, another strategy to keep redundancy at acceptable levels involves transmitting only the most critical packets (e.g., the trips) via Simplemux blast, while the rest are sent without confirmation. VLAN tags can be effectively utilized to categorize the packets.
The value of the period will therefore be limited by the redundancy allowed by the available bandwidth. It is clear that the method can be beneficial for loss-prone networks with high RTT: as an example, a copy of the packet would be available 22.22 ms later instead of waiting for the RTT (100 ms, see the last row of
Table 6).
Considering that this method always delivers all the frames, the important performance indicator is not the loss rate but the additional delay caused by packet loss, with different burstiness levels. We will first present two detailed examples, and some averaged results will then be reported.
Figure 13 and
Figure 14 show two sets of 40 faults, each of them generating a burst of GOOSE frames jointly with periodic ones. The period is set to 10 ms. In the first case (
Figure 13), with a low RTT, a low loss rate (1%), and no bursty losses, the additional delay is kept very low: an average of 0.36 ms, up to 10 ms in some few cases (and 16 ms in one case). If compared with TCP (see
Figure 8, obtained in the very same network conditions), the advantage in terms of delay is clear: in this case, the maximum delay is 16 ms, whereas with TCP, it was up to 455 ms. The processing delay in the Raspberry is roughly 0.1 ms. In a real deployment, this delay could even be reduced by using more specific hardware.
Things become more complicated in
Figure 14. Since the loss rate is 10%, packets are lost in bursts (ABEL = 10), and the RTT is higher. In this case, the maximum delay becomes 330 ms, although it is 3.68 ms on average.
The averaged results considering no bursty losses (ABEL = 1) show that the average added delay is usually under 0.5 ms (
Table 7). Furthermore, the maximum delay added to a packet was 16.16 ms. The standard deviation remains low.
As reported in
Table 8, the effect of bursty losses (ABEL = 10) is noticeable, especially when combined with a high loss rate (10%). In these cases, the variance of the delay grows significantly, with some packets sent more than 30 times (period of 10 ms and delay above 300 ms). However, the average added delay only grows up to 2–3 ms. This can be an interesting improvement, considering that GOOSE frames are sent in bursts, so it is easy for at least one of them to arrive on time.
All in all, the results illustrate the trade-off between the reduction in the added delay and the bandwidth increase. It will be the decision of the network operator to tune the period so the delay is kept to the required limits, always considering the bandwidth limitations imposed by the connection technology and the costs.
In general, it is clear that a profound understanding of the underlying network is essential to make an informed decision between a method without confirmation (such as R-GOOSE or VX-GOOSE) and the Simplemux blast approach, which continues to send the frame until it is received. If the network exhibits bursty packet loss behavior, it would be more advantageous to implement the latter method, bearing in mind the critical importance of maintaining a stable electrical grid.