Telemetry data acquisition is becoming crucial for efficient detection and timely reaction in the case of network status changes, such as failures. Streaming telemetry data to many collectors might be hindered by scalability issues, causing delay in localization and detection procedures. Providing efficient mechanisms for managing the massive telemetry traffic coming from network devices can pave the way to novel procedures, speeding up failure detection and thus minimizing response time. This paper proposes a novel Kafka-based monitoring framework leveraging the telemetry service. The proposed framework exploits the built-in scalability and reliability of Kafka to go beyond traditional monitoring systems. The framework allows a continuous monitoring of optical system data and their distribution through simple compressed text messages to a large number of consumers. Moreover, the proposed framework keeps a limited history of the monitored data, easing, for example, root cause failure analysis. The implemented monitoring platform is experimentally validated, considering the disaggregated paradigm, in terms of functional assessment, scalability, resiliency, and end-to-end message latency. Obtained results show that the framework is highly scalable, supporting up to around 4000 messages per second (and potentially more) with low CPU load, and is capable of achieving an end-to-end (i.e., producer-consumer) latency of about 50 ms. Moreover, the considered architecture is capable of overcoming the failure of a monitoring framework core component without losing any message.
Reliable and scalable Kafka-based framework for optical network telemetry
Sgambelluri A.;Pacini A.;Paolucci F.;Castoldi P.;Valcarenghi L.
2021-01-01
Abstract
Telemetry data acquisition is becoming crucial for efficient detection and timely reaction in the case of network status changes, such as failures. Streaming telemetry data to many collectors might be hindered by scalability issues, causing delay in localization and detection procedures. Providing efficient mechanisms for managing the massive telemetry traffic coming from network devices can pave the way to novel procedures, speeding up failure detection and thus minimizing response time. This paper proposes a novel Kafka-based monitoring framework leveraging the telemetry service. The proposed framework exploits the built-in scalability and reliability of Kafka to go beyond traditional monitoring systems. The framework allows a continuous monitoring of optical system data and their distribution through simple compressed text messages to a large number of consumers. Moreover, the proposed framework keeps a limited history of the monitored data, easing, for example, root cause failure analysis. The implemented monitoring platform is experimentally validated, considering the disaggregated paradigm, in terms of functional assessment, scalability, resiliency, and end-to-end message latency. Obtained results show that the framework is highly scalable, supporting up to around 4000 messages per second (and potentially more) with low CPU load, and is capable of achieving an end-to-end (i.e., producer-consumer) latency of about 50 ms. Moreover, the considered architecture is capable of overcoming the failure of a monitoring framework core component without losing any message.File | Dimensione | Formato | |
---|---|---|---|
pub1_jocn-13-10-E42.pdf
non disponibili
Tipologia:
Documento in Post-print/Accepted manuscript
Licenza:
Non pubblico
Dimensione
4.43 MB
Formato
Adobe PDF
|
4.43 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.