Spotlight Long-Term Reporting Dashboards

When a stream is defined in the Streaming Analytics pane of the Spotlight real-time UI, the data for the stream is output to the long-term dashboards every 5 seconds. Along the way, the data is enriched with new data like DNS and geo-ip information. All of this data goes into the long-term dashboards. Many different correlations and insights can be achieved through the use of these dashboards, and the workflows within and between them.

When a streaming analytic is clicked on in the streaming Analytics pane of the Spotlight real-time UI, a new browser tab will open to the Spotlight long-term Overview Dashboard, and automatically filter on the name of the stream that was chosen. The stream filter is pinned, so that it persists between dashboards. A dropdown is provided to change stream filter. If the pinned filter is removed, the dashboard will display the aggregated results of all the streams. In this case, simply use the dropdown menu to select a specific stream.

Because the Spotlight long-term dashboards are built on the Kibana web UI, they are very customizable. However, the dashboards have been designed with a particular layout that usually has certain

To filter on a value for a specific field in a record, prefix the value with the name of the field, followed by a colon, followed by the value. For example, “server_addr:10.8.1.36”, will populate the dashboards with data from records that have a server_addr field value of 10.8.1.36.

Wildcards can also be used in filters. For example “server_addr:10.8.1.*” can be used to return any records that have a server_addr of 10.8.1. followed by anything. . Many other types of filtering can be performed in the filter bar. An abundance of documentation is available on the elastic website about the filters, and the Kibana UI in general.

To know what fields to filter on, the Discover tab can be used. By doing so, it will become evident that there are many more fields in the database than are used in the dashboards. Users are encouraged to explore the available data using the Discover tab, and create new visualizations to further enrich the dashboards. When new dashboards are created, it may be desirable to add the new dashboard to the list of dashboards in the menu to the right of the stream dropdown. This is accomplished by editing the menu, and adding an entry into the appropriate top level menu, or by creating a new top level menu to add the dashboard to.

It is important to keep in mind that the more visualizations that a dashboard has, the longer it will take to complete the queries and populate the dashboard. There are two ways to address this. One is to minimize the number of visualizations in each dashboard by spreading the visualizations our across more simpler dashboards. The other is to configure Spotlight to send the streaming analytics data to a remote ELK server. Furthermore, a remote ELK server can scale for size and performance by turning an ELK server into a cluster made of any number of ELK nodes.

The share button in the top right corner of the dashboard can be used to embed any dashboard into another website. The embedded dashboard does not include the surrounding controls of Kibana. This can be useful for providing dashboards as part of other content, with specific time and content filters that the user cannot change.

Overview Dashboard

The Overview dashboard provides limited information about all of the streams, or the selected stream. This dashboard can be used to see high level information, which may be enough to solve a problem, or enough to decide which other dashboard to navigate to for more detail.

At the top of the Overview dashboard are four bar graphs that display the top 16 conversations for each of the four primary Spotlight metrics: Application Latency, Network Latency, TCP/IP Quality, and VoIP Quality. These are all bar graphs that are similar to the graphs in the real-time UI. This provides a level of familiarity when navigating from the real-time UI to the long-term dashboards. It is important to keep in mind that while Spotlight generates a new set of the 16 worst flows for each type of quality measurements, the long-term dashboards displays an aggregated set of the data for these intervals. This means that each of the 16 conversations in a graph may come from different intervals throughout the range of time selected. Also, the graphs can be customized to show more or less conversations at one time.

Below the graphs are two tables. The first table displays high level information about the streams that are configured for Worst Conversations. The second table display high level information about the streams that are configured for Conversation Counts. These tables, like all tables in the long-term dashboards, can be sorted and filtered by any column.

Latency Dashboards

There are two Latency dashboards for Application and Network Latency. These dashboards provide detailed information about the conversations with the worst latency. To better understand how latency is calculated, please refer to the Spotlight Manual.

At the top of the latency dashboards are metrics showing the average and max latency, the number of conversations, clients, and servers. Below the metrics is a graph for utilization. This graph has utilization for the active conversations, the network utilization, and the non tcp/udp traffic.

Below that is a bar graph listing the top 16 conversations with the worst application latency. This is the same graph that is on the overview page. Next to it is a map showing where the servers with the worst conversations are.  Beside it is a world map displaying geographically where the servers with the worst latency are in the world. The geo-ip data is acquired during the logstash phase using the maxmind database. The lookup is only done on the server IPv4 addresses.

In the middle of the dashboard are a collection of graphs showing avg and max latency over time, unique servers and conversations over time, and conversation duration over time. The avg and max latency are separate graphs because sometimes a max latency spike can be high enough that the avg gets pushed so low that it cannot be seen in the graph. If desired, the avg and max latency can be added to the same graph.

At the bottom of the dashboard are three tables. The first two tables separate the clients and servers, and show the number of conversations for both, the max app and max net latency and the min TCP quality and min VoIP quality. The application and network latency values are shown for each conversation because knowing what the network latency is for a conversation that has the worst application latency can provide insight into the problem, and vice versa. The last table lists the worst conversations, and lots of details about each like duration, client packets, server packets, client bytes, server bytes, avg app latency, max app latency, avg net latency, max net latency, avg tcp quality, min tcp quality, and the number of expert events generated for each conversation. Again, even though the table may be the list of conversations with the worst app latency, correlating that with all of the other characteristics about the conversations can be very insightful during the monitoring or troubleshooting process.

Quality Dashboards

There are two Quality dashboards for TCP and VoIP quality. TCP quality is a number from 1 and 5 that is based on the number of TCP related events generated for a conversation. VoIP quality is a number between 1 and 5, defining the VoIP quality of a conversation. Because the TCP quality is based on expert event counts, a graph is provided showing the number of expert events over time.

The layout of the quality dashboards is very similar to the layout of the latency dashboards.

Conversation Dashboard

The Conversation Count dashboard displays conversations that have exceeded the user defined thresholds configured in the streaming analytics dialog in the Spotlight real-time UI.

At the top of the dashboard is a table listing the stream, network, stats type, threshold, and count. For each stream there is a row in the table for conversation quality, TCP quality, App Latency, Net Latency, and VoIP Quality. The threshold column shows what the user configured the threshold to be, and the count column shows how many conversations exceeded those thresholds.

At the bottom of the dashboard are graphs for each of the stats types, showing the counts over time.

Compare Dashboards

The Compare Dashboards compare latency and quality values over time by day and by week.

In the Compare Days dashboards, there is a line for the selected day, and a line for one day before and two days before the selected day. The default time for the Compare Days dashboard is Today but can be changed to any day. This dashboard should always be set to a whole day for it to work right. Once a day is set the left and right arrows on the date picker can be used to move through the days. The Compare Weeks dashboard is identical to the Compare Days dashboard, except that it is used to compare weeks instead of days. Both dashboards can be customized to add more days and compare other metrics.