Troubleshooting cloud applications has made security, application performance monitoring (APM), and DevOps impossible to manage. Most cloud providers do not allow you to install your own agents, monitors, or probes on their systems to find out what is going on with their services. So what options are an organization left with?
Many cloud service providers have compromised by providing application programming interfaces (APIs) that can be used to gather data on the health, behavior, and performance of applications and their infrastructure. Telemetry aggregates the data generated by your systems that document a system’s state, as well as relationships and dependencies in a distributed system.
What is OpenTelemetry?
OpenTelemetry is an open-source observability framework made up of a collection of vendor-agnostic APIs, software development kits (SDKs), and libraries that generate and gather telemetry data from the three pillars of observability: cloud-native logs, metrics, and traces. It standardizes the way that telemetry data is collected, formatted, and exported to various backend systems for processing and analysis.
Supported Data Types
The types of exporters for backend systems include:
- OTLP: The OpenTelemetry open-source protocol used to transmit telemetry data from the SDK to the Collector and from the Collector to the backend system of choice.
- Splunk
- Prometheus
- Jaeger
- Kafka
- AWS
- Azure
- Data Dog
- F5
- Elastic Search
- Dynatrace
- Logz.io
- Sumo Logic
- Tencent
- Zipkin
OpenTelemetry vs OpenTracing
OpenTelemetry was formed after the merger of OpenCensus and OpenTracing, two other open-source API projects by the Cloud Native Computing Foundation (CNCF) and is now the single standard for application instrumentation. OpenTracing was only able to work with application traces which made it difficult to see how applications performed across services. OpenTelemetry is backward compatible with OpenTracing and OpenCensus.
OpenTelemetry simplifies alerting, reporting, and troubleshooting of issues that arise within multi-language systems that need correlation to provide detailed insight into the source of an event.
How the OpenTelemetry Collector Works
An OpenTelemetry Collector acts as a buffer between software and the observability tools that can receive, process, and export telemetry data in various formats and send it to several different services. It is how the data is sent to your storage or monitoring platform.
The Splunk OpenTelemetry Collector packages the open-source OpenTelemetry Collector with extra components such as receivers that determine how to get data in through push and pulls, processors that determine what to do with received data, exporters that determine where to send the data once it is received, and extensions which provide additional functions beyond the OpenTelemetry Collector. The Smart Agent Receiver can be used to integrate all Smart Agent metric monitors into Collector pipelines. Additional benefits include the ability to profile code, correlate data between multiple Splunk Observability Cloud views, and bundle FluentD to send logs.
How to Install and Configure the Splunk Distribution of the OpenTelemetry Collector
First, decide whether you will use the Splunk installer script, Splunk automated install method for configuration tool, or manually create an install configuration. Automated install packages can be downloaded directly from Splunk Cloud. Commands to run the installer script can be found below.
For Linux
Step 1: Install systemd, sudo, and curl.
Step 2: Download and execute installer script.
Step 3: Configure the environment variables.
-
SPLUNK_REALM: This is the Realm to send data to. The default is us0.
-
SPLUNK_MEMORY_TOTAL_MIB: This is the total allocated memory in mebibytes (MiB). For example, 512 allocates 512 MiB (500 x 2^20 bytes) of memory.
-
SPLUNK_ACCESS_TOKEN: This is the base64-encoded access token for authenticating data ingest requests.
Use the following commands:
curl -sSL https://dl.signalfx.com/splunk-otel-collector.sh > /tmp/splunk-otel-collector.sh; sudo sh /tmp/splunk-otel-collector.sh --realm SPLUNK_REALM --memory SPLUNK_MEMORY_TOTAL_MIB -- SPLUNK_ACCESS_TOKEN
For Windows
Step 1: Make sure that you have Administrator access to PowerShell
Step 2: Download and execute installer script.
Step 3: Configure the environment variables.
-
SPLUNK_REALM: This is the Realm to send data to. The default is us0.
-
SPLUNK_ACCESS_TOKEN: This is the base64-encoded access token for authenticating data ingest requests.
Use the following command:
& {Set-ExecutionPolicy Bypass -Scope Process -Force; $script = ((New-Object System.Net.WebClient).DownloadString('https://dl.signalfx.com/splunk-otel-collector.ps1')); $params = @{access_token = "SPLUNK_ACCESS_TOKEN"; realm = "SPLUNK_REALM"}; Invoke-Command -ScriptBlock ([scriptblock]::Create(". {$script} $(&{$args} @params)"))}
For Kubernetes
Step 1: Make sure that cert-manager is deployed and available.
Step 2: Deploy the operator.
Step 3: Configure the environment variables.
Step 4: Deploy the OpenTelemetry Collector.
Step 5: Verify the cert-manager, operator, and collector are up and running properly.
Use the following commands:
kubectl apply -f https://github.com/signalfx/splunk-otel-collector-operator/releases/latest/download/splunk-otel-operator.yaml kubectl create secret generic splunk-access-token --namespace splunk-otel-operator-system --from-literal=access-token=SPLUNK_ACCESS_TOKEN kubectl apply -f - <<EOF apiVersion: otel.splunk.com/v1alpha1 kind: Agent metadata: name: splunk-otel namespace: splunk-otel-operator-system spec: clusterName: <MY_CLUSTER_NAME> realm: <SPLUNK_REALM> EOF kubectl get pods -n cert-manager NAME READY STATUS RESTARTS AGE cert-manager-7c9c58cbcb-jwwkk 1/1 Running 0 5m1s cert-manager-cainjector-5d88544c9c-chwhr 1/1 Running 0 5m1s cert-manager-webhook-85f88ffb5b-4hrpb 1/1 Running 0 5m1s kubectl get pods -n splunk-otel-operator-system NAME READY STATUS RESTARTS AGE splunk-otel-agent-pp8wn 1/1 Running 0 68s splunk-otel-cluster-receiver-8f666b5b8-wbncp 1/1 Running 0 68s splunk-otel-operator-controller-manager-67b86fcf5c-f2sqq 1/1 Running 0 3m38s
Use Case Examples for a Splunk OpenTelemetry Collector
eCommerce companies have always been interested in monitoring whether their site performance can sustain their business level. When a site is down, it has the potential of costing some popular sites more than a few million dollars per minute. With stakes like this on the line, companies are invested in making sure that they can respond to errors quickly. The Splunk OpenTelemetry Collector helps to connect to cloud resources and send error logs to Splunk so that the source of those errors can be found quickly and with precise accuracy.
Splunk also recommends using the Splunk OpenTelemetry Collector to monitor Kubernetes deployments. It can track uptime, utilization of memory, CPU, and storage, interacting between cluster components, and platform container orchestration. By deploying a helm chart on the host running a Kubernetes cluster, we can track workload and other important metrics. Some deployments even use Prometheus to send metrics from the cluster to the cloud.
Conclusion
OpenTelemetry is a tool that keeps expanding to incorporate more formats, backend systems, APIs, and libraries. As these options become available, so will its usability in Splunk. This blog post only scratches the surface of what can be accomplished using this tool. If you want to learn more or have a specific question, please engage in the comments section of the blog post.
You don’t have to master Splunk by yourself in order to get the most value out of it. Small, day-to-day optimizations of your environment can make all the difference in how you understand and use the data in your Splunk environment to manage all the work on your plate.
Cue Atlas Assessment: Instantly see where your Splunk environment is excelling and opportunities for improvement. From download to results, the whole process takes less than 30 minutes using the link below: