This document describes how you can manage your OpenTelemetry Collector deployment at scale.
To get the most out of this page you should know how to install and configure the collector. These topics are covered elsewhere:
Telemetry collection at scale requires a structured approach to manage agents. Typical agent management tasks include:
Not every use case requires support for all of the above agent management tasks. In the context of OpenTelemetry task 4. Health and performance monitoring is ideally done using OpenTelemetry.
Observability vendors and cloud providers offer proprietary solutions for agent management. In the open source observability space, there is an emerging standard that you can use for agent management: Open Agent Management Protocol (OpAMP).
The OpAMP specification defines how to manage a fleet of telemetry data agents. These agents can be OpenTelemetry collectors, Fluent Bit or other agents in any arbitrary combination.
Note The term “agent” is used here as a catch-all term for OpenTelemetry components that respond to OpAMP, this could be the collector but also SDK components.
OpAMP is a client/server protocol that supports communication over HTTP and over WebSockets:
Let’s have a look at a concrete setup:
own_xxx
connection settings).You can try out a simple OpAMP setup yourself by using the OpAMP protocol implementation in Go. For the following walkthrough you will need to have Go in version 1.19 or above available.
We will set up a simple OpAMP control plane consisting of an example OpAMP server and let an OpenTelemetry Collector connect to it via an example OpAMP supervisor.
First, clone the open-telemetry/opamp-go
repository:
git clone https://github.com/open-telemetry/opamp-go.git
Next, we need an OpenTelemetry Collector binary that the OpAMP supervisor can
manage. For that, install the OpenTelemetry Collector Contrib
distro. The path to the collector binary (where you installed it into) is
referred to as $OTEL_COLLECTOR_BINARY
in the following.
In the ./opamp-go/internal/examples/server
directory, launch the OpAMP server:
$ go run .
2023/02/08 13:31:32.004501 [MAIN] OpAMP Server starting...
2023/02/08 13:31:32.004815 [MAIN] OpAMP Server running...
In the ./opamp-go/internal/examples/supervisor
directory create a file named
supervisor.yaml
with the following content (telling the supervisor where to
find the server and what OpenTelemetry Collector binary to manage):
server:
endpoint: ws://127.0.0.1:4320/v1/opamp
agent:
executable: $OTEL_COLLECTOR_BINARY
Note Make sure to replace
$OTEL_COLLECTOR_BINARY
with the actual file path. For example, in Linux or macOS, if you installed the collector in/usr/local/bin/
then you would replace$OTEL_COLLECTOR_BINARY
with/usr/local/bin/otelcol
.
Next, create a collector configuration as follows (save it in a file called
effective.yaml
in the ./opamp-go/internal/examples/supervisor
directory):
receivers:
prometheus/own_metrics:
config:
scrape_configs:
- job_name: otel-collector
scrape_interval: 10s
static_configs:
- targets: [0.0.0.0:8888]
hostmetrics:
collection_interval: 10s
scrapers:
load:
filesystem:
memory:
network:
exporters:
# NOTE: Prior to v0.86.0 use `logging` instead of `debug`.
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [hostmetrics, prometheus/own_metrics]
exporters: [debug]
Now it’s time to launch the supervisor (which in turn will launch your OpenTelemetry Collector):
$ go run .
2023/02/08 13:32:54 Supervisor starting, id=01GRRKNBJE06AFVGQT5ZYC0GEK, type=io.opentelemetry.collector, version=1.0.0.
2023/02/08 13:32:54 Starting OpAMP client...
2023/02/08 13:32:54 OpAMP Client started.
2023/02/08 13:32:54 Starting agent /usr/local/bin/otelcol
2023/02/08 13:32:54 Connected to the server.
2023/02/08 13:32:54 Received remote config from server, hash=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.
2023/02/08 13:32:54 Agent process started, PID=13553
2023/02/08 13:32:54 Effective config changed.
2023/02/08 13:32:54 Enabling own metrics pipeline in the config<F11>
2023/02/08 13:32:54 Effective config changed.
2023/02/08 13:32:54 Config is changed. Signal to restart the agent.
2023/02/08 13:32:54 Agent is not healthy: Get "http://localhost:13133": dial tcp [::1]:13133: connect: connection refused
2023/02/08 13:32:54 Stopping the agent to apply new config.
2023/02/08 13:32:54 Stopping agent process, PID=13553
2023/02/08 13:32:54 Agent process PID=13553 successfully stopped.
2023/02/08 13:32:54 Starting agent /usr/local/bin/otelcol
2023/02/08 13:32:54 Agent process started, PID=13554
2023/02/08 13:32:54 Agent is not healthy: Get "http://localhost:13133": dial tcp [::1]:13133: connect: connection refused
2023/02/08 13:32:55 Agent is not healthy: health check on http://localhost:13133 returned 503
2023/02/08 13:32:55 Agent is not healthy: health check on http://localhost:13133 returned 503
2023/02/08 13:32:56 Agent is not healthy: health check on http://localhost:13133 returned 503
2023/02/08 13:32:57 Agent is healthy.
If everything worked out you should now be able to go to http://localhost:4321/ and access the OpAMP server UI where you should see your collector listed, managed by the supervisor:
You can also query the collector for the metrics exported (note the label values):
$ curl localhost:8888/metrics
...
# HELP otelcol_receiver_accepted_metric_points Number of metric points successfully pushed into the pipeline.
# TYPE otelcol_receiver_accepted_metric_points counter
otelcol_receiver_accepted_metric_points{receiver="prometheus/own_metrics",service_instance_id="01GRRKNBJE06AFVGQT5ZYC0GEK",service_name="io.opentelemetry.collector",service_version="1.0.0",transport="http"} 322
# HELP otelcol_receiver_refused_metric_points Number of metric points that could not be pushed into the pipeline.
# TYPE otelcol_receiver_refused_metric_points counter
otelcol_receiver_refused_metric_points{receiver="prometheus/own_metrics",service_instance_id="01GRRKNBJE06AFVGQT5ZYC0GEK",service_name="io.opentelemetry.collector",service_version="1.0.0",transport="http"} 0
Was this page helpful?
Thank you. Your feedback is appreciated!
Please let us know how we can improve this page. Your feedback is appreciated!