Instrumentation

Instrumentation is the act of adding observability code to your application. This can be done with direct calls to the OpenTelemetry API within your code or including a dependency which calls the API and hooks into your project, like a middleware for an HTTP server.

TracerProvider and Tracers

In OpenTelemetry each service being traced has at least one TracerProvider that is used to hold configuration about the name/version of the service, what sampler to use and how to process/export the spans. A Tracer is created by a TracerProvider and has a name and version. In the Erlang/Elixir OpenTelemetry the name and version of each Tracer is the same as the name and version of the OTP Application the module using the Tracer is in. If the call to use a Tracer is not in a module, for example when using the interactive shell, the default Tracer is used.

Each OTP Application has a Tracer registered for it when the opentelemetry Application boots. This can be disabled by setting the Application environment variable register_loaded_applications to false. If you want a more specific named Tracer or disable the automatic registration you can register a Tracer with a name and version. Examples:

opentelemetry:register_tracer(test_tracer, <<"0.1.0">>),
OpenTelemetry.register_tracer(:test_tracer, "0.1.0")

Giving names to each Tracer, and in the case of Erlang/Elixir having that name be the name of the Application, allows for the ability to blacklist traces from a particular Application. This can be useful if, for example, a dependency turns out to be generating too many or in some way problematic spans and it is desired to disable their generation.

Additionally, the name and version of the Tracer are exported as the InstrumentationLibrary component of spans. This allows users to group and search spans by the Application they came from.

You can lookup a Tracer by name with get_tracer/1 and use that Tracer variable to call the tracing API through otel_tracer in Erlang or OpenTelemetry.Tracer in Elixir:

Tracer = opentelemetry:get_tracer(my_app),
SpanCtx = otel_tracer:start_span(Tracer, <<"hello-world">>, #{}),
...
otel_tracer:end_span(SpanCtx).
tracer = OpenTelemetry.get_tracer(:my_app)
span_ctx = OpenTelemetry.Tracer.start_span(tracer, "hello-world", %{})
...
OpenTelemetry.Tracer.end_span(span_ctx)

In most cases you will not need to manually register or look up a Tracer. Simply use the macros provided, which are covered in the following section, and the Tracer for the Application the macro is used in will be used automatically.

Starting Spans

A trace is a tree of spans, starting with a root span that has no parent. To represent this tree, each span after the root has a parent span associated with it. When a span is started the parent is set based on the context. A context can either be implicit, meaning your code does not have to pass a Context variable to track the active context, or explicit where your code must pass the Context as an argument not only to the OpenTelemetry functions but to any function you need to propagate the context so that spans started in those functions have the proper parent.

For implicit context propagation across functions within a process the process dictionary is used to store the context. When you start a span with the macro with_span the context in the process dictionary is updated to make the newly started span the currently active span and this span will be end’ed when the block or function completes. Additionally, starting a new span within the body of with_span will use the active span as the parent of the new span and the parent is again the active span when the child’s block or function body completes:

parent_function() ->    
    ?with_span(<<"parent">>, #{}, fun child_function/0).
               
child_function() ->
    %% this is the same process, so the span <<"parent">> set as the active
    %% span in the with_span call above will be the active span in this function    
    ?with_span(<<"child">>, #{}, 
               fun() ->
                   %% do work here. when this function returns, <<"child">> will complete.
               end).
    
require OpenTelemetry.Tracer

def parent_function() do
    OpenTelemetry.Tracer.with_span "parent" do
        child_function()
    end
end

def child_function() do
    ## this is the same process, so the span <<"parent">> set as the active
    ## span in the with_span call above will be the active span in this function    
    OpenTelemetry.Tracer.with_span "child" do
        ## do work here. when this function returns, <<"child">> will complete.
    end
end

Cross Process Propagation

The examples in the previous section were spans with a child-parent relationship within the same process where the parent is available in the process dictionary when creating a child span. Using the process dictionary this way isn’t possible when crossing processes, either by spawning a new process or sending a message to an existing process. Instead, the context must be manually passed as a variable.

Creating Spans for New Processes

To pass spans across processes we need to start a span that isn’t connected to particular process. This can be done with the macro start_span. Unlike with_span, the start_span macro does not set the new span as the currently active span in the context of the process dictionary.

Connecting a span as a parent to a child in a new process can be done by attaching the context and setting the new span as currently active in the process. The whole context should be attached in order to not lose other telemetry data like baggage.

SpanCtx = ?start_span(<<"child">>),
Ctx = otel_ctx:get_current(),

proc_lib:spawn_link(fun() ->
                        otel_ctx:attach(Ctx),
                        ?set_current_span(SpanCtx),
                        
                        %% do work here
                        
                        ?end_span(SpanCtx)
                    end),
span_ctx = OpenTelemetry.Tracer.start_span(<<"child">>)
ctx = OpenTelemetry.Ctx.get_current()

task = Task.async(fn ->
                      OpenTelemetry.Ctx.attach(ctx),
                      OpenTelemetry.Tracer.set_current_span(span_ctx)
                      # do work here
                      
                      # end span here or after `await` returns
                  end)
                  
_ = Task.await(task)
OpenTelemetry.Tracer.end_span(span_ctx)

Linking the New Span

If the work being done by the other process is better represented as a link – see the link definition in the specification for more on when that is appropriate – then the SpanCtx returned by start_span is passed to link/1 to create a link that can be passed to with_span or start_span:

Parent = ?current_span_ctx,
proc_lib:spawn_link(fun() ->
                        %% a new process has a new context so the span created
                        %% by the following `with_span` will have no parent
                        Link = opentelemetry:link(Parent),
                        ?with_span(<<"other-process">>, #{links => [Link]},
                                   fun() -> ok end)
                    end),
parent = OpenTelemetry.current_span_ctx()
task = Task.async(fn ->
                    # a new process has a new context so the span created
                    # by the following `with_span` will have no parent
                    link = OpenTelemetry.link(parent)
                    Tracer.with_span "my-task", %{links: [link]} do
                      :hello
                    end
                 end)

Attributes

Attributes are key-value pairs that are applied as metadata to your spans and are useful for aggregating, filtering, and grouping traces. Attributes can be added at span creation, or at any other time during the life cycle of a span before it has completed.

The key can be an atom or a utf8 string (a regular string in Elixir and a binary, <<"..."/utf8>>, in Erlang). The value can be of any type. If necessary the key and value are converted to strings when the attribute is exported in a span.

The following example shows the two ways of setting attributes on a span by both setting an attribute in the start options and then again with set_attributes in the body of the span operation:

?with_span(<<"my-span">>, #{attributes => [{<<"start-opts-attr">>, <<"start-opts-value">>}]}, 
           fun() -> 
               ?set_attributes([{<<"my-attribute">>, <<"my-value">>},
                                {another_attribute, <<"value-of-attribute">>}])
           end)
Tracer.with_span "span-1", %{attributes: [{<<"start-opts-attr">>, <<"start-opts-value">>}]} do
  Tracer.set_attributes([{"my-attributes", "my-value"},
                         {:another_attribute, "value-of-attributes"}])
end

Semantic Attributes

Semantic Attributes are attributes that are defined by the OpenTelemetry Specification in order to provide a shared set of attribute keys across multiple languages, frameworks, and runtimes for common concepts like HTTP methods, status codes, user agents, and more. These attribute keys and values are available in the header opentelemetry_api/include/otel_resource.hrl.

For details, see Trace semantic conventions.

Events

An event is a human-readable message on a span that represents “something happening” during it’s lifetime. For example, imagine a function that requires exclusive access to a resource like a database connection from a pool. An event could be created at two points - once, when the connection is checked out from the pool, and another when it is checked in.

?with_span(<<"my-span">>, #{}, 
           fun() -> 
               ?add_event(<<"checking out connection">>),
               %% acquire connection from connection pool
               ?add_event(<<"got connection, doing work">>),
               %% do some work with the connection and then return it to the pool
               ?add_event(<<"checking in connection">>)
           end)
Tracer.with_span "my-span" do
  Span.add_event("checking out connection")
  ## acquire connection from connection pool
  Span.add_event("got connection, doing work")
  ## do some work with the connection and then return it to the pool
  Span.add_event("checking in connection")
end

A useful characteristic of events is that their timestamps are displayed as offsets from the beginning of the span, allowing you to easily see how much time elapsed between them.

Additionally, events can also have attributes of their own:

?add_event("Process exited with reason", [{pid, Pid)}, {reason, Reason}]))
Span.add_event("Process exited with reason", pid: pid, reason: Reason)

Cross Service Propagators

Distributed traces extend beyond a single service, meaning some context must be propagated across services to create the parent-child relationship between spans. This requires cross service context propagation, a mechanism where identifiers for a trace are sent to remote processes.

In order to propagate trace context over the wire, a propagator must be registered with OpenTelemetry. This can be done through configuration of the opentelemetry application:

%% sys.config
...
{text_map_propagators, [baggage,
                        trace_context]},
...                        
## runtime.exs
...
text_map_propagators: [:baggage, :tracer_context],
...

If you instead need to use the B3 specification, originally from the Zipkin project, then replace trace_context and :trace_context with b3 and :b3 for Erlang or Elixir respectively.

Library Instrumentation

Library instrumentations, broadly speaking, refers to instrumentation code that you didn’t write but instead include through another library. OpenTelemetry for Erlang/Elixir supports this process through wrappers and helper functions around many popular frameworks and libraries. You can find in the opentelemetry-erlang-contrib repo and the registry.

Creating Metrics

The metrics API, found in apps/opentelemetry-experimental-api of the opentelemetry-erlang repository, is currently unstable, documentation TBA.