Observability
更新时间: 2025/07/07
在Gitcode上查看源码

Observability is a visualization capability built on the OpenTelemetry specifications. It focuses on the observability data (including metrics, traces, and logs) in the system and provides collection, reporting, and integration with multiple visualization tools, helping monitor the system status, predict the running trend, and analyze and locate system faults.

Background

Traces

A trace is a set of events, triggered as a result of a single logical operation, and consolidated after processed by various components of an application. A distributed trace contains events that cross process, network, and security boundaries. It may be initiated when someone presses a button to start an operation on a website. In particular, a trace can be thought of as a directed acyclic graph (DAG) of spans, whose edges are defined as parent/child relationship. The following is an example trace made up of 6 spans:

text
            [Span A]  ←←←(the root span)
               |
        +------+------+
        |             |
    [Span B]      [Span C] ←←←(Span C is a `child` of Span A)
        |             |
    [Span D]      +---+-------+
                  |           |
               [Span E]    [Span F]

It can be easier to visualize traces with a time axis as in the diagram below:

text
––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time

 [Span A···················································]
   [Span B··············································]
      [Span D··········································]
    [Span C········································]
         [Span E·······]        [Span F··]

Metrics

Metrics are quantitative measurements of system behavior, performance, and status. They can be any measurable values, such as CPU usage, memory usage, request latency, and error rate. Recording measurement values mainly involves meters, instruments, and measurements.

  • Meter: A meter creates and manages instruments.
  • Instrument: An instrument is an object that has a name, type, description, and unit. It is used to capture and record measurement values.
  • Measurement: A measurement is a single record of the original value, including key-value pair attribute tags, which are used to mark and distinguish measurement values in different scenarios.
text
+------------------+
| MeterProvider    |                 +-----------------+             +--------------+
|   Meter A        | Measurements... |                 | Metrics...  |              |
|     Instrument X +-----------------> In-memory state +-------------> MetricReader |
|     Instrument Y |                 |                 |             |              |
|   Meter B        |                 +-----------------+             +--------------+
|     Instrument Z |
|     ...          |                 +-----------------+             +--------------+
|     ...          | Measurements... |                 | Metrics...  |              |
|     ...          +-----------------> In-memory state +-------------> MetricReader |
|     ...          |                 |                 |             |              |
|     ...          |                 +-----------------+             +--------------+
+------------------+

Logs

Logs record events and status information of applications for debugging and troubleshooting.

  • OTel logging APIs: native logging APIs (Logger/LoggerProvider)
  • Context injection: automatically adding TraceID/SpanID to implement log-trace correlation. Core components and data flow directions are as follows.
text
+----------------------------------------------------------------+
|                           Application                          |
|                                                                |
|          +----------------------+       +-------------------+  |
|          |   OTel Logging API   |       |   OTel Context    |  |
|          |   (Logger/Logger-   +------->   Propagator       |  |
|          |  Provider/LogRecord) |       | (TraceID/SpanID)  |  |
|          +----------+-----------+       +---------+---------+  |
|                     |                             |            |
|                     |        Inject Context       |            |
|                     +<----------------------------+            |
|                     |                                          |
|          +----------v-----------+                              |
|          |   OTel Logs SDK      |                              |
|          |  +-----------------+ |                              |
|          |  | LogRecord       | |       +-------------------+  |
|          |  | Processor       +--------->   Resource        |  |
|          |  | (Batch/Simple)  | |       | (service.name,    |  |
|          |  +--------+--------+ |       |  host.name, etc.) |  |
|          |           |          |       +---------+---------+  |
|          |  +--------v--------+ |                 |            |
|          |  | LogRecord       | |    Add Resource |            |
|          |  | Exporter        +<------------------+            |
|          |  | (OTLP/Console)  | |                              |
|          |  +-----------------+ |                              |
|          +----------+-----------+                              |
+----------------------------------------------------------------+

Data Model

Traces

Data Model for Traces

text
┌───────────────────────────────┐
│         TracerProvider        │
├───────────────────────────────┤
│ + getTracer(name, version)    │
└──────────────┬────────────────┘
               │ creates

┌───────────────────────────────┐
│            Tracer             │
├───────────────────────────────┤
│ + startSpan(name, options)    │
└──────────────┬────────────────┘
               │ creates

┌───────────────────────────────┐ has  ┌──────────────┐
│             Span              │◆──▶ │ SpanContext  │
├───────────────────────────────┤      ├──────────────┤
│ - name: string                │      │ - traceId    │ 
│ - startTime: timestamp        │      │ - spanId     │
│ - endTime: timestamp          │      │ - traceFlags │
│ - parentSpanId: String        │      │ - isRemote   │
├───────────────────────────────┤      └──────────────┘ 
│ + setAttribute(k,v)           │ contains ┌─────────────┐
│ + addEvent(name, attributes)  │───────▶ │  Attribute  │
│ + recordException(exception)  │          ├─────────────┤
│ + end()                       │          │ - key       │
└──────────────┬────────────────┘          │ - value     │
               │ contains                  └─────────────┘
               ▼                                 ▲
        ┌─────────────┐       contains           │
        │    Event    │──────────────────────────┘
        ├─────────────┤
        │ - name      │
        │ - timestamp │
        └─────────────┘

Span Composition

  • Operation name
  • Start and end timestamps
  • Attributes: list of key-value pairs
  • Events: a collection of zero or multiple events, each of which is a tuple (timestamp, name, attributes). The name must be a string.
  • Parent span identifier
  • Link: a link to zero or multiple causally related spans (identified by their respective span contexts)
  • Context: span context required for referencing a span

Span Name

A span name concisely identifies the work represented by the span, for example, the name of an RPC method, a function, or a subtask or phase in a larger computation.

Span Type

  • SERVER: A span covers a remote request processed by a server, and the client waits for a response.
  • CLIENT: A span describes a remote service request, and the client waits for a response. When the context of a CLIENT span is propagated, the CLIENT span usually becomes the parent of the remote SERVER span.
  • PRODUCER: A span describes the start or scheduling of a local or remote operation. This span usually ends before the related CONSUMER span ends, or even before the CONSUMER span starts. In a message transfer scenario with batch processing, a new PRODUCER span needs to be created for each message to trace the message.
  • CONSUMER: A span represents the processing of an operation started by the PRODUCER, and the PRODUCER does not wait for the result.
  • INTERNAL: a default value indicating a span represents an internal operation within the application, rather than an operation with a remote parent or child relationship.

Metrics

Data Model for Metrics

text
  Metric
+------------+
|name        |
|description |
|unit        |     +------------------------------------+
|data        |---> |Gauge, Counter, Histogram, ...      |
+------------+     +------------------------------------+

  Data [One of Gauge, Counter, Histogram, ...]
+-----------+
|...        |  // Metadata about the Data.
|points     |--+
+-----------+  |
               |      +---------------------------+
               |      |DataPoint 1                |
               v      |+------+------+   +------+ |
            +-----+   ||label |label |...|label | |
            |  1  |-->||value1|value2|...|valueN| |
            +-----+   |+------+------+   +------+ |
            |  .  |   |+-----+                    |
            |  .  |   ||value|                    |
            |  .  |   |+-----+                    |
            |  .  |   +---------------------------+
            |  .  |                   .
            |  .  |                   .
            |  .  |                   .
            |  .  |   +---------------------------+
            |  .  |   |DataPoint M                |
            +-----+   |+------+------+   +------+ |
            |  M  |-->||label |label |...|label | |
            +-----+   ||value1|value2|...|valueN| |
                      |+------+------+   +------+ |
                      |+-----+                    |
                      ||value|                    |
                      |+-----+                    |
                      +---------------------------+

Metric Composition

  • A metric consists of metadata and data.
  • The metadata contains the following attributes:
    • Metric name
    • Attribute (dimension, also called label)
    • Value type (integer, floating point, etc.)
    • Metering unit
  • Data is one of the types of counters, meters, and histograms.
    • A data point contains a timestamp, attribute, and value.

Metric Name

  • It cannot be an empty string.
  • It is a case-insensitive ASCII string.
  • It must start with a letter. Subsequent characters must contain alphanumeric characters, underscores (_), periods (.), hyphens (-), and slashes (/).
  • It contains 0 to 255 characters.

Metric Name Conventions

  • limit: A tool that measures a constant value of a known total should be called entity.limit. For example, system.memory.limit indicates the total system memory.
  • usage: A measurement mechanism that tracks the amount used within a known, finite total should be named entity.usage. For example, system.memory.usage uses the attribute = used | cached | free |... to indicate the memory amount for each specific state. Where appropriate, the total usage (the sum of all attribute values) must equal the total capacity.
  • utilization: A measurement mechanism that tracks the ratio of usage to its limit should be called entity.utilization. For example, system.memory.utilization indicates the proportion of memory in use.
  • time: A measurement mechanism that tracks the time elapsed should be called entity.time. For example, system.cpu.time with the attribute = idle | user | system |... may not be the wall-clock time, and may be less or greater than the wall-clock time between measurements.
  • io: A measurement mechanism for tracing bidirectional data flows should be called entity.io and have a direction attribute, such as system.network.io.

Logs

Data Model for Logs

text
Logger
+------------------+
| name             |
| version          |     +------------------------------------+
| ...              |---> |LogRecord 1                         |
+------------------+     |+----------------+----------------+|
                         || Timestamp      |  timestamp     ||
                         |+----------------+----------------+|
                         || SeverityNumber |  severity_num  ||
                         |+----------------+----------------+|
                         || SeverityText   |  severity_text ||
                         |+----------------+----------------+|
                         || Body           |  body          ||
                         |+----------------+----------------+|
                         || Attributes     |  +-----------+ ||
                         ||                |  |key: value | ||
                         ||                |  |...        | ||
                         ||                |  +-----------+ ||
                         |+----------------+----------------+|
                         || TraceId        |  (optional)    ||
                         || SpanId         |  (optional)    ||
                         |+----------------+----------------+|
                         +------------------------------------+
                                        .
                                        .
                                        .
                         +------------------------------------+
                         |LogRecord N                         |
                         | ...                                |
                         +------------------------------------+

Log Composition

  • A log data model (log record) is independent of vendors. It contains core fields, such as timestamp, severity, message body, trace ID, span ID, resource attributes (such as service name and instance ID), and attributes (key-value pairs).

Attribute

Attributes are used in traces, metrics, and logs. An attribute is a key-value pair and must meet the following requirements:

  • The attribute key must be a non-empty string. The key is case sensitive. Keys with different cases are considered different keys.
  • The attribute value can be:
    • A primitive type: string, Boolean value, double-precision floating point number (IEEE 754-1985), or signed 64-bit integer.
    • An array of primitive type values. The array must be homogeneous, that is, it cannot contain values of different types.
  • The maximum number of attributes is 128.

Attribute Name

Attribute names starting with otel. are reserved by OpenTelemetry. The attribute name must comply with the following rules:

  • Names should be lowercase.
  • Use namespaces. Delimit the namespaces using a period. For example, service.version denotes the service version. Service is the namespace and version is an attribute in that namespace. Namespaces can be nested. For example, telemetry.sdk is a namespace inside top-level telemetry namespace and telemetry.sdk.name is an attribute inside telemetry.sdk namespace. Use namespaces (and periods) whenever it makes sense.
  • For each multi-word component separated by periods, use underscores to separate the words (i.e., snake_case). For example, http.response.status_code denotes the status code in the http namespace. Use underscores only when the use of period (namespaces) is meaningless or the semantic meaning of the name is changed. For example, use rate_limiting instead of rate.limiting.
  • Attribute, event, metric, and other names should be descriptive and unambiguous. When introducing a name describing a certain attribute of the object, include the attribute name. For example, use file.owner.name instead of file.owner and system.network.packet.dropped instead of system.network.dropped. Avoid introducing names and namespaces that would mean different things when used by different conventions or tools. For example, use security_rule instead of rule.
  • Use shorter names when it does not compromise clarity. Drop namespace components or words in multi-word components when they are not necessary. For example, vcs.change.id describes pull request ID as precisely as vcs.repository.change.id does.

Observability Architecture

Observability Data Flow Diagram

  1. Application components collect observability data through traces, metrics, and log interfaces encapsulated by the application framework based on the OpenTelemetry SDK.
  2. The application framework processes observability data through the OpenTelemetry SDK sampler, processor, and exporter, and reports the data to the Fluent Bit server.
  3. The Fluent Bit service of the observability component forwards and reports observability data to external collectors and visualization platforms of openUBMC, based on the observability configuration.

Application Interface

Trace

This module is designed to isolate the Lua layer of trace-related capability interfaces, keeping the Lua layer separate from both the encapsulation layer and the open-source intrduction layer. Isolation between the Lua layer and the encapsulation layer is achieved via weak references. In the event the encapsulation library cannot be loaded via require, the Lua layer provides empty objects and no-op interfaces to protect business logic.

tracer Creation

The tracer serves as the starting point for building the trace capability. The service needs to hold a tracer to create a spanand use the span recording, storage, and reporting functions to build cross-service trace links. During component initialization, a tracer is automatically created and stored using the component name as its tracer name. The component can call get_tracer() to directly return the tracer. The component can also create a tracer by passing the name parameter.

lua
-- Method prototype
local tracer = trace.get_tracer(name, version, url)

-- Usage example
local trace = require 'telemetry.trace'
local name = ...
local trace_custom = trace.get_tracer(name, '1.0.0')
local trace_default = tracer.get_tracer()

req:

  • name: (mandatory) string type, tracer name. Generally, the library name is used. For example, "hwdiscovery"
  • version: (optional) string type, library version. For example, "1.0.0"
  • url: (optional) string type, pointing to the semantic convention document for attribute naming. For example, "https://xxx/"

rsp:

  • tracer: userdata type. The service holds and calls start_span to create a span and perform subsequent operations.
  • msg: string type, containing the error message. This field returns detailed failure information upon error, but is omitted (or empty) upon success. The business logic determines whether to process this information.

span Creation

A span is the minimum functional unit in a trace. The life cycle of a span is generally a single function or code block. Services create spans at the tracing location and trace data using the span capabilities. After the life cycle ends, the data is reported. The backend integrates the spans reported by different services into a complete trace based on the correlation between the reported spans.

lua
-- Method prototype
local child_span = tracer:start_span(name, attribute, options)
local child_span_opt = trace.start_span(name, attribute, options)

-- Usage example
function get_object()
    local root_span = tracer:start_span("get_object", {"level": "notice", "opcode": 1})
    local parent_span_context = root_span:get_context()
    local child_span = tracer:start_span("get_object", {"level": "notice", "opcode": 1}, {parent = parent_span_context})

    -- Spans can be created without using tracers. Tracers are automatically created by the trace based on the current component name and version.
    local child_span_opt = trace.start_span("get_object", {"level": "notice", "opcode": 1}, {parent = parent_span_context})
end

req:

  • name: (mandatory) string type, span name. The name does not need to be unique. When a span is created, a span_id is allocated as the unique identifier, which is not perceived by the service. For examp, "get_object";
  • attribute: (mandatory) dictionary type. The key must be of the string type and the value can only be of the string, bool, int, or double type. The attribute is used by users as required and is exported with span data for trace tagging and data filtering. For examp, {"level": "notice", "opcode": 1};
  • options: (optional) configuration item available when creating a span, dictionary type. Available options are as follows:
    • parent: table type. Only the spancontext data exported by span:get_context() can be used. If the data is passed and created as child span, it inherits the parent trace_id. If the data is not passed, a root span is created as the start of the entire trace.
    • force_sample: bool type. It is used as the custom component sampling flag. The value true is passed when the service requires mandatory trace sampling.

rsp:

  • span: userdata type. The service holds and calls related methods to complete the creation of the trace. If the Lua layer does not reference the encapsulation layer or the creation fails, a noop span is returned. Any operation does not take effect and data is not reported.
  • msg: string type, containing the error message. This field returns detailed failure information upon error, but is omitted (or empty) upon success. The business logic determines whether to process this information.

span Operations

Use set_attribute to Set span Attributes

span attributes are exported along with span data and can be used for trace tagging and data filtering.

lua
-- Method prototype
span:set_attribute(key, value)

-- Usage example
function get_object()
    local span = trace.start_span(name, attribute)

    span:set_attribute("level", "notice")

    local err_msg = span:set_attribute(key, value)
    local err_msg then
        log:error("%s", err_msg)
    end
end

req:

  • key: (mandatory) string type, indicating the attribute name. If the attribute name already exists, this operation will modify the original attribute. e.g. "level";
  • value: (mandatory) The value can be of the string, bool, int, or double type, corresponding to the attribute value of the attribute name.

rsp:

  • msg: string type, containing the error message. This field returns detailed failure information upon error, but is omitted (or empty) upon success. The business logic determines whether to process this information.

Non-parent-child links are created between spans, typically used in asynchronous scenarios.

lua
-- Method prototype
span:add_link(span_context, attribute)

-- Usage example
function get_object()
    local span = trace.start_span(name, attribute)

    span:add_link(span_context, {"level": "notice", "opcode": 1})

    local err_msg = span:add_link(span_context, attribute)
    local err_msg then
        log:error("%s", err_msg)
    end
end

req:

  • span_context : (mandatory) dictionary type. Only the spancontext data exported by span:get_context() can be used.
  • attribute: (optional) dictionary type. The key must be of the string type, and the value can only be of the string, bool, int, or doubletype. It is an attribute of the span and a supplementary attribute when links are added. It is used by users as required and is exported as an additional field of the links field along with the span data. For example, {"level": "notice", "opcode": 1}.

rsp:

  • msg: string type, containing the error message. This field returns detailed failure information upon error, but is omitted (or empty) upon success. The business logic determines whether to process this information.

Use add_event to Add a Span Event

Events are recorded in spans, which are similar to log recording operations and are the main part for visualization.

lua
-- Method prototype
span:add_event(name, attribute)

-- Usage example
function get_object()
    local span = trace.start_span(name, attribute)

    span:add_event(string.format("get object failed: %s", ret), {"level": "notice", "opcode": 1})

    local err_msg = span:add_event(name, attribute)
    local err_msg then
        log:error("%s", err_msg)
    end
end

req:

  • name: (mandatory) string type. It indicates the event description, that is, the log content. For example, "get object failed".
  • attribute: (optional) dictionary type. The key must be of the string type, and the value can only be of the string, bool, int, or double type. It is a supplementary attribute when an event is added and is used by users as required. It is exported as an additional field of the event field along with the span data. For example, {"level": "notice", "opcode": 1}.

rsp:

  • msg: string type, containing the error message. This field returns detailed failure information upon error, but is omitted (or empty) upon success. The business logic determines whether to process this information.

Use set_status to Set the Current span Status

Sets the current span status. The default status is unset. The unset, ok, and error statuses can be set. The span status can be used for tail sampling and visualized filtering.

lua
-- Method prototype
span:set_status(status, description)

-- Usage example
function get_object()
    local span = trace.start_span(name, attribute)

    span:set_status("ok", "get object succeeded")

    local err_msg = span:set_status(status, description)
    local err_msg then
        log:error("%s", err_msg)
    end
end

req:

  • status: (mandatory) string type, span status. Only description can be set. For example, "ok".
  • description: (mandatory) string type, supplementary description of the status. For example, "get object succeeded".

rsp:

  • msg: string type, containing the error message. This field returns detailed failure information upon error, but is omitted (or empty) upon success. The business logic determines whether to process this information.

Use get_context to Obtain the Context of the Current span

Obtain the context of the current span for subsequent link or child span creation.

lua
-- Method prototype
local context = span:get_context()

-- Usage example
function get_object()
    local span = trace.start_span(name, attribute)

    local context = span:get_context()

    local context, err_msg = span:get_context()
    local err_msg then
        log:error("%s", err_msg)
    end
end

rsp:

  • context: context information of a span, dictionary type. The fields include trace_id, span_id, trace_state, is_remoting, and trace_flags. The service is unaware of the specific information; it is only used for propagation. context content must not be modified, as it would cause child span linking failures.
  • msg: string type, containing the error message. This field returns detailed failure information upon error, but is omitted (or empty) upon success. The business logic determines whether to process this information.

Use is_recording to Check Whether the Current span Can Be Operated

Sampling policy formulation and errors may cause span creation failures and return no operation span. If the content to be recorded must be obtained with time or resource consumption, this method can be used to determine whether to collect the recorded information.

lua
-- Method prototype
local is_recording = span:is_recording()

-- Usage example
function get_object()
    local span = trace.start_span(name, attribute)

    if span:is_recording() then
        span:add_event()
        span:set_status()
    end
end

rsp:

  • is_recording: bool type. It indicates whether the current span can be operated. If an error occurs, false is returned.

Use finish to End the Current span

Ending the current span will automatically trigger operations such as storage or reporting.

lua
-- Method prototype
span:finish()

-- Usage example
function get_object()
    local span = trace.start_span(name, attribute)

    span:add_event()
    span:set_status()

    span:finish()
end

Use pcall to Execute the Method, Capture the Execution Status, and Record the Status to the span for Reporting

Similar to the pcall method of Lua, this method protects the calling, captures the execution result of the function, updates the span status to ok or error, and reports the status. In addition, it automatically ends the trace.

lua
-- Method prototype
span:pcall(cb, ...)

-- Usage example
function get_object()
    local span = trace.start_span(name, attribute)
    local function record(time, msg)
        if type(msg) ~= "string" then
            error("msg must be string")
        end
        print(time, msg)
    end

    local ok, ret = span:pcall(record, time, msg)
end

req:

  • function: (mandatory) function subjected to the protected pcall. For example, function();
  • param: (optional) function subjected to the protected call

rsp:

  • status: bool type, indicating whether the function is successfully executed (true: successful; false: failed)
  • result: return value of the function (if successful) or error information (if failed)

Metrics

This module is designed to isolate the Lua layer of metrics-related capability interfaces, keeping the Lua layer separate from both the encapsulation layer and the open-source integration layer. Isolation between the Lua layer and the encapsulation layer is achieved via weak references. In the event the encapsulation library cannot be loaded via require, the Lua layer provides empty objects and no-op interfaces to protect the business logic.

Core concepts

  • MeterProvider: entry point of the entire Metrics API, which creates and manages all Meter instances.
  • Meter: created by MeterProvider to generate specific metrics tool Instrument. CreateXxxInstrument() is used to create different types of instruments based on service requirements.
  • Instrument: created by Meter to collect and report specific measurement data.

Hierarchical structure example:

plaintext
+-- MeterProvider()
    |
    +-- Meter(name='test1', version='1.0.0', schema url='https://test//test')
    |   |
    |   +-- Instrument<Counter>(name='counter_test', description='test counter', unit='kb')
    |   |
    |   +-- instrument...
    |
    +-- Meter(name='test2', version='1.1.0', schema url='https://test//test')
        |
        +-- Instrument<ObservableCounter>(name='observable_counter_test', description='test observable counter', unit='kb')
        |
        +-- Instruments...

meter Creation

meter is the initial part of metrics capability building. Services need to hold a meter and create different types of instrument based on services to record data. The component calls get_meter() to create a meter.

lua
-- Method prototype
local meter = metrics.get_meter(name, version, schema_url)
 
-- Usage example
local metrics = require 'telemetry.metrics'
local meter_custom = metrics.get_meter(name)

req:

  • name: (mandatory) string type, meter name
  • version: (optional) version number
  • schema_url: (optional) URL of the associated semantic convention

instrument Creation

An instrument is a carrier that collects and reports specific measurement data. Services can create different types of instrument as required.

NOTE

Instrument names are displayed in the final visualization system, which directly reflects the readability of data reporting. Therefore, the rationality of instrument names needs to be reviewed.

lua
-- Method prototype
meter:create_counter(name, description, unit)
meter:create_updowncounter(name, description, unit)
meter:create_observable_counter(name, description, unit)
meter:create_observable_updowncounter(name, description, unit)
meter:create_observable_gauge(name, description, unit)
 
-- Usage example
local counter = meter:create_counter("counter_data", "count some data for test", "kb")
  • create_counter: Creates a monotonically increasing counter.
  • create_updowncounter: Creates a counter that can be incremented or decremented.
  • create_observable_counter: Creates a monotonically increasing asynchronous counter observable_counter.
  • create_observable_updowncounter: Creates an asynchronous counter observable_updowncounter that can be incremented or decremented.
  • create_observable_gauge: Creates an asynchronous instantaneous meter observable_gauge.

req:

  • name: (mandatory) string type, instrument name.
  • description: (optional) description of the instrument.
  • unit: (optional) unit of the collected data, for example, kb.

instrument Method Invocation

Synchronous Instruments

Synchronous instruments include counter, updowncounter, and histogram, which directly record metrics through explicit invoking.

lua
-- Create a meter
local meter = metrics.get_meter()

-- Create a counter
local counter = meter:create_counter("my_counter")

-- Increase the value of the counter
counter:add(10)
lua
-- Create a meter
local meter = metrics.get_meter()

-- Create an updowncounter
local updowncounter = meter:create_updowncounter("my_updowncounter")

-- Increase or decrease the value of the updowncounter
updowncounter:add(5)    -- Increase
updowncounter:add(-3)    -- Decrease

-- e.g. add dimensions to a metric using attributes
flash_io_counter:add(data_size, {file_type="log", mc_name="xx"})
flash_io_counter:add(data_size, {file_type="persisitence", mc_name="xx"})
lua
-- Create a meter
local meter = metrics.get_meter()

-- Create a histogram
local histogram = meter:create_histogram("my_histogram")

-- Record the value of the histogram
histogram:record(5)

-- e.g. add dimensions to a metric using attributes
record:record(data, {mc_name="xx"})

In the metric sampling of actual services, tags can be added to analyze the same metric from multiple dimensions. The visualization system supports metric filtering and aggregation based on tags. For example:

  • otelcol_flash_io_total{file_type = "log"} reports the cumulative bytes written to the flash storage for data categorized as log files.
  • sum(otelcol_flash_io_total{file_type = "persisitence"}) reports the cumulative bytes written to the flash storage for data categorized as persistent files.

Asynchronous Instrument

Asynchronous instruments include observable_counter, observable_updowncounter, and observable_gauge. They are also called observation instruments and collect data by registering callbacks. These callbacks are called when metrics are collected. The data needs to be obtained periodically in polling mode.

lua
-- Create a meter
local meter = metrics.get_meter()

-- Create an ObservableCounter
local observable_counter = meter:create_observable_counter("my_observable_counter")

-- Register a callback
observable_counter:add_callback(function()
    return 30
end)

Logs

This module is designed to isolate the Lua layer of logs-related capability interfaces, keeping the Lua layer separate from both the encapsulation layer and the open-source integration layer. Isolation between the Lua layer and the encapsulation layer is achieved via weak references. In the event the encapsulation library cannot be loaded via require, the Lua layer provides empty objects and no-op interfaces to protect the business logic.

Log and debug log interfaces are unified. Reporting and debug logs are distinguished internally by key-value pair lists and parameter lists.

Logging

InterfaceFunction
log:debug(body, attrs)Reporting debug logs
log:info(body, attrs)Reporting info logs
log:notice(body, attrs)Reporting notice logs
log:warn(body, attrs)Reporting warn logs
log:error(body, attrs)Reporting error logs
lua
-- Record debug logs
log:debug('my debug log')

-- Record debug logs and related attribute information
log:debug('my debug log', {system_id=1, mc_name="xx"})

-- The method of recording other logs is the same as the preceding methods.