Gå til innhald

Configure the OPC Classic extractor

To configure the OPC Classic extractor, you must edit the configuration file. The file is in YAML format and the sample configuration file contains all valid options with default values.

When setting up an extractor, you should not base your config on the file config.example.yml, but instead, use the config.minimal.yml as your base and copy the parts you need from config.example.yml.

You can exclude fields entirely to let the extractor use default values. The configuration file separates settings by component, and you can remove an entire component to disable it or use default values.

Environment variable substitution

In the config file, values wrapped in ${} are replaced with environment variables with that name. For example, ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

The configuration file also contains the global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.

:::tip Tip You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud. :::

Minimal YAML configuration file

version: 1

source:
    # Windows username for authentication
    username: 
    # Windows password for authentication
    password:
    # List of servers to connect to.
    servers:
      - # Server host name or IP address
        host:
        # Version of DA to use, one of V2 or V3
        # This can be left out to disable live data.
        da-version:
        # Version of HDA to connect to on this host.
        # This can be left out to disable history.
        # Must be V1
        hda-version:
        # Prefix on externalIds for nodes generated by this server.
        id-prefix:
        # Name of state store used to store states if history is enabled for this server. Required if hda-version is set.
        state-store-name:
    endpoint-url: "opc.tcp://localhost:4840"

cognite:
    # The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
    project: "${COGNITE_PROJECT}"

    # If this is set to true, credentials can be left out, and the extractor
    # will read data without pushing it to CDF.
    debug: false

    # This is for Microsoft as IdP, to use a different provider,
    # set implementation: Basic, and use token-url instead of tenant.
    # See the example config for the full list of options.
    idp-authentication:
        # Directory tenant
        tenant: ${COGNITE_TENANT_ID}
        # Application Id
        client-id: ${COGNITE_CLIENT_ID}
        # Client secret
        secret: ${COGNITE_CLIENT_SECRET}
        # List of resource scopes, ex:
        # scopes:
        #   - scopeA
        #   - scopeB
        scopes:
          - ${COGNITE_SCOPE}

Timestamps and intervals

In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit]. For example, 10m for 10 minutes or 1h for 1 hour. timeunit is one of d, h, m, s, ms. You can also use cron expressions.

For history start and end times, you can use a similar syntax. [N][timeunit] and [N][timeunit]-ago. 1d-ago means 1 day in the past from the time history starts, and 1h means 1 hour in the future. For instance, you can use this syntax to configure the extractor to read only recent history.

Source

This section contains parameters for connecting to the OPC Classic servers. The extractor can connect to multiple servers, where each server has its own ID prefix. Technically, each server may have multiple connections because DA and HDA are effectively separate servers. In practice, many servers support both DA and HDA interfaces but share information between the two.

All servers share the same authentication information. This is under the assumption that the extractor signs in to the servers using a shared domain or similar. It is considered best practice to give the extractor a separate network user.

Run multiple extractors if the extractor needs multiple sets of credentials.

Parameter Description
username Windows username to use for authentication to the servers.
password Windows password to use for authentication to the servers.
domain Domain for the user used for authentication.
parallelism Maximum number of requests made in parallel to each server. The default value is 10.
use-async Use async mode when making requests to HDA servers. This can be more efficient both for the extractor and the server, but not all servers support it, so it's disabled by default.
servers A list of servers to extract from.
servers[].host Host or IP address to connect to.
servers[].da-version Version of DA to connect to on this host. This can be left out to not connect to DA at all. Valid options are V2 or V3.
servers[].hda-version Version of HDA to connect to. This can be left out to not connect to HDA at all. Must be set equal to V1 if enabled.
servers[].name Name of the server to connect to on the given host. This is used to pick which server to connect to if multiple are available. If left out, pick the first server found.
servers[].proxy-address Proxy requests to the server through this address.
servers[].id-prefix Prefix for external IDs of assets and time series created in CDF by this server.
servers[].state-store-name Name of state store used to store states if history is enabled for this server. Required if hda-version is set.
servers[].cache.path Path to a local JSON file caching the node hierarchy. If the file doesn't exist, the extractor will browse the DA and HDA servers and generate it. The file may be manually edited to limit which nodes the extractor should read.
attribute-chunking Configuration for chunking of attributes read from HDA servers.
attribute-chunking.chunk-size Maximum number of items per attribute read request. The default value is 1000.
attribute-chunking.parallelism Maximum number of parallel requests for attributes. The default value is 10.
keep-alive-interval Interval between each read of server status, used as a keep alive. The syntax is described in Timestamps and intervals.

Subscriptions

If you connect the extractor to a Data Access (DA) server, it establishes subscriptions on the tags it discovers. Subscriptions in OPC DA are callbacks, meaning that the server will call a function in the extractor whenever it sees any changes.

Parameter Description
chunking Configuration for how the extractor will chunk requests for subscriptions.
chunking.chunk-size Maximum number of items per subscription request. The default value is 1000.
chunking.parallelism Maximum number of parallel requests to create subscriptions. The default value is 10.
keep-alive Keep alive rate in milliseconds for subscriptions. The default value is 10000.
update-rate Requested update rate, this is how often the server should check for updates from its underlying systems. The server is not required to obey this, and may return a revised update rate, or just use a notification based approach. The default value is 1000
deadband Minimum difference in value required for an update to be registered. The default value is 0.0

Logger

Log entries are either Fatal, Error, Warning, Information, Debug, Verbose, in order of decreasing importance. Each level covers the ones of higher importance.

Parameter Description
console Configuration for logging to the console.
console.level Minimum level of log events to write to the console. Set this to enable console logging.
console.stderr-level Log events at this level or above are redirected to standard error.
file Configuration for logging to a rotating log file.
file.level Minimum level of log events to write to file.
file.path Path to the files to be logged. If this is, for example, set to logs/log.txt, logs on the form logs/log[date].txt will be created, depending on rolling-interval.
file.retention-limit Maximum number of log files that are kept in the log folder.
file.rolling-interval A rolling interval for log files. Either day or hour. The default value is day.

Metrics

The OPC Classic extractor can push some general metrics about usage to a Prometheus pushgateway server, or expose a prometheus server for scraping.

Parameter Description
server Configuration for a prometheus scrape server.
server.host Host for a locally hosted prometheus server, used for scraping.
server.port The port used by the local prometheus server.
push-gateways A list of pushgateway destinations the extractor will push metrics to.
push-gateways[].host URI of the pushgateway host.
push-gateways[].job Name of the metrics job on this pushgateway.
push-gateways[].username Username for basic authentication.
push-gateways[].password Password for basic authentication.
push-gateways[].push-interval Interval in seconds between each push of metrics.

Cognite

Configuration for the connection to Cognite Data Fusion (CDF).

Parameter Description
host The CDF service URL. Defaults to https://api.cognitedata.com.
project The CDF project. Required.
idp-authentication Configuration for authenticating to CDF.
idp-authentication.authority Authority used with tenant to authenticate to azure tenants. Use token-url if connecting using a non-azure IdP. Defaults to https://login.microsoftonline.com
idp-authentication.tenant Azure tenant used with authority.
idp-authentication.token-url URL used to obtain service tokens, used for non-azure IdPs.
idp-authentication.client-id Service principal client ID.
idp-authentication.secret Service principal client secret.
idp-authentication.resource Optional resource parameter to pass along with token request.
idp-authentication.scopes A list of scopes to pass along with the request, will typically need to contain [host]/.default
idp-authentication.audience Optional audience parameter to pass along with token request.
idp-authentication.min-ttl Requested minimum time-to-live in seconds for the token.
idp-authentication.certificate Configuration for authenticating using a client certificate.
idp-authentication.certificate.authority-url Certificate authority URL.
idp-authentication.certificate.path Path to the .pem or .pfx certificate to be used for authentication.
idp-authentication.certificate.password Certificate password.
data-set Configuration for data set to assign newly created assets and time series to.
data-set.id Internal ID of dataset. Specify either this or external-id.
data-set.external-id External ID of dataset. Specify either this or id.
update Set this to true to enable updating assets and time series that have changed in the source.
metadata-targets Targets for writing "metadata", meaning assets and timeseries name, description, and metadata. By default the extractor will create time series with just an external ID and nothing else.
metadata-targets.clean Configuration for writing metadata to CDF Clean.
metadata-targets.clean.assets Set to true to enable creating assets in CDF.
metadata-targets.clean.time-series Set to true to enable writing time series metadata.
max-upload-interval Maximum time to cache datapoints before they are uploaded to CDF. The syntax is described in Timestamps and intervals. Defaults to 1s
max-data-points-upload-queue-size Maximum number of cached datapoints before they are uploaded to CDF. Defaults to 1000000.
cdf-retries Configuration for automatic retries on requests to CDF.
cdf-retries.timeout Timeout in milliseconds for each individual try. Defaults to 80000.
cdf-retries.max-retries The maximum number of retries. Less than 0 retries forever.
cdf-retries.max-delay Maximum delay between each try in milliseconds. Base delay is calculated according to 125 * 2 ^ retry milliseconds. If this is less than 0, there is no upper limit. Defaults to 5000.
cdf-chunking Configuration for chunking on requests to CDF. Note that increasing these may cause requests to fail, due to limits in the API. Read the API documentation before making these higher than their current value.
cdf-chunking.time-series Maximum number of time series per get/create time series request.
cdf-chunking.assets Maximum number of assets per get/create assets request.
cdf-chunking.data-point-time-series Maximum number of time series per datapoint create request.
cdf-chunking.data-points Maximum number of datapoints per datapoint create request.
cdf-throttling Configuration for how requests to CDF should be throttled.
cdf-throttling.time-series Maximum number of parallel requests per time series operation. Defaults to 20.
cdf-throttling.assets Maximum number of parallel requests per assets operation. Defaults to 20.
cdf-throttling.data-points Maximum number of parallel requests per datapoints operation. Defaults to 10.
sdk-logging Configuration for logging of requests from the SDK.
sdk-logging.disable Set this to true to disable logging of requests from the SDK, it's enabled by default.
sdk-logging.level Log level to log messages from the SDK at, defaults to debug.
sdk-logging.format Format of the log message. Defaults to CDF ({Message}): {HttpMethod} {Url} - {Elapsed} ms
nan-replacement Replacement for NaN values when writing to CDF. Defaults to none, meaning these are just removed.
extraction-pipeline.external-id Configuration for associating this extractor with an extraction pipeline. Used for monitoring and remote configuration.
certificates Configuration for special handling of SSL certificates. This shouldn't be considered a permanent solution to certificate problems.
certificates.accept-all Accept all remote SSL certificates even if verification fails. This introduces a risk of man-in-the-middle attacks.
certificates.allow-list List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates.

State Store

Configuration for storing state in a local database or in CDF RAW. This is required if reading from an HDA server.

Parameter Description
location Path to database file, or name of raw database containing state store.
database Which type of database to use. Valid options are LiteDb, Raw, or None. The default value is None.
interval Interval between each push of local states to the state store. The syntax is described in Timestamps and intervals. The default value is 1m.

History

Configuration for reading historical data from an HDA server.

Parameter Description
backfill Set this to true to enable backfill, meaning that the extractor will read backwards from the earliest known timestamp as well as forwards from the latest known timestamp on startup. This is only useful if there is enough data in the server that reading it all will take a very long time.
start-time The earliest timestamp history will be read from, in milliseconds since 01/01/1970. Alternatively use syntax N[timeunit](-ago) where timeunit is one of w, d, h, m, s, or ms. -ago indicates that this is in the past, otherwise it will be in the future.
end-time The latest timestamp that history will be read from, in milliseconds since 01/01/1970. Alternatively use syntax N[timeunit](-ago) where timeunit is one of w, d, h, m, s, or ms. -ago indicates that this is in the past, otherwise it will be in the future.
chunking Chunking for history reads.
chunking.chunk-size Maximum number of items per history read request. Defaults to 1000.
chunking.parallelism Maximum number of parallel history read requests. Defaults to 10.
chunking.max-per-minute Maximum number of history read requests per minute.
chunking.max-read-per-tag Maximum number of values returned per tag per request. Defaults to 1000.
granularity Granularity to use when doing history read. Nodes with the last/earliest known timestamp within this range of each other will be read together. This shouldn't be smaller than the usual average update rate. Leave at 0 to always read a single node each time. The syntax is described in Timestamps and intervals. Defaults to 15s.