Configure the OPC Classic extractor¶

To configure the OPC Classic extractor, you must edit the configuration file. The file is in YAML format and the sample configuration file contains all valid options with default values.

When setting up an extractor, you should not base your config on the file config.example.yml, but instead, use the config.minimal.yml as your base and copy the parts you need from config.example.yml.

You can exclude fields entirely to let the extractor use default values. The configuration file separates settings by component, and you can remove an entire component to disable it or use default values.

Environment variable substitution¶

In the config file, values wrapped in ${} are replaced with environment variables with that name. For example, ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

The configuration file also contains the global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.

:::tip Tip You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud. :::

Minimal YAML configuration file¶

version: 1

source:
    # Windows username for authentication
    username: 
    # Windows password for authentication
    password:
    # List of servers to connect to.
    servers:
      - # Server host name or IP address
        host:
        # Version of DA to use, one of V2 or V3
        # This can be left out to disable live data.
        da-version:
        # Version of HDA to connect to on this host.
        # This can be left out to disable history.
        # Must be V1
        hda-version:
        # Prefix on externalIds for nodes generated by this server.
        id-prefix:
        # Name of state store used to store states if history is enabled for this server. Required if hda-version is set.
        state-store-name:
    endpoint-url: "opc.tcp://localhost:4840"

cognite:
    # The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
    project: "${COGNITE_PROJECT}"

    # If this is set to true, credentials can be left out, and the extractor
    # will read data without pushing it to CDF.
    debug: false

    # This is for Microsoft as IdP, to use a different provider,
    # set implementation: Basic, and use token-url instead of tenant.
    # See the example config for the full list of options.
    idp-authentication:
        # Directory tenant
        tenant: ${COGNITE_TENANT_ID}
        # Application Id
        client-id: ${COGNITE_CLIENT_ID}
        # Client secret
        secret: ${COGNITE_CLIENT_SECRET}
        # List of resource scopes, ex:
        # scopes:
        #   - scopeA
        #   - scopeB
        scopes:
          - ${COGNITE_SCOPE}

Timestamps and intervals¶

In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit]. For example, 10m for 10 minutes or 1h for 1 hour. timeunit is one of d, h, m, s, ms. You can also use cron expressions.

For history start and end times, you can use a similar syntax. [N][timeunit] and [N][timeunit]-ago. 1d-ago means 1 day in the past from the time history starts, and 1h means 1 hour in the future. For instance, you can use this syntax to configure the extractor to read only recent history.

Source¶

This section contains parameters for connecting to the OPC Classic servers. The extractor can connect to multiple servers, where each server has its own ID prefix. Technically, each server may have multiple connections because DA and HDA are effectively separate servers. In practice, many servers support both DA and HDA interfaces but share information between the two.

All servers share the same authentication information. This is under the assumption that the extractor signs in to the servers using a shared domain or similar. It is considered best practice to give the extractor a separate network user.

Run multiple extractors if the extractor needs multiple sets of credentials.

Parameter	Description
username	Windows username to use for authentication to the servers.
password	Windows password to use for authentication to the servers.
domain	Domain for the user used for authentication.
parallelism	Maximum number of requests made in parallel to each server. The default value is `10`.
use-async	Use async mode when making requests to HDA servers. This can be more efficient both for the extractor and the server, but not all servers support it, so it's disabled by default.
servers	A list of servers to extract from.
servers[].host	Host or IP address to connect to.
servers[].da-version	Version of DA to connect to on this host. This can be left out to not connect to DA at all. Valid options are `V2` or `V3`.
servers[].hda-version	Version of HDA to connect to. This can be left out to not connect to HDA at all. Must be set equal to `V1` if enabled.
servers[].name	Name of the server to connect to on the given host. This is used to pick which server to connect to if multiple are available. If left out, pick the first server found.
servers[].proxy-address	Proxy requests to the server through this address.
servers[].id-prefix	Prefix for external IDs of assets and time series created in CDF by this server.
servers[].state-store-name	Name of state store used to store states if history is enabled for this server. Required if hda-version is set.
servers[].cache.path	Path to a local JSON file caching the node hierarchy. If the file doesn't exist, the extractor will browse the DA and HDA servers and generate it. The file may be manually edited to limit which nodes the extractor should read.
attribute-chunking	Configuration for chunking of attributes read from HDA servers.
attribute-chunking.chunk-size	Maximum number of items per attribute read request. The default value is `1000`.
attribute-chunking.parallelism	Maximum number of parallel requests for attributes. The default value is `10`.
keep-alive-interval	Interval between each read of server status, used as a keep alive. The syntax is described in Timestamps and intervals.

Subscriptions¶

If you connect the extractor to a Data Access (DA) server, it establishes subscriptions on the tags it discovers. Subscriptions in OPC DA are callbacks, meaning that the server will call a function in the extractor whenever it sees any changes.

Parameter	Description
chunking	Configuration for how the extractor will chunk requests for subscriptions.
chunking.chunk-size	Maximum number of items per subscription request. The default value is `1000`.
chunking.parallelism	Maximum number of parallel requests to create subscriptions. The default value is `10`.
keep-alive	Keep alive rate in milliseconds for subscriptions. The default value is `10000`.
update-rate	Requested update rate, this is how often the server should check for updates from its underlying systems. The server is not required to obey this, and may return a revised update rate, or just use a notification based approach. The default value is `1000`
deadband	Minimum difference in value required for an update to be registered. The default value is `0.0`

Logger¶

Log entries are either Fatal, Error, Warning, Information, Debug, Verbose, in order of decreasing importance. Each level covers the ones of higher importance.

Parameter	Description
console	Configuration for logging to the console.
console.level	Minimum level of log events to write to the console. Set this to enable console logging.
console.stderr-level	Log events at this level or above are redirected to standard error.
file	Configuration for logging to a rotating log file.
file.level	Minimum level of log events to write to file.
file.path	Path to the files to be logged. If this is, for example, set to `logs/log.txt`, logs on the form `logs/log[date].txt` will be created, depending on `rolling-interval`.
file.retention-limit	Maximum number of log files that are kept in the log folder.
file.rolling-interval	A rolling interval for log files. Either `day` or `hour`. The default value is `day`.

Metrics¶

The OPC Classic extractor can push some general metrics about usage to a Prometheus pushgateway server, or expose a prometheus server for scraping.

Parameter	Description
server	Configuration for a prometheus scrape server.
server.host	Host for a locally hosted prometheus server, used for scraping.
server.port	The port used by the local prometheus server.
push-gateways	A list of pushgateway destinations the extractor will push metrics to.
push-gateways[].host	URI of the pushgateway host.
push-gateways[].job	Name of the metrics job on this pushgateway.
push-gateways[].username	Username for basic authentication.
push-gateways[].password	Password for basic authentication.
push-gateways[].push-interval	Interval in seconds between each push of metrics.

Cognite¶

Configuration for the connection to Cognite Data Fusion (CDF).

Parameter	Description
host	The CDF service URL. Defaults to `https://api.cognitedata.com`.
project	The CDF project. Required.
idp-authentication	Configuration for authenticating to CDF.
idp-authentication.authority	Authority used with `tenant` to authenticate to azure tenants. Use `token-url` if connecting using a non-azure IdP. Defaults to `https://login.microsoftonline.com`
idp-authentication.tenant	Azure tenant used with `authority`.
idp-authentication.token-url	URL used to obtain service tokens, used for non-azure IdPs.
idp-authentication.client-id	Service principal client ID.
idp-authentication.secret	Service principal client secret.
idp-authentication.resource	Optional resource parameter to pass along with token request.
idp-authentication.scopes	A list of scopes to pass along with the request, will typically need to contain `[host]/.default`
idp-authentication.audience	Optional audience parameter to pass along with token request.
idp-authentication.min-ttl	Requested minimum time-to-live in seconds for the token.
idp-authentication.certificate	Configuration for authenticating using a client certificate.
idp-authentication.certificate.authority-url	Certificate authority URL.
idp-authentication.certificate.path	Path to the `.pem` or `.pfx` certificate to be used for authentication.
idp-authentication.certificate.password	Certificate password.
data-set	Configuration for data set to assign newly created assets and time series to.
data-set.id	Internal ID of dataset. Specify either this or `external-id`.
data-set.external-id	External ID of dataset. Specify either this or `id`.
update	Set this to `true` to enable updating assets and time series that have changed in the source.
metadata-targets	Targets for writing "metadata", meaning assets and timeseries name, description, and metadata. By default the extractor will create time series with just an external ID and nothing else.
metadata-targets.clean	Configuration for writing metadata to CDF Clean.
metadata-targets.clean.assets	Set to `true` to enable creating assets in CDF.
metadata-targets.clean.time-series	Set to `true` to enable writing time series metadata.
max-upload-interval	Maximum time to cache datapoints before they are uploaded to CDF. The syntax is described in Timestamps and intervals. Defaults to `1s`
max-data-points-upload-queue-size	Maximum number of cached datapoints before they are uploaded to CDF. Defaults to `1000000`.
cdf-retries	Configuration for automatic retries on requests to CDF.
cdf-retries.timeout	Timeout in milliseconds for each individual try. Defaults to `80000`.
cdf-retries.max-retries	The maximum number of retries. Less than 0 retries forever.
cdf-retries.max-delay	Maximum delay between each try in milliseconds. Base delay is calculated according to `125 * 2 ^ retry` milliseconds. If this is less than 0, there is no upper limit. Defaults to `5000`.
cdf-chunking	Configuration for chunking on requests to CDF. Note that increasing these may cause requests to fail, due to limits in the API. Read the API documentation before making these higher than their current value.
cdf-chunking.time-series	Maximum number of time series per get/create time series request.
cdf-chunking.assets	Maximum number of assets per get/create assets request.
cdf-chunking.data-point-time-series	Maximum number of time series per datapoint create request.
cdf-chunking.data-points	Maximum number of datapoints per datapoint create request.
cdf-throttling	Configuration for how requests to CDF should be throttled.
cdf-throttling.time-series	Maximum number of parallel requests per time series operation. Defaults to `20`.
cdf-throttling.assets	Maximum number of parallel requests per assets operation. Defaults to `20`.
cdf-throttling.data-points	Maximum number of parallel requests per datapoints operation. Defaults to `10`.
sdk-logging	Configuration for logging of requests from the SDK.
sdk-logging.disable	Set this to `true` to disable logging of requests from the SDK, it's enabled by default.
sdk-logging.level	Log level to log messages from the SDK at, defaults to `debug`.
sdk-logging.format	Format of the log message. Defaults to `CDF ({Message}): {HttpMethod} {Url} - {Elapsed} ms`
nan-replacement	Replacement for `NaN` values when writing to CDF. Defaults to none, meaning these are just removed.
extraction-pipeline.external-id	Configuration for associating this extractor with an extraction pipeline. Used for monitoring and remote configuration.
certificates	Configuration for special handling of SSL certificates. This shouldn't be considered a permanent solution to certificate problems.
certificates.accept-all	Accept all remote SSL certificates even if verification fails. This introduces a risk of man-in-the-middle attacks.
certificates.allow-list	List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates.

State Store¶

Configuration for storing state in a local database or in CDF RAW. This is required if reading from an HDA server.

Parameter	Description
location	Path to database file, or name of raw database containing state store.
database	Which type of database to use. Valid options are `LiteDb`, `Raw`, or `None`. The default value is `None`.
interval	Interval between each push of local states to the state store. The syntax is described in Timestamps and intervals. The default value is `1m`.

History¶

Configuration for reading historical data from an HDA server.

Parameter	Description
backfill	Set this to `true` to enable backfill, meaning that the extractor will read backwards from the earliest known timestamp as well as forwards from the latest known timestamp on startup. This is only useful if there is enough data in the server that reading it all will take a very long time.
start-time	The earliest timestamp history will be read from, in milliseconds since `01/01/1970`. Alternatively use syntax `N[timeunit](-ago)` where `timeunit` is one of `w`, `d`, `h`, `m`, `s`, or `ms`. `-ago` indicates that this is in the past, otherwise it will be in the future.
end-time	The latest timestamp that history will be read from, in milliseconds since `01/01/1970`. Alternatively use syntax `N[timeunit](-ago)` where `timeunit` is one of `w`, `d`, `h`, `m`, `s`, or `ms`. `-ago` indicates that this is in the past, otherwise it will be in the future.
chunking	Chunking for history reads.
chunking.chunk-size	Maximum number of items per history read request. Defaults to `1000`.
chunking.parallelism	Maximum number of parallel history read requests. Defaults to `10`.
chunking.max-per-minute	Maximum number of history read requests per minute.
chunking.max-read-per-tag	Maximum number of values returned per tag per request. Defaults to `1000`.
granularity	Granularity to use when doing history read. Nodes with the last/earliest known timestamp within this range of each other will be read together. This shouldn't be smaller than the usual average update rate. Leave at 0 to always read a single node each time. The syntax is described in Timestamps and intervals. Defaults to `15s`.