Gå til innhald

Configure the SAP extractor

To configure the SAP extractor, you must create a configuration file. The file must be in YAML format.

You can use the sample minimal configuration file included with the extractor packages as a starting point for your configuration settings.

The configuration file contains the global parameter version, which holds the version of the configuration schema. This article describes version 1.

:::tip Tip You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud. :::

Using values from Azure Key Vault

The SAP extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault tag followed by the name of the secret you want to load. For example, to load the value of the sap-password secret in Key Vault into a password parameter, configure your extractor like this:

password: !keyvault sap-password

To use Key Vault, you also need to include the azure-keyvault section in your configuration, with the following parameters:

Parameter Description
keyvault-name Name of Key Vault to load secrets from
authentication-method How to authenticate to Azure. Either default or client-secret. For default, the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For client-secret, the extractor will authenticate with a configured client ID/secret pair.
client-id Required for using the client-secret authentication method. The client ID to use when authenticating to Azure.
secret Required for using the client-secret authentication method. The client secret to use when authenticating to Azure.
tenant-id Required for using the client-secret authentication method. The tenant ID of the Key Vault in Azure.

Example:

azure-keyvault:
  keyvault-name: my-keyvault-name
  authentication-method: client-secret
  tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
  client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
  secret: 1234abcd

Logger

Use the optional logger section to set up logging to a console or files.

Parameter Description
console Set up console logger configuration. See the Console section.
file Set up file logger configuration. See the File section.

Console

Use the console subsection to enable logging to a standard output, such as a terminal window.

Parameter Description
level Select the verbosity level for console logging. Valid options, in decreasing verbosity levels, are DEBUG, INFO, WARNING, ERROR, and CRITICAL.

File

Use the file subsection to enable logging to a file. The files are rotated daily.

Parameter Description
level Select the verbosity level for file logging. Valid options, in decreasing verbosity levels, are DEBUG, INFO, WARNING, ERROR, and CRITICAL.
path Insert the path to the log file.
retention Specify the number of days to keep logs for. The default value is 7 days.

Cognite

Use the cognite section to describe which CDF project the extractor will load data into and how to connect to the project.

Parameter Description
project Insert the CDF project name. This is a required parameter.
host Insert the base URL of the CDF project. The default value is https://api.cognitedata.com.
idp-authentication Insert the credentials for authenticating to CDF using an external identity provider. You must enter either an API key or use IdP authentication.

Identity provider (IdP) authentication

Use the idp-authentication subsection to enable the extractor to authenticate to CDF using an external identity provider, such as Azure AD.

Parameter Description
client-id Enter the client ID from the IdP. This is a required parameter.
secret Enter the client secret from the IdP. This is a required parameter.
scopes List the scopes. This is a required parameter.
resource Insert token requests. This is an optional parameter.
token-url Insert the URL to fetch tokens from. You must enter either a token URL or an Azure tenant.
tenant Enter the Azure tenant. You must enter either a token URL or an Azure tenant.

Extractor

Use the optional extractor section to add tuning parameters.

Parameter Description
mode Set the execution mode. Options are single or continuous.

Use continous to run the extractor in a continuous mode, executing the Odata queries defined in the endpoints section. The default value is single
upload-queue-size Enter the size of the upload queue. The default value is 50 000 rows.
parallelism Insert the number of parallel queries to run. The default value is 4 queries.
state-store Set to true to configure state store. The default value is no state store, and the incremental load is deactivated. See the State store section.
chunk_size Enter the number of rows to be extracted from SAP OData on every run. The default value is 1000 rows, as recommended by SAP.
delta_padding_minutes Extractor internal parameter to control the incremental load padding. Do not change.

State store

Use the state store subsection to save extraction states between runs. Use this if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time.

Parameter Description
local Local state store configuration. See the Local section.
raw RAW state store configuration. See the RAW section.

Local

Use the local section to store the extraction state in a JSON file on a local machine.

Parameter Description
path Insert the file path to a JSON file.
save-interval Enter the interval in seconds between each save. The default value is 30 seconds.

RAW

Use the RAW section to store the extraction state in a table in the CDF staging area.

Parameter Description
database Enter the database name in the CDF staging area.
table Enter the table name in the CDF staging area.
upload-interval Enter the interval in seconds between each save. The default value is 30 seconds.

Metrics

Use the metrics section to describe where to send performance metrics for remote monitoring of the extractor. We recommend sending metrics to a Prometheus pushgateway, but you can also send metrics as time series in the CDF project.

Parameter Description
push-gateways List the Pushgateway configurations. See the Pushgateways section.
cognite List the Cognite metrics configurations. See the Cognite section.

Pushgateways

Use the pushgateways subsection to define a list of metric destinations, each on the following schema:

Parameter Description
host Enter the address of the host to push metrics to. This is a required parameter.
job-name Enter the value of the exported_job label to associate metrics with. This separates several deployments on a single pushgateway, and should be unique. This is a required parameter.
username Enter the credentials for the pushgateway. This is a required parameter.
password Enter the credentials for the pushgateway. This is a required parameter.
clear-after Enter the number of seconds to wait before clearing the pushgateway. When this parameter is present, the extractor will stall after the run is complete before deleting all metrics from the pushgateway. The recommended value is at least twice that of the scrape interval on the pushgateway. This is to ensure that the last metrics are gathered before the deletion.
push-interval Enter the interval in seconds between each push. The default value is 30 seconds.

Cognite

Use the cognite subsection to sent metrics as time series to the CDF project configured in the cognite main section above. Only numeric metrics, such as Prometheus counters and gauges, are sent.

Parameter Description
external-id-prefix Insert a prefix to all time series used to represent metrics for this deployment. This creates a scope for the set of time series created by these metrics exported and should be deployment-unique across the entire project. This is a required parameter.
asset-name Enter the name of the asset to attach to time series. This will be created if it doesn't already exist.
asset-external-id Enter the external ID for the asset to create if the asset doesn't already exist.
push-interval Enter the interval in seconds between each push. The default value is 30 seconds.

SAP

The sap section contains a list of SAP sources. The schema for each SAP source configuration depends on which SAP source type you are connecting to. These are distinguished by the type parameter. The supported SAP sources are:

  • OData
  • SOAP
  • RFC

This is the schema for SAP OData sources

Parameter Description
type Type of SAP source connection, set to odata for SAP OData sources.
source_name Insert the SAP NetWeaver Gateway URL. This is a required parameter.
gateway_url Insert the SAP NetWeaver Gateway URL. This is a required parameter.
client Enter the SAP client number. This is a required parameter.
username Enter the SAP username to connect to the SAP NetWeaver Gateway. This is a required parameter.
password Enter the password to connect to the SAP NetWeaver Gateway. This is a required parameter.
certificates Certificates needed for authentication towards SAP instance. This is an optional parameter. See the Certificates section.
timezone Specify how the extractor should handle the source time zone. Valid values are local and utc. The default value is local.

This is the schema for SAP SOAP sources

Parameter Description
type Type of SAP source connection, set to soap for SAP SOAP sources.
source_name Insert the SAP NetWeaver Gateway URL. This is a required parameter.
wsdl_url Insert the SOAP WSDL URL related to the SAP ABAP webservice . This is a required parameter.
client Enter the SAP client number. This is a required parameter.
username Enter the SAP username to connect to the SAP Webservice. This is a required parameter.
password Enter the password to connect to the SAP SAP Webservice. This is a required parameter.
certificates Certificates needed for authentication towards SAP instance. This is an optional parameter. See the Certificates section.
timezone Specify how the extractor should handle the source time zone. Valid values are local and utc. The default value is local.

This is the schema for SAP RFC sources

Parameter Description
type Type of SAP source connection, set to rfc for SAP RFC sources.
source_name Insert the SAP NetWeaver Gateway URL. This is a required parameter.
ashost Insert SAP application host address. This is a required parameter
sysnr Technical identifier for internal processes in SAP. It consists of a two-digit number from 00 to 97. This is a required parameter
client Enter the SAP client number. This is a required parameter.
username Enter the SAP username to connect to the SAP NetWeaver Gateway. This is a required parameter.
password Enter the password to connect to the SAP NetWeaver Gateway. This is a required parameter.
saprouter Enter the SAPRouter address when applicable. This is an optional parameter.
snc_partnername Enter the SAP SNC (Secure Network Communcation) name when applicable. This is an optional parameter.
snc_lib Enter the path to the the SAP SNC library needed when using SNC authentication. This is an optional parameter.
x509cert Enter the path to the user X509 certificate when applicable . This is an optional parameter.
timezone Specify how the extractor should handle the source time zone. Valid values are local and utc. The default value is local.

Certificates

Use the certificates subsection the certificates to be used for authentication towards SAP instances.

There are three certificates needed to perform the authentication: certificate authority (ca_cert), public key (public_key), and private key (private_key)

Please check this documentation on how to generate the three certificates from a .p12 certificate file, if needed.

When setting the certificate authentication, note thatthree certificates are needed and they must be placed in the same folder where the extractor will be running.

Parameter Description
ca_cert Enter the path to the CA certificate file.
public_key Enter the path to the public key file.
private_key Enter the path to the key file.

Endpoints

Use the endpoint subsection to specify the OData endpoints.

Parameter Description
name Enter the name of SAP endpoint that will be used to extract data from a SAP source. The name must be unique for each query in the configuration file. This is a required parameter.
source_name Enter the name of the SAP source related to this endpoint. This must be one of the SAP sources configured in the sap section. This is a required parameter.
sap_service Enter the name of the related SAP service. For odata endpoints, it's the SAP OData service. For soap endpoints, it's the operation defined in the WSDL document. For rfc endpoints, it's the name of the SAP function module exposed through the RFC protocol. This is a required parameter.
sap_entity Enter the name of the SAP entity related to the SAP OData service. This is a required parameter.
destination The destination of the data in CDF. One of many destination types, see Destination. This is a required value.
sap_key Enter list of fields related to the SAP entity to be used as keys while ingesting data to CDF staging. This is a required parameter when using raw as a CDF destination.
request Enter the request to be sent to the SAP. This is a required parameter for rfc and soap endpoints. See Request section.
incremental_field Enter the name of the field to be used as reference for the incremental runs. This is an optional parameter. If you leave this field empty, the extractor will fetch full data loads every run.
schedule Schedule the interval which the OData queries will be executed towards the SAP Odata service. See the Schedule section.
extract_schema Extracts the SAP entity schema to CDF staging area. It expects database and table parameters, same as RAW destination. This is an optional parameter.
filter Enter the filter query string. The $filter system query option allows clients to filter a collection of resources from the target SAP OData endpoint. This is an optional parameter.

Request

The request parameter is required for rfc and soap endpoints.

Both SOAP or RFC communication protocols need a request to the SAP server in order to retrieve data.

SOAP requests

SAP ABAP Webservices are SOAP/based, meaning the requests to the SAP server must be in a valid XML format.

The SAP extractor expects this XML to be added as a string in the request parameter. This is an example of a valid XML request to a SAP ABAP Webservice generated from a SAP Function Module:

    request: |
      <n0:BAPI_FUNCLOC_GETLIST xmlns:n0="urn:sap-com:document:sap:rfc:functions">
        <FUNCLOC_LIST>
              <item>
                <FUNCTLOCATION>String 57</FUNCTLOCATION>
                <FUNCLOC>String 58</FUNCLOC>
                <LABEL_SYST>S</LABEL_SYST>
                <DESCRIPT>String 60</DESCRIPT>
                <STRIND>Strin</STRIND>
                <CATEGORY>S</CATEGORY>
                <SUPFLOC>String 63</SUPFLOC>
                <PLANPLANT>Stri</PLANPLANT>
                <MAINTPLANT>1010</MAINTPLANT>
                <PLANGROUP>Str</PLANGROUP>
                <SORTFIELD>String 67</SORTFIELD>
            </item>
          </FUNCLOC_LIST>
          <MAINTPLANT_RA>
            <item>
            <SIGN>I</SIGN>
            <OPTION>EQ</OPTION>
            <LOW>1010</LOW>
            <HIGH>1010</HIGH>
            </item>
            </MAINTPLANT_RA>
      </n0:BAPI_FUNCLOC_GETLIST>

RFC requests

The SAP RFC communication protocol triggers a SAP Function module remotely to a target SAP server. SAP Function Modules expect import parameters in order to run and return the processed request.

The SAP extractor expects the SAP FM parameters to be sent as a JSON request inside the request parameter. This is an example of a valid SAP RFC call to RFC_READ_TABLE SAP function module.

    request: |
      {
        "QUERY_TABLE":"QMEL",
        "FIELDS":["QMNUM","QMART","QMTXT"]
      }

Schedule

Use the schedule subsection to schedule runs when the extractor runs as a service.

Parameter Description
type Insert the schedule type. Valid options are cron and interval.

  • cron uses regular cron expressions.
  • interval expects an interval-based schedule.
  • expression Enter the cron or interval expression to trigger the query. For example, 1h repeats the query hourly, and 5m repeats the query every 5 minutes.

    Destination

    The raw destination writes data to the CDF staging area (RAW). The raw destination requires the sap_key parameter in the endpoint configuration.

    Parameter Description
    type Type of CDF destination, set to raw to write data to RAW.
    database Enter the CDF RAW database to upload data into. This will be created if it doesn't exist. This is a required value.
    table Enter the CDF RAW table to upload data into. This will be created if it doesn't exist. This is a required value.

    Time series

    The time_series destination inserts the resulting data as data points in time series.

    There are two mandatory parameters in order to use time series as destination:

    • type: Set to time_series to write data to CDF time series
    • field_mapping: To ingest data into a time series, SAP entity fields must be mapped to the following CDF timeseries fields
      • externalId: Required SAP entity field.
      • timestamp: Required SAP entity field.
      • value: Required SAP entity field.

    Assets

    The assets destination inserts the resulting data as CDF assets.

    There are two mandatory parameters in order to use CDF assets as destination:

    • type: Set to assets to write data to CDF assets
    • field_mapping: To ingest data into assets, SAP entity fields must be mapped to the following CDF time series fields
      • externalId: Required SAP entity field.
      • parentExternalId: Optional SAP entity field.
      • description: Optional SAP entity field.
      • source: Optional SAP entity field.

    Any other columns returned by the endpoint call will be mapped to key/value pairs in the metadata field for assets.

    Events

    The events destination inserts the resulting data as CDF events.

    There are two mandatory parameters in order to use CDF events as destination:

    • type: Set to events to write data to CDF events
    • field_mapping: To ingest data into events, SAP entity fields must be mapped to the following CDF time series fields
      • externalId: Required SAP entity field.
      • startTime: Optional SAP entity field.
      • endTime: Optional SAP entity field.
      • description: Optional SAP entity field.
      • source: Optional SAP entity field.

    Any other columns returned by the endpoint call will be mapped to key/value pairs in the metadata field for events.