Using Application Function Modeler To Create Forecast (APL) Procedure in SAP HANA

In this blog or document, I have shared my experience gained while creating forecast procedure using following features of SAP HANA:

APL (Automated Predictive Library) Forecast Function
Application Function Modeler (AFM)

I haven't seen any convincing blog which promotes this feature of SAP HANA. Personally, I believe it is really good feature for people with beginner or no SQL scripting skills as AFM is graphical tool. However, the end result is a procedure which can be used later for different purpose. I created this forecast procedure based on the use case provided by Forefront Analytics. The use case instructed to use APL forecast function. The input dataset was provided by Forefront Analytics. I started my research on APL and my main focus was time series function i.e. "Forecast". I learnt about APL installation and different APL functions as I progressed. I discovered about AFM when I was watching SAP Technology videos as part of my APL research.

I have tried to document APL work in following three sections:

Prerequisites

There are following prerequisites before we start developing:

SAP HANA (SPS 09)

SAP AFL (this should be installed as part of SAP HANA)

SAP APL (you will need to install it separately)

unixODBC 64 bit (No version was mentioned in guide so I used 2.3.2)

SAP HANA Script Server should be enabled

You can refer to this blog if you need to install APL.

What is APL Forecast Function?

The forecast operation consists in building and training a time series model, and then applying the model in order to obtain the forecast data. Following table lists the input and output tables required for forecast function to work. Also, there is expected table structure within the description.

Direction

Type

Description

FUNC_HEADER

Optional (The table must be provided but it can be empty.)

The function header defines the operating parameters of an APL function call. For instance, a function header can be used to set the logging level or to provide an operation id. A function header is made of a collection of string pairs { name, value }. It's usually the first input parameter of a function. It can be empty.

Expected Table Structure:

Column	SQL Data Type	Description
KEY	(N)VARCHAR, (N)CLOB	The parameter key
VALUE	(N)VARCHAR, (N)CLOB	The parameter value

Supported Parameter Names:

Parameter Name	Description
OID	Optional ●Supported values: Any string ●Default value: None. An operation ID. This optional ID can be provided by the APL caller to tag all the inserted rows in the the output tables.
ModelFormat	Optional ●Default value: bin. The requested output format for the model. This impacts the table structure of the model output table.
LogLevel	Optional ●Min value: 0 (Disabled) ●Max value: 10 ●Default: 8 This impacts the amount of progress messages and logging information produced by APL and the Automated Analytics engine. These messages can then be retrieved from the log operation table.

VARIABLE_DESCS

Optional

The table must be provided but it can be empty. The variable descriptions for the input dataset, as expected by the Automated Analytics engine.

Expected Table Structure:

Column	SQL Data Type	Description
1^st Column	INTEGER	Variable rank
2^nd Column	(N)VARCHAR, (N)CLOB	Variable name
3^rd Column	(N)VARCHAR, (N)CLOB	Storage. Possible values: ●number (the variable contains only "computable" numbers (be careful a telephone number, or an account number should not be considered numbers) ●integer ●string (: the variable contains character strings) ●date (the variable contains dates) ●datetime (the variable contains date and time stamps) ●angle
4^th Column	(N)VARCHAR, (N)CLOB	Value type: the value type of the variable. Possible values: ●nominal (categorical variable which is the only possible value for a string) ●ordinal (discrete numeric variable where the relative order is important) ●continuous (a numeric variable from which mean, variance, etc. can be computed) ●textual (textual variable containing phrases, sentences or complete texts)
5^th Column	INTEGER	Key level
6^th Column	INTEGER	Order level
7^th Column	(N)VARCHAR, (N)CLOB	Missing string value: the string used in the data description file to represent missing values (for example, "999" or "#Empty" - without the quotes)
8^th Column	(N)VARCHAR, (N)CLOB	Group name.
9^th Column	(N)VARCHAR, (N)CLOB	Variable description: an additional description label for the variable.
OID(10^th Column)		The operation ID, if set. Otherwise a new one is generated. This column is optional.

OPERATION_CONFIG

Mandatory

The configuration of the training operation.

For this function, you must declare the type of Automated Analytics model and you can declare the optional Cutting Strategy in the OPERATION_CONFIG table as follows:

Key	Supported Values /DescriptionAPL/
DescriptionAPL/ TimePointColumnName (Mandatory)	Name of the column in the dataset that contains the time points of the time series.
APL/Horizon (Mandatory)	Number of forecasted time points
APL/LastTrainingTimePoint (Optional)	Value in the time point column which represents the last point in time for the training dataset
APL/CuttingStrategy (Optional)	The Cutting Strategy defines how a training set is cut under three subsets (estimation, validation and test sets) when needed. For time series: 'sequential with no test''sequential' (default value)

Expected Table Structure:

Column	SQL Data Type	Description
KEY	(N)VARCHAR, (N)CLOB	The parameter alias name
VALUE	(N)VARCHAR, (N)CLOB	The parameter value

VARIABLE_ROLES

Optional

The table must be provided but it can be empty. The roles of the variables for this training.

When training a model, the roles of the variables can be specified. These variable roles are provided as string pairs {variable name, variable role}.

In data modeling, variables may have four roles. They may be:

●Target variables (also known as output variables): a target variable is the variable that you seek to explain, or for which you want to predict the values in an application dataset. It corresponds to your domain-specific business issue. In some businesses, target variables may also be known as variables to be explained or dependant variables.

●Explanatory variables (also known as input variables): an input variable describes your data and serves to explain a target variable. Explanatory variables may also be known as causal variables or independent variables.

●Weight variables: a weight variable allows one to assign a relative weight to each of the observations it describes, and actively orient the training process.

●Skipped variables: these variables are ignored during the training process

Expected Table Structure:

Column	SQL Data Type	Description
NAME	(N)VARCHAR, (N)CLOB	Variable name
ROLE	(N)VARCHAR, (N)CLOB	Variable role. Supported roles: ●input ●skip ●target ●weight

DATASET

Mandatory

The name of your input (training) dataset.

Datasets are used for training models and applying models.

Datasets used for training or applying models can contain any number of columns (within HANA and AFL limits). The supported SQL datatypes for the dataset columns are all the datatypes supported by AFL:

●INTEGER

●BIGINT

●DOUBLE

●CLOB & NCLOB

●VARCHAR & NVARCHAR

●DATE, TIME, TIMESTAMP, SECONDDATE

OUT

DATASET

The resulting forecast

OUT

LOG

The training log

The operation log. When performing an APL operation, especially training or applying a model, the Automated Analytics engine produces status/warning/error messages. These messages are returned from an APL function through an output database table.

Expected table structure:

Column	SQL Data Type	Description
OID	(N)VARCHAR, (N)CLOB	Operation ID (OID)
TIMESTAMP	TIMESTAMP	Message timestamp
LEVEL	INTEGER	Message level
ORIGIN	(N)VARCHAR, (N)CLOB)	Message origin
MESSAGE	NCLOB	The actual message

OUT

SUMMARY

The training summary

When training a model, debriefing information related to the training operation is produced. This is known as the training summary. This information is a set of indicators, provided as string pairs { KEY, VALUE }.

Expected table structure:

Column	SQL Data Type	Description
OID	(N)VARCHAR, (N)CLOB	Operation ID (OID)
KEY	(N)VARCHAR, (N)CLOB	Indicator name
VALUE	(N)VARCHAR, (N)CLOB	Indicator value

OUT

INDICATORS

The variable statistics and indicators

When training, testing or querying a model, it's possible to retrieve variable indicators (i.e variable statistics). For each variable, a collection of indicators may be retrieved. These indicators are described using the following attributes: { variable name, indicator name, indicator value, indicator detail (when applicable) }. Indicators are returned from an APL function through an output database table. The output table contains estimator indicators for regression models, to help plotting the regression curve.

Expected table structure:

Column	SQL Data Type	Description
OID	(N)VARCHAR, (N)CLOB	Operation ID (OID)
TARGET	(N)VARCHAR, (N)CLOB	Name of the target, when the indicator is based on it, for example: predictive power of predictive confidence
VARIABLE	(N)VARCHAR, (N)CLOB	Variable name
KEY	(N)VARCHAR, (N)CLOB	Indicator name
VALUE	(N)VARCHAR, (N)CLOB	Indicator value
DETAIL	(N)VARCHAR, (N)CLOB	Indicator detail

What is Application Function Modeler?

The SAP HANA Application Function Modeler (AFM) is the default editor for flowgraphs. A flowgraph is a development object. It is stored in a project and has extension .hdbflowgraph. By default, the activation of a flowgraph generates a procedure in the catalog. A flowgraph models a data flow that can contain

tables, views, and procedures from the catalog
relational operators such as projection, filter, union, and join
functions from Application Function Libraries (AFL) installed on your system
attribute view and calculation view development objects

In addition the AFM provides support for some optional, additional components of the SAP HANA Platform such as

the Business Function Library
the Predictive Analysis Library
R Scripts
Data Provisioning operators
the generation of task plans

We will be using Application function modeler to create flowgraph consuming APL forecast function. This will generate procedure in our defined schema. We will be following below approach to create forecast flowgraph.

Create Catalog Objects such as schema, source tables, function tables, output tables
Create Flowgraph

Assumptions:

XS project has already been created. In this example we have used following XS project: APP1

Create Catalog Objects

Schema

First of all, please create schema definition file named APP1.hdbschema using following method:
Right click project name and select New > Other > SAP HANA > Database Development > Schema
Select the folder where you want to save this file and give it a name “APP1”. Please make sure there is no template selected
Once file is created then open it and enter following

schema_name = "APP1";

Right click and activate it. This will create the schema
Run following SQL statements to grant privileges to _SYS_REPO

grant alter on schema "APP1" to "_SYS_REPO";

Tables

Please create DDL source file (core data services) using following method.

Right click project name and select New > Other > SAP HANA > Database Development > DDL source file
Select the folder where you want to save this file and give it a name. Please make sure you have selected empty structure. Perform this three times and create three files func_data.hdbdd, output_data.hdbdd, source_data.hdbdd.
Once files are created then open it and enter following:

func_data.hdbdd

namespace APP1.db;

@Schema: 'APP1'

context func_data {

@Catalog.tableType : #COLUMN

@nokey

entity FUNCTION_HEADER {

KEY : hana.VARCHAR(50);

VALUE : hana.VARCHAR(50);

};

@Catalog.tableType : #COLUMN

@nokey

entity OPERATION_CONFIG {

KEY : hana.VARCHAR(1000);

VALUE : hana.VARCHAR(50);

};

@Catalog.tableType : #COLUMN

@nokey

entity VARIABLES_ROLES {

NAME : hana.VARCHAR(50);

ROLE : hana.VARCHAR(10);

};

@Catalog.tableType : #COLUMN

@nokey

entity VARIABLES_DESC {

RANK : Integer;

NAME : hana.VARCHAR(50);

STORAGE : hana.VARCHAR(10);

VALUETYPE : hana.VARCHAR(10);

KEYLEVEL : Integer;

ORDERLEVEL : Integer;

MISSINGSTRING : hana.VARCHAR(50);

GROUPNAME : hana.VARCHAR(50);

DESCRIPTION : hana.VARCHAR(100);

};

source_data

namespace APP1.db;

@Schema: 'APP1'

context source_data {

@Catalog.tableType : #COLUMN

Entity DATA {

key Date: LocalDate;

Produced: Double;

};

output_data

namespace APP1.db;

@Schema: APP1'

context output_data {

@Catalog.tableType : #COLUMN

@nokey

entity FORECAST_OUTPUT {

Date : LocalDate;

Produced: Double;

kts_1: Double;

};

@Catalog.tableType : #COLUMN

@nokey

entity FORECAST_LOG {

OID: hana.VARCHAR(50);

TIMESTAMP: UTCTimestamp;

LEVEL: Integer;

ORIGIN: hana.VARCHAR(50);

MESSAGE: hana.CLOB;

};

@Catalog.tableType : #COLUMN

@nokey

entity FORECAST_SUMM {

OID: hana.VARCHAR(50);

KEY: hana.VARCHAR(100);

VALUE: hana.VARCHAR(100);

};

@Catalog.tableType : #COLUMN

@nokey

entity FORECAST_INDI {

OID: hana.VARCHAR(50);

VARIABLE: hana.VARCHAR(100);

TARGET: hana.VARCHAR(100);

KEY: hana.VARCHAR(100);

VALUE: hana.VARCHAR(100);

DETAIL: hana.CLOB;

};

Save the files, right click on them and activate. This will create required tables in schema “APP1”. You will see following tables in schema

Creating Flowgraph

Assumption:

Source data table is populated with data

Please create FORECAST_FG.hdbflowgraph file using following method:

Right click project name and select New > Other > SAP HANA > Database Development > flowgraph model
Select the folder where you want to save this file and give it a name.

Once files are created then open it and right click on empty space to go to properties of this flowgraph model. Please change the schema from _SYS_BIC to APP1

Add forecast function (drag and drop) from general list in node palette

Go to function properties and then choose APL_AREA in area and once functions are populated then then choose FORECAST function. Your screen should be like below:

Next, we will add input table and then map it to corresponding function node. Please select FUNCTION_HEADER from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Source” when system asks you. Now, create a connection from this table to FUNC_HEADER node in function. Please note you can name this each node as per your need.

Please perform similar for all the input tables and your screen should be like below:

We will now add output tables and then map it to corresponding function node. Please select FORECAST_OUTPUT from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Sink” when system asks you.
Before we create a connection between node and table we will need to define the columns in output node of function. Please use similar table structure as we defined above when creating output table in schema APP1. Go to properties of node > signature > add/change columns and its data types
Now, create a connection from node “Output_Data” to this table FORECAST_OUTPUT. Please note you can name each node as per your need.

If you are going to truncate table everytime you run procedure then please select following option in node properties.

Please perform similar for all the output tables and your screen should be like below:

Please note that in each node there is green tick which means there is no errors
Now, please activate this model
Once activated you will see new procedures and table types are created in schema APP1

To call this procedure please use following SQL statement:

CALL "APP1"."APP1.fg::FORECAST_FG"();

References

How to Install the Automated Predictive Library... | SCN. 2015. How to Install the Automated Predictive Library... | SCN. [ONLINE] Available at:http://scn.sap.com/community/hana-in-memory/blog/2015/02/19/how-to-install-the-automated-predictive-library-in-sap-hana. [Accessed 11 March 2015].

What is the SAP Automated Predictive Library (A... | SCN. 2015. What is the SAP Automated Predictive Library (A... | SCN. [ONLINE] Available at:http://scn.sap.com/community/predictive-analysis/blog/2015/03/02/what-is-the-sap-automated-predictive-library-apl-for-sap-hana. [Accessed 11 March 2015].

SAP Help, SAP Automated Predictive Library Reference Guide, viewed 20^th March, 2015, https://websmp203.sap-ag.de/~sapidb/012002523100002180172015E/apl11_apl_user_guide_en.pdf

Search This Blog

Angad Singh's Blog