Using Application Function Modeler To Create Forecast (APL) Procedure in SAP HANA
In this blog or document, I have shared my experience gained while creating forecast procedure using following features of SAP HANA:
I have tried to document APL work in following three sections:
You can refer to this blog if you need to install APL.
The SAP HANA Application Function Modeler (AFM) is the default editor for flowgraphs. A flowgraph is a development object. It is stored in a project and has extension .hdbflowgraph. By default, the activation of a flowgraph generates a procedure in the catalog. A flowgraph models a data flow that can contain
func_data.hdbdd
source_data
output_data
What is the SAP Automated Predictive Library (A... | SCN. 2015. What is the SAP Automated Predictive Library (A... | SCN. [ONLINE] Available at:http://scn.sap.com/community/predictive-analysis/blog/2015/03/02/what-is-the-sap-automated-predictive-library-apl-for-sap-hana. [Accessed 11 March 2015].
SAP Help, SAP Automated Predictive Library Reference Guide, viewed 20th March, 2015, https://websmp203.sap-ag.de/~sapidb/012002523100002180172015E/apl11_apl_user_guide_en.pdf
- APL (Automated Predictive Library) Forecast Function
- Application Function Modeler (AFM)
I have tried to document APL work in following three sections:
Prerequisites
There are following prerequisites before we start developing:- SAP HANA (SPS 09)
- SAP AFL (this should be installed as part of SAP HANA)
- SAP APL (you will need to install it separately)
- unixODBC 64 bit (No version was mentioned in guide so I used 2.3.2)
- SAP HANA Script Server should be enabled
You can refer to this blog if you need to install APL.
What is APL Forecast Function?
The forecast operation consists in building and training a time series model, and then applying the model in order to obtain the forecast data. Following table lists the input and output tables required for forecast function to work. Also, there is expected table structure within the description.
Direction
|
Type
|
Description
|
|||||||||||||||||||||||||||||||||
IN
|
FUNC_HEADER
|
Optional (The table must be provided but it can be
empty.)
The function header defines the operating parameters
of an APL function call. For instance, a function header can be used to set
the logging level or to provide an operation id. A function header is made of
a collection of string pairs { name, value }. It's usually the first input
parameter of a function. It can be empty.
Expected Table
Structure:
Supported
Parameter Names:
|
|||||||||||||||||||||||||||||||||
IN
|
VARIABLE_DESCS
|
Optional
The table must be provided but it can be empty. The
variable descriptions for the input dataset, as expected by the Automated
Analytics engine.
Expected
Table Structure:
|
|||||||||||||||||||||||||||||||||
IN
|
OPERATION_CONFIG
|
Mandatory
The configuration of the training operation.
For this function, you must declare the type of
Automated Analytics model and you can declare the optional Cutting Strategy
in the OPERATION_CONFIG table as follows:
Expected
Table Structure:
|
|||||||||||||||||||||||||||||||||
IN
|
VARIABLE_ROLES
|
Optional
The table must be provided but it can be empty. The
roles of the variables for this training.
When training a model, the roles of the variables
can be specified. These variable roles are provided as string pairs {variable
name, variable role}.
In data modeling, variables may have four roles.
They may be:
●Target variables (also known as output variables):
a target variable is the variable that you seek to explain, or for which you
want to predict the values in an application dataset. It corresponds to your
domain-specific business issue. In some businesses, target variables may also
be known as variables to be explained or dependant variables.
●Explanatory variables (also known as input
variables): an input variable describes your data and serves to explain a
target variable. Explanatory variables may also be known as causal variables
or independent variables.
●Weight variables: a weight variable allows one to
assign a relative weight to each of the observations it describes, and
actively orient the training process.
●Skipped variables: these variables are ignored
during the training process
Expected
Table Structure:
|
|||||||||||||||||||||||||||||||||
IN
|
DATASET
|
Mandatory
The name of your input (training) dataset.
Datasets are used for training models and applying
models.
Datasets used for training or applying models can
contain any number of columns (within HANA and AFL limits). The supported SQL
datatypes for the dataset columns are all the datatypes supported by AFL:
●INTEGER
●BIGINT
●DOUBLE
●CLOB & NCLOB
●VARCHAR & NVARCHAR
●DATE, TIME, TIMESTAMP, SECONDDATE
|
|||||||||||||||||||||||||||||||||
OUT
|
DATASET
|
The resulting forecast
|
|||||||||||||||||||||||||||||||||
OUT
|
LOG
|
The training log
The operation log. When performing an APL operation,
especially training or applying a model, the Automated Analytics engine
produces status/warning/error messages. These messages are returned from an
APL function through an output database table.
Expected
table structure:
|
|||||||||||||||||||||||||||||||||
OUT
|
SUMMARY
|
The training summary
When training a model, debriefing information
related to the training operation is produced. This is known as the training
summary. This information is a set of indicators, provided as string pairs {
KEY, VALUE }.
Expected table
structure:
|
|||||||||||||||||||||||||||||||||
OUT
|
INDICATORS
|
The variable statistics and indicators
When training, testing or querying a model, it's
possible to retrieve variable indicators (i.e variable statistics). For each
variable, a collection of indicators may be retrieved. These indicators are
described using the following attributes: { variable name, indicator name,
indicator value, indicator detail (when applicable) }. Indicators are
returned from an APL function through an output database table. The output
table contains estimator indicators for regression models, to help plotting
the regression curve.
Expected
table structure:
|
What is Application Function Modeler?
The SAP HANA Application Function Modeler (AFM) is the default editor for flowgraphs. A flowgraph is a development object. It is stored in a project and has extension .hdbflowgraph. By default, the activation of a flowgraph generates a procedure in the catalog. A flowgraph models a data flow that can contain
- tables, views, and procedures from the catalog
- relational operators such as projection, filter, union, and join
- functions from Application Function Libraries (AFL) installed on your system
- attribute view and calculation view development objects
In addition the AFM provides support for
some optional, additional components of the SAP HANA Platform such as
- the Business Function Library
- the Predictive Analysis Library
- R Scripts
- Data Provisioning operators
- the generation of task plans
We will be using Application function
modeler to create flowgraph consuming APL forecast function. This will generate
procedure in our defined schema. We will be following below approach to create
forecast flowgraph.
- Create Catalog Objects such as schema, source tables, function tables, output tables
- Create Flowgraph
Assumptions:
- XS project has already been created. In this example we have used following XS project: APP1
Create Catalog Objects
Schema
- First of all, please create schema definition file named APP1.hdbschema using following method:
- Right click project name and select New > Other > SAP HANA > Database Development > Schema
- Select the folder where you want to save this file and give it a name “APP1”. Please make sure there is no template selected
- Once file is created then open it and enter following
schema_name =
"APP1";
- Right click and activate it. This will create the schema
- Run following SQL statements to grant privileges to _SYS_REPO
Tables
Please create DDL source file (core data services) using following method.
- Right click project name and select New > Other > SAP HANA > Database Development > DDL source file
- Select the folder where you want to save this file and give it a name. Please make sure you have selected empty structure. Perform this three times and create three files func_data.hdbdd, output_data.hdbdd, source_data.hdbdd.
- Once files are created then open it and enter following:
func_data.hdbdd
namespace APP1.db;
@Schema: 'APP1'
context func_data {
@Catalog.tableType : #COLUMN
@nokey
entity FUNCTION_HEADER {
KEY : hana.VARCHAR(50);
VALUE : hana.VARCHAR(50);
};
@Catalog.tableType : #COLUMN
@nokey
entity OPERATION_CONFIG {
KEY : hana.VARCHAR(1000);
VALUE : hana.VARCHAR(50);
};
@Catalog.tableType : #COLUMN
@nokey
entity VARIABLES_ROLES {
NAME : hana.VARCHAR(50);
ROLE : hana.VARCHAR(10);
};
@Catalog.tableType : #COLUMN
@nokey
entity VARIABLES_DESC {
RANK : Integer;
NAME : hana.VARCHAR(50);
STORAGE : hana.VARCHAR(10);
VALUETYPE : hana.VARCHAR(10);
KEYLEVEL : Integer;
ORDERLEVEL : Integer;
MISSINGSTRING : hana.VARCHAR(50);
GROUPNAME : hana.VARCHAR(50);
DESCRIPTION : hana.VARCHAR(100);
};
};
source_data
namespace APP1.db;
@Schema: 'APP1'
context source_data {
@Catalog.tableType : #COLUMN
Entity DATA {
key Date: LocalDate;
Produced: Double;
};
};
output_data
namespace APP1.db;
@Schema: APP1'
context output_data {
@Catalog.tableType : #COLUMN
@nokey
entity FORECAST_OUTPUT {
Date : LocalDate;
Produced: Double;
kts_1: Double;
};
@Catalog.tableType : #COLUMN
@nokey
entity FORECAST_LOG {
OID: hana.VARCHAR(50);
TIMESTAMP: UTCTimestamp;
LEVEL: Integer;
ORIGIN: hana.VARCHAR(50);
MESSAGE: hana.CLOB;
};
@Catalog.tableType : #COLUMN
@nokey
entity FORECAST_SUMM {
OID: hana.VARCHAR(50);
KEY: hana.VARCHAR(100);
VALUE: hana.VARCHAR(100);
};
@Catalog.tableType : #COLUMN
@nokey
entity FORECAST_INDI {
OID: hana.VARCHAR(50);
VARIABLE: hana.VARCHAR(100);
TARGET: hana.VARCHAR(100);
KEY: hana.VARCHAR(100);
VALUE: hana.VARCHAR(100);
DETAIL: hana.CLOB;
};
};
- Save the files, right click on them and activate. This will create required tables in schema “APP1”. You will see following tables in schema
Creating Flowgraph
Assumption:
Source data table is populated with data
Please create FORECAST_FG.hdbflowgraph file
using following method:
- Right click project name and select New > Other > SAP HANA > Database Development > flowgraph model
- Select the folder where you want to save this file and give it a name.
- Once files are created then open it and right click on empty space to go to properties of this flowgraph model. Please change the schema from _SYS_BIC to APP1
- Add forecast function (drag and drop) from general list in node palette
- Go to function properties and then choose APL_AREA in area and once functions are populated then then choose FORECAST function. Your screen should be like below:
- Next, we will add input table and then map it to corresponding function node. Please select FUNCTION_HEADER from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Source” when system asks you. Now, create a connection from this table to FUNC_HEADER node in function. Please note you can name this each node as per your need.
- Please perform similar for all the input tables and your screen should be like below:
- We will now add output tables and then map it to corresponding function node. Please select FORECAST_OUTPUT from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Sink” when system asks you.
- Before we create a connection between node and table we will need to define the columns in output node of function. Please use similar table structure as we defined above when creating output table in schema APP1. Go to properties of node > signature > add/change columns and its data types
- Now, create a connection from node “Output_Data” to this table FORECAST_OUTPUT. Please note you can name each node as per your need.
- If you are going to truncate table everytime you run procedure then please select following option in node properties.
- Please perform similar for all the output tables and your screen should be like below:
- Please note that in each node there is green tick which means there is no errors
- Now, please activate this model
- Once activated you will see new procedures and table types are created in schema APP1
- To call this procedure please use following SQL statement:
CALL "APP1"."APP1.fg::FORECAST_FG"();
References
How to Install the Automated Predictive Library... | SCN. 2015. How to Install the Automated Predictive Library... | SCN. [ONLINE] Available at:http://scn.sap.com/community/hana-in-memory/blog/2015/02/19/how-to-install-the-automated-predictive-library-in-sap-hana. [Accessed 11 March 2015].What is the SAP Automated Predictive Library (A... | SCN. 2015. What is the SAP Automated Predictive Library (A... | SCN. [ONLINE] Available at:http://scn.sap.com/community/predictive-analysis/blog/2015/03/02/what-is-the-sap-automated-predictive-library-apl-for-sap-hana. [Accessed 11 March 2015].
SAP Help, SAP Automated Predictive Library Reference Guide, viewed 20th March, 2015, https://websmp203.sap-ag.de/~sapidb/012002523100002180172015E/apl11_apl_user_guide_en.pdf
Comments
Post a Comment