Using Application Function Modeler To Create Forecast (APL) Procedure in SAP HANA

In this blog or document, I have shared my experience gained while creating forecast procedure using following features of SAP HANA:

  • APL (Automated Predictive Library) Forecast Function
  • Application Function Modeler (AFM)
I haven't seen any convincing blog which promotes this feature of SAP HANA. Personally, I believe it is really good feature for people with beginner or no SQL scripting skills as AFM is graphical tool. However, the end result is a procedure which can be used later for different purpose. I created this forecast procedure based on the use case provided by Forefront Analytics. The use case instructed to use APL forecast function. The input dataset was provided by Forefront Analytics. I started my research on APL and my main focus was time series function i.e. "Forecast". I learnt about APL installation and different APL functions as I progressed. I discovered about AFM when I was watching SAP Technology videos as part of my APL research. 

I have tried to document APL work in following three sections:


Prerequisites

There are following prerequisites before we start developing:


  • SAP HANA (SPS 09)
  • SAP AFL (this should be installed as part of SAP HANA)
  • SAP APL (you will need to install it separately)
  • unixODBC 64 bit (No version was mentioned in guide so I used 2.3.2)
  • SAP HANA Script Server should be enabled

You can refer to this blog if you need to install APL.


What is APL Forecast Function?

The forecast operation consists in building and training a time series model, and then applying the model in order to obtain the forecast data. Following table lists the input and output tables required for forecast function to work. Also, there is expected table structure within the description. 


Direction
Type
Description
IN

FUNC_HEADER
Optional (The table must be provided but it can be empty.)
The function header defines the operating parameters of an APL function call. For instance, a function header can be used to set the logging level or to provide an operation id. A function header is made of a collection of string pairs { name, value }. It's usually the first input parameter of a function. It can be empty.

Expected Table Structure:

Column
SQL Data Type
Description
KEY
(N)VARCHAR, (N)CLOB
The parameter key
VALUE
(N)VARCHAR, (N)CLOB
The parameter value

Supported Parameter Names:

Parameter Name
Description
OID
Optional
●Supported values: Any string
●Default value: None.
An operation ID. This optional ID can be provided by the APL caller to tag all the inserted rows in the the output tables.
ModelFormat
Optional
●Default value: bin.
The requested output format for the model. This impacts the table structure of the model output table.
LogLevel
Optional
●Min value: 0 (Disabled)
●Max value: 10
●Default: 8
This impacts the amount of progress messages and logging information produced by APL and the Automated Analytics engine. These messages can then be retrieved from the log operation table.


IN
VARIABLE_DESCS
Optional
The table must be provided but it can be empty. The variable descriptions for the input dataset, as expected by the Automated Analytics engine.

Expected Table Structure:

Column
SQL Data Type
Description
1st Column
INTEGER
Variable rank
2nd Column
(N)VARCHAR, (N)CLOB
Variable name
3rd Column
(N)VARCHAR, (N)CLOB
Storage.
Possible values:
●number (the variable contains only "computable" numbers (be careful a telephone number, or an account number should not be considered numbers)
●integer
●string (: the variable contains character strings)
●date (the variable contains dates)
●datetime (the variable contains date and time stamps)
●angle
4th Column
(N)VARCHAR, (N)CLOB
Value type: the value type of the variable.
Possible values:
●nominal (categorical variable which is the only possible value for a string)
●ordinal (discrete numeric variable where the relative order is important)
●continuous (a numeric variable from which mean, variance, etc. can be computed)
●textual (textual variable containing phrases, sentences or complete texts)
5th Column
INTEGER
Key level
6th Column
INTEGER
Order level
7th Column
(N)VARCHAR, (N)CLOB
Missing string value: the string used in the data description file to represent missing values (for example, "999" or "#Empty" - without the quotes)
8th Column
(N)VARCHAR, (N)CLOB
Group name.
9th Column
(N)VARCHAR, (N)CLOB
Variable description: an additional description label for the variable.
OID(10th Column)

The operation ID, if set. Otherwise a new one is generated. This column is optional.
IN
OPERATION_CONFIG
Mandatory
The configuration of the training operation.
For this function, you must declare the type of Automated Analytics model and you can declare the optional Cutting Strategy in the OPERATION_CONFIG table as follows:

Key
Supported Values /DescriptionAPL/
DescriptionAPL/
TimePointColumnName (Mandatory)
Name of the column in the dataset that contains the time points of the time series.
APL/Horizon (Mandatory)
Number of forecasted time points
APL/LastTrainingTimePoint (Optional)
Value in the time point column which represents the last point in time for the training dataset
APL/CuttingStrategy (Optional)
The Cutting Strategy defines how a training set is cut under three subsets (estimation, validation and test sets) when needed.
For time series:
'sequential with no test''sequential' (default value)

Expected Table Structure:

Column
SQL Data Type
Description
KEY
(N)VARCHAR, (N)CLOB
The parameter alias name
VALUE
(N)VARCHAR, (N)CLOB
The parameter value
IN
VARIABLE_ROLES
Optional
The table must be provided but it can be empty. The roles of the variables for this training.

When training a model, the roles of the variables can be specified. These variable roles are provided as string pairs {variable name, variable role}.
In data modeling, variables may have four roles. They may be:
●Target variables (also known as output variables): a target variable is the variable that you seek to explain, or for which you want to predict the values in an application dataset. It corresponds to your domain-specific business issue. In some businesses, target variables may also be known as variables to be explained or dependant variables.
●Explanatory variables (also known as input variables): an input variable describes your data and serves to explain a target variable. Explanatory variables may also be known as causal variables or independent variables.
●Weight variables: a weight variable allows one to assign a relative weight to each of the observations it describes, and actively orient the training process.
●Skipped variables: these variables are ignored during the training process

Expected Table Structure:

Column
SQL Data Type
Description
NAME
(N)VARCHAR, (N)CLOB
Variable name
ROLE
(N)VARCHAR, (N)CLOB
Variable role.
Supported roles:
●input
●skip
●target
●weight


IN
DATASET
Mandatory
The name of your input (training) dataset.
Datasets are used for training models and applying models.
Datasets used for training or applying models can contain any number of columns (within HANA and AFL limits). The supported SQL datatypes for the dataset columns are all the datatypes supported by AFL:
●INTEGER
●BIGINT
●DOUBLE
●CLOB & NCLOB
●VARCHAR & NVARCHAR
●DATE, TIME, TIMESTAMP, SECONDDATE
OUT
DATASET
The resulting forecast

OUT
LOG
The training log
The operation log. When performing an APL operation, especially training or applying a model, the Automated Analytics engine produces status/warning/error messages. These messages are returned from an APL function through an output database table.

Expected table structure:

Column
SQL Data Type
Description
OID
(N)VARCHAR, (N)CLOB
Operation ID (OID)
TIMESTAMP
TIMESTAMP
Message timestamp
LEVEL
INTEGER
Message level
ORIGIN
(N)VARCHAR, (N)CLOB)
Message origin
MESSAGE
NCLOB
The actual message
OUT
SUMMARY
The training summary

When training a model, debriefing information related to the training operation is produced. This is known as the training summary. This information is a set of indicators, provided as string pairs { KEY, VALUE }.

Expected table structure:

Column
SQL Data Type
Description
OID
(N)VARCHAR, (N)CLOB
Operation ID (OID)
KEY
(N)VARCHAR, (N)CLOB
Indicator name
VALUE
(N)VARCHAR, (N)CLOB
Indicator value
OUT
INDICATORS
The variable statistics and indicators

When training, testing or querying a model, it's possible to retrieve variable indicators (i.e variable statistics). For each variable, a collection of indicators may be retrieved. These indicators are described using the following attributes: { variable name, indicator name, indicator value, indicator detail (when applicable) }. Indicators are returned from an APL function through an output database table. The output table contains estimator indicators for regression models, to help plotting the regression curve.

Expected table structure:

Column
SQL Data Type
Description
OID
(N)VARCHAR, (N)CLOB
Operation ID (OID)
TARGET
(N)VARCHAR, (N)CLOB
Name of the target, when the indicator is based on it, for example: predictive power of predictive confidence
VARIABLE
(N)VARCHAR, (N)CLOB
Variable name
KEY
(N)VARCHAR, (N)CLOB
Indicator name

VALUE
(N)VARCHAR, (N)CLOB
Indicator value
DETAIL
(N)VARCHAR, (N)CLOB
Indicator detail





What is Application Function Modeler?


The SAP HANA Application Function Modeler (AFM) is the default editor for flowgraphs. A flowgraph is a development object. It is stored in a project and has extension .hdbflowgraph. By default, the activation of a flowgraph generates a procedure in the catalog. A flowgraph models a data flow that can contain
  • tables, views, and procedures from the catalog
  • relational operators such as projection, filter, union, and join
  • functions from Application Function Libraries (AFL) installed on your system
  • attribute view and calculation view development objects
In addition the AFM provides support for some optional, additional components of the SAP HANA Platform such as
  • the Business Function Library
  • the Predictive Analysis Library
  • R Scripts
  • Data Provisioning operators
  • the generation of task plans
We will be using Application function modeler to create flowgraph consuming APL forecast function. This will generate procedure in our defined schema. We will be following below approach to create forecast flowgraph.
  • Create Catalog Objects such as schema, source tables, function tables, output tables
  • Create Flowgraph

Assumptions: 

  • XS project has already been created. In this example we have used following XS project: APP1


Create Catalog Objects

Schema

  • First of all, please create schema definition file named APP1.hdbschema using following method:
  • Right click project name and select New > Other > SAP HANA > Database Development > Schema
  • Select the folder where you want to save this file and give it a name “APP1”. Please make sure there is no template selected
  • Once file is created then open it and enter following
              schema_name = "APP1";
  •  Right click and activate it. This will create the schema
  •  Run following SQL statements to grant privileges to _SYS_REPO
          grant alter on schema "APP1" to "_SYS_REPO";

Tables

Please create DDL source file (core data services) using following method. 
  • Right click project name and select New > Other > SAP HANA > Database Development > DDL source file
  • Select the folder where you want to save this file and give it a name. Please make sure you have selected empty structure. Perform this three times and create three files func_data.hdbdd, output_data.hdbdd, source_data.hdbdd.
  • Once files are created then open it and enter following:
      

func_data.hdbdd

namespace APP1.db;

@Schema: 'APP1'
context func_data {

@Catalog.tableType : #COLUMN
@nokey
    entity FUNCTION_HEADER {
        KEY : hana.VARCHAR(50);
        VALUE : hana.VARCHAR(50);
          
    };

@Catalog.tableType : #COLUMN
@nokey
       entity OPERATION_CONFIG {
       KEY : hana.VARCHAR(1000);
        VALUE : hana.VARCHAR(50);
          
    };

@Catalog.tableType : #COLUMN
@nokey
    entity VARIABLES_ROLES {
       NAME : hana.VARCHAR(50);
        ROLE : hana.VARCHAR(10);
          
    };

@Catalog.tableType : #COLUMN
@nokey
    entity VARIABLES_DESC {
     RANK : Integer;
     NAME : hana.VARCHAR(50);
     STORAGE : hana.VARCHAR(10);
     VALUETYPE : hana.VARCHAR(10);
     KEYLEVEL : Integer;
     ORDERLEVEL : Integer;
     MISSINGSTRING : hana.VARCHAR(50);
     GROUPNAME : hana.VARCHAR(50);
     DESCRIPTION : hana.VARCHAR(100);
          
    };
};

source_data

namespace APP1.db;

@Schema: 'APP1'
context source_data {

    @Catalog.tableType : #COLUMN
       Entity DATA {
              key Date: LocalDate;
                     Produced: Double;
    };
 
};

output_data

namespace APP1.db;

@Schema: APP1'
context output_data {
@Catalog.tableType : #COLUMN
@nokey       
    entity FORECAST_OUTPUT {
        Date : LocalDate;
            Produced: Double;
            kts_1: Double;
    };
@Catalog.tableType : #COLUMN
@nokey
       entity FORECAST_LOG {
               OID: hana.VARCHAR(50);
       TIMESTAMP: UTCTimestamp;
       LEVEL: Integer;
       ORIGIN: hana.VARCHAR(50);
       MESSAGE: hana.CLOB;
    };
 @Catalog.tableType : #COLUMN
@nokey  
    entity FORECAST_SUMM {
       OID: hana.VARCHAR(50);
              KEY: hana.VARCHAR(100);
       VALUE: hana.VARCHAR(100);
   
    };
 @Catalog.tableType : #COLUMN
@nokey  
    entity FORECAST_INDI {
    OID: hana.VARCHAR(50);
    VARIABLE: hana.VARCHAR(100);
    TARGET: hana.VARCHAR(100);
    KEY: hana.VARCHAR(100);
    VALUE: hana.VARCHAR(100);
    DETAIL: hana.CLOB;
      
       };

};



  • Save the files, right click on them and activate. This will create required tables in schema “APP1”. You will see following tables in schema



Creating Flowgraph

Assumption: 

Source data table is populated with data

Please create FORECAST_FG.hdbflowgraph file using following method:
  •          Right click project name and select New > Other > SAP HANA > Database Development > flowgraph model              
  •     Select the folder where you want to save this file and give it a name.

  • Once files are created then open it and right click on empty space to go to properties of this flowgraph model. Please change the schema from _SYS_BIC to APP1


  •  Add forecast function (drag and drop) from general list in node palette



  • Go to function properties and then choose APL_AREA in area and once functions are populated then then choose FORECAST function. Your screen should be like below:





  • Next, we will add input table and then map it to corresponding function node. Please select FUNCTION_HEADER from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Source” when system asks you. Now, create a connection from this table to FUNC_HEADER node in function. Please note you can name this each node as per your need.


  • Please perform similar for all the input tables and your screen should be like below:



  • We will now add output tables and then map it to corresponding function node. Please select FORECAST_OUTPUT from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Sink” when system asks you.
  • Before we create a connection between node and table we will need to define the columns in output node of function. Please use similar table structure as we defined above when creating output table in schema APP1. Go to properties of node > signature > add/change columns and its data types
  • Now, create a connection from node “Output_Data” to this table FORECAST_OUTPUT. Please note you can name each node as per your need.



  • If you are going to truncate table everytime you run procedure then please select following option in node properties.



  • Please perform similar for all the output tables and your screen should be like below:

  • Please note that in each node there is green tick which means there is no errors
  • Now, please activate this model
  • Once activated you will see new procedures and table types are created in schema APP1




  • To call this procedure please use following SQL statement:


CALL "APP1"."APP1.fg::FORECAST_FG"();




References

How to Install the Automated Predictive Library... | SCN. 2015. How to Install the Automated Predictive Library... | SCN. [ONLINE] Available at:http://scn.sap.com/community/hana-in-memory/blog/2015/02/19/how-to-install-the-automated-predictive-library-in-sap-hana. [Accessed 11 March 2015].

What is the SAP Automated Predictive Library (A... | SCN. 2015. What is the SAP Automated Predictive Library (A... | SCN. [ONLINE] Available at:http://scn.sap.com/community/predictive-analysis/blog/2015/03/02/what-is-the-sap-automated-predictive-library-apl-for-sap-hana. [Accessed 11 March 2015].

SAP Help, SAP Automated Predictive Library Reference Guide, viewed 20th March, 2015, https://websmp203.sap-ag.de/~sapidb/012002523100002180172015E/apl11_apl_user_guide_en.pdf




Comments

Popular posts from this blog

SAP HANA Input Parameter - Derived from Table - Use Expression in Filter Value

SLT Replication Issue Due To Timeout Reading Logging Tables