Publish MicroStrategy Cubes from External Workflow Schedulers

Principal Consultant • MicroStrategy

OVERVIEW

A common issue faced by many organizations is the synchronization of data lake updates with Strategy cube refresh. The most common scenario is that the data pipeline to update the data lake is managed by an enterprise scheduler, oozie, streamer or other kind of data ingestion manager. These updates usually occur and are expected to complete within a certain time window, often several times a day or hour. To keep Strategy cubes as current as possible, the usual method is to create Strategy cube publishing subscriptions that are triggered by a time-based schedule. These subscriptions are configured to run shortly after the data lake update is complete.

The synchronization strategy described above is most effective in environments where relevant data lake updates occur less frequently (e.g. daily) and can reliably complete before the Strategy cube refresh occurs.

With the Big Data trend, the factors of Velocity, Volume and Big Data Platform Architecture have an impact on the synchronization strategy in the following ways:

Velocity affects cube refresh in this scenario because data lake updates occur very frequently and require execution of frequent Strategy subscriptions to refresh the cubes, often several times per hour. This can result in multiple overlapping workloads in the Strategy environment and increased job queuing, reducing Strategy performance and increasing latency.

Volume also affects cube refresh because longer times to ingest data into the data lake increases completion duration, leaving a shorter interval between data lake update and Strategy cube refresh. In response to this is cube schedules are delayed and may still be in progress when the next data lake update begins and may still be in progress after it completes. Side effects of the increased volume and attempts to compensate have been observed that include: scheduled cube refresh starting when a prior cube refresh is in progress that causes job cancellation; multiple data lake updates before cube refresh; long running cube refresh jobs when schedules are missed; and capacity waste when multiple cube refreshes done on a single data lake update. Synchronization timing shifts can have different impacts according to the nature of the shift.

Big Data Platform Architecture can also play a role. For example, if the platform is based on Hadoop file system or similar, the tables being updated become unavailable during data lake ingestion. When a Strategy cube refresh occurs and the data lake table is unavailable, or becomes unavailable while the cube query is executing, the cube refresh will fail with a database error, possibly after wasting many processing cycles.

SOLUTION APPLICATION

The solution discussed in this article is a python application that leverages the Strategy API suite to provide an external scheduler, i.e. the scheduler that manages the data lake updates, the ability to initiate an on-demand cube refresh immediately after the data lake update is complete. The application is named Strategy Cube Trigger, abbreviated as “MCT”.

MCT can be used asynchronously where the external scheduler initiates cube refresh but does not receive or interpret any status result, or it can be used synchronously in either a blocking or non-blocking mode where status results can be returned to the external scheduler.

When used synchronously in the non-blocking mode, MCT first checks whether the cube is available to be published and if so, initiates publishing, otherwise it does not attempt to publish and exits. A status result is returned to the external scheduler.

When used synchronously in the blocking mode, MCT first checks whether the cube is available and waits a specified duration for it to become available. If the cube becomes available within the waiting time window, publishing is initiated. A further option is available that causes MCT to wait until publishing is complete before exiting. A status result is returned to the external scheduler.

BENEFITS

Reduces data latency to a minimum between data lake and cubes by initiating publishing immediately after data lake ingestion is complete and data lake tables are available

Eliminates Strategy failed cube publishing attempts when cubes are busy
Eliminates multiple cube publish job queuing on the same timed subscription
Eliminates longer-running cube updates when a refresh cycle is missed
Eliminates redundant cube refreshes
Eliminates job failures due to data lake table unavailability

Reduces Strategy workloads and increases performance

APPLICATION STRUCTURE

The following pseudo-code shows the application flow:


# Pseudo Code for MSTR Cube Trigger 

mstr_cube_trigger(optional_request_list) 

    # Read config from properties file 

    get_properties(properties_file)      

    # Get request list from command line if not passed to function 

    if not optional_request_list: 

        get_json_request_from_argv()     

    # Login to Strategy 

    rest_api_login()  

    # Process request list                    

    for request in request_list: 

        if update_type = "event": 

            # Re-publish by event-triggered subscription 

            trigger_event(request::event_spec) 

        if update_type = "api publish":  

            # Republish by cube api, irr api or subscription 

            if not skip_if_busy(): 

                # Block until cube is available 

                wait_for_cube_ready(request::cube_spec)      

            if cube_ready(): 

                # Publish by API 

                publish(request::cube_spec)                  

                if synchronous = True: 

                    # Block until the cube finishes publishing 

                    wait_for_cube_ready(request::cube_spec)  

    rest_api_logout() 

    exit

APPLICATION FEATURES

Callable from external scheduler via command line interface
Callable from another python script
Requests submitted as json message on command line, argument to python function or from a text file
Multiple cube refresh requests can be packaged into a single request message

Optionally scheduled from cron job to check for requests sent as file by sftp
Logs requests and results
Connections configured in properties file
Connects with standard, ldap or Kerberos authentication
Runs on a server or in a container

Initiate cube refresh subscriptions via Strategy event
Initiate cube refresh directly via Strategy REST APIs
Optionally block until cube is ready when using REST APIs
Optionally block until cube has finished publishing when using REST APIs
Optionally skip refresh if cube is busy when using REST APIs

Synchronous refresh modes can include direct cube refresh for OLAP and MTDI cubes, incremental refresh report (IRR) for OLAP cubes, subscription based refresh for MTDI cubes
Cube refresh options include Add, Update, Replace and Upsert

APPLICATION OVERVIEW – SYNCHRONOUS MODE

Enterprise Workflow launches mstr_cube_trigger.sh script with json string containing cube refresh instructions as parameter.

mstr_cube_trigger.sh writes the json string passed in from the Enterprise Workflow to a file in the requests/in_progress folder.

mstr_cube_trigger.py loads configuration mstr_cube_trigger_properties.json to obtain information such as url for MSTR library, credentials, etc.

mstr_cube_trigger.py dispatches MSTR REST API calls to trigger cube refresh.

mstr_cube_trigger.py writes completion status to mstr_cube_trigger_log_<yyyymmdd>.json.

mstr_cube_trigger.sh moves the file from requests/in_progress folder to either of requests/completed or requests/error folder depending on outcome.

APPLICATION OVERVIEW – ASYNCHRONOUS MODE

Enterprise Workflow transfers trigger request json file to requests folder.

mstr_cube_trigger_cron.sh script is periodically run via cron to check for request files every minute.

mstr_cube_trigger.sh detects new trigger request json file in requests folder and moves it to requests/in_progress folder.

The mstr_cube_trigger_cron.sh launches python script mstr_cube_trigger.py.

mstr_cube_trigger.py loads configuration mstr_cube_trigger_properties.json to obtain information such as url for MSTR library, credentials, etc.

mstr_cube_trigger.py dispatches MSTR REST API calls to trigger cube refresh.

mstr_cube_trigger.py writes completion status to mstr_cube_trigger_log_<yyyymmdd>.json.

mstr_cube_trigger.sh moves the file from requests/in_progress folder to either of requests/completed or requests/error folder depending on outcome.

USE CASES

Here are some examples of use case scenarios for the MCT:

Use Case	Trigger API	Cube Type	Method	Mode	Wait for Cube Ready	Skip Publish If Cube Not Ready	Wait for Publish To Complete	Outcome
U01	Event	MTDI	Event Driven Subscription	Async	n/a	n/a	n/a	Trigger cube refresh
U02	Event	OLAP	Event Driven Subscription	Async	n/a	n/a	n/a	Trigger cube refresh
U03	Event	OLAP	Event Driven IRR Subscription	Async	n/a	n/a	n/a	Trigger cube refresh
U04	Cube	MTDI	API	Async	N	Y	N	Execute refresh if cube ready else skip. Continue without wait.
U05	Cube	OLAP	API	Async	N	Y	N	Execute refresh if cube ready else skip. Continue without wait.
U06	IRR	OLAP	IRR API	Async	N	Y	N	Skip if cube is not ready. Continue without wait.
U07	Subscr	MTDI	Immediate Subscription	Async	n/a	n/a	n/a	Directly execute cube refresh subscription.
U08	Cube	MTDI	API	Sync	Y	N	Y	Wait for cube ready before publish. Wait for cube to finish.
U09	Cube	OLAP	API	Sync	Y	N	Y	Wait for cube ready before publish. Wait for cube to finish.
U10	IRR	OLAP	IRR API	Sync	Y	N	Y	Wait for cube ready before publish. Wait for cube to finish.

EXAMPLES - Trigger by Remote Login or Workflow Agent

File Name: <path>/mstr_cube_trigger/bin/cube_trigger.sh
Usage: cube_trigger.sh <json_trigger_request_string>

Command	Result
cube_trigger.sh ' {"mstr_cube_trigger":[ { "event":{"name":"RefreshCubeDemo"} } ]}'	trigger event by supplying event name
cube_trigger.sh ' {"mstr_cube_trigger":[ { "event":{"id": "EB11845F9BEA70D22019AC39B4195BBD"} } ]}'	trigger event by supplying event id
cube_trigger.sh ' {"mstr_cube_trigger":[ { "project":{ "id":"B016BFB5D096F11FDF107BAA42E836A1"}, "cube":{ "id":"B09B090080EFA512A8FE1FAD1D11EC4C", "skip if busy":"Y", "block until published":"N", "max block minutes":60 } } ]}'	publish cube by supplying project and cube id
cube_trigger.sh ' {"mstr_cube_trigger": [ { "project":{"name":"Network Big Data"}, "folder":{"name":"Finance/Public Objects/Reports/Sales"}, "cube":{ "name":"MTDI Test Cube", "skip if busy":"Y", "block until published":"N", "max block minutes":60 } } ]}'	publish cube by supplying project, folder and cube names
cube_trigger.sh ' {"mstr_cube_trigger":[ { "project":{"id":"B016BFB5D096F11FDF107BAA42E836A1"}, "folder":{"id":"1A725C3268C2D25D608FDFF087F4AD28"}, "cube":{"name": "MTDI Test Cube"} } ]}'	publish cube by supplying project, folder and cube id and name mix

REQUIREMENTS

Strategy environment, with Library running and user account having permission to run REST APIs, triggers and subscriptions
Linux, VM or Mac OSX environment to host MCT

Python 3.10 or higher installed on host
cron and sftp on host if using sftp to deliver requests in text files
enterprise scheduler agent or rsh on host to launch MCT
existing python application workflow orchestrator that can call MCT natively

WHAT’S NEXT?

Another article will be published shortly to show code examples and how install the application.

Comment

0 comments

Details

Knowledge Article

Published:

June 30, 2023

Last Updated:

June 30, 2023

# Pseudo Code for MSTR Cube Trigger mstr_cube_trigger(optional_request_list) # Read config from properties file get_properties(properties_file) # Get request list from command line if not passed to function if not optional_request_list: get_json_request_from_argv() # Login to Strategy rest_api_login() # Process request list for request in request_list: if update_type = "event": # Re-publish by event-triggered subscription trigger_event(request::event_spec) if update_type = "api publish": # Republish by cube api, irr api or subscription if not skip_if_busy(): # Block until cube is available wait_for_cube_ready(request::cube_spec) if cube_ready(): # Publish by API publish(request::cube_spec) if synchronous = True: # Block until the cube finishes publishing wait_for_cube_ready(request::cube_spec) rest_api_logout() exit