KB485454: Stream Kafka Data Directly to MicroStrategy Cubes

Principal Consultant • MicroStrategy

This article describes procedure of streaming Kafka data directly to MicroStrategy Cubes.

Starting with the release of Strategy ONE (March 2024), dossiers are also known as dashboards.

Overview

Almost every organization has a requirement for near-real time dashboards to show what is happening in a key business process at the current moment. Many scenarios exist where up-to-the-minute performance indicators, inventory levels, sales volumes, back-orders, equipment status or user activity can be visualized for decision support or to create notifications, such as:

Process Monitoring
Market Ticker Analytics
Point-of-Sale Sales Tracking
Inventory Tracking
Incident Management
Delivery Tracking
Proactive Event Management
Automation of early root-cause identification
Up-to-the-minute performance vs past performance
Real-time compliance testing

The obstacle to near-real-time dashboards is usually the latency for data to be acquired, ingested and processed into the organization’s data warehouse or data lake (e.g. ETL and batch processing) and made available to the analytics system. Data may not be available for several minutes, hours or until the next day.
The Kafka Strategy Cube Writer (KMCW) is a python application built using Strategy’s MSTRIO python library and Strategy REST APIs. It fetches messages from Kafka Streams and writes the data to Strategy Super Cubes and has many features which allow a broad range of use cases to be implemented.
KMCW runs on Linux as a single instance (i.e. one cube writer) or several instances (multiple cube writers) or in Kubernetes containers.

Features

KMCW features include:

New use case implementation without code changes
Uses Strategy MSTRIO and REST APIs
Optional Containerization (Kubernetes)
High throughput and reliability
Data blending with history
Stream filtering (inbound)
Dataset filtering before writing to cube (outbound
Standard Transformation Plug-in to provide processing columns
Enterprise Monitoring Plug-in with built-in metrics
User Plug-in:

provides ability to load Strategy reports for data enrichment
provides ability to summarize stream data before writing to cube
provides ability to do additional data modifications

Secure Kafka and Strategy connections
Configurable polling strategies and poll buffer sizes and time limits
Configurable cube writing strategies and batch sizes
Configurable poll batching strategies (timer, gap detection, continuous)
Multiple independent cube writers running concurrently
Multiple cohorts (concurrent cube writers coordinated by semaphores) writing to one table in a cube
Multiple cohorts writing to multiple tables in a cube

Use cases

Here are some examples of use case scenarios where the KMCW facilitates real-time dashboards:

Retail

Incident Management

Brokerage

Logistics

Quick Start

Requirements:

Strategy environment, with Library running and user account having permission to run REST APIs
Active Kafka streams available via Kafka brokers
Linux, VM or Mac OSX environment to host KMCW

If you have a Strategy environment, Active Kafka Streams/Brokers and a Linux environment:

Install python3.6 (or higher) on an available Linux server
Download KMCW python code and scripts from GitHub: https://github.com/kjosmond/kafka_mstr_cubewriter
Configure KMCW environment:

Copy file <home_dir>/kafka_mstr_cubewriter/bin/profile to <home_dir>/.profile
Edit .profile and put in Strategy username and password, and name of KMCW you are creating
Set environment variables by running:
```
. .profile
```

From the kafka_mstr_cubewriter, add all required python modules by running:
```
pip3 install -r requirements/requirements.txt
```
Configure KMCW properties file as follows:

Copy conf/dev/kmcw_default.json to conf/dev/<MY NEW APP NAME>.json
Edit conf/dev/<MY NEW APP NAME>.json with the changes shown in the following json fragments, replacing the values in <> with the desired values. Note that you do not need to put the
```
mstr_user
```
and
```
mstr_pass
```
in here, because they are set by the .profile file as environment variables which KMCW will use instead. Add a stream column definition for each column in the Kafka message. The debug level can be set to 2 here so that you can observe what KMCW is doing in the log. Normally debug level is set to 1.


{
    "application": {
        "name": "kafka_mstr_cubewriter",
        "service": "<MY NEW APP NAME>",
        "pid_dat_file": "dat/<MY NEW APP NAME>.pid",
    },
    "logging": {
        "debug_level": 2,
    },
    "Strategy": {
        "connect": {
            "base_url": "https://<LIBRARY SERVER>/StrategyLibrary/api",
            "project_name": "<PROJECT NAME>",
            "folder_path": "/<PROJECT NAME>/Public Objects/Reports/<FOLDER NAME>",
        },
        "cube": {
            "config": {
                "cube_name": "<CUBE NAME>",
                "cube_id_file": "dat/<MY NEW APP NAME>.dat", 
            }
    },
    "kafka": { 
        "topic_list": [
            "<KAFKA TOPIC>"
        ],
        "bootstrap_servers": [
            {
                "host": "<KAFKA BROKER HOST NAME OR IP ADDRESS>",
                "port": <HOST PORT NUMBER>
            }
        ],
    },
    "dataframe": {
        "column_definition": [
            {
                "stream_column_name": "<STREAM COLUMN 1>",
                "data_type": "object",
                "element_type": "attribute",
                "send_to_cube": "Y",
                "cube_column_name": "<CUBE COLUMN 1>",
            },
            {
                "stream_column_name": "<STREAM COLUMN 2>",
                "data_type": "int64",
                "element_type": "attribute",
                "send_to_cube": "Y",
                "cube_column_name": "<CUBE COLUMN 2
            },
            …
            {
                "stream_column_name": "<STREAM COLUMN n>",
                "data_type": "float64",
                "element_type": "metric",
                "send_to_cube": "Y",
                "cube_column_name": "<CUBE COLUMN n 
    }
 }

From the kafka_mstr_cubewriter, run KMCW with the command:
```
bin/kafka_mstr_cubewriter.sh
```
```
 .
```
If the configuration settings are correct the following will happen:

KMCW connects to Strategy and crates a new cube
KMCW connects to Kafka brokers and reads topics
Data flowing from the topic will be written into the cube

Create a dossier in Strategy:

Add the cube as a dataset
Create a visualization and add the elements from the cube
Configure the dossier to refresh automatically every 15 seconds
Save the dossier
Run in presentation mode

Use
```
mstr_ko.sh
```
to produce more test messages
Observe that the dossier shows updated data every 15 seconds (or at the interval that topic messages are produced + 15 seconds)
You can stop the cube writer by entering the command:
```
bin/kmcw.sh command=stop_all
```

If you have a Strategy environment, and a Linux (or VM or Mac OSX) environment but no Active Kafka Streams/Brokers, you can run the example programs:

Install python3.6 (or higher) on an available Linux server (or VM or Mac OSX)
Download and install Apache Kafka on the same Linux server as the KMCW code
Download KMCW python code and scripts from GitHub: https://github.com/kjosmond/kafka_mstr_cubewriter
Configure KMCW environment:

Copy file <home_dir>/kafka_mstr_cubewriter/bin/profile to <home_dir>/.profile
Edit .profile and put in Strategy username and password, and the location of the Kafka server installation
Set environment variables by running:
```
. .profile
```

From the kafka_mstr_cubewriter, add all required python modules by running:
```
pip3 install -r requirements/requirements.txt
```
Configure KMCW properties file as follows:

Edit conf/dev/kmcw_example_app.json as follows in the following json fragments, replacing the values in <> with the desired values. Note that you do not need to put the
```
mstr_user
```
and
```
mstr_pass
```
in here, because they are set by the .profile file as environment variables which KMCW will use instead.


{
    "logging": {
        "debug_level": 2,
    },
    "Strategy": {
        "connect": {
            "base_url": "https://<LIBRARY SERVER>/StrategyLibrary/api",
            "project_name": "<PROJECT NAME>",
            "folder_path": "/<PROJECT NAME>/Public Objects/Reports/<FOLDER NAME>",
        }
}

Use
```
mstr_ko.sh
```
utility to Start up the Kafka server and Kafka Connect. Add Kafka topics as follows:

Run
```
bin/mstr_ko.sh
```
Enter the option: start_kafka
Enter the option: start_kafka_connect
Enter the option: new_topic
Enter the value when prompted: mstr_kafka_example_app
Enter the option: new_topic
Enter the value when prompted: mstr_kafka_example_app2

From the kafka_mstr_cubewriter, run KMCW with the command:
```
bin/kafka_mstr_cubewriter.sh . 
```
If the configuration settings are correct the following will happen:

KMCW connects to Strategy and crates a new cube
KMCW connects to Kafka brokers and reads topics

Use
```
mstr_ko.sh
```
to produce 1000 test messages as follows:

Run
```
bin/mstr_ko.sh
```
Enter the option: run_producer_py
Enter 1000 when prompted
Producer will create 1000 test messages
Data flowing from the topic will be written into the cube

Create a dossier in Strategy:

Add the cube as a dataset
Create a visualization and add the elements from the cube
Configure the dossier to refresh automatically every 15 seconds
Save the dossier
Run in presentation mode

Use
```
mstr_ko.sh
```
to produce more test messages
Observe that the dossier shows updated data every 15 seconds after the messages are generated until all messages are loaded into the cube
Stop the cube writer by entering the command:
```
bin/kmcw.sh command=stop_all
```
You can also try the example application that uses 3 cohorts, i.e. 3 cube writers updating the same cube simultaneously. To do this:

Update the files:

conf/dev/kmcw_cohort_example_app1.json
conf/dev/kmcw_cohort_example_app2.json
conf/dev/kmcw_cohort_example_app3.json

Make the same changes that are shown in step 6
Edit the .profile file and change the application name to kmcw_cohort_example_app (when KMCW starts, it automatically looks for properties files with that name and a suffix such as 1, 2, 3…)
Run KMCW with the command:
```
bin/kafka_mstr_cubewriter.sh
```
Use
```
 mstr_ko.sh 
```
to produce 1000 test messages
Create a dossier in Strategy to monitor the new 3-cohort cube
Use
```
mstr_ko.sh
```
to produce more test messages
Observe that the dossier shows updated data periodically

Comment

0 comments

Details

Knowledge Article

Published:

April 13, 2022

Last Updated:

March 21, 2024