EducationSoftwareStrategy.com
StrategyCommunity

Knowledge Base

Product

Community

Knowledge Base

TopicsBrowse ArticlesDeveloper Zone

Product

Download SoftwareProduct DocumentationSecurity Hub

Education

Tutorial VideosSolution GalleryEducation courses

Community

GuidelinesGrandmastersEvents
x_social-icon_white.svglinkedin_social-icon_white.svg
Strategy logoCommunity

© Strategy Inc. All Rights Reserved.

LegalTerms of UsePrivacy Policy
  1. Home
  2. Topics

KB485454: Stream Kafka Data Directly to MicroStrategy Cubes


Kenneth Osmond

Principal Consultant • MicroStrategy


This article describes procedure of streaming Kafka data directly to MicroStrategy Cubes.

Starting with the release of Strategy ONE (March 2024), dossiers are also known as dashboards.

Overview


Almost every organization has a requirement for near-real time dashboards to show what is happening in a key business process at the current moment. Many scenarios exist where up-to-the-minute performance indicators, inventory levels, sales volumes, back-orders, equipment status or user activity can be visualized for decision support or to create notifications, such as:

  • Process Monitoring
  • Market Ticker Analytics
  • Point-of-Sale Sales Tracking
  • Inventory Tracking
  • Incident Management
  • Delivery Tracking
  • Proactive Event Management
  • Automation of early root-cause identification
  • Up-to-the-minute performance vs past performance
  • Real-time compliance testing


The obstacle to near-real-time dashboards is usually the latency for data to be acquired, ingested and processed into the organization’s data warehouse or data lake (e.g. ETL and batch processing) and made available to the analytics system. Data may not be available for several minutes, hours or until the next day. 
The Kafka Strategy Cube Writer (KMCW) is a python application built using Strategy’s MSTRIO python library and Strategy REST APIs. It fetches messages from Kafka Streams and writes the data to Strategy Super Cubes and has many features which allow a broad range of use cases to be implemented. 
KMCW runs on Linux as a single instance (i.e. one cube writer) or several instances (multiple cube writers) or in Kubernetes containers.

ka0PW0000001JhTYAU_0EM4W000004ZE4o.jpeg

 

Features


KMCW features include:

  • New use case implementation without code changes
  • Uses Strategy MSTRIO and REST APIs
  • Optional Containerization (Kubernetes)
  • High throughput and reliability
  • Data blending with history
  • Stream filtering (inbound)
  • Dataset filtering before writing to cube (outbound
  • Standard Transformation Plug-in to provide processing columns 
  • Enterprise Monitoring Plug-in with built-in metrics
  • User Plug-in:
    • provides ability to load Strategy reports for data enrichment
    • provides ability to summarize stream data before writing to cube
    • provides ability to do additional data modifications
  • Secure Kafka and Strategy connections
  • Configurable polling strategies and poll buffer sizes and time limits
  • Configurable cube writing strategies and batch sizes
  • Configurable poll batching strategies (timer, gap detection, continuous)
  • Multiple independent cube writers running concurrently
  • Multiple cohorts (concurrent cube writers coordinated by semaphores) writing to one table in a cube
  • Multiple cohorts writing to multiple tables in a cube
ka0PW0000001JhTYAU_0EM4W000004ZE58.jpeg

 

Use cases


Here are some examples of use case scenarios where the KMCW facilitates real-time dashboards:

Retail

ka0PW0000001JhTYAU_0EM4W000004ZE5I.jpeg

Incident Management

ka0PW0000001JhTYAU_0EM4W000004ZE5X.jpeg

 

Brokerage

ka0PW0000001JhTYAU_0EM4W000004ZE5h.jpeg

 

Logistics

ka0PW0000001JhTYAU_0EM4W000004ZE5m.jpeg

 

Quick Start

Requirements:

  • Strategy environment, with Library running and user account having permission to run REST APIs
  • Active Kafka streams available via Kafka brokers
  • Linux, VM or Mac OSX environment to host KMCW


 

If you have a Strategy environment, Active Kafka Streams/Brokers and a Linux environment:

  1. Install python3.6 (or higher) on an available Linux server
  2. Download KMCW python code and scripts from GitHub: https://github.com/kjosmond/kafka_mstr_cubewriter
  3. Configure KMCW environment:
    1. Copy file <home_dir>/kafka_mstr_cubewriter/bin/profile to <home_dir>/.profile
    2. Edit .profile and put in Strategy username and password, and name of KMCW you are creating
    3. Set environment variables by running:
      . .profile
  4. From the kafka_mstr_cubewriter, add all required python modules by running:
    pip3 install -r requirements/requirements.txt
  5. Configure KMCW properties file as follows:
    1. Copy conf/dev/kmcw_default.json to conf/dev/<MY NEW APP NAME>.json
    2. Edit conf/dev/<MY NEW APP NAME>.json with the changes shown in the following json fragments, replacing the values in <> with the desired values. Note that you do not need to put the
      mstr_user
      and
      mstr_pass
      in here, because they are set by the .profile file as environment variables which KMCW will use instead. Add a stream column definition for each column in the Kafka message. The debug level can be set to 2 here so that you can observe what KMCW is doing in the log. Normally debug level is set to 1.


{
    "application": {
        "name": "kafka_mstr_cubewriter",
        "service": "<MY NEW APP NAME>",
        "pid_dat_file": "dat/<MY NEW APP NAME>.pid",
    },
    "logging": {
        "debug_level": 2,
    },
    "Strategy": {
        "connect": {
            "base_url": "https://<LIBRARY SERVER>/StrategyLibrary/api",
            "project_name": "<PROJECT NAME>",
            "folder_path": "/<PROJECT NAME>/Public Objects/Reports/<FOLDER NAME>",
        },
        "cube": {
            "config": {
                "cube_name": "<CUBE NAME>",
                "cube_id_file": "dat/<MY NEW APP NAME>.dat", 
            }
    },
    "kafka": { 
        "topic_list": [
            "<KAFKA TOPIC>"
        ],
        "bootstrap_servers": [
            {
                "host": "<KAFKA BROKER HOST NAME OR IP ADDRESS>",
                "port": <HOST PORT NUMBER>
            }
        ],
    },
    "dataframe": {
        "column_definition": [
            {
                "stream_column_name": "<STREAM COLUMN 1>",
                "data_type": "object",
                "element_type": "attribute",
                "send_to_cube": "Y",
                "cube_column_name": "<CUBE COLUMN 1>",
            },
            {
                "stream_column_name": "<STREAM COLUMN 2>",
                "data_type": "int64",
                "element_type": "attribute",
                "send_to_cube": "Y",
                "cube_column_name": "<CUBE COLUMN 2
            },
            …
            {
                "stream_column_name": "<STREAM COLUMN n>",
                "data_type": "float64",
                "element_type": "metric",
                "send_to_cube": "Y",
                "cube_column_name": "<CUBE COLUMN n 
    }
 }

  1. From the kafka_mstr_cubewriter, run KMCW with the command:
    bin/kafka_mstr_cubewriter.sh
    ​​​​​​
     .
     If the configuration settings are correct the following will happen:
    1. KMCW connects to Strategy and crates a new cube
    2. KMCW connects to Kafka brokers and reads topics
    3. Data flowing from the topic will be written into the cube
  2. Create a dossier in Strategy:
    1. Add the cube as a dataset
    2. Create a visualization and add the elements from the cube
    3. Configure the dossier to refresh automatically every 15 seconds
    4. Save the dossier
    5. Run in presentation mode
  3. Use
    mstr_ko.sh
    to produce more test messages
  4. Observe that the dossier shows updated data every 15 seconds (or at the interval that topic messages are produced + 15 seconds)
  5. You can stop the cube writer by entering the command:
    bin/kmcw.sh command=stop_all

If you have a Strategy environment, and a Linux (or VM or Mac OSX) environment but no Active Kafka Streams/Brokers, you can run the example programs:

  1. Install python3.6 (or higher) on an available Linux server (or VM or Mac OSX)
  2. Download and install Apache Kafka on the same Linux server as the KMCW code
  3. Download KMCW python code and scripts from GitHub: https://github.com/kjosmond/kafka_mstr_cubewriter
  4.  Configure KMCW environment:
    1. Copy file <home_dir>/kafka_mstr_cubewriter/bin/profile to <home_dir>/.profile
    2. Edit .profile and put in Strategy username and password, and the location of the Kafka server installation
    3. Set environment variables by running:
      . .profile
  5. From the kafka_mstr_cubewriter, add all required python modules by running:
    pip3 install -r requirements/requirements.txt
  6. Configure KMCW properties file as follows:
    1. Edit conf/dev/kmcw_example_app.json as follows in the following json fragments, replacing the values in <> with the desired values. Note that you do not need to put the
      mstr_user
      and
      mstr_pass
      in here, because they are set by the .profile file as environment variables which KMCW will use instead. 


{
    "logging": {
        "debug_level": 2,
    },
    "Strategy": {
        "connect": {
            "base_url": "https://<LIBRARY SERVER>/StrategyLibrary/api",
            "project_name": "<PROJECT NAME>",
            "folder_path": "/<PROJECT NAME>/Public Objects/Reports/<FOLDER NAME>",
        }
}

  1. Use
    mstr_ko.sh
    utility to Start up the Kafka server and Kafka Connect. Add Kafka topics as follows:
    1. Run
      bin/mstr_ko.sh
    2. Enter the option: start_kafka
    3. Enter the option: start_kafka_connect
    4. Enter the option: new_topic
    5. Enter the value when prompted: mstr_kafka_example_app
    6. Enter the option: new_topic
    7. Enter the value when prompted: mstr_kafka_example_app2
  2. From the kafka_mstr_cubewriter, run KMCW with the command:
    bin/kafka_mstr_cubewriter.sh . 
    If the configuration settings are correct the following will happen:
    1. KMCW connects to Strategy and crates a new cube
    2. KMCW connects to Kafka brokers and reads topics
  3. Use
    mstr_ko.sh
    to produce 1000 test messages as follows:
    1. Run
      bin/mstr_ko.sh
    2. Enter the option: run_producer_py
    3. Enter 1000 when prompted
    4. Producer will create 1000 test messages
    5. Data flowing from the topic will be written into the cube
  4. Create a dossier in Strategy:
    1. Add the cube as a dataset
    2. Create a visualization and add the elements from the cube
    3. Configure the dossier to refresh automatically every 15 seconds
    4. Save the dossier
    5. Run in presentation mode
  5. Use
    mstr_ko.sh
    to produce more test messages
  6. Observe that the dossier shows updated data every 15 seconds after the messages are generated until all messages are loaded into the cube
  7. Stop the cube writer by entering the command: 
    bin/kmcw.sh command=stop_all
  8. You can also try the example application that uses 3 cohorts, i.e. 3 cube writers updating the same cube simultaneously. To do this:
    1. Update the files:
      1. conf/dev/kmcw_cohort_example_app1.json
      2. conf/dev/kmcw_cohort_example_app2.json
      3. conf/dev/kmcw_cohort_example_app3.json
    2. Make the same changes that are shown in step 6
    3. Edit the .profile file and change the application name to kmcw_cohort_example_app (when KMCW starts, it automatically looks for properties files with that name and a suffix such as 1, 2, 3…)
    4. Run KMCW with the command:
      bin/kafka_mstr_cubewriter.sh
    5. Use
       mstr_ko.sh 
      to produce 1000 test messages
    6. Create a dossier in Strategy to monitor the new 3-cohort cube
    7. Use
      mstr_ko.sh
      to produce more test messages
    8. Observe that the dossier shows updated data periodically

 


Comment

0 comments

Details

Knowledge Article

Published:

April 13, 2022

Last Updated:

March 21, 2024