I. Introduction
Clustering is a popular strategy in which multiple machines are configured to provide common services in a unified manner. Clustering provides benefits such as higher availability through failover; better performance through load distribution; and greater scalability through an infrastructure for adding new resources into the system as system demands grow.
Strategy Intelligence Server provides out-of-the-box clustering capabilities. The purpose of this document is to guide users in the process of setting up a Strategy Intelligence Server cluster.
II. Setting Up a Cluster of Intelligence Servers
Connecting multiple Server machines together - each running Strategy Intelligence Server - leverages the load balancing and resource sharing designed into the Strategy Platform. This document will outline the procedure and considerations to create a clustered Intelligence Server environment.
- Hardware Choices
Load balancing and configuration are much simpler if identical hardware is used for each of the nodes. Identical hardware is not required, but it greatly simplifies the effort required to tune the system and will therefore be the assumed situation for the purpose of this document. In addition to the hardware for the Intelligence Server, the Relational Database Management System (RDBMS) containing the metadata and warehouse instances should already be set up on machines separate from the Intelligence Server nodes. Data Source Names (DSNs) pointing to the databases, both metadata and warehouse, must be created on all nodes with identical names for the DSNs.
One and only one metadata is required for a clustered configuration. Each node of the cluster must be set up to point to the same metadata repository. Information on the configuration will be stored within the metadata. The metadata may be created from any of the nodes and only needs to be done once. Use the Strategy Configuration Wizard to create a new metadata, or an existing one may be used.
- Strategy Configuration Wizard
The Strategy Configuration Wizard makes changes in two important locations: the registry and the selected metadata. The registry contains the information needed to connect to the metadata and determine what server definition to use. Within the registry, Intelligence Server has its connection information stored as a data source called 'CastorServer', a fixed name that cannot be changed. Also within this registry key is the Server Instance, which defines the server definition to be used once the metadata is accessed. The Strategy Configuration Wizard must be run on each Intelligence Server node in the cluster.
- Server Configuration
- Server definitions
Server definitions hold configuration information accessible from Strategy Developer through menu items 'Configure Strategy Intelligence Server' and 'Project Configuration'. All nodes should share a single definition.
It is required for cluster nodes to share a server definition. If there is dissimilar hardware or a conscious decision to differentiate the behavior of different servers, we can consider to configure the cluster nodes in asymmetric way. For example, if machines with different amounts of Random Access Memory (RAM) are clustered together, it may be wise to load the light weight projects in those less powerful node, and those heavy projects in those more powerful nodes.
Multiple server definitions may be created within the metadata. Going through the Strategy Configuration Wizard sets the desired server definition to be included in the registry and to be used by the machine on which the wizard was run. Re-running the Strategy Configuration Wizard allows the changing of selected server definitions, or the creation of new definitions.
- Project State
One requirement of a clustered configuration is that all projects be in the same state on each node, either loaded or unloaded. Effectively, this means that any projects to be used by the cluster must be set to load at start-up. If multiple server definitions are used, this setting will need to be modified on every server definition. Strategy also provides the capability to create an asymmetrical cluster that allows different servers to selectively load different combinations of projects.
- Service user identity
When Intelligence Server is installed, the last step is to choose a user identity under which the service will run. In order to run in a clustered configuration this user must be a domain account that has a trust relationship with each of the computers in the cluster. This is needed because resources will be shared across the network. More specific access privileges are discussed in the cache set-up section.
- Cache and History List Set-up
Report caches and history list messages are stored both in memory and on disk. Two main methodologies may be employed for the location of the cache files and history list files on disk. All report caches and history lists may be located on a single network shared drive apart from the Intelligence Server nodes, or each Intelligence Server machine may host its own local cache that is accessible to all nodes in the cluster. Different methodologies may be used for caches and history lists. However, for ease of administration, it is recommended that the same methodology be used both for report caches and history lists.
Regardless of the location chosen for cache file storage, the user account under which the Intelligence Server Service is running must have full control of the relevant folders. Otherwise, Intelligence Server will have problems with the creation and access of cache and history list files.
Using a single file server for all caches and history lists provides all of the benefits of any good file server. There can be hardware redundancy, backups and fault tolerance. In addition to the fault tolerance provided by the file share, there is the added benefit that if a cluster node goes down unexpectedly, the cache resources created by that node on the file server are still available to the other nodes.
The use of multiple local cache and history list locations provides the benefit of faster cache access for the local machine. Any report cache available locally will be used before retrieving from another machine. The cache and history list creation process is also faster as it is done to the local machine. This configuration is also less expensive since a dedicated file server is not required. This is the default configuration.
- Multiple local cache and history list locations
To set up caches to use multiple local caches on each node, the cache location should be .\Caches\ServerDefinition, where ServerDefinition is a name to reference the folder containing the caches. The name of the server definition is the default. Once this folder is created it must be located in the file system and shared with the exact share name 'ClusterCaches'. This is the share name Strategy Intelligence Server will look for on other nodes to retrieve caches.
Setting up history lists to use multiple local disk backups on each node involves a similar process. The history list location should be .\Inbox\ServerDefinition, where Server Definition is a name to reference the folder containing the history lists. This folder must be shared with the exact share name 'ClusterInBox', since this is the share name used by Strategy Intelligence Server to look for history lists on other nodes.
In Strategy, sharing of Intelligent Cube data files is configured similarly. The file location in project configuration is .\Cube\ServerDefinition, and this folder should be shared as 'ClusterCube'.
As long as each node is using the same server definition, the setting of the cache and history list locations in the Intelligence Server configuration only needs to be done once; however, the creation of the shared folders will need to be done on each node.
For steps on how to set up cache and inbox messages locally on each node in a clustered configuration for Strategy Intelligence Server Universal, refer to the following Strategy Knowledge Base technical note:
KB441125: How to configure Cache, Cube, Inbox, and Session Recovery file sharing for a MicroStrategy Intelligence Server cluster
- Single Shared network location
To set up caches on a network file server, create a directory to hold the caches on the file server and give it a shared name as desired. In the project configuration, set the cache location to be \\MachineName\ShareName. This only needs to be done once per server definition.
Similarly, to create history lists on a network file server, create a directory to hold the history lists on the file server, and give this directory a shared name. In the Intelligence Server configuration, set the history list location to be \\MachineName\ShareName. This also needs to be done only once per server definition.
Note: In UNIX environments, shared folders may be specified with an absolute path, but this absolute path must begin with double backslashes to be considered a shared file location. \\machinename\mountname is a valid shared location, whereas /machinename/mountname will be accessed as a relative path for locally-stored files.
Notice that report cache settings are done per project and different projects may use different methods of report cache storage. Different projects may also use different locations for their cache repositories. However, history list settings are done per server definition. As a result, different projects cannot use different locations for their history list backups.
- Updating caches
The process of updating caches takes place in two general ways. Caches may be expired so that subsequent running of the reports creates new caches, or reports may be scheduled to refresh the cache. Scheduled reports generate their own Structured Query Language (SQL) and create new cache files. There are two types of scheduled reports, each with differing behavior in a cluster.
- In a Strategy Intelligence Server and newer clustered environment, subscriptions created using a time-based schedule are not restricted to the primary node only. They are load balanced across all nodes of the cluster.
An administrative task created using time-based schedule only runs on the primary node. However, for tasks which are relevant to be performed on all nodes of a cluster such as deletion of caches, users can schedule an administrative task to be performed on all nodes of the cluster. Refer to the following Strategy Knowledge Base document for more information:
KB30866 : New feature in Strategy Intelligence Server 9.0: Schedule administrative tasks to be performed on all nodes of the cluster
- Event Based Schedules
Event based schedules run reports and tasks when the event associated with them is triggered.
In a Strategy Intelligence Server and newer clustered environment, subscriptions created created using an event-based schedule will not be restricted to any particular node. It will be load balanced across all nodes of the cluster.
An administrative task created using an event-based schedule will still only run on node where the event is triggered. However, like time triggered schedules, using event based schedules, tasks which are relevant to be performed on all nodes of a cluster such as deletion of caches, can be schedules to be performed on all nodes of the cluster or on the local node. See KB30866 for more details on how to do this.
- Expiring and Invalidating Caches:
There are a number of ways in which a cache can be invalidated, deleted or expired. The various methods are:
- Through a scheduled administrative task (Invalidation and Deletion only)
- By executing the appropriate command against the Intelligence Server from Command Manager
- Using the M8CAHUTL.exe utility from Windows
- A description of how to create an administrative task to invalidate/delete caches is available in the following Strategy Knowledge Base technical note:
KB13711 - How to invalidate/delete caches and History List messages in Strategy Intelligence Server using a scheduled administration task
When an administrative task is scheduled with a time based schedule, it will only execute on the primary node. So to be able to make this task apply over all the nodes of the cluster it is recommended to make the task event triggered. The trigger should also be kicked off against each node of the cluster.
Use M8CAHUTL.exe. This is a Windows-only utility, but it can be run against Intelligence Servers running on Unix/Linux platforms as well. Since this utility only operates against a single node at a time, it must be run against each Intelligence Server node in the cluster individually.
III. Setting Up a Web connection to a cluster
- Selecting an Intelligence Server
From the web administration page connections to Intelligence Servers are made. If the Intelligence Servers are on the same subnet and accessible by User Datagram Protocol (UDP), then the administrator page can dynamically list the servers by looking for the listener service running on the machines. Connecting Web to a server may be done from this list, if the server is listed, or by manually typing the server name running Intelligence Server. In either case if the node selected is part of a cluster the entire cluster will appear and will be labeled in the Web Administrator interface as a single cluster.
When Web is connected to a cluster all nodes reference the same project and load balancing directs new connections to Web to the least loaded node as measured by user connections. Once connected to a node the user will run all activity on the same node. Since the projects will be the same name, it may be difficult for the Web end-user to tell which server is being used, except by looking at the Uniform Resource Locator (URL). If nodes are manually removed from the cluster then the projects will be treated as separate in Web and the node connected to will depend on which project is selected. In this case of a broken cluster with multiple projects, the projects may look different but they are still accessing the same metadata.
Note: The same care should be used as when making any operations in 2-tier mode while a simultaneous 3-tier mode connection is employed; in this situation, there are two independent connections being made to the metadata. Strategy employs schema locking to prevent concurrent modifications that could lead to metadata inconsistency.
- Working around a firewall
The details of connecting to Strategy Intelligence Server from Strategy Web through a firewall are the same regardless of the cluster state. The only difference is that allowable ports, sources and destinations may be available between Web and each of the nodes of the cluster.
IV. How clustering works
- Running reports in 3-tier mode
The node that will perform all execution, as well as the node that may be monitored by an administrator, is the node that is connected to through the 3-tier mode client application, usually Strategy Developer. The query flow in a clustered environment is identical to the one in non-clustered environment with the following two exceptions.
The report server keeps a record of caches available on other nodes in the cluster and if a report is not available locally it will retrieve it from another node. If a cache does not exist then following report execution a local cache will be created as usual. Second, a user's history list, which is held in memory by each node, contains direct references to the relevant cache files. Accessing a report through the history list bypasses many of the report execution steps for greater efficiency.
- Running reports in 4-tier mode
The query flow on the Strategy Intelligence Server side is the same as in 3-tier mode, but components are added to communicate to browsers via Hyper Text Transfer Protocol (HTTP) over Transmission Control Protocol/Internet Protocol (TCP/IP) connections. Strategy Web's interaction with Strategy Intelligence Server is as a cluster-aware client that communicates using the eXtensible Markup Language (XML) Application Programming Interface (API). In addition, on the Strategy Intelligence Server side, there are XML cache files created on the server that improve the performance when working with Web clients. - History lists
The history list is a set of pointers to cache files. Each user has their own history list, and each node stores the pointers created for the user while connected to that node. In Strategy Intelligence Server 8.x and later, each node's history list is synchronized with the rest of the cluster.
Strategy Web also uses the history list from Intelligence Server. Since Web distributes connections between Intelligence Servers, it is important that history lists be synchronized.
- Monitoring the cluster
- Scope of Listings
Logging into Desktop and opening any of the various monitors only displays information relevant to the local node. The Job Monitor displays locally executing jobs, the Cache Monitor displays caches created by the local machine and the Connection Monitor displays user connections to the local machine. Even though only local caches are displayed, caches from other nodes may be retrieved as needed. - Job monitor
The Job Monitor records locally running jobs. The JobID numbers are not unique to the cluster but are unique only to the node. Logged statistics will hold information to differentiate the jobs running on different cluster nodes. - Cache Monitor
- Complete vs. Incomplete information
The detail view of the cache monitor includes the following categories.
- Report Name
- Project Name
- Status
- Last Update
- Cache Size (KB)
- Expiration
- Type
- Cache ID
- Showing complete information using the right-click menu adds the following.
- Hit Count (Only includes hits from local node)
- Creation Time
- Last Hit Time (Only includes hits from local node)
- File Name
- Waiting List
- DB connection monitor
The Database (DB) Connection Monitor, like the other monitors, only shows the connections from the local node. The status and number of database connections may be monitored. The number of threads may be configured from the prioritization screen and may be dynamically changed on the fly. Changes made will not be reflected on other nodes even if the servers are using the same definition. The number of threads to connect to the Warehouse is determined at start time and is updated whenever the value in the metadata is changed from the local node. - Cluster Monitor
The Cluster Monitor is the only way of determining the communication status between nodes in a cluster. This monitor shows the health of other nodes as determined by the local node. Workload for the purpose of load balancing has been implemented by counting the number of sessions on a server. As the number of sessions increases in a clustered environment, it is assumed that variation between users will be balanced, and a statistical 'average' user session will be the result on each cluster node; hence the use of the user session count as a measure of node workload.
The detail view of the Cluster Monitor shows the following categories.
- Machine Name
- Workload
- Port
- Status
- The primary values of status are active and stopped. Nodes leaving the cluster will be removed from the list altogether.
- Shutting down a node
- Ways of shutting down
- Administrative shutdown
- Forceful shutdown or Node failure
- A node may be taken down forcefully or administratively. A forceful shutdown may come from a power failure or a software error. Administratively, the node may be removed from the cluster or the Strategy Intelligence Server service may be stopped.
- Results of a shutdown
- Resource availability
If a node is rendered unavailable due to a forceful shutdown, its cache resources are still valid to other members of the cluster and will be accessed if they are available. If they are not available, new caches will be created on other nodes. In an administrative shutdown, the caches associated with the node will no longer be valid for other nodes, even if they are physically available, such as on a file server. - Client connection status
- Developer
3-tier mode client connections that are not cluster-aware, such as Strategy Desktop, will not experience any change if the node is removed from the cluster. The local node will have to regenerate its own caches rather than accessing the resources of other nodes, however. If Strategy Intelligence Server is shut down for any reason, any Desktop clients connected to that Intelligence Server will receive an error message notifying them of the lost connection to Intelligence Server, regardless of whether or not that Intelligence Server was in a cluster. - Web
If a cluster node shuts down while there are web users connected, those jobs will return an error message by default. The error message will offer the option to resubmit the job, in which case Web will automatically reconnect the user to another node. Customizations can automate Strategy Web to seamlessly resubmit these jobs without notifying the user. If the node is removed from the cluster, all existing connections will continue to function connected to that node. Future connections from Strategy Web will be to valid cluster node members. Of course, if a node has been removed from a cluster, it will not have access to other nodes' resources.
- Status after reboot
- Job execution status
If a node goes down for any reason, all jobs on that node are terminated. Restarting the node will provide an empty list of jobs in the job queue. - Cluster membership
If a node is forcefully shut down it will automatically re-join the cluster when it comes back up. If it was removed administratively, either by manual shutdown or cluster removal, it will have to be manually re-added to the cluster. Nodes that are still in the cluster but not available will be listed in the cluster monitor with a status of 'Stopped'. Also in the Intelligence Server Configuration->Clustering, the user can set which machines will join a cluster on startup.
- Other
Forcefully shut down nodes retain their valid caches if they are available. While the node is down, however, there is no way of monitoring the caches, changing their status, or invalidating them. They may be deleted by manually deleting the cache files on the local node, or deleting the appropriate files on a shared network location of cache files. However, this process is difficult, as the naming convention of cache files uses Object Ids and is not legible to the casual user.
V. Troubleshooting Questions
- Connection Problems
Which node am I connected to?
In Desktop the project source definition includes the server name. Desktop only connects to a specific node and therefore the server connected to is completely controllable.
Web dynamically directs new connections to the least loaded node. Although the name of a server name may appear as part of the URL, it is not guaranteed that this server name is the node to which the user is connected. The only reliable way of determining which node a Web user is connected to is by looking in the User Connection Monitor for each node and see which node the user appears in.
Do I need multiple Project sources?
Since Desktop connects to a specific node in a cluster, in order to fully access and monitor the cluster, project sources must be created to each node.
Does it matter which node I connect to in desktop?
Yes. The node you connect to will be the one you are able to monitor and the one any execution will take place on.
Can I control which node I connect to in web?
The node connected to is part of the URL and can therefore be observed. The process of extending and customizing the out of the box load balancing logic is not possible.
- Caches
Did my report hit cache?
If the cache is available on the local node, the Cache Monitor will increment the hit count. If the cache is retrieved from another node then the only indication would be the speed of the response. Statistics tables can provide additional data on cache hits.
What cache did my report hit?
Look in the statistics logs. Cache hit statistics are logged to the IS_CACHE_HIT_STATS table. However, finding the answer to this question for a particular job may be difficult.
Can there be multiple copies of the same cache?
Yes. Event-based schedules create caches on whichever node the event is fired. Also, if a job is initiated on one node and has not yet finished and created a cache, any instance of the same report initiated on another node will also run against the database and create a cache on that node.
How do I automate the deletion of caches on multiple nodes?
The utility M8CAHUTL.exe is a command line utility for automated cache deletion. It can either delete all caches or specific caches based on report ID.
Can I make copies of a cache file to multiple nodes?
If a node is stopped, copying and pasting the cache files and index files from one node's file system to another will cause those caches to be seen by the new node at startup. However, this is not a supported procedure and may have unexpected results.
- Synchronization Problems
When do I need to purge the Object Cache?
When an object is edited on one cluster node, the updated version ID of the object is announced to the other nodes. This allows the other nodes to invalidate the object they may have in memory and retrieve a fresh copy from the metadata when needed again in the future. In this case, there is no need to purge the Object Cache.
If any changes to an object are made in 2-tier mode, those changes will not be propagated to any Strategy Intelligence Servers connected to the metadata. Also, if an object is modified from an Intelligence Server not in the cluster but using the same metadata, then the cluster nodes will not know of the object change. In these cases, the Object Cache would need to be purged. If in doubt, it is often best to purge the Object Cache.
When do I need to purge the Element Cache?
Purging the element cache may be done with the M8CAHUTL.exe utility as well as from the administration interface. The Element Cache should be purged as a routine part of a warehouse load, or any time the elements associated with an attribute may have changed.
Which machine is my history list on?
A user’s history list is no longer stored on the machine to which the user is connected to. Rather, the combined History List in memory is a sum of all local files and is automatically synchronized. Therefore, users cannot tell which pointers are physically located on which machine.
What order should I start the server nodes?
The only reason to be concerned with the order of starting the nodes is to control which node becomes primary. The primary node will be the first server in the cluster to complete its startup sequence. If any node may legitimately be primary, then the servers may be started in any order. The primary node cannot be chosen in the Strategy server administration interface.
Note: In an asymmetric cluster, some projects may not be loaded on the primary node. For these projects, a different node will be primary. Time-based schedules for such projects will run on their own primary nodes, not the overall cluster primary node. Each project's primary node can be discovered in the Cluster Monitor by right-clicking and choosing "View by project."
How should I connect to edit schema objects?
Schema objects should be edited and updated on one of the nodes in the cluster, if the cluster must be running during the changes. It is better if the cluster is taken down and the schema changes are made in 2-tier mode. This will ensure that all nodes of the cluster have current information when they are brought back up.
KB6022