Part 1
Part 2
Monitoring WebSphere DataPower SOA Appliances:
Introduction
The IBM® WebSphere® DataPower® SOA Appliance (hereafter called DataPower) is a purpose-built hardware platform designed to simplify, secure, and accelerate XML, Web services, and Enterprise Service Bus deployments.
As with other network appliances, monitoring the health and capacity of DataPower appliances will ensure that they are ready to perform the functions for which they are configured. Monitoring not only notifies administrators of exceptions, it also provides trending analysis for managing the appliances and their capacity utilization over time, thus enabling the organization to maximize its return-on-investment and receive warnings of increases in network volumes and potential capacity issues.
This article describes various DataPower status inquiry methods and presents strategies and best practices for interpreting them. This article is based on DataPower Firmware Revision 3.8.0. Monitoring status providers may change with enhancements to the firmware, so you should check current firmware documentation for any additions to monitoring components.
Why monitor?
The DataPower Appliance family consists of 1U rack-mountable network devices. The latest generation devices (9235/9004 class) contain four gigabit RJ-45 Ethernet interfaces, a DB-9 Serial port, hot swappable power supplies and fan-trays, batteries, eight gigabytes of RAM, compact flash-based file system, and other components within a tamper-proof case. Optional features including internal hard drives, hardened cryptographic modules, and additional compact flash bays.
Each of these components helps ensure that the device is properly configured for the amount of network data it receives. Knowing that the devices are functioning properly ensures that they are available and ready to process this traffic. For example, if you are alerted to variations in the performance of the device’s fans, you may avoid having to take the device offline for unanticipated service. Understanding the level of network traffic and being aware of incremental changes may avoid bottlenecks as traffic increases over time.
Monitoring fundamentals on DataPower
DataPower provides a variety of information regarding general system health as well as consumption of resources and services. Physical parameters range from the temperature of CPUs, utilization of memory and file system, interface utilization, and voltage reading, among other physical values. In addition, there are more formulaic indicators, such as System Usage, which is a calculation of system capacity.
DataPower exposes these status values in a variety of ways. You can use the Web GUI or Command Line Interface (CLI) show commands to browse a list of status values. Or you can use the XML Management Interface (XMI) to send SOAP messages containing dp:get-status requests to the device, which responds with status information contained in SOAP responses. DataPower also supports the Simple Network Management Protocol (SNMP) and acts as an SNMP agent, providing status information in response to SNMP operations and in the creation of alerts via the SNMP notification mechanism.
Figure 1 shows the CPU usage as displayed within the Web GUI. It is obtained by navigating from Status Menu => System => CPU Usage. The data is displayed in a table incrementing from the latest 10 seconds through to the latest 24 hours.
Figure 1. Web GUI CPU usage display
The CLI show commands are used to display status information, and Listing 1 shows the show cpu command, which provides the same table of data shown in the Web GUI:
Listing 1. CLI Show CPU command
xi50# show cpu 10 sec 1 min 10 min 1 hour 1 day cpu usage (%): 1 1 7 7 7
While the Web GUI and CLI are convenient tools to fetch status information interactively, the XMI can be programmatically integrated into more complex solutions. For example, a Java™ class could execute a dp:get-status request and perhaps perform configuration modification based on the response. The SOAP request in Listing 2 shows a dp:get-status request to fetch CPU usage status:
Listing 2 Sample get status XMI request
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"> <env:Body> <dp:request xmlns:dp="http://www.datapower.com/schemas/management"> <dp:get-status class="CPUUsage"/> </dp:request> </env:Body> </env:Envelope>
The response is returned in a SOAP payload, as shown in Listing 3 below. Again, the CPU status is returned within a subtree containing the same table of data returned by the Web GUI and CLI:
Listing 3 XMI dp:get-status response
<?xml version="1.0" encoding="UTF-8"?> <env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"> <env:Body> <dp:response xmlns:dp="http://www.datapower.com/schemas/management"> <dp:timestamp>2009-09-24T11:56:22-04:00</dp:timestamp> <dp:status> <CPUUsage xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <tenSeconds>1</tenSeconds> <oneMinute>1</oneMinute> <tenMinutes>1</tenMinutes> <oneHour>1</oneHour> <oneDay>1</oneDay> </CPUUsage> </dp:status> </dp:response> </env:Body> </env:Envelope>
You can get a vast amount of status data using the dp:get-status request. For more information, including details of the schemas and WSDL used to customize dp:get-status and other XMI operations, see the IBM Redpaper WebSphere DataPower SOA Appliances: The XML Management Interface.
Most organizations query the health and capacity of a network device using the SNMP protocol in conjunction with tools such as those in the IBM Tivoli Monitoring (ITM) and Tivoli Composite Application Manager (ITCAM) product families. These tools use SNMP over UDP to poll an SNMP agent for device and application metrics. The management software may also receive notification alerts from the agent in response to particular events happening on the device. The DataPower appliance may be configured to act as a SNMP agent, responding to inbound polling requests and sending alerts in response to preconfigured events.
SNMP status variables are organized in hierarchies, which are described by the Management Information Base (MIB) document. Each metric that can be polled is addressed by an Object Identifier (OID). Some metrics are scalar objects describing a single data point, such as the current firmware version on the appliance. Other metrics may be tabular, such as the CPU status provided in the previous examples. When a specific OID is known, a GET OID can be used by the SNMP manager to get the specific metric. If all metrics in a specific hierarchy are desired, a Get Subtree can be used to get all values within that hierarchy. The DataPower appliance provides three Enterprise MIB documents for configuration, status, and notification. It is the status MIB that we are interested in.
While status inquiry is a straightforward endeavor, alerting is done using several DataPower objects. The device has four built-in notification alerts: authenticationFailure, linkDown, coldStart, and linkUp. Others are preconfigured, as described below. A properly configured SNMP monitor receives these traps in the event that the device restarts, its interfaces become enabled or disabled, or when a failed attempt to access the device occurs. In addition to the built-in alerts, custom alerts may be generated by subscribing to a list of error conditions or in conjunction with the logging system.
Reliance on alerts alone is not a sufficient monitoring strategy. For example, if the event causing the alert affects the device’s ability to send the message over the network, the notification may not be received at the SNMP monitor. Therefore it is prudent to combine subscription to alert messages with polling of status information, to provide a robust mechanism for communicating with monitoring tools.
How to monitor
Many status providers (or monitoring agents) are built into the DataPower firmware to fetch status data. Many providers are specific to the device. These providers (such as the environmental components, fans, temperatures, or battery health), are available within the default domain and are always enabled. Other status data (such as transaction rates for DataPower services) are segmented by application domain and may be further segmented by XML-Manager or DataPower service.
While the device-level data is automatically enabled, transaction data such as transaction rates or transaction times is usually available only when Statistics are enabled on the device. There are exceptions to this generalization -- for example, CPU status requires statistic enablement, while System Load does not. Each domain must have its individual Statistics setting enabled to provide domain-specific status.
This section shows you how to enable monitoring of DataPower from SNMP tools, and how to produce SNMP alerts from within DataPower. You'll see how Logging Target configuration can be configured to produce alerts based on system events, and how to subscribe to events such as Out of Memory or Power Supply failure to generate alerts. An example of a Power Supply failure will be used to demonstrate these principles.
SNMP settings must first be configured on the DataPower appliance. This configuration is accessible from the default domain and accessed from the left navigation menu of the DataPower Web GUI by first selecting the Administration menu and then selecting SNMP Settings under the Access heading.
This configuration consists of multiple tabs. The main tab must have the Admin State set to Enabled. Typically, the Local IP Address is set to a Host Alias defined in the default domain that maps to the Management Interface IP, which restricts SNMP polling requests to this IP and not any of the client traffic interfaces (eth0, eth1, or eth2). Figure 2 shows SNMP settings enabled on the default Local Port of 161. Outbound polling responses and traps will be sent out using any appliance interface that has the correct routing. To restrict this outbound traffic to the same IP, add a static route to the appliance's mgt0 configuration.
Figure 2. Enabling SNMP settings
The DataPower MIBs can be downloaded from the appliance to be used by any SNMP management tool. The MIBs enable these tools to translate named objects such as dpStatusMemoryStatusUsage to an OID used to request the metric. All appliance status OIDs are in the drStatusMIB.txt MIB file. Figure 3 shows the Enterprise MIBs tab of the SNMP Settings screen, and the method for downloading the MIBs:
Figure 3. SNMP MIB download tab
The Trap Event Subscription tab contains a list of event codes that can be sent to the management software as an alert. Examples are the codes for "Internal cooling fan has stopped" or "Power supply failure." Figure 4 below shows some of the default preloaded subscriptions. To add additional events, click Select Code. If a specific code is not shown in the list, you can add it manually. For example, adding code
0x806000e2
adds certificate monitor events to indicate when a certificate is nearing expiration. You can get these event codes from their associated log records in the default log. You can also get the event code in the Message Reference document for your firmware release.Figure 4. SNMP Trap Event Subscription
The SNMPV1/V2c Communities tab defines access policies for management software using SNMP V1 and V2. The community name is used as a credential to access the SNMP data on the appliance. A common community name for read-only access is public. A DataPower domain, either the default or an application domain, can be associated with the configured community.
If application data is to be polled, specify the application domain; otherwise use the default domain. Specifying an application domain does not prevent management software from polling device-level metrics such as device load, CPU utilization, memory metrics, and environmental statistics. Additionally, it allows polling of application metrics such as transaction rates and times, MQ queue manager status, message counters, or SLM metrics.
The mode of the community should be configured as read-only for access to appliance status metrics. Finally, a remote host access of 0.0.0.0/0 lets any SNMP manager access this community. It can be restricted to a range of IPs if desired. To configure additional communities, click Add. Figure 5 shows the specification of an SNMP V1/V2c community name of public for the read-only access of application domain status within the swlinn-poc domain.
Figure 5. SNMP Community Settings specifying and application domain
The Trap and Notification Targets tab lets you specify the IP and port of the SNMP manager that will receive SNMP alerts and notifications. The default is UDP port 162. The community name and the SNMP version (1, 2c, or 3) must be specified. If Version 3 is used, a DataPower user name is provided in the Security Name field. This user will be configured with SNMP V3 credentials. The specific events that are alerted are configured on either the SNMP Trap Event Subscription tab or on the subscription configuration of a SNMP logging target. Events preconfigured by default on the SNMP Trap Event Subscription tab are critical device-specific events, such as memory exhaustion, or hardware issues with the power supplies, battery, or fans. To configure additional notification targets, click Add. Figure 6 shows the configuration of the recipient of SNMP alerts using SNMP Version 2c with the community name of public:
Figure 6. SNMP Trap and Notification Targets
Finally, the SNMPV3 Contexts tab gives SNMPV3 managers access to non-default application domains. To allow only SNMP polling, enabling the SNMP settings and providing a SNMPV1/V2c community is all that is required. Trap and notification targets and event subscriptions are required in sending event alerts to an SNMP manager.
As previously mentioned, some status data such as fan speeds and CPU utilization is specific to the device. Other status data such as transaction rates are segmented by application domain and are accumulated only if the statistics setting configuration is enabled, as shown in Figure 7 below. Enabling statistics has a very small impact on system utilization. Adjusting the Load Interval (the frequency of SNMP polling) will further limit this impact.
Figure 7. Statistics enabled per domain
Here is an example of a poll of an appliance metric: An SNMP manager issues a SNMP GET command for the dpStatusMemoryStatusUsage metric, which returns a scalar value of the percentage of memory being utilized. Many SNMP managers, when configured with the DataPower MIBs, provide a tree hierarchy of the status MIB from which the appropriate metric can be selected, the metric polled, and the value displayed.
Application monitoring can also be polled if the application domain is specified in the DataPower SNMP configuration. Depending on the application configuration, specific metrics can be polled to provide data on the health or throughput of the application. These application-related table entries differ from system-level metrics in that they are dynamic and are based on the key fields of these tables. For an example of a poll of an application metric, consider the dpStatusHTTPTransactions2Table table, which contains the transaction rates for all services in a domain over various time intervals. Metrics in this table are based upon the service class, such as XMLFirewallService, and the service name, such as Loopback_FW.
In addition to the event subscriptions that you can specify in the SNMP settings, you can also configure a DataPower logging target to produce SNMP logging events, which enables DataPower to send SNMP alerts for specific events of interest. Select Manage Logging Targets from the left navigation of the DataPower Web GUI from the Administration menu under the Miscellaneous heading. Click Add to create a new logging target, and specify Target Type to be SNMP. Figure 8 shows a log target with an SNMP Target Type:
Figure 8. SNMP logging target
The SNMP logging target can subscribe to and filter events just like any other DataPower logging target. The SNMP configuration's list of trap and notification event codes specifies most critical events. An SNMP logging target in the default domain that subscribed to all events with a severity of critical or above is a similar way to produce these alerts. However, the logging target subscriptions in an application domain are more application specific. For example, you can specify logs with an MQ or SSL log category at the error or above level. You can also specify log messages generated by custom stylesheets using custom log categories. Figure 9 shows the subscription of all critical events for this SNMP type log target:
Figure 9. Logging Target Subscriptions
Now that the steps to configure and enable SNMP alerts have been described, here is a demonstration of a power supply alert. With the above configuration, the plug from one of two power supplies is pulled. Figure 10 shows log entries associated with a power supply failure:
Figure 10. System Log Entries
The SNMP configuration specified no restrictions on the SNMP Managers that could receive alerts from this appliance's public community. Any SNMP manager listening for alerts from this appliance on Port 162 will receive a trap for the power failure event.
This section has shown you how to configure DataPower to enable monitoring of both appliance and application metrics from SNMP tools, and how to produce SNMP alerts from within an appliance. A logging target configuration was configured to produce alerts based on logging events. SNMP configuration was configured to produce alerts by subscribing to systems events (such as "Out of memory" or "Power supply has failed") as well as an application event (an SSL certificate expiration warning). Enabling statistics for application-level metrics was also shown. A poll of the memory metrics was shown to demonstrate monitoring of device metrics, and a poll of the transaction rate table was shown to demonstrate monitoring of application-specific metrics. Finally, an example of a power supply failure was used to demonstrate SNMP alerting.
What to monitor
Monitoring accomplishes multiple goals. The general health of the device and of its various physical components can be ascertained by environmental status information such as temperatures, fan speeds, and the status of batteries and power supplies. System load can be gauged by a special status value known as System Usage, in addition to more familiar measurements such as CPU, memory, and file system utilization. The amount of data being processed by the device can be determined by analyzing network interface consumption. The following section discusses several informative status values. Each section shows how to determine the data from the Web GUI, the element from the XMI response, the CLI command to execute to show the status, and the object from the SNMP Enterprise MIB that contains the value.
General device health and activity monitors
General health and activity monitors ensure that the DataPower device is operating within predefined system parameters. You can analyze system capacity via system load and CPU utilization. You can evaluate uptime to ensure that the device has not experienced an unexpected restart. Fans and temperatures are checked to avoid overheating, which can take a device out of service. The following monitors are involved in these tasks:
System usage
Web GUI | System => System Usage | XMI | SystemUsage/Load |
---|---|---|---|
CLI | Show Load | Status MIB | dpStatusSystemUsageLoad |
System Usage is a measurement of the device’s ability to accept additional work. It is a formulaic calculation based on various components of system load. System Usage is typically considered the best single indicator of overall system capacity. While it may sometimes spike to 100%, typical values are less than 75%. The secondary work list value is a calculation of queued tasks, and is of lesser interest in typical monitoring situations.
Figure 11. System Usage Status
CPU Usage
Web GUI | System => CPU Usage | XMI | CPUUsage |
---|---|---|---|
CLI | Show cpu | Status MIB | dpStatusCPUUsage |
CPU Usage statistics are provided over five time intervals. Many customers are accustomed to monitoring CPU utilization, but this metric in DataPower is not as reliable as System Usage in determining device capacity. DataPower is self-optimizing, and spikes in CPU unassociated with traffic levels may occur as the device performs background activities. CPU usage may sometimes spike all the way up to 100%, but this level is not necessarily a concern unless it is sustained over numerous consecutive polls.
Figure 12. CPU Usage Status
Memory usage
Web GUI | System => System => Memory Usage | XMI | MemoryStatus |
---|---|---|---|
CLI | Show memory | Status MIB | dpStatusMemoryStatus |
Memory Usage statistics are provided for various classifications of the appliance’s flash memory. Statistics include a percentage of total memory utilized; bytes of total, used, and free memory; and of lesser interest in typical monitoring, request, XG4, and held memory. The percentage of used memory depends on the application, the size of request and response messages, and the volume and latency of requests. Typical utilization runs less than 80%, and statistics beyond this threshold are of concern. You can use the device’s Throttle Settings to temporarily slow down request processing or to perform a warm restart, which recaptures memory in this situation.
The following system error codes are associated with these sensors and can be used to trigger alerts from the SNMP Trap Event Subscription configuration:
0x01a40001 Throttling connections due to low memory 0x01a30002 Restart due to low memory 0x01a30003 Memory usage recovered above threshold
Figure 13. Memory Usage Status
File system information
Web GUI | System => System => File system Information | XMI | FilesystemStatus |
---|---|---|---|
CLI | Show Filesystem | Status MIB | dpStatusFilesystemStatus |
File system statistics are provided for free and total space of the encrypted, temporary, and internal file systems. Monitor all free space metrics -- levels below 20% of the total space are a concern. You can use the device’s Throttle Settings to temporarily slow down request processing or to perform a warm restart, which recaptures file system space in situations of reduced free space.
The following system error codes are associated with these sensors and can be used to trigger alerts from the SNMP Trap Event Subscription configuration:
0x01a40005 Throttling connections due to low temporary file space 0x01a30006 Restart due to low temporary file space 0x01a50007 Temporary file space recovered above threshold
Figure 14. File system Usage Status
System up time
Web GUI | Main => Date and Time | XMI | DateTimeStatus/uptime |
---|---|---|---|
CLI | DateTimeStatus/ uptime | Status MIB | dpStatusDateTimeStatusuptime |
System up time indicates the elapsed time since the device was last restarted, including controlled firmware reloads as well as any unexpected device restarts. The DataPower device restarts itself automatically in conjunction with throttle configurations such as memory or file system constraints. While you can use SNMP notification for alerting, monitoring uptime via polling ensures that any notification delivery failure will not obscure these events.
Figure 15. Date and time status
Temperature sensors
Web GUI | System => Temperature Sensors | XMI | TemperatureSensors/{various name values} |
---|---|---|---|
CLI | Show Sensors-Temperature | Status MIB | dpStatusTemperatureSensorsTable |
Various temperature readings are available for CPUs, Memory, and System. Each has a warning and danger temperature associated with it and a status value of OK or FAIL. Monitoring the status ensures that the device is operating within the specified range. Investigate temperatures outside the ranges by checking fan speeds, airflow around device, and if necessary by contacting DataPower Support.
Figure 16. Temperature sensors status
Fan sensors
Web GUI | System => Fan Sensors | XMI | EnvironmentalFanSensors/{various fan-id values} |
---|---|---|---|
CLI | Show Sensors-Fan | Status MIB | dpStatusEnvironmentalFanSensorsTable |
Proper functioning of the device’s fans is vital for proper operation. There are two hot swappable fan trays. If the device contains the optional hard disk drives, it will have two additional fans. Each value is associated with a minimum range and a status indicator. Monitoring the status value will ensure proper functioning of the fans. The following system error codes are associated with these sensors and can be used to trigger alerts from the SNMP Trap Event Subscription configuration:
0x02240002 Internal cooling fan has slowed 0x02220003 Internal cooling fan has stopped
Figure 17. Fan sensors status
Other sensors
Web GUI | System => Other Sensors | XMI | EthernetInterfaceStatus/{various name values} |
---|---|---|---|
CLI | Show Sensors-Other | Status MIB | dpStatusOtherSensorsTable |
There are several additional sensors grouped into the Other classification, including battery, hard disk, and power supply indicators. The intrusion detection sensor is also in this list, and it is triggered when tampering of the physical device is detected. All of these variables include a status value. Monitoring the status value will ensure proper functioning of the fans and other components.
The following system error codes are associated with these sensors and can be used to trigger alerts from the SNMP Trap Event Subscription configuration:
0x02220001 Power supply failure 0x02220004 System battery missing 0x02220005 System battery failed
Replace the battery every two years -- critical level log records will begin to appear before that.
Figure 18. Other sensors status
Interface utilization statistics
Interface utilization monitors provide an analysis of the amount of data that is being received and transmitted by the DataPower device. Each device contains four gigabit interfaces. Monitoring this utilization can help you understand your transmission rates and how they change over time. Knowing that a service is increasing 10% per month can be used to anticipate additional support resources such as DataPower or backend devices.
Ethernet interfaces
Web GUI | System => Ethernet Interfaces | XMI | EthernetInterfaceStatus/{various name values} |
---|---|---|---|
CLI | Show Ethernet | Status MIB | dpStatusEthernetInterfaceStatusTable |
Figure 19. Ethernet interface status
Receive and transmit throughput
Web GUI | IP-Network => RX Throughput | XMI | ReceiveKbpsThroughput/{various time values} |
---|---|---|---|
CLI | Show receive-kbps | Status MIB | dpStatusReceiveKbpsThroughputTable |
Web GUI | IP-Network => TX Throughput | XMI | TransmitKbpsThroughput/{various time values} |
---|---|---|---|
CLI | Show transmit-kbps | Status MIB | dpStatusTransmitKbpsThroughputTable |
Receive and transmit throughput information can help you understand the amount of data being processed by the device. These statistics are provided for five time values ranging from 10 seconds up to the most recent 24 hour period. This data point is an important one to capture in order to understand the network load that is being applied to the device. It includes management traffic. If you have not segregated management traffic such as Web GUI, CLI, and XMI to a separate interface, then this data will be included with any application traffic.
Each DataPower configuration (or application if you prefer) will vary significantly in terms of the processing done on individual messages. In some instances, small messages may trigger significant processing, perhaps requesting additional data from off box endpoints, performing processor intensive cryptographic operations, or in some other way generating significant system load. In another instance, large messages may be simply routed and require less processing. While there is no hard and fast rule, over time, observations of increases in data will correspond to increases in utilization of DataPower resources. Knowing this information before bottlenecks occur and alleviating it with additional DataPower devices can help you avoid system interruptions.
Figure 20. Rx throughput status
HTTP Connections
Web GUI | Connection => HTTP Connection Statistics | XMI | EthernetInterfaceStatus/{various name values} |
---|---|---|---|
CLI | Show http connection | Status MIB | HTTPConnections |
HTTP connections are produced at the domain level. Statistics must be enabled for each domain that is to produce HTTP connection data. One peculiarity is that HTTP connection data is not accumulated for services in loopback mode. The status data is segmented by XML-Manager and contains information about HTTP connections, such as request and reuse. This data can help you understand the level of connections and can be used to judge utilization growth over time.
Figure 21. HTTP connections status
Transaction rates and elapsed times for individual services are accumulated at the domain and within domain service level. Transaction rate and time are not provided unless statistics are enabled for each domain. This data can help you understand the number of transactions processed and the average response time of those transactions for a particular service over a number of time intervals.
Transaction rate and time
Web GUI | Connection => Transaction Rate | XMI | HTTPTransactions /{various time values} |
---|---|---|---|
CLI | Show http | Status MIB | dpStatusHTTPTransactionsTable |
Web GUI | Connection => Traction Time | XMI | HTTPMeanTransactionTime/{various time values} |
---|---|---|---|
CLI | Show http | Status MIB | dpStatusHTTPMeanTransactionTimeTable |
Figure 22. Transaction rate status
Other network status providers
DataPower supports many protocols beyond the HTTP examples discussed so far, including support for FTP, IMS, MQ, NFS, NTP, SQL, Tibco, and WebSphere JMS. Each of these protocols is represented by status providers, and as in the case of the previous examples, each is supported by the Web GUI, CLI, XMI, and SNMP. Individual configurations may not use any of these additional protocols, and few will use all of them. However, in a configuration that is using one or more of these protocols, monitoring the related status provider is prudent.
Best practices
Successful monitoring of the DataPower appliance will utilize active and proactive inquiry of status information. Configuration of SNMP tools will require listening for traps sent by the device and periodic polling of the device for MIB status data. These actions require a combination of DataPower SNMP Trap Event Subscription configuration and configuration of the SNMP monitoring agent in polling and potentially based on returned status values.
In addition to device monitoring, application monitoring is also a useful practice. In this instance sample messages may be sent from robotic clients through the DataPower service to ensure that all network links (including load balancers) are operational. In some instances, this effort is extended to include sending messages through to backend service provider applications to ensure that both frontside and backside links are in service. Both DataPower and backside resources must be configured to respond appropriately to these test messages.
The DataPower SMMP trap subscription capability is a useful method of leveraging SNMP notification of events within DataPower. Here is a suggested list of error codes to subscribe to. In the event that the error is produced, the SNMP agent on DataPower will send an Alert/Trap to the SNMP monitor.
Suggested error code subscription
0x02220001 | environmental | critical | Power supply failure. |
---|---|---|---|
0x02240002 | environmental | warning | Internal cooling fan has slowed |
0x02220003 | environmental | critical | Internal cooling fan has stopped. |
0x02220004 | environmental | critical | System battery missing. |
0x02220005 | environmental | critical | System battery failed. |
0x00330002 | mgmt | error | Memory full |
0x01a40001 | system | warning | Throttling connections due to low memory |
0x01a30002 | system | error | Restart due to low memory |
0x01a30003 | system | error | Restart due to resource shortage timeout |
0x01a50004 | system | notice | Memory usage recovered above threshold |
0x01a50005 | system | warning | Throttling connections due to low temporary file space |
0x01a30006 | system | error | Restart due to low temporary file space |
0x01a50007 | system | notice | Temporary file space recovered above threshold |
0x01a40008 | system | warning | Throttling connections due to low number of free ports |
0x01a30009 | system | error | Restart due to port shortage |
0x01a3000b | system | error | Restart due to prefix qcode shortage |
0x01a3000c | system | error | Restart due to namespace qcode shortage |
0x01a3000d | system | error | Restart due to local qcode shortage |
0x01a2000e | system | critical | Installed battery is nearing end of life |
0x01a30011 | system | error | Invalid virtual file system |
0x01a30012 | system | error | File not found |
0x01a30013 | system | error | Buffer too small |
0x01a30014 | system | error | I/O error |
0x01a30015 | system | error | Out of memory |
0x01a10016 | system | alert | Number of free qcodes is very low |
0x01a30017 | system | error | Restart due to low file descriptor |
0x01a40018 | system | warning | Throttling due to low number of available file descriptors |
MIB status values to monitor
It is recommended that SNMP monitors be configured to fetch and report on the following conditions:
dpStatusSystemUsageLoad | >80% for interval of 10 minutes or more |
---|---|
dpStatusCPUUsagetenMinutes | >90% (10 minute interval) |
dpStatusFilesystemStatusFreeTemporary | <20%, maybe unnecessary due to error code subscription |
dpStatusFilesystemStatusFreeUnencrypted | <20%, maybe unnecessary due to error code subscription |
dpStatusFilesystemStatusFreeEncrypted | <20%, maybe unnecessary due to error code subscription |
dpStatusMemoryStatusFreeMemory | <20%, maybe unnecessary due to error code subscription |
dpStatusTemperatureSensorsReadingStatus | Various temperature sensor readings (table) |
dpStatusEthernetInterfaceStatusStatus | For configured interfaces |
MIB status values to monitor for interface utilization
In addition to polling and inquiring of data, it is important to ascertain the normal traffic patterns of applications over time. The best way to do this is to capture and monitor the amount of network traffic that the device is processing. The transmit and receive values below will help you predict when devices will become saturated with traffic. Knowing this ahead of time can help you avoid service disruptions.
dpStatusNetworkTransmitDataThroughputTenMinutesBits | Capture values over extended time |
---|---|
dpStatusNetworkReceiveDataThroughputTenMinutesBits | Capture values over extended time |
Conclusion
Best practice monitoring of DataPower is a three-pronged activity:
- Continuously verify the status of the DataPower environment through polling status data and subscribing to SNMP traps.
- Monitor device utilization and capacity through analysis of system usage data and interpretation of Ethernet activity.
- Perform complete application path verification by sending test message through the DataPower service configuration and perhaps on through to backend resources.
Performing these three actions will ensure that services are available and the DataPower appliance is performing within standard ranges of operation.
Acknowledgements
The authors wish to thank all those who participated in the development of this developerWorks article. Of special note are the contributions of Shiu-Fun Poon, Matthias Seibler, and Gaurang Shah of WebSphere DataPower Engineering, and Bill Hines of WebSphere DataPower Technical Sales.
Managing WebSphere DataPower SOA Appliances via the WebSphere Application Server V7 Administrative Console:
Introduction
This article shows you how to manage multiple IBM® WebSphere® DataPower® appliances using the WebSphere Application Server V7 Administrative Console (Admin Console). The article describes the following configuration and other tasks:
- Verifying Appliance Manager settings
- Creating a new appliance entry for a master appliance
- Adding another appliance
- Uploading and provisioning new firmware
- Creating a managed set
- Assigning managed domains in a managed set
- Synchronizing firmware and configurations
Terminology
- Managed set
- A collection of appliances that share the same hardware type, model type, and feature license set. A managed set synchronizes sharable appliance settings, managed domains, and firmware across multiple appliances.
- Sharable appliance settings
- The global attributes for an appliance that can be shared with other appliances. For example, NTP configuration and SNMP configuration are sharable appliance settings, but appliance-specific settings, such as IP address and role-based management attributes are not sharable appliance settings.
- Master appliance
- The appliance in the managed set that is used to synchronize sharable appliance settings and managed domains for all appliances within the managed set. Each managed set must have at least one master appliance. Each managed set might also have subordinate appliances.
- Managed domain
- A domain on the master appliance that has been added to a managed set in the DataPower Appliance Manager, which uses the managed domain to synchronize configuration changes to the subordinate appliances that are part of the managed set.
- Task
- A long-running request that you have asked the DataPower appliance manager to process.
Before you get started
- Verify that each DataPower Appliance you want to add has a firmware level of 3.6.0.4 or later. The DataPower Appliance Manager can manage appliances at this firmware level. Do not use the DataPower firmware levels of 3.6.0.28, 3.6.0.29, or 3.6.0.30 for a managed set, as these firmware levels may cause the DataPower Appliance Manager to unnecessarily create new shareable appliance settings versions, or domain versions, and then synchronize these new versions across the managed set.
- Verify the XML Management Interface (specifically AMP) endpoint (default port 5550), is enabled on each appliance.
- Ensure that firmware levels that will be provisioned are compatible with the devices (firmware version, intended model type, appliance type, and licensed features provided by libraries in the firmware). The appliance manager allows the firmware types to be deployed only to matching appliances.
- While installing WebSphere Application Server, ensure that you select the Deployment Manager profile in the selection for profile creation. Otherwise, you will not see the DataPower Administration navigation links in the Admin Console.
Configuration tasks
This section provides the detailed configuration steps for managing DataPower Appliances.
1. Verifying Appliance Manager settings
- Log into the Admin Console and navigate to Servers => DataPower => Appliances.
- Verify whether the Appliance Manager is up by navigating to Servers => DataPower => Appliance Manager settings:
Figure 1. Appliance Manager settings
2. Creating new appliance entry for a master appliance
- Navigate to Servers => DataPower => Appliances.
- Click New and fill in the following information:
- Name (for example, xi50e)
- Host name (for example, xi50e.nivt.raleigh.ibm.com)
- Administrative port (for example, 5550)
- User ID (appliance’s admin user ID)
- Password (appliance’s admin password)
Figure 2. Create a new appliance
- Click OK.
- Click Tasks. The resulting screen indicates the status of the long-running task of adding a new appliance:
Figure 3. Tasks view
- Click on Servers => DataPower => Appliances. The new appliance should be listed. Since the appliance will take some time to provision, the status and synchronization states will not be available until the provisioning is complete.
- At this point, your appliance object has been created successfully, unless you see errors in the Tasks view.
3. Adding another appliance
- Navigate to Servers => DataPower => Appliance manager settings.
- Click New and fill in the following information
- Name (for example, dp14)
- Host name (for example, dp14.nivt.raleigh.ibm.com)
- Administrative port (for example, 5550)
- User ID (appliance’s admin user id)
- Password (appliance’s admin password)
- Click OK.
- Click on Tasks. The resulting screen indicates the status of the long-running task of adding a new appliance.
- Click on Servers => DataPower => Appliances. The new appliance should be listed. Since the appliance will take some time to provision, the status and synchronization states will not be available until the provisioning is complete.
Figure 4. List of created Appliances
- At this point, your second appliance object should be created successfully, unless you see errors in the Tasks view.
4. Uploading and provisioning new firmware
- Navigate to Servers => DataPower => Firmware.
- Click New and browse to the firmware (.scrypt2) image that needs to be uploaded. Ensure that you have uploaded at the very least, a version of the firmware that is installed on the planned master appliance. Otherwise, synchronization during the creation of managed sets will fail.
Figure 5. Add new firmware
- Click OK:
Figure 6. Firmware upload confirmation
- Click on Tasks to view the status of the long-running process.
- If the task completed successfully, navigate to Servers => DataPower => Firmware.
- Click on 3.7.2.1, for example, to view additional details of the firmware:
Figure 7. Additional Firmware details
- At this point, your firmware has been uploaded successfully.
5. Creating a managed set
- Navigate to Servers => DataPower => Managed sets.
- Click New. Enter the name of the managed set (for example, MyManagedSet1), and include the master appliance from the list (for example, dp1):
Figure 8. Create a new managed set
- Click Next. At this step, you might have 0 – n additional appliances visible that can be added to the managed set.
- Add the previously configured appliance (for example xi50e) to the managed set list:
Figure 9. Add appliance to a managed set
- Click Next and you will see a summary of actions.
- Click Finish. Until the task is complete, the status of the appliance will be appear as unavailable:
Figure 10. Create a managed set confirmation
- Click on Tasks to view the status of the long-running operation:
Figure 11. Tasks associated with creating a managed set
- Once the tasks have successfully completed, you can view the status of the managed set. The status change is indicated by the updated icon.
6. Assigning managed domains in a managed set
- Navigate to Servers => DataPower => Managed sets. Click on MyManagedSet1. Notice the list of managed and un-managed domains -- by default all domains are un-managed. Here you can select the domains that you would like to be propagated and synchronized with other (subordinate) appliances within this managed set.
- For example, aob_odbc can be selected as a managed domain:
Figure 12. List of domains in a managed set
- After clicking Manage, a long-running task is triggered.
- Click Tasks to view the status. It will take some time before the task completes successfully.
- Now navigate to Servers => DataPower => Managed sets. Click on MyManagedSet1. Expand managed and un-managed domains on the resulting screen:
Figure 13. List of managed domains in a managed set
- The selected domain aob_odbc is now available for provisioning under the managed domains list. If you click on aob_odbc, you can view additional details.
- You have now successfully created a managed domain on the master appliance within a managed set.
7. Synchronizing firmware and configurations
- Navigate to Managed sets => MyManagedSet1 and click Change firmware:
Figure 14. Provision a different firmware version
- You can select the first radio button and select a different firmware version from the drop-down.
- Once you click OK, the synchronization process will begin and all appliances with MyManagedSet1 will be updated with the selected version.
- You have now successfully provisioned a new or different firmware version across all the appliances in MyManagedSet1.
Conclusion
This article introduced you to DataPower appliance management terminology and showed you how to configure and administer DataPower Appliances, using the WebSphere Application Server V7 Admin Console.
Acknowledgements
Special thanks to John S. Graham for his guidance and support in ensuring the accuracy of the content in this article
No comments:
Post a Comment