Wednesday, August 3, 2011

Good Info - OPMN and Process Monitoring


Hi Everyone,

Today we will discuss about OPMN, I read some good information about OPMN (Oracle Process Manager Notification), so am sharing the some of the important points with explanation..

OPMN and Process Monitoring:

OPMN (Oracle Process monitoring and Notification services) are a set of processes that manage mid-tier Application Server components like Oracle HTTP Server (Apache) and OC4J containers.
OPMN consists of the Process Manager and the Notification Server.

(In Oracle 10g, OPMN manages all Application Server components except Oracle AS Metadata Repository and Application Server Control Console; it can be configured to manage Oracle AS Port Tunnel and custom processes because of its extensible design.)

Oracle Notification Server (ONS) is the transport mechanism for failure, recovery, startup and other related notifications between components in Oracle Application Server. It operates according to a publish-subscribe model: an Oracle AS component receives a notification of a certain type as per its subscription to ONS. When such a notification is published, ONS sends it to the appropriate subscribers.

Oracle Process Manager (PM) is the centralized process management mechanism in Oracle Application Server and is used to manage Oracle AS processes. It starts, stops, restarts and detects death of these processes. The Oracle AS processes that PM is configured to manage are specified in the opmn.xml file. The PM waits for a user command to start specific or all processes. When a specific or all processes are to be stopped, the PM receives a request as specified by the request parameters. OPMN is monitored by a shadow process that restarts upon request or after a catastrophic failure.

The PM uses ONS to:

  • detect that a process has completed initialization and is ready to receive requests
  • determine what ports are in use
  • obtain component specific runtime information

Four parameters determine the behavior of the Oracle Process Manager and Notification services process in managing the iAS middle tier comprising of OC4J instances and the Apache HTTP server.

They are as follows :

a) restart-on-death
b) ping timeout
c) ping interval
d) reverse-ping timeout

The settings for these parameters need to be governed by the heap sizing of the OC4J container JVMs, the latencies involved with garbage collection algorithms and the response times of the HTTP server.

The way in which the parameters affect the functioning of OPMN is as follows:

By default, as in the schema definition file, opmn.xsd, the restart-on-death parameter takes a default BOOLEAN value of “TRUE” which means that OPMN’s mandate is to ping the managed processes ( OC4J instances and Apache ) at certain pre-defined intervals, monitor them for a response within a certain specified timeout period and retry the operation 3 times before concluding that the process is dead due to its non-responsive state and killing and restarting it to guarantee that the managed processes are up and able to service client requests on an ongoing basis – this is OPMN’s mandate in ensuring failover and availability of the processes it manages.
Oracle HTTP Server

There could be a variety of reasons for the non-responsiveness of the Apache server :

  • due to a high load of concurrent requests competing for the web server’s attention
  • because of timeouts between the various modules of Apache and the servers servicing client requests
  • while it is in the middle of handling chunk data received from its modules
  • when it is involved with dealing with persistent connections that have fully used up the number of servers
  • it can spawn, with many connections remaining in the CLOSE_WAIT state
  • due to thrashing when NFS hiccups cause files that need to be served to be unavailable to it
  • Because of synchronization issues with the various mutexes that it needs to support for the proper functioning of its modules.

In any such case, due to Apache’s unresponsiveness over the ping period, OPMN will kill and restart

OHS as needed.
OC4J containers

In the case of the OC4J instances (Oracle containers for J2EE), OPMN follows similar logic to determine whether or not to kill and restart them. While the container is processing servlet or EJB logic within the JVM within which it runs, new objects get created in its heap memory area all the while.

When the garbage collection thread starts to run, it looks for objects that it can release to the heap memory pool based upon several algorithms that depend upon the kind of references the objects have to themselves. Since the collection is “generational”, i.e., objects having references are promoted to an older generation and presumed to have a longer lifetime, objects with weak references are candidates for “cleaning” up, and their occupied memory gets released to the global heap of the JVM.

In this way, memory is reclaimed back into the heap memory pool and made available for use in the creation of newer objects. The forays made by the garbage collector to reclaim memory in the heap are governed by several algorithms and every such collection takes a finite amount of time during which no other work of application processing is possible. And when the collection is over the entire heap, the full GC consists of a mark/sweep/compact cycles that “mark” the memory to be reclaimed, “sweep” the memory into the corresponding generations and “compact” the holes created when the memory is reclaimed, in order as to create contiguous memory for future object creation.

These strategies consume more time, as to be expected, and can result in a delay in the container responding to an OPMN ping cycle. During such full GC scans, OPMN can and will kill and restart the container, causing it to lose the state of the application or request it was processing at that time.

Since full GC scans (referred to as stop-the-world scans) can happen at any time during the lifetime of a request or an application, there is always the danger of OPMN killing a perfectly functioning container on the assumption that it was “hung” since it was “unresponsive”.
Heap memory settings for OC4J containers

These settings are made in the opmn.xml file in the <java-option> sections for each OC4J instance:

The settings for heap must include the -Xms (for start heap memory) and -Xmx (for maximum heap memory). Always use the “-server” option as the first option in the <java-option> sections, for Server HotSpot JVMs, best suited to long running JVMs in terms of performance. The recommended settings for the -Xmx value are 512MB as typical applications need that much of memory to avoid java.lang.OutOfMemoryError exceptions as seen from experience. Start with an -Xms value of 128MB to prevent side effects of “Too many files open” errors when this setting is higher, as garbage collection kicks in only later for higher -Xms values, resulting in open file handles not getting released by the GC.

<java-option>-server -Xms128M -Xmx512M

Thread pool sizing

In the server.xml file, set the thread-pool sizes as follows for optimum operation of the thread pool:

<global-thread-pool min=”40″ max=”40″ queue=”80″ keepAlive=”-1″/>

This sets the min and max thread-pool sizes to the same value and the keepAlive parameter to “-1″ – recommended for production environments, this will ensure that idle threads are never destroyed to allow for thread reuse without the overhead for new thread creation. The min, max and queue values can be left at the default as specified here.
Redundancy and load balancing

More than one OC4J instance can be started to accommodate the higher volume of concurrent requests that the container may need to handle. This is set through the “numProcs” parameter in the opmn.xml file and this parameter takes the value of 1 by default, to start a single OC4J instance. For multiple instances, the “numProcs” parameter can be adjusted to different values (2 for two instances, and so on ) and PM needs to be restarted with this value for the modules under its control. Very often, the applications that are being run may be process or memory intensive and may require one to adjust the value of the “numProcs” parameter to effect load-balancing via multiple instances.


I hope it will help you all while understanding OPMN. And if you have more information and any changes required in above post kindly update me..

Enjoyy working with Middleware technology…

Regards,
Ajinkya Vichare
ajinkya-vichare.blogspot.com

No comments:

Post a Comment