A PowerShell Monitor for ‘Leaky Processes’ (Part IV Implementing the Monitor)

Now that we have created the Custom Composite monitor using it is just like using any other type of monitor.  So, generically when implementing a monitor you have two choices:

  1.   Put it on a pre-existing class (default disabled, enable for the servers you want to monitor)
  2.   Attach it to a new class.  Use discovery to control where this is monitored.

As this module is PowerShell based we cannot run it on all of our servers as not every server has PowerShell installed.  Rather what I plan to do is create a ‘LocalApplication’ class for the ComputerRole.  I will call this class PowerShell and use it is the target for all my powershell based monitors and rules.  I will use the registry key for PowerShell as the determining factor for installed or not and discover its version as a property (this may allow me more functionality in the future if I ever utilize functionality exposed in 2.0 that is not exposed in 1.0).  So, lets go ahead and do that!

image

image

image

image

So now we have the class, we need to setup the discovery to tell when we should monitor this class.

image

image

image

Choose the Filtered Registry type for Configuration

image

Daily Discoveries are just fine for our environment

image

Use Regedit to find Key and Values that we are interested in and say “Copy Key Name”

image

We will the use the Key as our exists variable (if this key is present on a server we say that PowerShell is installed).  Paste the key name from your buffer into the Path area.  Make sure to remove the HKEY_LOCAL_MACHINE part as the SCOM module defaults to HKEY_LOCAL_MACHINE.

image

Note:  We chose Key for exists because we are just checking if the “Key” of the registry is there.  The only two valid Attribute Types for “Key” are Bool or Check if Exists (which is the same thing as Bool).  This will return True if the registry key exists on a server and False if it does not.  The Name under properties is the name of the variable in SCOM not the name of the key.

image

The next thing I want is to discover what version of PowerShell this is.  Again, look at regedit and grab the key location, remove the HKEY_LOCAL_MACHINE prefix and add \NameOfValue

image

image

image

The next step is to setup the Expression for the discovery.  An expression for a discovery is a logical expression that must evaluate to True for the server to be ‘discovered.’  You base this logical expression on the information that you just gathered.  In our example we will be saying if Exists is True then this server has PowerShell.  Note you can access the variables you defined in the previous page with the …

 

image

image

The next step in the discovery is the Discovery Mapper.  This is where we tell the class what the Properties of the class we are discovering should be set to (usually we are populating this with data we just discovered!).  Note, all Values under “Key Properties” must be filled in.  You can access properties of the parent class (target class for the discovery).

image

The XPATH Query convention for accessing the variables we defined previously is $Data/Properties/VariableName$

image

So now we just say OK and the discovery is setup

image

image

So now we have a class defined, a discovery that will only discover the class when PowerShell is installed, and a Composite Monitor powered by a PowerShell script to monitor for processes that are leaking handles.  Lets put it all together and wrap this up!

Go to the Health Model –> Monitors Section and create a new custom performance monitor (could be any type but I would classify this as a performance monitor) under our new PowerShell Class.

image

image

The Next step is to setup the configuration of this Custom Monitor.  Browse for a type and select the CustomMonitor Type we created in Part III.

image

Now setup the defaults for your environment

image

Setup your health

image

Setup your Alerts.  In an alert the XPATH query to access a property bag value is

$Data/Context/Property[@Name=’VariableName’]$

Since we setup the property bag value ‘message’ in the script for displaying which processes have high handle counts I will include that in my alert description.

image

Set the Category to PerformanceHealth under options as this is a performance health monitor

image 

You could also add some Product Knowledge to this alert to let people know what it is all about / where to find more information about handle count leaks.  That is setup under the Product Knowledge Tab

image 

This will open up Word and you can add whatever knowledge / links you would like to it.  After that you are done!  You can now import the management pack into your testing environment and begin playing with it!  If you would like to download the .XML or .MP files that I created while making this demo it is available here

Advertisements
Posted in Management Pack Authoring | 2 Comments

A PowerShell Monitor for ‘Leaky Processes’ (Part III Monitor Type Design)

So now we are to the point where we can make a monitor type for our shiny new data source.  We can then use this in an actual monitor and start monitoring for this condition on servers!

Step 1:  Create New Composite Monitor Type

image

image

image

image 

Add in Member Modules.  What we want here is the DataSource we created in Step II and then two condition detections, one for our NoLeakingProcesses state and one for our LeakingProcesses state.

image

We want to setup our configuration of the ProcessHandleLeak DataSource to push off setting the specific values for our variables until we actually implement the monitor against a class.  So, to do this, we will again use configuration variables.  This time we will use the ‘promote’ option in the Authoring Console which will automatically create the variables in the configuration schema pane (we will still have to define the value type – int, string, etc – though).

image

image

Now we add in our condition detections.

image

image

image

image

The next thing we have to do it determine the “Regular Composition”.  This simply means for the State NoLeakingProcesses what does the flow through the monitor look like and what does it look like for the State LeakingProcesses?

image

image

image

The next step is to update the Configuration Schema.  Note this time we do not have to create the variables (because we used promote) but we still have to set their types.

image

The Final Step in creating the Composite Monitor is setting up the overrides.

image
You now have now created your own composite monitor powered by a custom, PowerShell driven DataSource! Stay Tuned for how to use this new Composite Monitor to actually monitor for Processes that are Leaking Handles as well as the .XML Management Pack!

Posted in Management Pack Authoring | 2 Comments

Basic Performance Metric Collection for Group

Step 1:  Open SCOM Operator’s Console

Step 2:  Go to the Authoring Tab

Step 3:  Create New Rule

image

image

image

image

image

image

image

image

image

image

image

Posted in Generic SCOM Information, Management Pack Authoring | 5 Comments

Basic Creation and Discovery of a Class for Monitoring a Service

Simple definition of a Class:  Something you want to know information about and monitor things about.

  • Examples of Information you may want to know
    • What servers is this ‘class’ available on? (Discovery)
    • What version of the ‘class’
    • Where are the files that make up this ‘class’
      To Create a class:
  • Step 1:  Is this class a part of a larger application?  If so add it to that larger applications management pack.  If not, create a new management pack.

To Create a new management pack use the authoring console http://www.microsoft.com/downloads/details.aspx?FamilyId=6C8911C3-C495-4A03-96DF-9731C37AA6D7&displaylang=en

image

File –> New Management Pack

image

Now create the base class (abstract).  We will target all other classes for this Application at this class.  This will allow us more flexibility later

image 

image

Make it Abstract

image

Create new non-abstract class that inherits the base class.  This is the class we will use for actual monitor / discovery

image

image

image

image

image

So now we actually have a class.  The next step is to ‘discover’ where it is installed.  Easiest way to do this is by searching the registry on a group of servers.  Look for registry key under software.  You ‘can’ fall back onto looking at the ‘services’ area of the registry but for most applications you have multiple services you may want to monitor so its better to look for the larger applications registry area to define what role this server is actually playing.  Once you have found an acceptable registry key continue on.

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

Finish creating the monitor (Click OK)  Open Monitor to set recovery up

image

image

image

Save MP in non-sealed form.  Seal MP using our Key.  Export Sealed MP to production environment.

image

image

Open Sealed MP and export to management group

image

image

image

Use discovered inventory targeted at the new class to see instances of the new class (these are the servers you are now monitoring this on!)

image

image

image

Posted in Generic SCOM Information, Management Pack Authoring | Leave a comment

A Powershell Monitor for ‘Leaky Processes’ (Part II – DataSource Design)

So now we have a script that we can put into either an existing management pack or a new management pack and have it drive a new type of monitor.  For this example I will walk through putting this into a standalone management pack.

Step 1:  Open Authoring Console –> File –> New Management Pack

Step 2:  Give the Management Pack a unique and meaningful name

image

image

Step 3:  Create the New Composite Data Source.  Type Library –> Module Types –> Data Sources

image

Step 4:  Give our new DataSource a unique meaningful name.

image

Step 5:  Give the new DataSource a Display Name.

image

Step 6:  Setup the Member Modules

Member Modules are the subcomponents of the DataSource.  In our case we will need something to schedule the execution of our script, and something to execute and return values from our script.

System.Scheduler is the generic scheduler and will allow you execute any DataSource on a Schedule

Microsoft.Windows.PowerShellPropertyBagProbe is Built-in for running PowerShell scripts that return non-discovery data

image

 image

image

image

image

image

image

image

image

image

Step 7:  Setup the Configuration Schema of this DataSource

The Configuration Schema is simply a list of things that this DataSource needs defined for it inorder to run correctly.  This includes all of the $Config/Parameter$ variables we declared previously (it needs to get the data from somewhere and this say that it can expect to have it passed to it from whatever ends up calling it)

image

image

image

image

image

image

image

Next Steps:  Create the new Monitor Type for this DataSource (where we define what it means to be healthy or unhealthy) and then use that monitor type in an actual monitor!  Stay Tuned!

Posted in Management Pack Authoring | 2 Comments

A Powershell Monitor for ‘Leaky Processes’ (Part I – Script Design)

So a request came in this week for me to watch all the processes on a box and alert when the handle count of any of those processes exceeded 7,500.  So my first question was, what is the customer really trying to monitor?  Leaking Handles on the process.  The 7,500 handle count was just an arbitrary number that the customer had come up with, what they really wanted was the ability to say process X  has a high number of handles on my server and is steadily increasing, this may lead to my server or application crashing.

The next logical question is how do I do this in SCOM?  Well there are a few different ways you could go about this.  The most straight forward would be to create a discovery for all of the processes that run on a box.  At that point we could create a simple performance monitor for each instance of that class that says when this process is above threshold for handles X go to a warning state, when it exceeds Y go to a critical state and open a ticket.  There are a number of problems with doing this.

    1. A monitor that fires off when a process crosses a threshold is not really detecting a leak.

      • What I am trying to detect is when a process is continually grabbing more handles.

    2. Discovering all processes running on all servers is generically a bad thing.

      • This is because the instances of this class on each server would be constantly changing. This will lead to configuration churn, See Kevin Holman’s blog post for more information on Configuration Churn.

So what now?  Create a script based monitor.  Script based monitors are the most expensive (from the perspective of the agent) type of monitor you can create so you need to ensure that your script is efficient.  A great post on designing MPs to be efficient and scalable is from Kristopher Bash and is available at here.  Kristopher points out that the most efficient code execution can be achieved through .NET managed code.  The problem with this is now you have to distribute your compiled code to every server that will be using the monitor, not a very supportable thing.  The next best thing for SCOM R2 is a PowerShell module.  With the addition of a native PowerShell module in R2 the overhead associated with spawning separate processes to execute our scripts is removed (the script now runs under the OpsMgr PowerShell Host).  This means that if you have the choice of PowerShell or WSH, use PowerShell (this means that the agent that the code will ultimately be executing on must have PowerShell). The other key programming conecpt you should keep in mind when designing scripts that will be running is scalability. You want to design the script so that it has O(n) complexity, which is to say each time you loop through all the instances (in this example all the processes) you do not want loop through that instance space again.

So how do you ensure that this workflow only runs on servers with PowerShell?  In our environment we have created a class for this.  We use this class to discover additional attributes that we then use to scope things to.  In order to maximize its flexibility we have made it an extension to windows computer.  This allows us to make groups of this class based on our newly discovered attributes.  The PowerShell based monitors are then disabled by default and enabled with an override for all objects of the PowerShell enabled group.  Kevin Holman has done a nice write-up on the creation of the extension class and creation of the group here.

The next step is to actually create the PowerShell script that will be run and test it to ensure it does what we want.  The script I created will monitor all processes above a certain threshold.  It will track the Handle Counts of these processes and alert if they are increasing (leaking) by an average of more than LeakAmount over time NumSamples runs.  The script also takes a parameter called ignoredProcessList which takes a comma seperated list of processes to ignore on a box.

  1. HandleThreshold

    • This is the threshold that a process must exceed in order to be investigated and tracked.

    • Set to 0 to track all processes for a leak

    • We set this to 7500 handles to limit the monitors scope each time it runs

  2. NumSamples

    • This is the number of times the leaking condition (above HandleThreshold and averaging at least LeakAmount) must be true before a state change is triggered

    • Set to 1 to alert whenever a process exceeds the HandleThreshold (ignoring LeakAmount and changing the monitor to a simple high handle count detection monitor

    • We set this to 4 samples and fire the monitor off every 15 minutes

  3. LeakAmount

    • This is the average number of handles that the process must be increasing by to change state

    • Set to 0 to always change state if a process is over the HandleThreshold for NumSamples

    • We set this to 25 which detects processes that are leaking 100 handles or more over an hour period (monitor is run every 15 minutes and NumSamples set to 4

  4. ignoredProcessList

    • This is the comma seperated list of processes to ignore

    • Set this to an empty string to monitor all processes

    • We set this to lsass,system,svchost in our environment as our default setting. This can be overwriten for different groups of servers or individual servers as needed

Param ([int]$HandleThreshold, [int]$NumSamples, [int]$LeakAmount, [String]$ignoredProcessList)

###############################################################################
#Setup Variables
###############################################################################
$TempPath="c:\HealthServiceTemp\HandleCountMonitoring"
$ProcessList = Get-Process | ? {$_.Handles -gt $HandleThreshold}
$ProcessHashTable = @{}
$HighHandleCountProcesses = "ProcessName-PID-HandleCount"
$retValue = "Good"
$IgnoredProcessArray = New-Object System.Collections.ArrayList

###############################################################################
#Parse Input String for Ignored Processes and load it into an ArrayList
###############################################################################
foreach ($SubString in $ignoredProcessList.Split(‘,’))
{
    $IgnoredProcessArray.Add($SubString)
}

###############################################################################
#Parse output of Get-Processes and load results into a HashTable
###############################################################################
if($processList)
{
    foreach ($process in $processList)
    {
        If(-not ($IgnoredProcessArray.Contains($process.ProcessName)))
        {
            [String] $FileName = $process.ProcessName + "-" + $process.Id
            $ProcessHashTable.Add($FileName, $process.Handles)
        }
    }
}

###############################################################################
#Test for existance of Folder Structure, create if it does not exist
###############################################################################
if(-not (test-path $TempPath))
{
    New-Item $TempPath -type Directory
}

###############################################################################
#Change Directory to Folder Structure
###############################################################################
Set-Location $TempPath

###############################################################################
#Evaluate all files in folder structure.  Remove files that are not associated
#With a currently ‘hot’ (Above $HandleThreshold) ProcessName-PID pair
###############################################################################
$Files = Get-ChildItem $TempPath | ? {$_.Attributes -ne "Directory"}
if($Files)
{
    Foreach ($File in $Files)
    {
        if(-not($ProcessHashTable.ContainsKey($File.Name)))
        {
            Remove-Item $File
        }
    }
}

###############################################################################
#If there are ‘hot’ processes either create a file for them or add to the
#existing file.  Each line in the file will be the handle count at that check
#Time.  Each file will have a size limit of NumSamples * 100
###############################################################################
if($ProcessHashTable.count -gt 0)
{
    Foreach ($ProcessKey in $ProcessHashTable.Keys)
    {
        $FileName = $TempPath + "\" + $ProcessKey
        if(-not (Test-Path $FileName))
        {
            New-Item $FileName -type file
            Add-Content $filename $ProcessHashTable[$ProcessKey]
        }
        else
        {
            $count = (Get-Content $FileName | Measure-Object).count
            $HandleCount = $ProcessHashTable[$ProcessKey]
            if($count -gt $NumSamples * 100)
            {
                $Content = Get-Content $File | Select -Skip 1
                $Content += "`n$HandleCount"
                Set-Content $FileName $Content
            }
            else
            {
                Add-Content $FileName $HandleCount
            }
        }
    }
}

###############################################################################
#Check Folder Structure for files with more than $NumSamples lines in them.
#Check to see if the difference between the last two entries is greater than
#$LeakAmount. Add any matches to the $HighHandleCountProccesses string and set
#the return value to "Bad".  If the Process is not leaking by $LeakAmount remove
#File
###############################################################################
$Files = Get-ChildItem $TempPath | ? {$_.Attributes -ne "Directory"}
if($Files)
{
    Foreach ($File in $Files)
    {
        $Content = Get-Content $File
        if($Content.Count -gt $NumSamples)
        {
            $Difference = 0
            if($NumSamples -gt 1)
            {
                if($Content.Count -gt 0)
                {
                    $Values = $Content | Select-Object -last $NumSamples
                    $Difference = ([int]$Values[$NumSamples-1] – [int]$Values[0]) / ([int]$NumSamples-1)
                }
            }
            if($Difference -ge $LeakAmount)
            {
                $retValue = "Bad"
                $HandleCount = $ProcessHashTable[$File.Name]
                $HighHandleCountProcesses += "`n$File-$HandleCount"
            }
            else
            {
                Remove-Item $File
            }
        }
    }
}

###############################################################################
#Return Values
###############################################################################
$api = New-Object -comObject ‘Mom.ScriptAPI’
$bag = $api.CreatePropertyBag()
$bag.AddValue("retValue",$retValue)
$bag.AddValue("message",$HighHandleCountProcesses)   
$bag

###############################################################################
#Destroy objects
###############################################################################
remove-variable bag
remove-variable api
remove-variable retValue
remove-variable HighHandleCountProcesses
remove-variable HandleCount
remove-variable count
remove-variable NumSamples
remove-variable File
remove-variable Files
remove-variable TempPath
remove-variable ProcessHashTable
remove-variable filename
remove-variable ProcessKey
remove-variable HandleThreshold
remove-variable ProcessList
remove-variable process
remove-variable LeakAmount
remove-variable ignoredProcessList
remove-variable ignoredProcessArray
remove-Variable Content
remove-Variable Values
remove-Variable Difference

yoga-cats-11

The next step is to wrap this into a custom data source and build a composite monitor from that data source.  After that we use the composite monitor in a management pack that extends windows server operating system, default disabled, then create an override management pack to enable it for our PowerShell enabled servers.  Stay tuned!

Posted in Management Pack Authoring, Scripting | 3 Comments

SCOM Security Event Monitoring

Awhile a go a request came into me to begin alerting on the addition and removal of members to a number of active directory groups.  So after a bit of research on Security Events I found that this would be relatively simple to accomplish as all of our DCs have SCOM agents installed on them, or so I thought!

funny-cat

 

Security Events

I did my research on which security events I should be caring about on Randy Franklin Smith’s website (http://www.ultimatewindowssecurity.com) and highly suggest looking there for descriptions of any security event.  It turns out that for Windows Server 2003 DCs there were 6 events we wanted to monitor and in Windows Server 2008 R2 DCs there were 6 other events we wanted to monitor.

2003 DCs

2008 DCs

As these events are all Microsoft events I assumed they all were paramatized, and after a quick check with Log Parser I found out that they indeed were.  Parameter 3 holds the name of the account that was modified.  At this point I created a number of straight forward event based rules to alert us on group changes for important security groups [Builtin Admins, Domain Admins, Schema Admins, Enterprise Admins etc], setup a subscription to the alerts to notify the correct people and we were off to the races.

Everything is Great

The monitoring worked.  We got alerted when we added or removed people from these groups almost immediately.  Other groups approached us and asked us if we could provide the same types of monitoring for them.  Life was easy and SCOM was doing exactly what it was designed for.

755-lazy-dog

Speed Bump

As a SCOM administrator / Author the most common type of rule I make is an event  based rule.  I instruct my application owners to create events in the local event logs, then pick them up and alert on them.  As this was a pretty common practice for me I did not see a reason to test these rules in a development environment and, in hindsight, I should have.

It soon was noticed that the LSASS process on our three primary Domain Controllers had jumped up unexpectedly.  We utilize these three Domain Controllers pretty heavily so at first it was thought that some newly developed application was hammering them too frequently or something.  After a bit of investigation though it turned out that it was SCOM causing LSASS to spike up by over 30% of the total processor time available on these Domain Controllers.

image

We quickly targeted the new rules we had written as the culprits, what we couldn’t figure out is why they were causing LSASS, a Domain Controller process, to spike? After talking with Microsoft we found out that the actual security events store GUIDs not human readable strings (think parameter 3, the group name) and that by trying to filter on the parameter we were causing it to have to do that resolution which was putting a load on LSASS.  But hold on, it is very rare for us to have any of the events listed above happen in our environment so why was this a constant increase?  The Data Source we were using was a simple Event Provider DS.

image

We assumed that this configuration meant that if the event id matched 632 then it would look at the Parameters and check to see if Param[3] equaled Domain Admins.  But, if we actually sit down and look at what this configuration is actually saying, it just says pass the output of this DS on down the line if you find an Event with number 632 and Parameter 3 equal to Domain Admins.  This means, for every event that is dropped in the event log, check its Event ID and Parameter 3.  Essentially we were causing the Parameter 3 GUID to always be resolved to a friendly name which translated into the increase in load.

Solution

So what do we do now?  We need to build in the filtering logic that we thought was originally there!  To do this we use the same sort of logic that you use when designing for cookdown.  In the Data Source you map the output you want then pass that output to a condition detection for filtering then to the alert.

image

In SCOM this looks like

image

image

image

image

Or for the XML inclined

<Rule ID="GMI.Security.Rules.DC.2003.EnterpriseAdmins.Removed" Enabled="true" Target="MicrosoftWindowsServerAD2003Discovery!Microsoft.Windows.Server.2003.AD.DomainControllerRole" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100">
        <Category>Alert</Category>
        <DataSources>
          <DataSource ID="DS" TypeID="GMILibrary!GMI.Library.DS.FilteredEventProvider.Security">
            <EventDisplayNumber>(633|661)</EventDisplayNumber>
          </DataSource>
        </DataSources>
        <ConditionDetection ID="Filter" TypeID="System!System.ExpressionFilter">
          <Expression>
            <SimpleExpression>
              <ValueExpression>
                <XPathQuery>Params/Param[3]</XPathQuery>
              </ValueExpression>
              <Operator>Equal</Operator>
              <ValueExpression>
                <Value Type="String">Enterprise Admins</Value>
              </ValueExpression>
            </SimpleExpression>
          </Expression>
        </ConditionDetection>
        <WriteActions>
          <WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert">
            <Priority>1</Priority>
            <Severity>0</Severity>
            <AlertMessageId>$MPElement[Name="AlertMessageEnterpriseAdminRemoved"]$</AlertMessageId>
            <AlertParameters>
              <AlertParameter1>$Data/EventDescription$</AlertParameter1>
            </AlertParameters>
          </WriteAction>
        </WriteActions>
      </Rule>

Posted in Management Pack Authoring | Tagged , , | 4 Comments