Monday, 15 April 2019

After SharePoint CU upgrade, WSP package failing to deploy in some servers

Issue Description:This issue started happening after the Sep 2017 CU upgrade and that this issue is happening only one server where the solution is not getting deployed. The other server gets the solution deployed correctly.

  1. We started seeing this issue after patching. This is a common behavior we see in SharePoint On prem environments where the Logical SharePoint Timer Service instance goes into Offline/Disabled mode on one or more servers. Ideally after the PSConfig is completed successfully, it should move the Timer service instance back in Online state, however, in some instances, when there are some issues with PSConfig during one of the steps, it fails to get all the Timer service instances back online. This makes the timer jobs on the server ( where the Timer Service instance didn’t come back online) not processing correctly. 
    • This issues doesn’t come up to the surface in obvious way because the OWSTIMER.exe ( SharePoint Timer service) in Windows Services.msc console would remain running, so it would not throw any errors in Application Event logs or in ULS logs. However, since the logical Timer service instance is Offline on the server, the timer jobs would not be processed. This applies to all the timer jobs and not specific to WSP related timer jobs.
  2. Continuing from above point, I don’t believe the issue is with the WSP solution as we were able to deploy it successfully right after getting the Offline Timer Service instance back Online.
Review:
  • As per the logs, found the Timer job to be created for solution deployment but not actually getting through with deployment.
02/20/2018 10:23:10.54                 PowerShell.exe (0x0A2C)             0x0A5C SharePoint Foundation  PowerShell         6tf0       
Medium               Entering ProcessRecord Method of install-spsolution.     9a36ccdf-5135-4057-b3a7-90a009e136f4

02/20/2018 10:23:10.88                 PowerShell.exe (0x0A2C)             0x0A5C SharePoint Foundation  Topology             8uav     
 Verbose               Solution Deployment : Created timer job for branding.wsp, id : 8b0ebc5b-c659-4105-8e7a-cf62a3458360               9a36ccdf-5135-4057-b3a7-90a009e136f4

02/20/2018 10:23:10.88                 PowerShell.exe (0x0A2C)             0x0A5C SharePoint Foundation  Topology             8ucn     
Verbose               Solution Deployment : Deleting OperationStatus object for solution branding.wsp          
 9a36ccdf-5135-4057-b3a7-90a009e136f4

02/20/2018 10:23:10.88                 PowerShell.exe (0x0A2C)             0x0A5C SharePoint Foundation  PowerShell         6tf0       
Medium               Leaving ProcessRecord Method of install-spsolution.       9a36ccdf-5135-4057-b3a7-90a009e136f4

Explanation:

  • Each SharePoint server has Physical Timer Service ( OWSTIMER.exe) which is a Windows level service ( which is responsible for processing all the timer jobs), however, each SharePoint server also has Logical SharePoint Timer Service instance ( this is the piece which resides in SharePoint code base).
  • We require the Logical Timer service instances to be Online on all servers for SharePoint to be able to process the Timer jobs.
  • We ran the following script to identify if all the Timer service instances were Online on all servers and found that the affected server’s Timer Service Instance was set to Disabled.
$farm  = Get-SPFarm
$disabledTimers = $farm.TimerService.Instances | where {$_.Status -ne "Online"}
if ($disabledTimers -ne $null)
{
    foreach ($timer in $disabledTimers)
    {
        Write-Host "Timer service instance on server " $timer.Server.Name " is not Online. Current status:" $timer.Status
        Write-Host "Attempting to set the status of the service instance to online"
        $timer.Status = [Microsoft.SharePoint.Administration.SPObjectStatus]::Online
        $timer.Update()
    }
}
else
{
    Write-Host "All Timer Service Instances in the farm are online! No problems found"
}
  • We enabled the Timer service instance via above script and after that we tried to deploy the solution again via Central Admin site and it was successful on both servers.
Important Note:
  • When the Timer Service instance is not enabled on any of the server, you would see all timer jobs on that server getting impacted. You won’t see any errors in the ULS logs, however, above script would be a good way to check if all Timer service instances are online in the farm.
Reference:

 Add this step to patching activity: 
As an additional safeguard method, I would recommend to add following step in your Patching process so that we can explicitly make sure that all Timer service instances in the farm are Online after the Patching process is completed. You can add this step as part of your Patching process documentation and this will avoid any issues with Timer service instance.

Run following script from one of the SharePoint server after the Patching process is complete. This will check for any Offline/Disabled Timer Service instances and Enable them.