Issue Description:This issue started happening after the Sep 2017 CU upgrade and that this issue is happening only one server where the solution is not getting deployed. The other server gets the solution deployed correctly.
- We started seeing this issue after patching. This is a common behavior we see in SharePoint On prem environments where the Logical SharePoint Timer Service instance goes into Offline/Disabled mode on one or more servers. Ideally after the PSConfig is completed successfully, it should move the Timer service instance back in Online state, however, in some instances, when there are some issues with PSConfig during one of the steps, it fails to get all the Timer service instances back online. This makes the timer jobs on the server ( where the Timer Service instance didn’t come back online) not processing correctly.
- This issues doesn’t come up to the surface in obvious way because the OWSTIMER.exe ( SharePoint Timer service) in Windows Services.msc console would remain running, so it would not throw any errors in Application Event logs or in ULS logs. However, since the logical Timer service instance is Offline on the server, the timer jobs would not be processed. This applies to all the timer jobs and not specific to WSP related timer jobs.
- As per the logs, found the Timer job to be created for solution deployment but not actually getting through with deployment.
- Each SharePoint server has Physical Timer Service ( OWSTIMER.exe) which is a Windows level service ( which is responsible for processing all the timer jobs), however, each SharePoint server also has Logical SharePoint Timer Service instance ( this is the piece which resides in SharePoint code base).
- We require the Logical Timer service instances to be Online on all servers for SharePoint to be able to process the Timer jobs.
- We ran the following script to identify if all the Timer service instances were Online on all servers and found that the affected server’s Timer Service Instance was set to Disabled.
- We enabled the Timer service instance via above script and after that we tried to deploy the solution again via Central Admin site and it was successful on both servers.
- When the Timer Service instance is not enabled on any of the server, you would see all timer jobs on that server getting impacted. You won’t see any errors in the ULS logs, however, above script would be a good way to check if all Timer service instances are online in the farm.
Add this step to patching activity:
As an additional safeguard method, I would recommend to add following step in your Patching process so that we can explicitly make sure that all Timer service instances in the farm are Online after the Patching process is completed. You can add this step as part of your Patching process documentation and this will avoid any issues with Timer service instance.
Run following script from one of the SharePoint server after the Patching process is complete. This will check for any Offline/Disabled Timer Service instances and Enable them.