Troubleshooting TCM Monitor and Snapshot Job Errors with errorDetails

2026-06-22

When working with Tenant Configuration Management, or TCM, there are two asynchronous experiences where administrators typically need deeper troubleshooting information: monitor runs and snapshot jobs. A monitor run can come back as failed or partiallySuccessful. A snapshot job can do the same. The default Microsoft Graph response is useful for knowing that something went wrong, but it usually does not give you enough information to know what to fix.

The missing piece is the errorDetails property. It is not returned by default. You must ask Microsoft Graph for it explicitly by using the $select query parameter.

Diagram showing that TCM monitor runs and snapshot jobs require $select=errorDetails to retrieve actionable troubleshooting information.

The short version

If you already have the identifier of the failing object, call the object directly and select errorDetails:

GET https://graph.microsoft.com/v1.0/admin/configurationManagement/configurationMonitoringResults/{configurationMonitoringResultId}?$select=id,monitorId,runStatus,errorDetails

GET https://graph.microsoft.com/v1.0/admin/configurationManagement/configurationSnapshotJobs/{configurationSnapshotJobId}?$select=id,displayName,status,errorDetails

The Microsoft Graph documentation uses the camel-case property name errorDetails. If you have seen examples written as ?$select=errordetails, I recommend keeping the documented casing in your automation to avoid surprises.

Why the default response is not enough

Both monitor results and snapshot jobs expose a status field. That field tells you the outcome of the operation, but not necessarily the reason behind the outcome. For example, the following response tells us that the monitor run failed:

{
  "id": "66fa1689-22cb-49c1-8b5a-c94822b7b13b",
  "monitorId": "<monitorId>",
  "tenantId": "<tenantId>",
  "runInitiationDateTime": "2026-06-22T12:00:36.1084955Z",
  "runCompletionDateTime": "2026-06-22T12:01:11.1084955Z",
  "runStatus": "failed",
  "driftsCount": 0
}

That is a good health signal, but it is not yet a troubleshooting signal. If this is part of an automated monitoring pipeline, you do not want your alert to simply say "the monitor failed." You want it to say which resource type failed, which instance failed, and what Microsoft Graph reported as the error.

What errorDetails contains

The errorDetails collection is designed to provide exactly that missing context. For monitor runs, the documented type is errorDetail. For snapshot jobs, the documentation describes the property as the details of errors related to the reasons why the snapshot cannot complete. In practice, the important troubleshooting dimensions are:

resourceType: the TCM resource type that failed, such as microsoft.teams.meetingpolicy or microsoft.exchange.transportrule.
resourceInstanceName: the specific resource instance that caused the issue, when available.
errorMessage: the error text that points you toward the remediation.

Graphic showing the three important fields in a TCM errorDetails response: resourceType, resourceInstanceName, and errorMessage.

A selected response for a failed monitor run could look like this:

{
  "id": "<monitorId>",
  "monitorId": "69b6b9ba-20c9-4ffb-beef-263c07063222",
  "runStatus": "failed",
  "errorDetails": [
    {
      "resourceType": "microsoft.teams.meetingpolicy",
      "resourceInstanceName": "Global",
      "errorMessage": "Access Denied."
    }
  ]
}

That is much more actionable. Instead of guessing whether the issue is with the monitor definition, the TCM service principal, a workload role, or a resource-specific problem, you now have a concrete starting point.

Finding failed monitor runs

Monitor run history is exposed through configurationMonitoringResults. The list operation supports $select, $filter, $orderby, and $top, so you can quickly find recent failed or partially successful runs.

GET https://graph.microsoft.com/v1.0/admin/configurationManagement/configurationMonitoringResults?$filter=runStatus eq 'failed'&$orderby=runInitiationDateTime desc&$top=10

Once you have the failed result identifier, retrieve the error details:

GET https://graph.microsoft.com/v1.0/admin/configurationManagement/configurationMonitoringResults/<monitorId>?$select=id,monitorId,runInitiationDateTime,runCompletionDateTime,runStatus,driftsCount,errorDetails

You can also filter for partially successful monitor runs:

GET https://graph.microsoft.com/v1.0/admin/configurationManagement/configurationMonitoringResults?$filter=runStatus eq 'partiallySuccessful'&$orderby=runInitiationDateTime desc

This is important because a partially successful run may still contain useful drift results for some resources while one or more resources failed. Treating partiallySuccessful as "good enough" in automation can hide resource coverage gaps.

Finding failed snapshot jobs

Snapshot jobs are exposed through configurationSnapshotJobs. Snapshot jobs are asynchronous, so the normal workflow is to create the snapshot, poll the job until it completes, and then download the snapshot from the resourceLocation value when the job succeeds or partially succeeds.

To list recent failed snapshot jobs:

GET https://graph.microsoft.com/v1.0/admin/configurationManagement/configurationSnapshotJobs?$filter=status eq 'failed'&$orderby=createdDateTime desc&$top=10

Then call the specific job and select errorDetails:

GET https://graph.microsoft.com/v1.0/admin/configurationManagement/configurationSnapshotJobs/<jobId>?$select=id,displayName,status,createdDateTime,completedDateTime,errorDetails

For partially successful jobs, use the same pattern:

GET https://graph.microsoft.com/v1.0/admin/configurationManagement/configurationSnapshotJobs?$filter=status eq 'partiallySuccessful'&$orderby=createdDateTime desc

The snapshot job status enumeration includes partiallySuccessful as an evolvable enum value. If you are building strongly typed clients, make sure your code does not break when Graph returns future enum members. For raw REST calls, this usually just means treating status values as strings and not assuming that today's list is the complete list forever.

Using Microsoft Graph PowerShell

If you are using the Microsoft Graph PowerShell SDK, the command names are available in the Microsoft.Graph.ConfigurationManagement module. The generated cmdlets expose the same resources as the REST API. If the SDK cmdlet in your installed version does not expose a convenient select parameter for the non-default property, you can always use Invoke-MgGraphRequest and call the REST endpoint directly.

Import-Module Microsoft.Graph.Authentication
Import-Module Microsoft.Graph.ConfigurationManagement

Connect-MgGraph -Scopes 'ConfigurationMonitoring.Read.All'

$resultId = '<resultId>'
$uri = "/v1.0/admin/configurationManagement/configurationMonitoringResults/$resultId" +
       '?$select=id,monitorId,runStatus,errorDetails'

$result = Invoke-MgGraphRequest -Method GET -Uri $uri
$result.errorDetails | Format-Table resourceType, resourceInstanceName, errorMessage -AutoSize

For snapshot jobs, only the path changes:

$snapshotJobId = '<jobId>'
$uri = "/v1.0/admin/configurationManagement/configurationSnapshotJobs/$snapshotJobId" +
       '?$select=id,displayName,status,errorDetails'

$job = Invoke-MgGraphRequest -Method GET -Uri $uri
$job.errorDetails | Format-Table resourceType, resourceInstanceName, errorMessage -AutoSize

Building better automation around TCM errors

The practical value of errorDetails shows up when you build it into your operational workflows. A monitor failure should not require someone to manually open Graph Explorer, re-run the query, and discover that the service principal is missing a workload role. Your automation can do that triage automatically.

Troubleshooting loop showing how to find a failed TCM run, select errorDetails, map the cause, fix the issue, and rerun.

At a minimum, I recommend logging these fields whenever a monitor result or snapshot job is not fully successful:

Object identifier: the monitoring result ID or snapshot job ID.
Operation status: failed or partiallySuccessful.
Resource type: the TCM resource type that failed.
Resource instance: the resource instance name, when present.
Error message: the text returned by Graph.

From there, you can group errors by resourceType. If every Teams-related resource fails with an access error, you are probably dealing with a missing workload role or permission. If only one resource instance fails, the issue is likely more specific to that object or its configuration. If snapshot jobs fail across every requested resource, look first at the TCM service principal setup and the tenant-level permissions.

Common remediation patterns

The error message is the source of truth, but most failures tend to fall into a few operational categories:

Missing Microsoft Graph permission: the calling application or user does not have the required ConfigurationMonitoring.Read.All or ConfigurationMonitoring.ReadWrite.All permission for the operation being performed.
Missing TCM service principal permission: the Unified Tenant Configuration Management service principal does not have the workload permissions required to read or evaluate the selected resource type.
Missing workload role: some workloads require the TCM service principal to hold specific Microsoft Entra roles in addition to Graph permissions.
Unsupported or unavailable resource state: the requested resource type or instance cannot be processed in the tenant's current state.
Transient workload issue: a backend workload error may require retrying the monitor run or snapshot job after the service recovers.

The key is to avoid treating all TCM failures as equal. A failed monitor run with one Exchange transport rule error is a different operational problem than a snapshot job where every Teams resource failed due to access denied.

Suggested pattern for scripts

The following pseudo-flow is what I normally recommend for automation:

# 1. Find recent failed or partially successful monitor runs.
# 2. For each result, call the result directly with $select=errorDetails.
# 3. Write one log record per error detail.
# 4. Group by resourceType and errorMessage.
# 5. Route the issue to the right owner or remediation workflow.

That same pattern applies to snapshot jobs. The only difference is the collection and status field names:

Scenario	Collection	Status field	Error property
Monitor run	`configurationMonitoringResults`	`runStatus`	`errorDetails`
Snapshot job	`configurationSnapshotJobs`	`status`	`errorDetails`

Conclusion

The errorDetails property is one of those small Graph API details that makes a big operational difference. Without it, a failed TCM monitor run or snapshot job is only a status value. With it, you can see the failing resource type, the affected instance, and the message that points you toward remediation.

If you are building automation around Tenant Configuration Management, make $select=errorDetails part of your failure-handling path. It will make your alerts more useful, your logs more actionable, and your troubleshooting much faster.