Intermittent Issue with SpeedGrader Previews in Canvas
Incident Report for John Carroll University
Postmortem

Overview
Between 5:40 AM MT and 11:33 AM MT on April 28, 2022 users were unable to access and view documents via DocViewer, primarily through SpeedGrader. Available resources for the service were not enough to process requests for the number of users accessing DocViewer files at the time. Additional resources were added throughout the morning until service was consistent for users again.

Details
Recent Canvas background maintenance has included updating services to use a new system that allows for automation of scaling resources, deployment of services, and other managerial work. CanvaDocs moved to this system at 5:40 AM MT on April 28. This in itself was not a problem, but the metrics for how many of each resource is needed to manage expected user load is different. We underestimated some of these metrics in moving to this system, which led to available resources not being enough to handle incoming user requests. At 5:45 AM MT, users started seeing messages stating “Service is currently unavailable. Try again later” when attempting to access documents in DocViewer, especially in SpeedGrader. Our DocViewer engineers were notified of these issues soon after via automated alerts. When it was found that we were low on resources to handle incoming requests, they began manually adding more resources and updating configurations within the new system. This was first completed at 10:12 AM MT, with service temporarily returning to normal, but was needed again when user requests increased soon after. Another adjustment was completed by 11:15 AM MT, and users were able to access documents normally again by 11:33 AM MT.

Mitigation
Manually adding additional resources to handle user load, along with updating configurations within our new auto-scaling system for DocViewer allowed the service to run as it should once again. With other services also moving to this new system, we are working on providing better documentation, training, and guidance to engineers across various services as they do so. This will include information on how to better plan for expected usage and provide the correct number of resources to handle user requests across each.

Conclusion
We understand the importance our DocViewer has on Canvas functionality and the impact this caused for users trying to access the service. We are working to put safeguards in place to prevent service interruption from happening again through DocViewer and we apologize for the inconvenience this caused.

Posted May 16, 2022 - 14:29 EDT

Resolved
This incident has been resolved.
Posted May 16, 2022 - 14:27 EDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 16, 2022 - 13:29 EDT
Identified
Canvas Users may experience periodic "Service is currently unavailable. Try again later." messages when utilizing Speedgrader or previewing documents.

ITS is aware of the problem and is investigating. We will post updates as they become available from Canvas.

Please call the JCU Service Desk at x3005 for any questions or concerns you have.

Information Technology Services
216-397-3005
Posted May 16, 2022 - 11:34 EDT
This incident affected: Teaching & Learning Tools (Canvas (Learning Management System)).