Serverless applications aren't for everyone, as they make monitoring more difficult. While scaling and cost savings...
may be worth it for some developers, serverless apps come with higher test requirements and different monitoring strategies than traditional applications.
The best way to ensure serverless applications function as intended is to have consistent back-end tests. While this may not anticipate every scenario, it is a good way to prevent any sort of regression and guarantee that code is operating within expectations in production.
This tip discusses AWS Lambda, a popular serverless computing service. Other serverless computing options include Google Cloud Functions and Azure Functions from Microsoft, as well as offerings from iron.io and IBM BlueMix. The application monitoring and troubleshooting tools described below are not specific to serverless architectures, but the context of use will benefit teams adopting this approach to application delivery.
The serverless conundrum
It's easy to create status handlers on apps in the Express framework, i.e., a handler that responds to a GET /status request. But for serverless applications, each individual route is completely independent. DevOps teams must ensure that each microservice on a serverless application works properly in production.
There are two main types of serverless applications: those that take user input and those that operate entirely in the background. Serverless applications that take user input are easier to monitor if the developers configure clients to identify and report on errors that end users receive. In addition to user-generated reports, the team can set up simple HTTP tests using application performance monitoring (APM) tools such as New Relic or AppEnlight. Anything that exposes an HTTP endpoint can be tested. Developers using AWS Lambda can proxy any HTTP method to a single Lambda function via the cloud provider's API Gateway product, providing an HTTP HEAD request type that hits the same Lambda function but only tests functionality, and doesn't actually perform any commits to the database.
For example, if a Lambda function listens to something like POST /checkout, developers could make that same Lambda function also listen to HEAD /checkout and just return a 200 status if everything tests OK, without actually performing any checkout operations. Developers can think of the HEAD method support in the same way they would have previously created a /status handler.
Be on alert
The application team must get automatically notified when something goes wrong. This is where APM and logging tools, such as AppEnlight, or error tracking platforms, such as Errorception, come in. Services can run on a client web browser and track any back-end issues that end users encounter. In a perfect world, tests identify issues before they affect an end user, but users are unpredictable and some things are nearly impossible to anticipate on live applications -- server-based and serverless alike. It's still better to find out about an error automatically rather than rely on users to report it.
Additionally, the application team can configure the Load Balancer Access Logs for Lambda functions that use a load balancer. They can then search through logs and identify issues, such as HTTP 500 Internal Server Error occurrences, that may prevent client logs from being reported; this aids troubleshooting and post-mortems.
Build a log house
All logs from Lambda are easily accessible, and alerts should be configured to automatically notify the DevOps team when an issue occurs. This log-based alerting also works for back-end systems that don't have user activity. DevOps teams should configure Amazon's Simple Notification Service notifications for both Lambda function failures and known error responses, such as logs that indicate a server error that was reported to the user.
Logs can be automatically forwarded to a Lambda function and added into other log management platforms such as AppEnlight or Loggly. It's easier to track down issues when the administrator has all logs available for a request from the client side and AWS Elastic Load Balancing (ELB) and Lambda.
Verify a successful function deployment by tailing log files of the function after a deployment. I do this using the Lambda orchestration tool Apex. After writing "apex deploy <functionName>" I use "apex logs -f <functionName>". This yields a scrolling list of logs just written by that function.
Reaching the apex
Apex is a Lambda orchestration tool, as well as a shim that allows developers to run Go code directly on AWS Lambda. Apex is for Lambda what configuration management tool Ansible is for AWS Elastic Compute Cloud. It allows the user to build and deploy packages to AWS with versioning.
If an error or abnormal-looking pattern exists, the troubleshooter can quickly revert to the old version, place alerts for that particular error log and fix the problem. When debugging Lambda function calls, log the event input in the console so developers can write test cases for anything that failed.
AWS services such as ELB and CloudFront offer access and error logs. IT teams can monitor logs and create alerts whenever there are excessive amounts of errors, or a particular type of error that requires attention.
Customize how you say sorry
There's a good reason to use a content distribution network, whether the application runs in AWS Lambda with CloudFront, or from other serverless back ends. Static files in serverless applications running on AWS Lambda are likely served from Amazon CloudFront. CloudFront offers the ability to create custom error pages, which the application team can set up to trigger automatic alerts when something goes wrong.
Customers appreciate knowing that the application team was alerted to fix the issue. It's also a failsafe way to make sure DevOps teams don't miss any issues. CloudFront can even sit in front of API Gateway and return custom error pages even if everything else fails.
What's out of line?
One difficulty when working with back-end applications that do not have HTTP endpoints is making sure things still work when you can't perform active testing. One potential solution to this problem is to log common statistics that are usually stable. For example, ACI tracks total articles ingested by date and time. DevOps teams are notified if the number of articles increases or decreases by more than 15%, or if no articles were processed within 15 minutes. All of these alerts are configured with CloudWatch custom metrics.
Dashboards, from vendors such as Klipfolio, show useful statistics. For example, a dashboard tracks how many deliveries the application performed to any given client. These graphs allow a DevOps team to visualize changes and spot potential issues. ACI could notice a large dip in stories but also that the same dip happened a year ago around the start of the school year. This could be because it's common for bloggers to slow down when the school season starts. Comparative values are often easier for humans to spot than to convert into a programmed analytics system, so have people review statistics as a secondary check to error trackers, APM systems and log analytics.
Choosing between serverless providers
What does NoOps mean for IT ops teams?