Monitoring tells you when something is wrong, while Observability enables you to understand why. Monitoring is a subset of and necessary action for Observability. You can only monitor an observable system. We at Boldlink implement the Observability solutions tailored to our Customers requirements.
Setting the correct monitoring solutions in place for your AWS Organisations is crucial, but only half of the implementation; once data is being collected, you must develop Observable insights from it to allow you to have the Observability of your platform.
AWS Shared Responsibility Model
AWS monitors their facilities services, Api’s availability, and by default, will not filter or stop traffic unless there is an attack with significant traffic or you have subscribed to Advanced Shield.
It is up to you to create the necessary infrastructure and logic for your AWS Organisation, Accounts and Platforms on AWS for Monitoring Alerting and Audit. Let us dive in.
Observability
AWS Cloudtrail is the system to monitor all your API calls in every AWS account in your Organization and Region. These are importantly allowing to observe patterns of usage and access, some of the recommended best practices.
- By default, new AWS account events are available for up to 90 days, still, it is not storing them (overwrites them every 90 days), you must keep them to S3 or Cloudwatch for permanent and historical storage; make it the first step of your setup to implement Cloudtrail on all regions and store to S3 for long term storage at a minimum.
- Once the logs are stored on S3, use S3 Lifecycle policies to move logs to cheaper storage (AWS S3 Glacier) or clean logs outside your regulatory requirements (ex. delete all logs older than five years).
- Centralise the storage of the logs in a restricted separate AWS account purposely created and configured for it where the source account can only PUT logs; this means a compromised account will not have access to change the logs if properly configured.
- Enable logs to all AWS Regions even if you don’t use them all; one example of why you want to do this is to detect someone using an unauthorised region (could be an attack/compromise or a configuration error).
- Prevent the change of configurations to Cloudtrail using AWS Config SCP’s (Service Control Policy), which is applied at the Organisation OU and overwrites local AWS Account administrators to change it.
- Enable Lambda and S3 API calls. These services are not included by default due to the high nr of events they will generate. Still, it is highly recommended that you enable them all, especially for Production level accounts.
- Encryption at rest by default is enabled, but logfiles integrity validation is not, enable it to ensure there was no tampering with the uploaded file. Detection here equals an attack or system compromise.
- Integrate the logs on S3 with AWS Athena or third party products to analyse the logs and provide you with the aimed Observability.
For your platforms, you will have many different parts with different log systems and different log formats.
It would be best if you approached this with the same attitude you would approach a datalake, centralise the data from different sources and use other systems to generate the insights for you to action.
AWS Security Hub agglomerates many tools such as GuarDuty; IAM Access Analyzer; Macie; and Inspector and can be expanded with 3rd party solutions, let us look into more detail to the AWS solutions:
- AWS GuardDuty allows the detection in real-time to threats to your infrastructure ex. Ec2 login attacks or ssh port open; for more detail of the detection, see here the list of threats and more are added as AWS product lifecycle progresses.
- AWS IAM Access Analyser helps you identify the resources in your organisation and accounts, such as Amazon S3 buckets or IAM roles shared with an external entity allowing you to detect and control unwanted access to your AWS resources which can be a security risk.
- AWS Macie through Machine Learning enables you to scan your S3 contents to detect PII or sensitive information, which could expose your business to an attack either because a bucket isn’t encrypted or because your application isn’t correctly storing and safeguarding your client’s information.
- AWS Inspector was designed to scan your Ec2 Instances contents and configuration and detect misconfigurations and security issues.
A key component of your Observability strategy is AWS Cloudwatch, this service allows for the aggregation of your logs, for example, you can create subscriptions that will trigger if there are specific keywords and feed this to custom metrics where you can create alert conditions.
Do you have to use only AWS Cloudwatch? No, you can bring your own or use 3rd party solutions to replace or extend, but if you can use it, we recommend it since it will be another platform for your teams to manage.
This will be a learning curve as your Platform and AWS journey progresses, and more services are available to AWS customers.
Was this list comprehensive or too short? What else should we also include? Let us know in the comments below.