Optimized Search Queries with AWS DDB

Amazon DynamoDB is a fully-managed, high-performance NoSQL database service that helps developers build efficient and fast applications without dealing with the attendant critical infrastructure that comes with such operation. It comes with a host of brilliant features such as automatic backups, default encryption and auto-scaling.

A popular DynamoDB feature marketed by AWS is “Performance at scale” – efficiently scaling databases as your key growth metrics rise exponentially. For this to happen, increased database requests larger than the capacity of a DynamoDB partition are automatically divided into chunks and spread over a sufficient number of diverse shards. Features like autoscaling and data streams make this process a highly reliable and operationally efficient process.

Searching DynamoDB’s Tables

The search feature is an essential component of cloud-based business applications. App Developers need to quickly query specific information needed by the user and present them in an optimized and cost-effective way. To do this, Amazon DynamoDB provides the Scan functionality. As we will see, this can be an inefficient way of querying our fast-growing database.

Unlike the case of Relational Databases where there’s greater flexibility in the query design, Dynamo DB requires developers to identify common queries which are then assigned “hot keys’ for fast indexing and query. At first, this makes database management very easy but as production scales and your dataset expand with increased edge cases i.e ‘more cold keys’, querying data becomes a cumbersome process. As the majority of the RCUs (Read capacity unit) is spent on a few hotkeys, attempting to solve this dilemma by over-provisioning other partitions is an expensive option.

↓ Reduced DynamoDB requests => ↓ Reduced provisioned throughput capacity => ↑ Cost savings

Requests Throttling

Database requests’ throttling is an inevitable event that arises from the uneven partition workload distribution. To avoid this, AWS recommends setting up a cache amongst other solutions. Before doing this though, items triggering this event must be identified. Amazon CloudWatch Contributor Insights can be used to identify the most accessed and throttled items and partition keys. Amazon officially recommends two caching solutions: DynamoDB Accelerator (DAX) and Amazon ElastiCache.

Built specifically for DynamoDB, DAX is an in-memory caching service that increases the performance of apps by providing very fast response times to data access requests. By storing frequently accessed data from DynamoDB tables in memory, read performance is accelerated in multiples. A fully managed service like DAX, Amazon ElasticCache allow developers to perform enhanced queries needed to retrieve information for their apps. With ElasticCache, requests forwarded by the application are processed in its in-memory data store, therefore, reducing the volume of the requests on DynamoDB partitions.

Available Search Solutions

There are a number of external search solutions available for integration with your DynamoDB data. There’s Amazon ElasticSearch and Amazon CloudSearch. There’s also PartiQL – a new SQL-compatible query language supported by DynamoDB, Apache Solr and Typesense.

In this article, we will be looking at ElasticSearch and CloudSearch, two enhanced search engines well-suited for searching through all types of data.

1. Amazon CloudSearch

Amazon CloudSearch is a fully managed cloud search service that empowers developers to configure an application’s search functionality in a highly scalable manner. Provisioning a seamless filtering experience is easy with a plug and play setup and requires little application logic thereby increasing developers’ productivity.

Incorporating CloudSearch with DynamoDB involves three (3) simple steps:

Setting up a CloudSearch domain to search DynamoDB data. A domain comes with unique endpoints for querying and must have a unique name. The CloudSearch console provides easy to follow configuration wizard for this step.

Uploading data from DDB to CloudSearch for indexing. Uploaded data are segmented into batches and assigned defined fields.

Lastly, search requests from your app are sent to CloudSearch. You do have to make sure that the search domain is in sync with changes in DDB tables. This can be done automatically or by creating new domains.

While CloudSearch is an efficient search engine capable of filtering large datasets cost-effectively, ElasticSearch is a more powerful and popular alternative with richer customizable features.

2. Amazon ElasticSearch

Amazon ElasticSearch is a sophisticated, fully managed search and analytics service allowing developers to perform enhanced search queries needed to retrieve information for their apps. It provisions ElasticSearch clusters, an open-source search engine with simple REST API endpoints.

Connecting Elastic Search with Dynamo DB

Here’s a sample process for setting up richer, full-text searching capabilities for your DynamoDB using ElasticSearch and AWS Lambda.

First, you select the DynamoDB tables you would like to index. The tables chosen will most likely contain the most frequently queried data queried by your application. Again, you can find these items using Contributor Insights.
Next, you enable DynamoDB Streams on the designated tables. DDB Streams captures time-ordered information about changes to items in DynamoDB tables. Combined with AWS Lamda, we can audit data flows and also create events triggered by changes in hand-picked items. An alternative to DDB Streams is Kinesis Streams which are more suited for real-time processing of events on AWS.
We then create an IAM role with a basic policy containing at least Lamda, DDB and ElasticSearch execution permissions.
Lastly, a Lambda function is created and deployed. There are a number of ddb-es lambda blueprints available for this purpose. To test the Lambda function, AWS recommends Amazon ES console or Kibana – a powerful data visualization and navigation interface for ElasticSearch.

An alternative to using the AWS Lambda + DynamoDB streams + ElasticSearch combination is using the Logstash plugin for DynamoDB. The Logstash pipeline can consume data from DynamoDB and DDB streams and then load it plus subsequent changes into ElasticSearch clusters for indexing and enhanced structured and full-text queries. Again, appropriate IAM credentials need to be configured and assigned.

Summary

Amazon DynamoDB is a simple, secure and efficient way to configure data models for our application in a manner that guarantees fast, predictable performance and scalability. It however comes with some downsides especially its limited querying capabilities. While an additional search engine expense on top of the usual database cost might be discouraging, integrating an efficient full-text modern search solution such as ElasticSearch gives your data-driven application a huge performance boost and ultra-rich search abilities.

An efficient search solution is an essential part of your application’s architectural performance. It’s therefore important to examine which of the available solutions provides the best business value.

At Boldlink, our AWS experts are open to consult with you providing the best solutions tailored to your needs.

Book a call with us today.