Create a knowledge base using Amazon OpenSearch

A step-by-step guide to setting up a knowledge base using Amazon OpenSearch, Lambda, and API Gateway.

Create a knowledge base using Amazon OpenSearch

Photo by Luke Tanis on Unsplash

So, let’s get straight to the point. We are going to build a knowledge base using Amazon OpenSearch, Lambda, and API Gateway on AWS. 🎇

What is a knowledge base?

According to Atlassian, it is a repository or collection of documents representing all the relevant information about a certain topic. A knowledge base can be composed of documents, tutorials, FAQs, and any other content about the topic. Companies use the knowledge base to train new joiners after onboarding, store RCA reports for future issue resolution, store architecture diagrams of various systems, and disseminate legal or HR information (🙁).

From the developers’ perspectives, a knowledge base can be used for different purposes:

  • Get the new joiners up to speed with various system setup steps (Say it three times! 😎).

  • Keep track of all RCA and issue resolution documents for future reference.

  • Manage CR documents all in one place.

and many more…

What you need to know to complete this tutorial:

  • An AWS account.

  • AWS CLI installed and at least one profile configured with access-key-id and secret-access-key.

  • Familiarity with SAM (Serverless Application Model).

Two ways you can follow this tutorial:

  1. Clone this repository on your local machine and then deploy the stack using,
$> sam build
$> sam deploy --guided

2. Set up a local SAM project from scratch and then update the code as we go through it.

To better understand how the knowledge base is implemented, we will take route#2 and explain everything from scratch.

Photo by Mindspace Studio on Unsplash

So, let’s begin! 🔥

Project setup:

  1. First, open a terminal and create a simple SAM project using this command.
$> sam init
# For different options and choices, 
# I have provided a set of options that you can choose.
# Text marked with bold is the choice.

Which template source would you like to use? 1
Choose an AWS Quick Start application template 1
Use the most popular runtime and package type? (Python and zip) y
Would you like to enable X-Ray tracing on the function(s) in your application? n
Would you like to enable monitoring using CloudWatch Application Insights? n
Would you like to set Structured Logging in JSON format on your Lambda functions? n
Project name [sam-app] opensearch-kb-app

2. Change the current directory to the directory of the project that we have just created.

$> cd opensearch-kb-app

3. Now that the project is set up, we can modify it to create our knowledge base. For that, open the template.yaml file in your favorite code editor and delete the content. Delete the /hello_world directory as well. 💯

In the following steps, we will add blocks of code to the template.yaml file and explain what they are doing. Simply add them in sequence and I will provide a reference file at the end so that you can check with your final version.

4. First, we add a description of what this template is going to deploy on AWS.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  Opensearch Knowledge Base App

  Sample SAM Template for Opensearch Knowledge Base App

5. Next, in the Globals section, we specify that we are going to use python3.10 as the runtime for all the Lambda functions, and the timeout is set to 20s. We also create an environment variable called STAGE_NAME so that this variable can be accessed from every lambda function. The !Ref implies that we are accessing the value from a parameter.

Globals:
  Function:
    Timeout: 20
    Runtime: python3.10
    Environment:
      Variables:
        STAGE_NAME: !Ref StageName

6. Now, we define the parameter variables that will be used throughout the template. AllPrefix and StageName parameters are used in the name of every resource to identify them as part of this stack and also to avoid name collisions.

Parameters:
  AllPrefix:
    Type: String
    Default: 'knowledge-base'
  StageName:
    Type: String
    Default: 'dev'

The following code snippets will be added under the Resources section.

7. First, we create a Role that will be used by our Lambda functions. The role describes that Lambda functions will be able to assume this role and perform actions specified in the Policies section. ‘lambda:InvokeFunction’, action implies that API Gateway will be able to call the Lambda functions associated with this role. This role also allows the Lambda functions to interact with SecretsManager and OpenSearch services through boto3 library. We will see examples of it later.

Resources:

  # Main Role
  CustomMainRole:
    Type: AWS::IAM::Role
    Properties:
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - 'lambda.amazonaws.com'
            Action:
              - 'sts:AssumeRole'
      Policies:
        - PolicyName: 'CustomLambdaPolicy'
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 'lambda:InvokeFunction'
                Resource: '*'
              - Effect: Allow
                Action:
                  - 'secretsmanager:*'
                Resource: '*'
              - Effect: Allow
                Action:
                  - 'es:*'
                Resource: '*'

8. Now, we need to use some form of authentication system to access the OpenSearch service that we will create shortly. For this tutorial, we will use the “master user — password” authentication. Instead of storing the username and password in the template file, which is not a secure approach, we will use Amazon SecretsManager service and point the OpenSearch service to use the username and password to authenticate users trying to access the OpenSearch dashboard. As you can probably decipher from the description below, the “username” is set to “admin” and the “password” is auto-generated with some validation rules.

  OpenSearchSecret:
    Type: AWS::SecretsManager::Secret
    Properties:
      Name: !Sub '${AllPrefix}-os-secret-${StageName}'
      Description: 'Password will be generated dynamically'
      GenerateSecretString:
        SecretStringTemplate: !Sub '{"username": "admin"}'
        GenerateStringKey: 'password'
        PasswordLength: 25
        ExcludeCharacters: '"@/\'

9. The big guy now! Our OpenSearch service. Forget the alien configurations below 😱. The Gist of this configuration is this — We will be deploying an OpenSearch cluster with single instance of type t3.small.search. MasterUserOptions property specifies that for the authentication system, “username-password” will be used and where to go to get these values (the SecretsManager resource mentioned above).

  OpenSearchServiceDomain:
    Type: AWS::OpenSearchService::Domain
    Properties:
      DomainName: !Sub '${AllPrefix}-os-${StageName}'
      EngineVersion: 'OpenSearch_2.11'
      AccessPolicies:
        Version: '2012-10-17'
        Statement:
          - Effect: 'Allow'
            Principal:
              AWS: '*'
            Action: 'es:*'
            Resource: '*'
      ClusterConfig:
        InstanceCount: 1
        ZoneAwarenessEnabled: false
        InstanceType: 't3.small.search'
      NodeToNodeEncryptionOptions:
        Enabled: true
      EncryptionAtRestOptions:
        Enabled: true
      EBSOptions:
        EBSEnabled: true
        Iops: '0'
        VolumeSize: 15
        VolumeType: 'gp2'
      DomainEndpointOptions:
        EnforceHTTPS: true
      AdvancedSecurityOptions:
        Enabled: true
        InternalUserDatabaseEnabled: true
        MasterUserOptions:
          MasterUserName: !Join [ '', [ '{{resolve:secretsmanager:', !Ref OpenSearchSecret, ':SecretString:username}}' ] ]
          MasterUserPassword: !Join [ '', [ '{{resolve:secretsmanager:', !Ref OpenSearchSecret, ':SecretString:password}}' ] ]

10. We need to have some APIs to handle users’ requests for various CRUD operations involving OpenSearch, e.g. to index documents, update the indexed documents, and search for documents given a query. We will use API Gateway to manage these APIs.

  DefaultApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: !Ref StageName
      GatewayResponses:
        DEFAULT_4XX:
          ResponseParameters:
            Headers:
              Access-Control-Allow-Origin: "'*'"
              Access-Control-Allow-Headers: "'*'"
        DEFAULT_5XX:
          ResponseParameters:
            Headers:
              Access-Control-Allow-Origin: "'*'"
              Access-Control-Allow-Headers: "'*'"
      Cors:
        AllowMethods: "'*'"
        AllowHeaders: "'*'"
        AllowOrigin: "'*'"

Now that we have configured all the necessary resources, we will create four APIs to handle users’ requests, namely create-document, update-document, get-document, and finally search-documents.

All these Lambdas follow a common pattern. We assign the role CustomMainRole that was created earlier. We define two environment variables containing the SecretsManager secret ID and OpenSearch domain endpoint. Note the Path and Method properties for each API. We will need them later.

11. Create Document API

  CreateDocumentFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambdas/
      Handler: create_doc.lambda_handler
      Role: !GetAtt CustomMainRole.Arn
      Environment:
        Variables:
          OPEN_SEARCH_SECRET: !Ref OpenSearchSecret
          OPEN_SEARCH_DOMAIN_ENDPOINT: !GetAtt OpenSearchServiceDomain.DomainEndpoint
      Events:
        PingRootEvent:
          Type: Api
          Properties:
            Path: /{index_name}/kb-docs
            Method: post
            RestApiId: !Ref DefaultApi

12. Update Document API

  UpdateDocumentFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambdas/
      Handler: update_doc.lambda_handler
      Role: !GetAtt CustomMainRole.Arn
      Environment:
        Variables:
          OPEN_SEARCH_SECRET: !Ref OpenSearchSecret
          OPEN_SEARCH_DOMAIN_ENDPOINT: !GetAtt OpenSearchServiceDomain.DomainEndpoint
      Events:
        PingRootEvent:
          Type: Api
          Properties:
            Path: /{index_name}/kb-docs/{doc_id}
            Method: put
            RestApiId: !Ref DefaultApi

13. Get Document API

  GetDocumentFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambdas/
      Handler: get_doc.lambda_handler
      Role: !GetAtt CustomMainRole.Arn
      Environment:
        Variables:
          OPEN_SEARCH_SECRET: !Ref OpenSearchSecret
          OPEN_SEARCH_DOMAIN_ENDPOINT: !GetAtt OpenSearchServiceDomain.DomainEndpoint
      Events:
        PingRootEvent:
          Type: Api
          Properties:
            Path: /{index_name}/kb-docs/{doc_id}
            Method: get
            RestApiId: !Ref DefaultApi

14. Search Documents API

  SearchDocumentsFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambdas/
      Handler: search_docs.lambda_handler
      Role: !GetAtt CustomMainRole.Arn
      Environment:
        Variables:
          OPEN_SEARCH_SECRET: !Ref OpenSearchSecret
          OPEN_SEARCH_DOMAIN_ENDPOINT: !GetAtt OpenSearchServiceDomain.DomainEndpoint
      Events:
        PingRootEvent:
          Type: Api
          Properties:
            Path: /{index_name}/kb-docs/search
            Method: post
            RestApiId: !Ref DefaultApi

15. Finally! We are at the end of this template file 🥳. We output the base URL of the API Gateway so that we can call the APIs created above by simply appending the {Path} part for each API to the base URL.

Outputs:
  ApiGatewayLambdaInvokeUrl:
    Value: !Sub 'https://${DefaultApi}.execute-api.${AWS::Region}.amazonaws.com/${StageName}'

This concludes the configuration of our knowledge base template file. Now, did you mess anything up 🤔? No worries. Compare your version with this file and be merry.

In the next post, we will create the handling logic of our CRUD APIs and deploy the stack. See you then 🙏.

If you found this post useful, please give it a 👏🏽 and follow me on Medium. Let’s get connected on LinkedIn.