Skip to content

FriendsOfTerraform/aws-sagemaker-inference

Repository files navigation

Sagemaker Inference Module

This module builds and configures SageMaker inference models and endpoints

This repository is a READ-ONLY sub-tree split. See https://github.com/FriendsOfTerraform/modules to create issues or submit pull requests.

Table of Contents

Example Usage

Basic Usage

module "basic_usage" {
  source = "github.com/FriendsOfTerraform/aws-sagemaker-inference.git?ref=v1.0.0"

  # manages multiple models
  models = {
    # The keys of the map are model names
    demo-model = {
      iam_role_arn = "arn:aws:iam::111122223333:role/service-role/AmazonSageMakerServiceCatalogProductsExecutionRole"

      # manages multiple container definitions
      container_definitions = {
        # the keys of the map are DNS name for the containers
        container1 = {
          image               = "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.19.0"
          model_data_location = "s3://demo-bucket/demo-model.tar.gz"
        }
      }
    }
  }

  # manages multiple endpoints
  endpoints = {
    # the keys of the map are endpoint names
    realtime-endpoint = {
      provisioned = {
        production_variants = {
          # must refer to models created by this module
          demo-model = {
            instance_type  = "ml.m5.large"

            auto_scaling = {
              policies = {
                # the keys of the map are policy names
                builtin-policy            = { expression = "SageMakerVariantInvocationsPerInstance = 1000" }
                keep-invocations-near-100 = { expression = "Invocations average = 100" }
              }
            }

            cloudwatch_alarms = {
              # the keys of the map are alarm names
              invocations-greater-than-1000         = { expression = "Invocations average > 1000" }
              invocation-5xx-errors-greater-than-10 = { expression = "Invocation5XXErrors average >= 10" }
            }
          }
        }
      }
    }
  }
}

Argument Reference

Mandatory

  • (map(object)) models [since v1.0.0]

    Deploy multiple models. Please see example

    • (map(object)) container_definitions [since v1.0.0]

      Container images containing inference code that are used when the model is deployed for predictions.

      • (string) image [since v1.0.0]

        The registry path where the inference code image is stored in Amazon ECR

      • (string) compression_type = "CompressedModel" [since v1.0.0]

        Specify the model compression type. Valid values: "CompressedModel", "UncompressedModel"

      • (map(string)) environment_variables = {} [since v1.0.0]

        Environment variables for the container

      • (string) model_data_location = null [since v1.0.0]

        The URL where model artifacts are stored in S3

      • (object) use_multiple_models = null [since v1.0.0]

        Configure this container to host multiple models

        • (bool) enable_model_caching = true [since v1.0.0]

          Whether to cache models for a multi-model endpoint. By default, multi-model endpoints cache models so that a model does not have to be loaded into memory each time it is invoked. Some use cases do not benefit from model caching. For example, if an endpoint hosts a large number of models that are each invoked infrequently, the endpoint might perform better if you disable model caching.

    • (string) iam_role_arn [since v1.0.0]

      A role that SageMaker AI can assume to access model artifacts and docker images for deployment

    • (map(string)) additional_tags = {} [since v1.0.0]

      Additional tags for the model

    • (bool) enable_network_isolation = false [since v1.0.0]

      If enabled, containers cannot make any outbound network calls.

    • (object) inference_execution_config = {} [since v1.0.0]

      Specifies details of how containers in a multi-container endpoint are called.

      • (string) mode = "Serial" [since v1.0.0]

        How containers in a multi-container are run. Valid values: "Serial" - Containers run as a serial pipeline. "Direct" - Only the individual container that you specify is run.

    • (object) vpc_config = null [since v1.0.0]

      Specifies the VPC that you want your model to connect to. This is used in hosting services and in batch transform.

      • (list(string)) security_group_ids [since v1.0.0]

        List of security group IDs the models use to access private resources

      • (list(string)) subnet_ids [since v1.0.0]

        List of subnet IDs to be used for this VPC connection

Optional

  • (map(string)) additional_tags_all = {} [since v1.0.0]

    Additional tags for all resources deployed with this module

  • (map(object)) endpoints = {} [since v1.0.0]

    Configures multiple endpoints

    • (map(string)) additional_tags = {} [since v1.0.0]

      Additional tags for the endpoint

    • (string) encryption_key = null [since v1.0.0]

      Specify an existing KMS key's ARN to encrypt your response output in S3.

    • (object) provisioned = null [since v1.0.0]

      Creates a provisioned endpoint, mutually exclusive to serverless. Must specify one of provisioned or serverless

      • (map(object)) production_variants [since v1.0.0]

        Configure multiple production variants, one for each model that you want to host at this endpoint.

        • (string) instance_type [since v1.0.0]

          The EC2 instance type

        • (object) auto_scaling = null [since v1.0.0]

          Enables auto scaling

          • (map(object)) policies [since v1.0.0]

            Manages multiple auto scaling policies

            • (string) expression [since v1.0.0]

              The expression in <metric_name> <statistic> = <TargetValue> format. For example: "Invocations average = 100". If using a predefined metric such as SageMakerVariantInvocationsPerInstance, you can omit <statistic> from the expression. For example: "SageMakerVariantInvocationsPerInstance = 100"

            • (bool) enable_scale_in = true [since v1.0.0]

              Allow this Auto Scaling policy to scale-in (removing EC2 instances).

            • (string) scale_in_cooldown_period = "5 minutes" [since v1.0.0]

              Specify the number of seconds to wait between scale-in actions.

            • (string) scale_out_cooldown_period = "5 minutes" [since v1.0.0]

              Specify the number of seconds to wait between scale-out actions.

          • (number) maximum_capacity = 1 [since v1.0.0]

            Specify the maximum number of EC2 instances to maintain.

          • (number) minimum_capacity = 1 [since v1.0.0]

            Specify the minimum number of EC2 instances to maintain.

        • (map(object)) cloudwatch_alarms = {} [since v1.0.0]

          Configures multiple Cloudwatch alarms. Please see example

          • (string) expression [since v1.0.0]

            The expression in <metric_name> <statistic> <comparison_operator> <threshold> format. For example: "Invocations average >= 100"

          • (string) description = null [since v1.0.0]

            The description of the alarm

          • (number) evaluation_periods = 1 [since v1.0.0]

            The number of periods over which data is compared to the specified threshold.

          • (string) notification_sns_topic = null [since v1.0.0]

            The SNS topic where notification will be sent

          • (string) period = "1 minute" [since v1.0.0]

            The period over which the specified statistic is applied. Valid values: "1 minute" - "6 hours"

        • (string) container_startup_timeout = null [since v1.0.0]

          The timeout value for the inference container to pass health check by SageMaker AI Hosting. Valid values: "1 minute" - "1 hour".

        • (number) initial_instance_count = 1 [since v1.0.0]

          Specify the initial number of instances used for auto-scaling.

        • (number) initial_weight = 1 [since v1.0.0]

          Determines initial traffic distribution among all of the models that you specify in the endpoint configuration.

        • (string) model_data_download_timeout = null [since v1.0.0]

          The timeout value to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant. Valid values: "1 minute" - "1 hour".

        • (number) volume_size = null [since v1.0.0]

          The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Valid values: 1 - 512.

      • (object) async_invocation_config = null [since v1.0.0]

        Specifies configuration for how an endpoint performs asynchronous inference

        • (string) s3_output_path [since v1.0.0]

          Location to upload response output on success. Must be an S3 url(s3 path)

        • (string) encryption_key = null [since v1.0.0]

          Specify an existing KMS key's ARN to encrypt your response output in S3.

        • (string) error_notification_location = null [since v1.0.0]

          SNS topic to post a notification when inference fails. If no topic is provided, no notification is sent

        • (number) max_concurrent_invocations_per_instance = null [since v1.0.0]

          The maximum number concurrent requests sent to model container. If no value is provided, SageMaker chooses an optimal value.

        • (string) s3_failure_path = null [since v1.0.0]

          Location to upload response output on failure. Must be an S3 url (s3 path).

        • (string) success_notification_location = null [since v1.0.0]

          SNS topic to post a notification when inference completes successfully. If no topic is provided, no notification is sent

      • (object) enable_data_capture = null [since v1.0.0]

        Enables data capture, where SageMaker can save prediction request and prediction response information from your endpoint to a specified location

        • (string) s3_location_to_store_data_collected [since v1.0.0]

          Amazon SageMaker will save the prediction requests and responses along with metadata for your endpoint at this location.

        • (object) capture_content_type = null [since v1.0.0]

          The content type headers to capture. Must specify one of csv_text or json

          • (list(string)) csv_text = null [since v1.0.0]

            The CSV content type headers to capture.

          • (list(string)) json = null [since v1.0.0]

            The JSON content type headers to capture.

        • (object) data_capture_options = {} [since v1.0.0]

          Specifies what data to capture.

          • (bool) prediction_request = true [since v1.0.0]

            Capture prediction requests (Input)

          • (bool) prediction_response = true [since v1.0.0]

            Capture prediction responses (Output)

        • (number) sampling_percentage = 30 [since v1.0.0]

          Amazon SageMaker will randomly sample and save the specified percentage of traffic to your endpoint.

      • (map(object)) shadow_variants = {} [since v1.0.0]

        Specify shadow variants to receive production traffic replicated from the model specified on production_variants. If you use this field, you can only specify one variant for production_variants and one variant for shadow_variants.

        • (string) instance_type [since v1.0.0]

          The EC2 instance type

        • (string) container_startup_timeout = null [since v1.0.0]

          The timeout value for the inference container to pass health check by SageMaker AI Hosting. Valid values: "1 minute" - "1 hour".

        • (number) initial_instance_count = 1 [since v1.0.0]

          Specify the initial number of instances used for auto-scaling.

        • (number) initial_weight = 1 [since v1.0.0]

          Determines initial traffic distribution among all of the models that you specify in the endpoint configuration.

        • (string) model_data_download_timeout = null [since v1.0.0]

          The timeout value to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant. Valid values: "1 minute" - "1 hour".

        • (number) volume_size = null [since v1.0.0]

          The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Valid values: 1 - 512.

    • (object) serverless = null [since v1.0.0]

      Creates a serverless endpoint, mutually exclusive to provisioned. Must specify one of provisioned or serverless

      • (object) variant [since v1.0.0]

        Configures variant for this endpoint

        • (string) model_name [since v1.0.0]

          The name of the model to be used for this endpoint. The model specified must be managed by the same module

        • (number) max_concurrency = 20 [since v1.0.0]

          The maximum number of concurrent invocations your serverless endpoint can process. Valid values: 1 - 200

        • (number) memory_size = 1024 [since v1.0.0]

          The memory size of your serverless endpoint. Valid values: 1024, 2048, 3072, 4096, 5120, 6144.

        • (number) provisioned_concurrency = null [since v1.0.0]

          Provisioned concurrency enables you to deploy models on serverless endpoints with predictable performance and high scalability. For the set number of concurrent invocations, SageMaker will keep underlying compute warm and ready to respond instantaneously without cold starts. Must be <= max_concurrency

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages