What is the difference between AWS glue vs lambda?

1.0K Asked by CsabaToth in AWS , Asked on Mar 29, 2024

I am currently engaged with a Task which is related to designing a data processing pipeline for a large e-commerce platform. The pipeline includes the extraction, transforming, and loading of data from various sources into a centralized data warehouse. Discuss the appropriate approach when I should choose AWS glue over AWS lambda or vice versa.

Answered by Deepa bhawana

In the context of AWS, here is the difference given between both:-

AWS glue

You can use the AWS glue when you are dealing with complex ETL workflow which requires automated schema discovery, data cataloging, and orchestrating.

The AWS glue is very suitable for large-scale data processing tasks where data transformation includes multiple steps and data sources.

Here is the example given in the form of coding for the AWS glue ETL job:-

From awsglue.context import GlueContext

From pyspark.sql import SparkSession

# Initialize Spark session and Glue context

Spark = SparkSession.builder.appName(“GlueETLJob”).getOrCreate()

glueContext = GlueContext(spark)

# Define source and target data paths

Source_path = “s3://my-source-bucket/input-data/”

Target_path = “s3://my-target-bucket/output-data/”

# Read data from source

Source_df = glueContext.create_dynamic_frame.from_options(

    Connection_type=”s3”,

    Connection_options={“paths”: [source_path]},

    Format=”json”

).toDF()

# Perform data transformations

Transformed_df = source_df.withColumn(“new_column”, source_df[“existing_column”] + 1)

# Write transformed data to target

Transformed_df.write.mode(“overwrite”).format(“parquet”).save(target_path)

AWS Lambda

You can opt-out of the AWS lambda for lightweight and event-driven processing tasks which would require quick implementation and scalability based on the demand. You can use lambda for processing individual records, triggering action based on the events, and integrating with the other AWS services seamlessly.

Here is the example given in the form of coding for the AWS lambda function in Python:-

Import boto3

Def lambda_handler(event, context):

    # Extract data from the event

    Input_data = event[“Records”][0][“s3”][“object”][“key”]

    # Process data (e.g., perform calculations, format data)

    Processed_data = process_data(input_data)

    # Save processed data to the target location

    Save_to_s3(processed_data, “my-target-bucket/output-data/processed-data.txt”)

Def process_data(input_data):

    # Placeholder for data processing logic

    Processed_data = input_data.upper()  # Example: Convert data to uppercase

    Return processed_data

Def save_to_s3(data, target_path):

    S3 = boto3.client(“s3”)

    S3.put_object(Body=data, Bucket=”my-target-bucket”, Key=target_path)

Here is the combination of the example given which would integrate both AWS glue and AWS lambda for a data processing pipeline:-

From pyspark.context import SparkContext

From awsglue.context import GlueContext

From pyspark.sql import SparkSession

# Initialize Spark context and Glue context

Sc = SparkContext()

glueContext = GlueContext(sc)

spark = glueContext.spark_session

# Define source and target data paths

Source_path = “s3://my-source-bucket/input-data/”

Target_path = “s3://my-target-bucket/output-data/”

# Read data from source

Source_df = glueContext.create_dynamic_frame.from_options(

    Connection_type=”s3”,

    Connection_options={“paths”: [source_path]},

    Format=”json”

).toDF()

# Perform data transformations

Transformed_df = source_df.withColumn(“new_column”, source_df[“existing_column”] + 1)

# Write transformed data to target

Transformed_df.write.mode(“overwrite”).format(“parquet”).save(target_path)

What is the difference between AWS glue vs lambda?

Your Answer