How can I write a Python script by using Boto3 for implementing a SQL query on the Athena?
I am currently working on a data analyst project in which I need to analyze large amounts of data stored in Amazon S3 by using Amazon Athena via Boto3. How can I programmatically write a Python script by using the Boto3 to implement a SQL query on the Athena and retrieve the results for further processing?
In the context of AWS, here is the example given by Python programming language script by using the boto 3 for implementing a SQL query on Amazon Athena and retrieving the results:-
Import boto3
# Initialize Athena client
Athena_client = boto3.client(‘athena’)
# Define the SQL query
Query = “SELECT * FROM your_table_name LIMIT 100”
# Execute the query
Response = athena_client.start_query_execution(
QueryString=query,
QueryExecutionContext={
‘Database’: ‘your_database_name’
},
ResultConfiguration={
‘OutputLocation’: ‘s3://your-bucket-name/path/to/query/results/’
}
)
# Get the QueryExecutionId
Query_execution_id = response[‘QueryExecutionId’]
# Wait for query to complete
Query_status = None
While query_status not in [‘SUCCEEDED’, ‘FAILED’, ‘CANCELLED’]:
Query_status_response = athena_client.get_query_execution(
QueryExecutionId=query_execution_id
)
Query_status = query_status_response[‘QueryExecution’][‘Status’][‘State’]
# Check if query execution was successful
If query_status == ‘SUCCEEDED’:
# Get the query results
Results_response = athena_client.get_query_results(
QueryExecutionId=query_execution_id
)
# Extract and process the results
Columns = [col[‘Name’] for col in results_response[‘ResultSet’][‘ResultSetMetadata’][‘ColumnInfo’]]
Rows = [tuple(row[‘Data’]) for row in results_response[‘ResultSet’][‘Rows’][1:]] # Skip the header row
Print(“Columns:”, columns)
Print(“Rows:”, rows)
Else: Print(“Query execution failed or was cancelled.”)