4,863 questions
-1
votes
0
answers
43
views
AWS Athena cannot scan the Catalog database created by Terraform but can scan the manual created one
I created Terraform script to create Glue Crawlers and Catalog database. Crawlers crawl data from S3 bucket, the objects are in JSON and partitioned by dates. For example, s3://something/dev/raw/year=...
0
votes
0
answers
56
views
Spark JDBC insert: 1M rows fast, 2M rows extremely slow, 13M rows fast — same code
I am running a Spark job (AWS Glue / Spark 3.x) that writes data to PostgreSQL via JDBC.
What confuses me is that performance is highly non-linear:
- 1M rows → ~3 minutes
- 2M rows → > 1 hour (...
0
votes
0
answers
45
views
Can't SELECT anything in a AWS Glue Data Catalog view due to invalid view text: <REDACTED VIEW TEXT>
i created a glue view through a glue job like this:
CREATE OR REPLACE PROTECTED MULTI DIALECT VIEW risk_models_output.vw_behavior_special_limit_score
SECURITY DEFINER AS
[query ...
0
votes
0
answers
30
views
msck repair table sync partitions fails
I have a pyspark job that write dataframe to s3 with partitions. the partition value is string. in my pyspark script, I have the line:
spark.sql("MSCK REPAIR TABLE table_name SYNC PARTITIONS"...
0
votes
1
answer
58
views
AWS Glue Connection for BigQuery gives "SparkProperties is missing but it is required" and "secretId is not defined in the schema" when using Go SDK
I'm trying to programmatically create a native Google BigQuery connection in AWS Glue using the AWS SDK for Go v2 (github.com/aws/aws-sdk-go-v2/service/glue).
According to the AWS docs (Glue 4.0+ ...
1
vote
1
answer
86
views
AWS Glue PySpark job taking 4 hours to process small JSON files from S3
I have an AWS Glue job that processes thousands of small JSON files from S3 (historical data load for Adobe Experience Platform). The job is taking approximately 4 hours to complete, which is ...
0
votes
0
answers
77
views
Glue Job connection with SQL Server hosted on EC2
I have created a Glue JDBC Connection for my SQL Server running on EC2.
I tested the connection with Visual ETL in the following way:
Used SQL Server as source
Selected my SQL Server connection in ...
0
votes
1
answer
75
views
"Max concurrent runs exceeded" in AWS Glue job with job run queuing
I am having a problem with my architecture on AWS and I need help because I do not understand the behavior I am witnessing.
In short, I have a bucket in S3 where CSV files are sometimes placed. Each ...
0
votes
1
answer
69
views
Iceberg field-id values - can I specify my own when creating a table?
I'm using AWS Glue Data Catalog to store Apache Iceberg tables. I use the Iceberg Java SDK to define the tables there. When I create an Iceberg table, I provide field-id values associated with each ...
1
vote
1
answer
69
views
AWS Glue Script Scanning Entire Table Despite Date Filter
I have written a small Glue script that fetches some data between two dates, but I found that it scans the entire table instead of just the data within the specified time range. I also tried creating ...
Advice
0
votes
0
replies
39
views
Applying a Single AWS Glue Data Quality Ruleset to Multiple Glue Jobs with Dynamic Column Input
Team,
We are implementing a new requirement to integrate Data Quality (DQ) rules within AWS Glue Studio. We have successfully created DQ rules using the DQDL builder, leveraging built-in rulesets, and ...
0
votes
0
answers
126
views
How to do bucket logic in partition for Iceberg Table using AWS Glue?
# =====================================================
# 🧊 Step 4. Write Data to Iceberg Table (Glue Catalog)
# =====================================================
table_name = "glue_catalog....
3
votes
0
answers
225
views
How to convert epoch to datetime in Datadog dashboard?
I have a Datadog dashboard displaying the metrics we get for our AWS Glue Zero-ETL integrations. One of those is lastSyncTimestamp, the epoch timestamp until which source has been synced to target.
I ...
0
votes
0
answers
55
views
Is it possible to update script section for AWS Glue ETL or Glue streaming Jobs using AWS CLI?
Version my python script for each change and push to S3 with new version
aws s3 cp aws_glue_script_v1.0.3_1.py s3://mytestcicdglue/glue-scripts/aws_glue_script_v1.0.3_1.py
I have skeleton json of ...
0
votes
0
answers
32
views
AWS Glue Pandas UDF with fhir.resources validation is 10× slower — how to reduce runtime using iterator UDFs or Arrow batch tuning?
I have an AWS Glue 5.0 job (Spark 3.x, Python 3.x) that transforms Aurora PostgreSQL data into FHIR NDJSON.
With native PySpark transformations only: ~3–4 minutes for 320k rows.
With a Pandas UDF that ...