Newest 'aws-glue' Questions

-1 votes

0 answers

43 views

AWS Athena cannot scan the Catalog database created by Terraform but can scan the manual created one

I created Terraform script to create Glue Crawlers and Catalog database. Crawlers crawl data from S3 bucket, the objects are in JSON and partitioned by dates. For example, s3://something/dev/raw/year=...

Anh

1

asked Jan 31 at 2:15

0 votes

0 answers

56 views

Spark JDBC insert: 1M rows fast, 2M rows extremely slow, 13M rows fast — same code

I am running a Spark job (AWS Glue / Spark 3.x) that writes data to PostgreSQL via JDBC. What confuses me is that performance is highly non-linear: - 1M rows → ~3 minutes - 2M rows → > 1 hour (...

Kunanya Khuntiptong

1

asked Jan 17 at 16:17

0 votes

0 answers

45 views

Can't SELECT anything in a AWS Glue Data Catalog view due to invalid view text: <REDACTED VIEW TEXT>

i created a glue view through a glue job like this: CREATE OR REPLACE PROTECTED MULTI DIALECT VIEW risk_models_output.vw_behavior_special_limit_score SECURITY DEFINER AS [query ...

Paloma Raissa

1

asked Jan 14 at 20:37

0 votes

0 answers

30 views

msck repair table sync partitions fails

I have a pyspark job that write dataframe to s3 with partitions. the partition value is string. in my pyspark script, I have the line: spark.sql("MSCK REPAIR TABLE table_name SYNC PARTITIONS"...

Dozel

169

asked Jan 13 at 19:47

0 votes

1 answer

58 views

AWS Glue Connection for BigQuery gives "SparkProperties is missing but it is required" and "secretId is not defined in the schema" when using Go SDK

I'm trying to programmatically create a native Google BigQuery connection in AWS Glue using the AWS SDK for Go v2 (github.com/aws/aws-sdk-go-v2/service/glue). According to the AWS docs (Glue 4.0+ ...

Shubham Dixit

9,268

asked Jan 8 at 12:11

1 vote

1 answer

86 views

AWS Glue PySpark job taking 4 hours to process small JSON files from S3

I have an AWS Glue job that processes thousands of small JSON files from S3 (historical data load for Adobe Experience Platform). The job is taking approximately 4 hours to complete, which is ...

Jayron Soares

461

asked Dec 20, 2025 at 12:08

0 votes

0 answers

77 views

Glue Job connection with SQL Server hosted on EC2

I have created a Glue JDBC Connection for my SQL Server running on EC2. I tested the connection with Visual ETL in the following way: Used SQL Server as source Selected my SQL Server connection in ...

Shivkumar Mallesappa

3,117

asked Dec 20, 2025 at 8:18

0 votes

1 answer

75 views

"Max concurrent runs exceeded" in AWS Glue job with job run queuing

I am having a problem with my architecture on AWS and I need help because I do not understand the behavior I am witnessing. In short, I have a bucket in S3 where CSV files are sometimes placed. Each ...

Raul Almuzara

1

asked Dec 18, 2025 at 8:37

0 votes

1 answer

69 views

Iceberg field-id values - can I specify my own when creating a table?

I'm using AWS Glue Data Catalog to store Apache Iceberg tables. I use the Iceberg Java SDK to define the tables there. When I create an Iceberg table, I provide field-id values associated with each ...

pedorro

3,419

asked Dec 16, 2025 at 21:24

1 vote

1 answer

69 views

AWS Glue Script Scanning Entire Table Despite Date Filter

I have written a small Glue script that fetches some data between two dates, but I found that it scans the entire table instead of just the data within the specified time range. I also tried creating ...

new coderrrr

11

asked Nov 13, 2025 at 18:00

Advice

0 votes

0 replies

39 views

Applying a Single AWS Glue Data Quality Ruleset to Multiple Glue Jobs with Dynamic Column Input

Team, We are implementing a new requirement to integrate Data Quality (DQ) rules within AWS Glue Studio. We have successfully created DQ rules using the DQDL builder, leveraging built-in rulesets, and ...

Prainika

15

asked Nov 13, 2025 at 3:35

0 votes

0 answers

126 views

How to do bucket logic in partition for Iceberg Table using AWS Glue?

# ===================================================== # 🧊 Step 4. Write Data to Iceberg Table (Glue Catalog) # ===================================================== table_name = "glue_catalog....

Mohammed Suhail

1

asked Nov 4, 2025 at 17:35

3 votes

0 answers

225 views

How to convert epoch to datetime in Datadog dashboard?

I have a Datadog dashboard displaying the metrics we get for our AWS Glue Zero-ETL integrations. One of those is lastSyncTimestamp, the epoch timestamp until which source has been synced to target. I ...

dan

31

asked Nov 1, 2025 at 19:16

0 votes

0 answers

55 views

Is it possible to update script section for AWS Glue ETL or Glue streaming Jobs using AWS CLI?

Version my python script for each change and push to S3 with new version aws s3 cp aws_glue_script_v1.0.3_1.py s3://mytestcicdglue/glue-scripts/aws_glue_script_v1.0.3_1.py I have skeleton json of ...

aparnagottumukkala

3

asked Oct 15, 2025 at 8:51

0 votes

0 answers

32 views

AWS Glue Pandas UDF with fhir.resources validation is 10× slower — how to reduce runtime using iterator UDFs or Arrow batch tuning?

I have an AWS Glue 5.0 job (Spark 3.x, Python 3.x) that transforms Aurora PostgreSQL data into FHIR NDJSON. With native PySpark transformations only: ~3–4 minutes for 320k rows. With a Pandas UDF that ...

Achyuth Ainala

1

asked Sep 4, 2025 at 15:19

Collectives™ on Stack Overflow

AWS Athena cannot scan the Catalog database created by Terraform but can scan the manual created one

Spark JDBC insert: 1M rows fast, 2M rows extremely slow, 13M rows fast — same code

Can't SELECT anything in a AWS Glue Data Catalog view due to invalid view text: <REDACTED VIEW TEXT>

msck repair table sync partitions fails

AWS Glue Connection for BigQuery gives "SparkProperties is missing but it is required" and "secretId is not defined in the schema" when using Go SDK

AWS Glue PySpark job taking 4 hours to process small JSON files from S3

Glue Job connection with SQL Server hosted on EC2

"Max concurrent runs exceeded" in AWS Glue job with job run queuing

Iceberg field-id values - can I specify my own when creating a table?

AWS Glue Script Scanning Entire Table Despite Date Filter

Applying a Single AWS Glue Data Quality Ruleset to Multiple Glue Jobs with Dynamic Column Input

How to do bucket logic in partition for Iceberg Table using AWS Glue?

How to convert epoch to datetime in Datadog dashboard?

Is it possible to update script section for AWS Glue ETL or Glue streaming Jobs using AWS CLI?

AWS Glue Pandas UDF with fhir.resources validation is 10× slower — how to reduce runtime using iterator UDFs or Arrow batch tuning?

Hot Network Questions