加载中...
Skip to main content
Filter by
Sorted by
Tagged with
-1 votes
0 answers
43 views

I created Terraform script to create Glue Crawlers and Catalog database. Crawlers crawl data from S3 bucket, the objects are in JSON and partitioned by dates. For example, s3://something/dev/raw/year=...
Anh's user avatar
  • 1
0 votes
0 answers
56 views

I am running a Spark job (AWS Glue / Spark 3.x) that writes data to PostgreSQL via JDBC. What confuses me is that performance is highly non-linear: - 1M rows → ~3 minutes - 2M rows → > 1 hour (...
Kunanya Khuntiptong's user avatar
0 votes
0 answers
45 views

i created a glue view through a glue job like this: CREATE OR REPLACE PROTECTED MULTI DIALECT VIEW risk_models_output.vw_behavior_special_limit_score SECURITY DEFINER AS [query ...
Paloma Raissa's user avatar
0 votes
0 answers
30 views

I have a pyspark job that write dataframe to s3 with partitions. the partition value is string. in my pyspark script, I have the line: spark.sql("MSCK REPAIR TABLE table_name SYNC PARTITIONS"...
Dozel's user avatar
  • 169
0 votes
1 answer
58 views

I'm trying to programmatically create a native Google BigQuery connection in AWS Glue using the AWS SDK for Go v2 (github.com/aws/aws-sdk-go-v2/service/glue). According to the AWS docs (Glue 4.0+ ...
Shubham Dixit's user avatar
1 vote
1 answer
86 views

I have an AWS Glue job that processes thousands of small JSON files from S3 (historical data load for Adobe Experience Platform). The job is taking approximately 4 hours to complete, which is ...
Jayron Soares's user avatar
0 votes
0 answers
77 views

I have created a Glue JDBC Connection for my SQL Server running on EC2. I tested the connection with Visual ETL in the following way: Used SQL Server as source Selected my SQL Server connection in ...
Shivkumar Mallesappa's user avatar
0 votes
1 answer
75 views

I am having a problem with my architecture on AWS and I need help because I do not understand the behavior I am witnessing. In short, I have a bucket in S3 where CSV files are sometimes placed. Each ...
Raul Almuzara's user avatar
0 votes
1 answer
69 views

I'm using AWS Glue Data Catalog to store Apache Iceberg tables. I use the Iceberg Java SDK to define the tables there. When I create an Iceberg table, I provide field-id values associated with each ...
pedorro's user avatar
  • 3,419
1 vote
1 answer
69 views

I have written a small Glue script that fetches some data between two dates, but I found that it scans the entire table instead of just the data within the specified time range. I also tried creating ...
new coderrrr's user avatar
Advice
0 votes
0 replies
39 views

Team, We are implementing a new requirement to integrate Data Quality (DQ) rules within AWS Glue Studio. We have successfully created DQ rules using the DQDL builder, leveraging built-in rulesets, and ...
Prainika's user avatar
0 votes
0 answers
126 views

# ===================================================== # 🧊 Step 4. Write Data to Iceberg Table (Glue Catalog) # ===================================================== table_name = "glue_catalog....
Mohammed Suhail's user avatar
3 votes
0 answers
225 views

I have a Datadog dashboard displaying the metrics we get for our AWS Glue Zero-ETL integrations. One of those is lastSyncTimestamp, the epoch timestamp until which source has been synced to target. I ...
dan's user avatar
  • 31
0 votes
0 answers
55 views

Version my python script for each change and push to S3 with new version aws s3 cp aws_glue_script_v1.0.3_1.py s3://mytestcicdglue/glue-scripts/aws_glue_script_v1.0.3_1.py I have skeleton json of ...
aparnagottumukkala's user avatar
0 votes
0 answers
32 views

I have an AWS Glue 5.0 job (Spark 3.x, Python 3.x) that transforms Aurora PostgreSQL data into FHIR NDJSON. With native PySpark transformations only: ~3–4 minutes for 320k rows. With a Pandas UDF that ...
Achyuth Ainala's user avatar

15 30 50 per page
1
2 3 4 5
325