Question

How to format string date for AWS glue crawler/data frame to correctly identify as date field?

I have some json data (sample below). aws glue crawler reads this data and creates a glue catalog database with table , and sets the date field as a string field . is there a way , i can format date in my json file such that crawler can identify this as a date field ? I plan to read this data into dynamic frame via aws glue etl and push it to a sql database , where I want to save it as a date field , so that it is easy to query and do comparisons on the date field. example of script below.

can i convert the string date field to rds date field in spark data frame?

myscript.py

data=gluecontext.create_dynamic_frame.from_catalog(database="sample", table_name="table" ...

data_frame=data.toDF()

//convert the string field to date field in the spark data frame
{"id": "abc", .... date="2024-07-09"}
...
 3  51  3
1 Jan 1970

Solution

 3

You can use to_date to convert the string field to the date field in the spark dataframe as follows:

from pyspark.sql.functions import to_date

data=gluecontext.create_dynamic_frame.from_catalog(database="sample", table_name="table")
data_frame = data.toDF()

# convert the string field to the date field in the spark data frame
data_frame = data_frame.withColumn("date", to_date("date", "yyyy-MM-dd"))
2024-07-11
Vikas Sharma