Question
Encountering ValueError upon joining two pandas dataframes on a datetime index column
I have two tables which I need to join on a date column. I want to preserve all the dates in both tables, with the empty rows in each table just being filled with NaN
s in the final combined array. I think an outer join is what I'm looking for. So I've written this code (with data_1 and data_2 acting as mockups of my actual tables)
import pandas as pd
def main():
data_1 = [["May-2024", 10, 5], ["June-2024", 3, 5], ["April-2015", 1, 3]]
df1 = pd.DataFrame(data_1, columns = ["Date", "A", "B"])
df1["Date"] = pd.to_datetime(df1["Date"], format="%B-%Y")
print(df1)
data_2 = [["01-11-2024", 10, 5], ["01-06-2024", 3, 5], ["01-11-2015", 1, 3]]
df2 = pd.DataFrame(data_2, columns = ["Date", "C", "D"])
df2["Date"] = pd.to_datetime(df2["Date"], format="%d-%m-%Y")
print(df2)
merged = df1.join(df2, how="outer", on=["Date"])
print(merged)
if __name__ == "__main__":
main()
But when I try and perform an outer join on two pandas dataframes, I get the error
ValueError: You are trying to merge on object and int64 columns for key 'Date'. If you wish to proceed you should use pd.concat
I checked the datatype of both columns by printing
print(df1["Date"].dtype, df2["Date"].dtype)
and they both seem to be
datetime64[ns] datetime64[ns]
datetimes. So I'm not quite sure why I'm getting a ValueError
Any help is appreciated, thanks.
3 36
3