Question

How do I get variable length slices of values using Pandas?

I have data that includes a full name and first name, and I need to make a new column with the last name. I can assume full - first = last.

I've been trying to use slice with an index the length of the first name + 1. But that index is a series, not an integer. So it's returning NaN.

The commented lines show the things I tried. It took me a while to realize what the series/integer issue was. It seems this shouldn't be so difficult.

Thanks

import pandas as pd

columns = ['Full', 'First']
data = [('Joe Smith', 'Joe'), ('Bobby Sue Ford', 'Bobby Sue'), ('Current Resident', 'Current Resident'), ('', '')]
df = pd.DataFrame(data, columns=columns)

#first_chars = df['First'].str.len() + 1

#last = df['Full'].str[4:]
#last = df['Full'].str[first_chars:]
#last = df['Full'].str.slice(first_chars)
#last = df.Full.str[first_chars:]
#pd.DataFrame.insert(df, loc=2, column='Last', value=last)

#df['Last'] = df.Full.str[first_chars:]
#df['Last'] = str(df.Full.str[first_chars:])

#first_chars = int(first_chars)
#df['Last'] = df['Full'].apply(str).apply(lambda x: x[first_chars:])
df['Last'] = df['Full'].str.slice(df['First'].str.len() + 1)

print(df)
 3  72  3
1 Jan 1970

Solution

 3

IIUC, you can do it this way using string slicing, list comprehension, and zip:

df['Last'] = [u[len(i)+1:] for u, i in zip(df['Full'], df['First'])]

Output:

               Full             First   Last
0         Joe Smith               Joe  Smith
1    Bobby Sue Ford         Bobby Sue   Ford
2  Current Resident  Current Resident       
3                                           

Details:

  • Use zip to to a list of tuples of (Full, First) pairs.
  • Create a list comprehension looping through this list.
  • For each element in the list, take the first element of the tuple, u and string slice it based on the length plus 1 of the second element in the tuple, i.
2024-07-19
Scott Boston

Solution

 2

Edit: Use removeprefix instead of replace to deal with cases where first and last names are the same:

df['Last'] = df.apply(lambda row: row['Full'].removeprefix(row['First']).strip(), axis=1)
               Full             First   Last
0         Joe Smith               Joe  Smith
1    Bobby Sue Ford         Bobby Sue   Ford
2  Current Resident  Current Resident       
3                                           
4           Joe Joe               Joe    Joe

Original answer: Use apply on axis=1 to replace each name:

df['Last'] = df.apply(lambda row: row['Full'].replace(row['First'], '').strip(), axis=1)
               Full             First   Last
0         Joe Smith               Joe  Smith
1    Bobby Sue Ford         Bobby Sue   Ford
2  Current Resident  Current Resident       
3        
2024-07-19
e-motta