Question

How do I add a location filter when using the LinkedIn Voyager API?

I'm trying to scrape LinkedIn job postings using the Voyager API, adapting code from here. Here's the relevant portion of the code:

class JobSearchRetriever:
    def __init__(self):
        self.job_search_link = 'https://www.linkedin.com/voyager/api/voyagerJobsDashJobCards?decorationId=com.linkedin.voyager.dash.deco.jobs.search.JobSearchCardsCollection-187&count=100&q=jobSearch&query=(origin:JOB_SEARCH_PAGE_OTHER_ENTRY,selectedFilters:(sortBy:List(DD)),spellCorrectionEnabled:true)&start=0'
        emails, passwords = get_logins('search')
        self.sessions = [create_session(email, password) for email, password in zip(emails, passwords)]
        self.session_index = 0
        self.headers = [{
            'Authority': 'www.linkedin.com',
            'Method': 'GET',
            'Path': 'voyager/api/voyagerJobsDashJobCards?decorationId=com.linkedin.voyager.dash.deco.jobs.search.JobSearchCardsCollection-187&count=25&q=jobSearch&query=(origin:JOB_SEARCH_PAGE_OTHER_ENTRY,selectedFilters:(sortBy:List(DD)),spellCorrectionEnabled:true)&start=0',
            'Scheme': 'https',
            'Accept': 'application/vnd.linkedin.normalized+json+2.1',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9',
            'Cookie': "; ".join([f"{key}={value}" for key, value in session.cookies.items()]),
            'Csrf-Token': session.cookies.get('JSESSIONID').strip('"'),
            # 'TE': 'Trailers',
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36',
            # 'X-Li-Track': '{"clientVersion":"1.12.7990","mpVersion":"1.12.7990","osName":"web","timezoneOffset":-7,"timezone":"America/Los_Angeles","deviceFormFactor":"DESKTOP","mpName":"voyager-web","displayDensity":1,"displayWidth":1920,"displayHeight":1080}'
            'X-Li-Track': '{"clientVersion":"1.13.5589","mpVersion":"1.13.5589","osName":"web","timezoneOffset":-7,"timezone":"America/Los_Angeles","deviceFormFactor":"DESKTOP","mpName":"voyager-web","displayDensity":1,"displayWidth":360,"displayHeight":800}'
        } for session in self.sessions]

I want to filter the job postings to only those from a specific state - let's say, Michigan. Performing a search myself on the LinkedIn website, I find (from the URL) that the GeoID for Michigan is 103051080. I've tried editing the job_search_link in various ways, for example by adding geoUrn:List(103051080) to the query, but I'm still getting posts from all over the US.

How do I edit the query to only get posts from a specific location? I haven't found specific documentation; this guide to locations in the Profile API looks relevant, but I'm still not sure how to incorporate that into the query.

 3  139  3
1 Jan 1970

Solution

 1

You are on the right track, locationUnion:(geoId:103051080) is what you're looking for, job_search_link should be:

https://www.linkedin.com/voyager/api/voyagerJobsDashJobCards?decorationId=com.linkedin.voyager.dash.deco.jobs.search.JobSearchCardsCollection-187&count=100&q=jobSearch&query=(origin:JOB_SEARCH_PAGE_OTHER_ENTRY,selectedFilters:(sortBy:List(DD),locationUnion:(geoId:103051080)),spellCorrectionEnabled:true)&start=0

All you need to scrape that api is two cookies; JSESSIONID & li_at. You can either use the same method as the library you adapted the code from; which is to use selenium to login and get the cookies, or you can copy the cookies from the browser if you are already logged in. (Inspect[F12] > Application/Storage > Cookies)

Here is how to scrape the api with requests:

import requests

keywords = 'Python' # search keyword
geo_id = '103051080' # Michigan geoId

params = {
  "decorationId": "com.linkedin.voyager.dash.deco.jobs.search.JobSearchCardsCollection-210",
  "q": "jobSearch",
  "query": f"(origin:JOB_SEARCH_PAGE_SEARCH_BUTTON,keywords:{keywords},locationUnion:(geoId:{geo_id}))",
  "count": 100,
  "start": 0
}

# manually converting params to string
# passing params as dict/urlencode(dict) will not work
params = '&'.join(f'{k}={v}' for k, v in params.items())

# required cookies
# copy from browser or use selenium (login)
cookies = {
   'JSESSIONID': 'ajax:00000000000000000',
   'li_at': 'XXXXXXXXXXXXXXXXXXXX',
}

headers = {
  'accept': 'application/vnd.linkedin.normalized+json+2.1',
  'csrf-token': cookies['JSESSIONID'],
  'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36'
}


url = "https://www.linkedin.com/voyager/api/voyagerJobsDashJobCards"
response = requests.get(url, headers=headers, params=params, cookies=cookies)
print(response.text)

Note: If you don't want to search a specific keyword, remove keywords:{keywords} from params['query'], so query becomes:

"query": f"(origin:JOB_SEARCH_PAGE_SEARCH_BUTTON,locationUnion:(geoId:{geo_id}))"
2024-07-12
GTK