How to search pubmed entrez API with Python and filter results by metadata

If you want to apply more filters to search results of Pubmed than given in their web interface, you might want to use the entrez API.

The following example shows how you can sort alphabetically by the journal the articles originally appeared in.

I would recommend processing the data in the .json format.

import requests
import json

db = 'pubmed'
domain = 'https://www.ncbi.nlm.nih.gov/entrez/eutils'
nresults = 10
query = "depression"

# standard query
queryLinkSearch = f'{domain}/esearch.fcgi?db={db}&retmax={nresults}&retmode={retmode}&term={query}'
response = requests.get(queryLinkSearch)
pubmedJson = response.json()

results = []

for paperId in pubmedJson["esearchresult"]["idlist"]:
    # metadata query
    queryLinkSummary = f'{domain}/esummary.fcgi?db={db}&id={paperId}&retmode={retmode}'
    results.append({'paperId': paperId, 'metadata': requests.get(queryLinkSummary).json()})
    # check the journalnames 
    # print(results[-1]["metadata"]["result"][paperId]["fulljournalname"])

resultsSorted = sorted(results, key=lambda x: x["metadata"]["result"][x["paperId"]]["fulljournalname"])

with open('resultsSorted.json', 'w') as f:
    json.dump(resultsSorted, f)
Posted by Joshua Simon in APIs, Python