How to search pubmed entrez API with Python and filter results by metadata
If you want to apply more filters to search results of Pubmed than given in their web interface, you might want to use the entrez API.
The following example shows how you can sort alphabetically by the journal the articles originally appeared in.
I would recommend processing the data in the .json format.
import requests
import json
db = 'pubmed'
domain = 'https://www.ncbi.nlm.nih.gov/entrez/eutils'
nresults = 10
query = "depression"
retmode='json'
# standard query
queryLinkSearch = f'{domain}/esearch.fcgi?db={db}&retmax={nresults}&retmode={retmode}&term={query}'
response = requests.get(queryLinkSearch)
pubmedJson = response.json()
results = []
for paperId in pubmedJson["esearchresult"]["idlist"]:
# metadata query
queryLinkSummary = f'{domain}/esummary.fcgi?db={db}&id={paperId}&retmode={retmode}'
results.append({'paperId': paperId, 'metadata': requests.get(queryLinkSummary).json()})
# check the journalnames
# print(results[-1]["metadata"]["result"][paperId]["fulljournalname"])
resultsSorted = sorted(results, key=lambda x: x["metadata"]["result"][x["paperId"]]["fulljournalname"])
with open('resultsSorted.json', 'w') as f:
json.dump(resultsSorted, f)