Parsing the MeSH ASCII format in Python
Similar to our previously published UniPruot parser, MeSH provides an ASCII format that can easily be parsed using Python.
Just like the UniProt parser, this function yields MeSH entries represented by dictionaries. Example code is included at the bottom of the file.
#!/usr/bin/env python3
"""
A simple parse for the MeSH ASCII format, as downloadable from here:
on ftp://nlmpubs.nlm.nih.gov/online/mesh/.asciimesh/d2015.bin
For a reference on MeSH headings (i.e. keys in the resulting map), see
http://www.nlm.nih.gov/mesh/elmesh99.pdf
Originally published on TechOverflow.net
"""
from collections import defaultdict
__author__ = "Uli Koehler"
__copyright__ = "Copyright 2015, Uli Kö221hler"
__license__ = "CC0 1.0 Universal"
__version__ = "1.0"
def readMeSH(fin):
"""
Given a file-like object, generates MeSH objects, i.e.
dictionaries with a list of values for each qualifier.
Example: {"MH": ["Acetylcysteine"]}
"""
currentEntry = None
for line in fin:
line = line.strip()
if not line:
continue
# Handle new record. MeSH explicitly marks this
if line == "*NEWRECORD":
# Yiel old entry, initialize new one
if currentEntry:
yield currentEntry
currentEntry = defaultdict(list)
continue
# Line example: "MH = Acetylcysteine"
key, _, value = line.partition(" = ")
# Append to value list
currentEntry[key].append(value)
# If there is a non-empty entry left, yield it
if currentEntry:
yield currentEntry
if __name__ == "__main__":
# Example of how to use readMeSH()
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("file")
args = parser.parse_args()
with open(args.file, "r") as infile:
# readMeSH() yields MeSH objects, i.e. dictionaries
for entry in readMeSH(infile):
print(entry)
If this post helped you, please consider buying me a coffee or donating via PayPal to support research & publishing of new posts on TechOverflow