Just like the UniProt parser, this function yields MeSH entries represented by dictionaries. Example code is included at the bottom of the file.
Just like many other databases in computational biology, the downloads for the popular UniProt database are available in a custom text format which is documented on ExPASy.
While it is certainly not difficult writing a generic parser and one can use BioPython, I believe it is often easier to use a tested parser that only uses standard libraries.
Using rsync instead of ftp has a few key advantages
- Fully incremental downloads
- Resumable downloads
- Faster than FTP
- Only a single connection, no problems with FTP active/passive ports (e.g. important for dual-stack lite, e.g. see this excellent post in german )
You need to calculate the center star approximation for a given set of sequences. Instead of calculating the sequence distances and center string by hand, you want the computer to do the hard work.
You need to parse a GFF3 file containing information about sequence features. You prefer to use a minimal, depedency-free solution instead of importing the GFF3 data into a database right away. However, you need to have a standard-compatible parser
If you Google GO OBO parser, there is something missing. You can easily find parsers in Perl, parsers in Java, but not even BioPython has a parser in Python. The format itself, however seems like it’s tailor-made for Python’s generator concept. Only a few SLOCs are needed to get it work without storing everything in RAM.
I used this parser in a prototype project that allows to search GO interactively (it’s fast). I’m not sure when/if I’ll publish that, but here is the parser code.
It is surprisingly difficult to compute simple statistics of FASTA files using existing software. I recently needed to compute the nucleotide count and relative GC frequency of a single sequence in FASTA format, but unless you install dependency-heavy native software like FASTX or you develop it by yourself using BioPython or similar, there doesn’t seem to be a simple, dependency-free solution for this simple set of problem.