Plans for refreshing Datasets for version 2.2.

Plans for refreshing Links are here: Link Refresh for v2.2.

chembl

Download new versions of RDF files form ChEMBL FTP site.

FTP site: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/22.1/

Plan: All files will be downloaded and included in OPS:

cco.ttl.gz
chembl_22.1_activity.ttl.gz
chembl_22.1_assay.ttl.gz
chembl_22.1_bindingsite.ttl.gz
chembl_22.1_biocmpt.ttl.gz
chembl_22.1_cellline.ttl.gz
chembl_22.1_document.ttl.gz
chembl_22.1_indication.ttl.gz
chembl_22.1_journal.ttl.gz
chembl_22.1_moa.ttl.gz
chembl_22.1_molecule.ttl.gz
chembl_22.1_molhierarchy.ttl.gz
chembl_22.1_protclass.ttl.gz
chembl_22.1_source.ttl.gz
chembl_22.1_target.ttl.gz
chembl_22.1_targetcmpt.ttl.gz
chembl_22.1_targetrel.ttl.gz
chembl_22.1_unichem.ttl.gz
void.ttl.gz

Put into RDF Graph: <http://www.ebi.ac.uk/chembl> ?

ChEMBL Linkset files can be downloaded from the same FTP directory. Those file names end with “_ls.ttl.gz”. The list of ChEMBL linkset files to be loaded for v2.2 is on this page: Link Refresh for v2.2

Note: The chembl_22.1_indication.ttl.gz file is new to ChEMBL and not previously loaded into past versions of Open PHACTS.

uniprot

To obtain UniProt data for Open PHACTS, issue 3 queries to UniProt REST service:

curl "http://www.uniprot.org/uniprot/?query=reviewed:yes&format=rdf&compress=yes" -o swissprot.rdf.gz
curl "http://www.uniprot.org/uniparc/?query=reviewed:yes&format=rdf&compress=yes" -o uniparc.rdf.gz
curl "http://www.uniprot.org/uniref/?query=reviewed:yes&format=rdf&compress=yes" -o uniref.rdf.gz

enzyme

Download new version of RDF file form UniProt FTP site. The enzyme data is part of UniProt distribution. So if UniProt is refreshed, then ‘enzyme’ should also be refreshed to the same version to keep it in sync.

FTP site: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/enzyme.rdf.xz

Put into RDF Graph: <http://purl.uniprot.org/enzyme>

wikipathways

Egon & crew will create new versions. When ready, download the following files. The 20170310 part is the version number. The data appears to be updated monthly.

http://data.wikipathways.org/20170310/rdf/voidWp.ttl
http://data.wikipathways.org/20170310/rdf/wikipathways-20170310-rdf-wp.zip

The following two files were also download and loaded into Virtuoso for OPS version 2.1. But they only contain information about the visual Pathway diagrams, rather than logical definition of each Pathway. As a result, there’s not really a good reason to include them in Open PHACTS. Excluding them looks like it would save on the order of 12+ million triples from 1841 data files. So they will be excluded from version 2.2.

http://data.wikipathways.org/20170310/rdf/voidGpml.ttl
http://data.wikipathways.org/20170310/rdf/wikipathways-20170310-rdf-gpml.zip

The ‘*.zip’ files will need to be unpacked before they can be loaded. Each ‘zip’ file contains RDF files. The 0210 version contains 1841 ttl files inide the ‘zip’ file.

Put into RDF Graph: <http://www.wikipathways.org>

ocrs

Open PHACTS Chemistry Registration Service.

New versions will be created by Valery. Creation of the OCRS data depends on ChEMBL and WIkiPathways data and other chemistry datasets (which ones?).

Put into RDF Graph (for version 2.1): <http://ops.rsc.org>

For version 2.2, ocrs should be put into a different RDF Graph URI.

Other Datasets

For all other Open PHACTS datasets, the current plan (as of 2017-02-26) is not to refresh them, but to re-use the data used for version 2.1.