Quantcast
Channel: Bio <-> Chem
Viewing all 176 articles
Browse latest View live

Getting into the box with recent antimalarials

$
0
0
Update 8th Dec: Last one for the box ?


This lead just appeared in a detailed  PNAS paper,  "(+)-SJ733, a clinical candidate for malaria that acts through ATP4 to induce rapid host-mediated clearance of Plasmodium" (PMID 25453091).  The closest I can get  (not all the enantiomers) for the structure is CID 71276330, VKCPFWKTFZAOTO-RTWAWAEBSA-N.  This has a SureChEMBL match to WO2013027196  that includes an SAR table for 54 analogues.  But I'm confused - does (+)-SJ733 =  (+)-SJ557733 (as described in the MMV portfolio or not ?   Having just been tweeted this link by MMV the answer is "yes" but I wish folk would stick less ambiguous code naming systems.  I should have remembered the trick of Googling both names together, which hits the link given.


Update 31 July.  I appreciated the recent encouragement from MMV so if you have any suggestions for inclusion, especially from your own publications or patents (including review authors), by all means wang them over and I can attribute as you wish (e.g. you could add them as blog comments in the first instance). Normaly I don't pester about sharing but in this case the eventual Box will have a global impact on NTD research so please spread the word (note also no.5 just had a clinical trial published).

****************************************************

I hope some of these "ferreted out" leads to make it into the MMV Pathogen Box (an extension of the Malaria Box that I have already blogged about)  If some don't they can reside here as a reference set anyway.  I only have time to pick off  some antimalarials rather than additional  NTDs, since some were the subject of previous blogposts.  In addition,  last spring,  I was engaged in curating antimalarial patent reviews for a planned collaborative paper but the project lost traction for various reasons (but might be resuscitated).

I have added some structures from the MMV portfolio I had already blogged about  but some are easy to find as PubChem-positive name-to-struc; Arterolane/OZ277/RBx11160 =  CID 10475633 (16 nM vs NF54)  OZ439 =  CID 24999143 (28 nM vs K1) and MMV666693/TCMDC-124577 = CID 3742333 (238 nM vs 3D7, this CID got the data links but I think the correct E/Z representation is CID 5910214).  Note most of these leads will have patent links in PubChem, either in the CID reord to USPTO or via the SureChem substance (SID)  links. 

Unfortunately  DDD107498 and SJ557733 have been blinded so if anyone from Dundee or St Judes cares to wang over the structure I can add them.

My pick'n mix below (adding the thee above makes 30) includes parasite-actives from the last few years along with some target-specific leads (but confirmed as parasite -active as MMV suggested) with probable molecular mechanism of action (mmoa).  As commented it could be useful to omics-profile the Plasmodium expression perturbations which just might give clustered signatures  for the known vs unknown mmoas.  This could deconvolute some of the latter to the former by infering the possible target.  Note for the author-named target sequences I found links for,  there may be residual ambigutiy where I did not match the UniProt entry to the same Plasmodium strains used in the reports  since there can be many TrEMBL entries to select from (which could include residue changes).

I have done my best to get the name-to-structures right. However, authors can inadvertently make these particularly difficult to find or resolve (e.g. by not including a full IUPAC name in the M&M) and may even not deposit their novel structures into any chemical databases (so please let me know if any need correcting).  Note a) not all the CIDs had a 3D confromer and b)  PubChem -ves are rendered via ChemAxon Marvin.

While I have added what I judge to be the primary publication title, note there may multiple sources of activity data linked to references in ChEMBL via PubChem Bioassay,  even if the structures were used for later comparisons (or even have non-malaria related confirmatory BioAssay results). I have also added SureChEMBL pointers where, on inspection, the patent documents seemed useful (i.e. including actual data and without deliberate obfuscation of structure-to-activity mapping) and with first-filing quantitative SAR data sets that were either unique, or larger than the subsequent papers.

If you want to check the information space around these compounds, 1) Google the InChIKey (but note, just now the results are unfortunately daisy-chaining all my blog posts )   2) check "Same Connectivity" in the CID entry to track any isomers, 3)  browse the 90% Tanimoto 2D shell  as "Similar Compunds" and 4) "Related Compounds with Annotation" from the display ribbon, 5) you can  also browse "Similar Conformer" if you fancy some 3D-walking.  For the PubChem-negatives just paste in the SMILES for the 2D search. 

*********************************************

1)   Top of the list  (since I put it into PubChem myself as CID 71819647)  has to be the home-grown MMV670437 with a parasite IC50 of 44 nM. It just appeared in CHEMBL3137625 (but a different IC50).  The background is in the link to the Sydney Team above.

PMIWBIXSAYKRGF-SFHVURJKSA-N



    
2)  Next  in line are those  from the  MMV portfolio that are not blinded.  These start with MMV3900048  (CID 53311393)    as 17.8 nM against D7 .  There are many analogues in  WO2011086531  but the IC50s were in ng not nM :(

RTJQABCNNLMCJF-UHFFFAOYSA-N



3)  P218  is from  "Malarial dihydrofolate reductase as a paradigm for drug development against a resistance-compromised target" (PMID 23035243) with a 55 nM IC50 against mutant protein  This has PDB  structure (CID 66563688) but is not mapped into PubChem BioAssay  (n.b there were no  obvious structure mappings to the activity data in WO2009048957)

VDGXZSSDCDPCRF-UHFFFAOYSA-N.



Pf target is P13922

4) The historical antimalarial endochin (CID 100474 but not named by the PubChem submitters)  served as a structural template for optimization of analogoue leads now named Endochin-like quinolones (ELQ).   From the MMV portfolio ELQ300  was described  in "Quinolone-3-diarylethers: a new class of antimalarial drug" (PMID 23515079) The structure (CID 67016608)  has a parasite IC50 of  1nM.  There is a substantial data (Fig 4) in "Compounds having antiparasitic or anti-infectious activity" (WO2010065905) as first-filing but the dense image tables defeated automated extraction. Some lead structures were successfully extracted from a  later filing (WO2012167237)

WZDNKHCQIZRDKW-UHFFFAOYSA-N


JFTR the ELQ  series is extended in a subsequent paper "Discovery, synthesis, and optimization of antimalarial 4(1H)-quinolone-3-diarylethers" (PMID 24720377)  As 20d = ELQ-333 in the paper  a sub nM example is

COC1=CC2=C(C=C1Cl)C(=O)C(=C(C)N2)C1=CC=C(OC2=CC=C(F)C=C2)C=C1.

However this analogue was PubChem -ve and is so close to ELQ300 I have not given it a separate entry. In addition the overlap between the two papers and the two patents would take more sorting out than I can do just now.  In the meantime for those interested  in the extended series in  PMID 24720377 the the OA full-text extracts well in chemicalize.org.


5) NITD609  = KAE609 = cipargamin.   This is described in "The spiroindolone drug candidate NITD609 potently inhibits gametocytogenesis and blocks Plasmodium falciparum transmission to anopheles mosquito vector" (PMID 22508309). The structure, with a 4.4 nM IC50 against Dd2,  is CID 44469321

CKLPLPZSUQEDRT-WPCRTTGESA-N




Note recent the publication of encouraging  Phase II results


6) DSM265  = CID 51347395   from a GSK pubication "Structure-guided lead optimization of triazolopyrimidine-ring substituents identifies potent Plasmodium falciparum dihydroorotate dehydrogenase inhibitors with clinical candidate potential"  (PfDHODH) inhibitors"  (PMID 21696174)

OIZSVTOIBNSVOS-UHFFFAOYSA-N

The Pf target should be Q08210.

7) A lead structure from “Discovery of novel and ligand-efficient inhibitors of Plasmodium falciparum and Plasmodium vivax N-myristoyltransferase”  (PMID 23170970) . This was resolved  to CID 70678410 (see this blog post for the WO2013083991 patent links)

GEVWCNOQZDSSIG-UHFFFAOYSA-N

The Pf target is  Q8ILW6

8) ACT-213615 with an unknown mmoa, was published as "Identification of a new class of antimalarials (PMID 22732921) from which the code name was mapped to CID 53303859

JOODDBZFOAVHIT-AFSOEPDBSA-N 


9) This one is PubChem -ve but the image in the Nature paper "Targeting Plasmodium PI(4)K to eliminate malaria" (PMID 24284631) for one of the leads  KDU691 (mmoa  P. vivax PI(4)Kinase IC50 of 1.5 nM)    plausibly resolves to the SMILES below

 CNC1=CC=C(C=C1)N(C)C(=O)C1=CN2C(C=N1)=NC=C2C1=CC=C(Cl)C=C1

URXVBRGOHBSZCO-UHFFFAOYSA-N


Target is Q8I406


10) This one is from "In vitro and in vivo characterization of the antimalarial lead compound SSJ-183 in Plasmodium models" ( PMID 24255594) as CID 44226912,  with a 7.6 nM IC50 against the KI parasite.

HBSWMATYASLRBW-UHFFFAOYSA-N



11) Originating in Dundee from "Discovery and structure-activity relationships of pyrrolone antimalarials" (PMID 23517371) the lead was named TDR32750 (8a), with a Plasmodium falciparum K1 EC50 of ~ 9 nM. However,  it looks like Novartis had already picked this up as GNF-Pf-1753 (CID 5730429)

LGOJAESSCLSLCP-GZTJUZNOSA-N


12) The Capetown team published  "Medicinal chemistry optimisation of antiplasmodial imidazopyridazine hits from high throughput screening of a SoftFocus kinase library: part 1" (PMID 24568587). Compound 35 was the lead with IC50 vs K1 = 6.3 nM, and vs NF54 = 7.3 nM.



This resolves to  CS(=O)(=O)C1=CC=C(C=C1)C1=CN=C2C=CC(=NN12)C1=CC(=CC=C1)S(C)(=O)=O . It had no exact match in PubChem or SureChEMBL but looks similar to a Novartis  kinase inhibitor from a patent (CID 24948277) .  

JDMXXVFQJAZOTB-UHFFFAOYSA-N


13) I was  intrigued to see "A 2nd Selective Inhibitor of Plasmodium falciparum Glucose-6-Phosphate Dehydrogenase (PfG6PDH) - Probe 2" in the PubMed hits from the Molecular Libraries initiative (PMID 24501782).  The lead is a  PfG6PDH inhibitor (190 nM IC50) that was >420-fold selective vs. the human orthologue.  The probe number is ML304  (CID 56639562)

TUAFUWGZCFIOSV-MRXNPFEDSA-N


(given the first probe against this target  had lower parasite potency I have not included it but for the record it is ML238 (CID 53362052)

Pf target is Q25856

14) This peptidic  protease inhibitor is from  "Rational design of the first difluorostatone-based PfSUB1 inhibitors" (PMID 24909083). At 600 nM against the enzyme this (PubChem and SureChEMBL -ve ) lead  (compound 1a) is not particularly potent but could still be a useful mmoa probe.

CC[C@H](C)[C@H](NC(=O)[C@H](CCCCN)NC(C)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(=O)C[C@H](N)C(=O)C(F)(F)C(=O)NCC(O)=O

IMPBWYDGTOWBTF-MJIQOFRUSA-N


N.b. while a tweet from MMV was positive about the set up to this point they suggested anti-parasite activity as a Box pre-requisite.  Such an assay has not been performed in this case.  However,  I have left  it in for mechanistic interest and I'm sure the University of Sienna team might distribute some of it for parasite mmoa testing.

Pf target is Q8I0V0


15) This was a potent hit from an earlier Sydney series as OSMS39 (CID 57515644) .  While the team has cautioned on the poor solubility,  the 5 nM IC50 makes this worthy of inclusion, even if a little more DMSO may be called for.

GVGNOLWIUGQIHW-UKWGHVSLSA-N


16) This  lead series from Sanofi is patent-only so far.  I'm giving the Freepatentsonline url for "Pyrimidinone derivatives as antimalarial agents" WO2013190123 since the SureChEMBL extraction failed for some reason.  Chemicalize.org converted over 400 IUPACs including most of the 119 SAR examples.  The one below, compound 90, has a 2nM IC50 against the NF54 Pf strain.  The structure is PubChem-negative but Thomson Pharma did extract some analogues, such as CID 72694240, but for compound 90 the SMILES string and InChIKey are below.

FC(F)(F)C1CCN2C(=O)C=C(N=C2N1CCC1CCN(CC1)C=O)N1CC2CC1CO2

 NFTJCCFKIQSYDD-UHFFFAOYNA-N


17) Also patent-only,  this is from a StJudes, MMV,  Rutgers consortium filing "Substituted 2-alkyl-1-oxo-n-phenyl-3-heteroaryl-1,2,3,4- tetrahydroisoquinoline-4-carboxamides for antimalarial therapies" (WO2013027196). The most potent was example 7 with an EC50 against 3D7 of 3 nM (CID 71529699) pulled out by Thomson Pharma.

UJHYREHWQZFQPP-UHFFFAOYSA-N


There are potent analogues in 7196 including  CID 71276304 as example 51 with a 5nM EC50  (n.b. the blinded SJ557733 may come from this patent).

18) From the paper "Aminoazabenzimidazoles, a Novel Class of Orally Active Antimalarial Agents" (PMID 24914738)  this lead was compound 23 with 40 nM IC50 against the parasite. It was PubChem and SureChEMBL negative. My guess is this AZ team would  have filed but maybe the extractions failed.  The useful frontspiece image is below.



CC(OC1=NC2=C(C=C1)N=C(N)N2CC(O)C1=C(Cl)C=C(C=C1)C(F)(F)F)C(F)(F)F

 POQVSNVXDFTBIQ-UHFFFAOYNA-N


19)  A third MLSCN probe but optimised against whole parasites as described in "ML238: An Antimalarial Small Molecule of a Unique Structural Class" (PMID:23236647) as  CID 49849912

 BKTXRPJVVXUPPO-PNCWTNKOSA-N


20) A recent patents (not yet in SureChEMBL)  was  "Aryl derivatives and uses thereof"(WO2014074778). I'd never heard of  Jacobus Pharmaceuticals as an assignee organisation but  the inventor has published anti-parasite papers, which would lend some credibility to the patent application.  The example 23 shown below  has a 21 nM IC50 vs D6 and 7nM vs W2 (there is mouse survival data for the series as well).

CC(C)(C)NCC1=CC(=CC(=C1O)C1=CC(=C(Cl)C=C1)C(F)(F)F)C(C)(C)C

IEDUUGCRKLENDA-UHFFFAOYSA-N


While this is also PubChem negative  its has similarity to CID 129635 that was also reported as having antimalarial activity in mice.

21) This is directed against the falcipain cys protease in the parasite but also has a 1nM IC50 against the parasite as reported in "Falcipain inhibitors: optimization studies of the 2-pyrimidinecarbonitrile lead series" (PMID 20672841). While the paper is an SAR tour de force, the actual drawing of the Markushed structure in a later review article (PMID: 23587422) saved me some work.


I thus think the lead is CID 15979041

AXKCHJVWQBAGPP-UHFFFAOYSA-N


This is one of the unusual cases where there is less SAR in the patent (WO2007025775 mostly  exemplified as TFA salts)  than the paper, but the latter was published a few years later.

The Pf targets are falcipain 2 Q9N6S8  and falcipain 3 Q9NBA7

22)  "Discovery and preliminary structure–activity relationship analysis of 1,14-sperminediphenylacetamides as potent and selective antimalarial lead compounds" (PMID 23265884) just happened to pop into my G+ feed  (CID 71533487 )

SUKXUTUXGGJIBX-UHFFFAOYSA-N



While the lead structure does not look particularly drug-like,  the 8.6 nM IC50 against K1, would make a case to investigate the mmoa for a possible new molecular target.  What is also notable for this paper is the enlightened specification of the whole series as  InChIKeys in the Elsevier abstract but Google seems to have indexed them via other portals.

23) KAF156  is listed in the MMV portfolio but still  took some digging out and its a good example of the name-to-structure mapping problem . The key Google hit was "KAF156 is an antimalarial clinical candidate with potential for use in prophylaxis, treatment and prevention of disease transmission" (PMID 24913172). This just included an image for the structure but also cited the original SAR paper "Imidazolopiperazines: lead optimisation of the second-generation antimalarial agents" (PMID 22524250). However, the code name was not assigned in that first paper  for CID 856296 

BUPRVECGWBHCQV-UHFFFAOYSA



So in this case KAF156 in PMID 24913172 = CID 856296 = example 20 in PMID 22524250 = CHEMBL2058833 =  example 412 in WO2011006143.  This lead has three firsts in this set, 1)  it takes the biscuit for 754 analogues with EC50s listed from two strains listed in the 6143 patent,  2) also the largest list of collaborators and 3) the only recent lead with a commercial supplier (SID 188474432 (but whether they actually have made it for stock is another matter .....)


24) Was found in "piperidinylcarbazole as antimalarial" (sic)   (WO2014108168 ) from a Merck team, one of whom is now at MMV.  The IC50 of example 3 was reported as 3nM against the K1 strain.  This is PubChem and SureChEMBL negative, being a recent patent.

Smiles: OC1(CCNCC1)N1C2=CC=C(F)C=C2C2=C1C=CC(Cl)=C2
InChI: 1S/C17H16ClFN2O/c18-11-1-3-15-13(9-11)14-10-12(19)2-4-16(14)21(15)17(22)5-7-20-8-6-17/h1-4,9-10,20,22H,5-8H2
InChI key: FQMRAIQOBRFRPX-UHFFFAOYSA-N


25)  Is from"A Specific Inhibitor of PfCDPK4 Blocks Malaria  Transmission: Chemical-genetic Validation" ( PMID 24123773).  In the paper 1294 (CID 56963908)  is the proof-of-concept lead with a 47 nM IC50 in a parasite exflagellation assay (although it has some HERG liability).  This structure is example 150 in US20130018040 but the claims expand to "Compositions And Methods For Treating Toxoplasmosis, Cryptosporidiosis, And Other Apicomplexan Protozoan Related Diseases".  This already alludes to possible parasite homologue utility which the authors duly exploit for the same comound in  "Neospora caninum calcium-dependent protein kinase 1 is an effective drug target for neosporosis therapy"(PMID 24681759). This paper shows not only an adroit target and species hop but also a PDB structure for Uw1294 in the Neospora enzyme

InChI=1S/C24H28N6O/c1-3-31-20-7-6-17-12-19(5-4-18(17)13-20)22-21-23(25)26-15-27-24(21)30(28-22)14-16-8-10-29(2)11-9-16/h4-7,12-13,15-16H,3,8-11,14H2,1-2H3,(H2,25,26,27)
InChIKey: KONPIFGGWNMOKY-UHFFFAOYSA-N
SMILES : CCOC1=CC2=C(C=C1)C=C(C=C2)C3=NN(C4=C3C(=NC=N4)N)CC5CCN(CC5)C


The Pf target is Q8IBS5

26)  This is a recent patent from the Novartis Singapore team  "Compounds and compositions for the treatment of parasitic diseases"(US20140155367).  The selected example is 19 (CID 73891131) since the IC50 was 1 nM for Plasmodium parasite proliferation and the most potent (4.3 nM EC50)  in the secondary P. yoelii sporozoite invasion assay.  Thomson Pharma pulled this one out but you can extact the 89 analogues with parasite data using the FreePantentsOnline url above for 5367 in chemicalize.org.

InChI=1S/C23H17N5O2/c1-27(19-8-2-15(13-24)3-9-19)23(30)18-10-11-28-21(12-18)20(14-26-28)16-4-6-17(7-5-16)22(25)29/h2-12,14H,1H3,(H2,25,29)
InChIKey: RQRAIWVNBUQEFG-UHFFFAOYSA-N
SMILES : CN(C1=CC=C(C=C1)C#N)C(=O)C2=CC3=C(C=NN3C=C2)C4=CC=C(C=C4)C(=O)N




Resolving the NIH small-molecule probe review set

$
0
0
This week's issue of Cell includes a detailed wrap-up review on the NIH  Molecular Libraries Initiative "Advancing Biological Understanding and Therapeutics Discovery with Small-Molecule Probes" (PMID 26046436).  As a tour de force this includes supplementary tables displaying nearly 90 structures (page 1 below).



As ever, the fly in the ointment is the absence of links and/or explicit designations of the structures, since we just get the probe number and an image.   The issue of mapping NIH probes to CID structures has come up on this blog before and in our paper that already included challenges for resolving these compounds (PMID 25415348 , F1000 recommended) so this presents exactly the same problem.  While it may not be the only solution, my ad hoc workflow was as follows.  For the first step of extracting the ML codes I used PDF Tables, which did the pretty good job  on the entire supplementary table (below).


After a bit of  fiddling via Excel and Text Fixer we can get a list of 88 ML codes to map with the PubChem IES tool, as per below.


First off  we can get the 1:1 mapping


The probe codes are on the left with  PubChem CIDs on the right, in three rows.

ML007976135ML22556593029ML29152940465
ML0312113511ML22671304779ML29256587900
ML0503136844ML22746926632ML29453364485
ML0515322399ML22846742353ML29553364533
ML0566857802ML2313392161   ML29653364510
ML05671311947ML24049830258ML29756642816
ML07725067404ML24453239856ML30046861530
ML0813068143ML24550904134ML30069361347
ML13144607580ML24650985821ML30353316412
ML133781301ML24744246403ML30856928204
ML1412950007ML24849835928ML31051035449
ML1441542103ML24917253208ML3113384730
ML16544820665ML25050904505ML31453245590
ML16944475955ML25153255421ML32249800087
ML17181131ML25250985951ML32360167849
ML17424856225ML25253393831ML32513891339
ML1754257368ML25353382523ML32657525860
ML1762360837ML25453382545ML3281517823
ML1803238389ML25653364507ML33456840728
ML18615945391ML25753384746ML33523723457
ML1931261822ML26451003603ML33671301451
ML19846907762ML26544246499ML34557390068
ML20344543605ML26753257126ML35570701426
ML204230710ML26953239838ML36546785920
ML20546931017ML27553301904ML367921541
ML21649852229ML27653362052ML37571598521
ML21845115620ML27753347902
ML2217217941ML2821067700
ML22360859ML28956587994
ML22450897809ML29056593349

The mapping sheet is now on Figshare.  We can then re-run the mapping with the Entrez History option to get the set of  86  live links


You can retrieve these as a MyNCBI open list.  So are they "right" ?  It looks like it but there is no obvious way to cross-corroborate.   The stats from the 88 probe codes in Table S1 indicate the synonym mapping missed, ML350, ML338, and  ML238 but produced two CIDs for ML300.  The latter was a simple salt case (below).


ML350 did not come back in the IES  but in the normal query box it double-maps viz


Notwithstanding, the report document for ML350 resolves it to SID 144087319 but the synonym did not come across to the CID 60156214  because the heuristic could not split the "one synonym two structures" case because of  the chemical supplier entry.   The same thing has happened to both  ML338  = CID 60182306  and ML238 = CID 49849912.

There some quirks related to the list above I may delve into later but can only touch on a few just now.  The first is that the count of  375 probes stated in the review is not available as an accurate molecular listing anywhere in the NIH system.  Our efforts described in PMID 25415348  only managed 322 so that's quite a shortfall.  Note in the web snapshot of the 86 there are some interesting stats in the facets on the right. For example;

  1. Only 46 are indexed as originating from the probe BioAssays
  2. 72 are nominally covered by vendors (not necessarily as stock though)
  3. 35 have exact matches for patent extracted chemistry (indicates filings from the MLP  but some may be prior art)
  4. Surprisingly, there is only 1 Mesh "pharmacology" cross-reference.  This means the probes are not featuring  in PubMed papers where a MeSH annotator would classify them as having in-vivo pharmacology
  5. Since there are only 74 matches to the MLSCN collection 12 of this set have not been added to the screening collection (but this could be a parent/mixture issue) 
  6. The overlap with the 318 probes we CID mapped in PMID 25415348  is only 60  :(

The ELABELA GPCR ligand: a smORF hidden in plain sight

$
0
0
Notwithstanding the 15th Anniversary of the Human Genome completion this week, the continued yo-yoing in canonical protein number is still very much a topical  issue, as can be seem from the twitter trail below.


The subject of AV's tweet encompasses the recent finding (effectively a re-discovery as outlined below) of a novel vertebrate peptide ligand for the Apelin receptor, initially named ELABELA or Toddler, but now with the HGNC symbol APELA.  I think this surprised quite a few folk and it was of particular interest to our  IUPHAR Evolving Pharmacology Committee who oversee new GPCR endogenous ligands and pairings that we consequently add to the database (see  PMID 23957221). Some emails buzzed around this group when the two papers came out 2013/14  but the only database annotation at that time was a transcript as a lncRNA (i.e. non-coding). We subsequently entered the peptide and mapping relationship into GtoPdb as ligand 7930  (n.b. have just added the Kd=0.51nM interaction into the entry as per below but this will go live in the next release in a couple of weeks)


This entry, other historical matches and the subsequent spread of new or revised annotations across the databases exhibit interesting bionformatics quirks.  A major one was that a complete cDNA had been already deposited back in Jan 2008 from one of the Japanese high-throughput transcriptome efforts as AK092578. However, this had no submitter-marked CDS and thus no ORF that could be captured in the protein databases.  Retrospectively, as you can see below, the mRNA clearly encoded APELA in one of the forward frames.


The reason for the CDS omission in this case may have been that it fell below any automated length threshold that could be used for batch submission. Even dropping this to 100 residues would have caused an explosion of spurious small open reading frames (smORFs) in any bulk 6-frame translated cDNA sets. Fast forward to April 2014, it was then an easy matter for the Singapore team to sequence a new ES cell cDNA, duly including the correctly annotated ORF (KJ158076) which was promptly promoted to Swiss-Prot P0DMC3.  In terms of cDNA sans-ORF there was an even earlier deposition than AK092578 from a patent sequence in July 2006, CS163711,  from WO2005085280A3 ("Human cDNA clones comprising polynucleotides encoding polypeptides and methods of their use"). So is there any IP significance for this old-school shotgun sequence filing by Five Prime Therapeutics ? Its unclear since this is apparently daisy-chained (different titles)  into a complex 19-member, patent family. However, the rather pithy WIPO examination report quotes "claims or said claims Nos. 1-79 are so inadequately supported by the description that no meaningful opinion could be formed" ('nuff said).

What gave me more of a surprise from just searching around and about was the TBLASTN of the protein against ESTs (below)


This encompasses solid ORF hits from humans to bears right down to fish and Xenopus (as reported in PMID 24316148), some of which, like the mouse ESTs, go back a decade or more.  So it was very much "hidden in plain sight" as a moderately abundant vertebrate transcript (including 22 human ESTs in Unigene).  This comes back to the pre-2014 absence of any protein database entry. There was thus nothing for genomic predictions or  EST matches to "lock on" to in the automated annotations that might have flagged transitive protein similarity matches.

Once the sequence had got into the system via UniProt and RefSec in 2014,  the gene building pipelines (but mostly the US ones) that re-ran subsequently  then "found" the swathe of othologoues. Hence we can get the instant tree from the NCBI nr BLASTP results (below)


Even from trying the same search in the Transcript Assembly Shotgun (TSA) division I was unable to match anything phylogenetically below fish. This suggests it could be a vertebrate invention but as a small protein it could simply mean that possible homologues in the invertebrate phyla gave no significant similarity scores. Curiously, neither TreeFam nor the Ensembl orthologues have  been displayed, maybe also because the protein too small. There are no Swiss-Prot  InterPro features either since it has no "domains" (because they all drop below all the scoring thresholds).

This unexpected discovery raises general issues.  Speculation has surfaced (e.g. PMID 25345765) that by ribosomal profiling, ORF-mining the 1000s of nominally ncRNA sequences, or stuffing as many unusual vertebrate tissue samples into LC/MS-MS instruments as possible,  more smORFs,  including novel endogenous receptor ligands, are going to turn up. However, I happen to be on record with arguments against the idea of a rich post-genomic harvest of cryptic smORFs (PMID 15174140).  Therefore, if anyone is willing to wager a few beers I'd put up for (over the next three years) the number of novel receptor ligands at not more than 5 and (solidly cross-corroborated) new smORFs at less than 50.

Just for fun you can see the result from the Phyre 2 reverse threading server (below) as a nice-but-not-so-convincing result (but don't blame the server for divergent structural evolution in small proteins).



Notiwthstanding, with something as small as a peptide hormone, my guess is some teams are probably working on an NMR structure (and for apelin in //?) right now....

sp|P0DMC3|23-54  QRPVNLTMRRKLRKHNCLQRRCMPLHSRVPFP

I will finish with an interesting set of Google hits for the above sequence.


Chessboardane and other strange patent extractions

$
0
0
Since I pinged it round on twitter  I can attribute HP  for coming up with the name "Chessboardane" for the unusual structure below.


From checking the three submitters for CID 21040251 we can quickly discern three things
  1. Strange as it may look, according  to the thee submissions collapsing via the CID chemistry rules it is a legitimate chemical structure (but not necessarily extant)
  2. You can -er- buy it (or at least get a quote from the listing vendor)
  3. It was derived from US7092578  ("Signaling adaptive-quantization matrices in JPEG using end-of-block codes")
  4. It first entered PubChem back in 2007
  5. ChemsSpider, perhaps wisely, have deprecated the structure but nonetheless nicely computed it in 3D and assigned the same InChIKey   (see below)

These interesting  manifestation of this raises a number of  questions.  First of these is are there any more like this ?  The answer is yes, around  475 in fact, a selection of which is shown below.


So whats been happening?  I may fill in some more details later but it sounds like this  (I already knew the basics but thank EB from PubChem for filling in some gaps).

Back in 2009 the USPTO initiated the Complex Work Unit (CWU)  Pilot Program.  This envisaged the use of approved source file formats for chemical structure drawings, mathematical formulae, protein 3D depictions and table data. This would streamline the processing, examination, and publication of patent applications.  This included the use of contracters to prepare ChemDraw (CDX) and MOL files.   It turns out that many of the non-chemical drawings of diagrams or tables are instantiated as CDX files, which, by default, were all subsequently converted to MOL files (i.e. converted to chemistry regardless as to whether they actually depicted chemical structures or not).

Summary points:

  1. Are folks aware of the problem?  PubChem and ChemSpider are and many others know at least the outline.  
  2. So does it matter?  In the grand scheme of things (e.g. 68 million) probably not much.  I cant immediately think of any crucial data mining approaches that the presence of these would confound (but you never know with big data). Weather any of the primary sources perceive it as an issue to address I don't know  (comments anyone?). 
  3. Can they be filtered out ?  The short answer is no because they conform to the same rules passed by legitimate structures.  I came up with " Limits: RotatableBondCount to 0, Complexity from 5000" which gave 507 but not with complete specificity. But I note  ChemSpider must have come up with something for the deprecation flagging (or a bulk older SureChem purge?).
  4. Is it anyone's fault ?  Not really, especially considering how useful authentic structures from CWUs are that are fed in by patent sources.  However, it might not be a bad idea for the USPTO to arrange some kind of triage for the contractors to separate the chemical from the non-chemical CDX files and not convert the latter to molfiles
  5. So what of  the vendor (only one that I could find)?  Well, these entries do rather red-hand them for their "scrape-'n-sell" approach. 

Proteases: a hard days curation

$
0
0
It is my good fortune, along with my colleague, to have been invited to coauthor a review on proteases and a/b hydrolases.  As this will have an enphasis on inhibitors for these human enzymes we are in the process of adding new entries to GtoPdb, over and above what we have already have in time for a new release towards the end of this month.


This puts me and my colleague where the rubber meets the road  (a US cliche with no good UK English equivalent - "at the coal face" doesn't ring so well) for selection, extraction, curation, and annotation. Its tough going for a few reasons (lack of experience or enthusiasm not being two).  Firstly, we are moving beyond the 65 or so we had drug and/or  clinical ligands for  out into the more speculative target, druggable genome, functional probing zone, where there are less papers to choose from.  Secondly, this takes us into dense subfamiles with high sequence similarity that makes specificity mapping particularly difficult.  Thirdly, I don't want to generally impugn protease medicinal chemistry authors but, on occasions, they can make concise data curation from their papers really hard going (sometimes I think proteases are worse than other target classes in this regard but I have no metrics to back this up).

Asthetic CID of the week

$
0
0
ACIDOT (Asthetic CID Of The week) is something less serious as a summer interlude but possibly continuable if it proves popular.  The principle is simply to find large things that look both pretty and unusual,  as CIDs or SIDs (depending on the rendering). They will be tracked back to their sources (and at least guess made on their chemical reality rather than Chessbordane).

This weeks example, ACIDOT_02 is CID 90478973, strangely enough,  is from the FDA Structured Product Labeling resource.


My guess is this tribromophenyl-benzene (CID 158630) ensemble has something to do with drug manufacturing as preservative or colourant (can anyone expand?)

************************************
First up from last weeks tweet ACIDOT_01 is CID 90265505-


This came from From "organic electroluminescence device"US20140183486 via SureChEMBL.  The patent contains pages of the sort of stuff below depicted below. Its not obvious if this was actually made or not but the assignee Kosan is a substantial Japanese petroleum company.


MK-8931 vs verubecestat

$
0
0
There is some context to this in  http://cdsouthan.blogspot.se/2015/03/mystery-of-jnj-code-number-structure.html in terms of  synonym spaghetti for vendor structures.  However this was moved up for separate post because a) this is a more serious case for an Alzheimer's disease current clinical candidate b) it will feature in an ACS presentation and c) the three structures featured are now linked to the GtoPdb entry for BACE1 (but wont be live until  ~ 3rd week of Aug).    

To their discredit Merck have consistently blinded the structure of MK-8931,  their declared lead  BACE1 inhibitor for Alzheimer's  since before 2011 with  four entries in Clinical Trials.gov and a synonym link to  SCH 900931. However there are no proper publications (only abstracts) and none with a name-to-struc (n2s) disclosure.



Notwithstanding, in Dec of 2014 ChemIDplus duly submitted a PubChem entry  SID 223427121 as shown below.



Thus the name MK 8931 now merges  into CID 23627211  IKFZEHQGULJMKI-SFHVURJKSA-N.  But where did this name-to-structure (n2s) declaration originate from  ? Curiously we can find a 2013 Wikimedia entry (below)



But the plot thickens since the link above does not go anywhere, but you can also find a vendor in Google with a name match (below).



But there are two more quirks, firstly a ghost molfile that won't open  as well a CAS no.  However, in this case there is no structure, only a name (and the Editor  could have easily resolved the secretases by Googling MK-8931). 


Adding to the quirk list is that ChemIDplus added RN 16330-81-6 to the n2s link.  Inspecting the SIDs (below) clarifies some of whats been going on. 



Starting in date order, the structure was first deposited in 2007 from a Thomson Pharma (presumed) patent extraction.  Next in was the Prous journal but the n2s in this case was Schering pre-merger  (i.e. a synonym back-mapping).  Then we have the three automated patent extractions but nested between these is a ChEMBL assay extraction from an Merck paper.  So CID 23627211 "synonym-chains" the patent and publication links,  even with the CAS number that has no n2s in SciFinder  (for the record 1628077-07-5 is linked to SCH-1359113 in SciFinder but also without a structure).   But the same connectivity operator reveals  the "flat" version as CID 46863661 that Thomson dropped in 2010 (odd - since this was three years after the specified stereo version, maybe a secondary patent?)

There is also a significant quirk in the Merck paper below.


The paper usefully declares their detailed results but it turns out the MK8931-linked CID 23627211 is not the lead structure in the paper as compound 16. It is in fact compound 13 (i.e. with Ki 7.8 nM and cell Abeta40 IC50 13 nM as shown below).


Note also that no development codes appear in this paper and that compound 16 was their PDB structure as CID 66575082 .   The patent matching for MK-8931 as CID 23627211 is unclear because it hits what looks like a prior-art exemplification in the later granted Merck US9029362



This calls back to Schering US20070287692 but I cant find activity data (note also SciFinder does not have the structure-to-patent mapping but Thomson Pharma probably does for SID 46488339.  However, the plot for Merck BACE1 clinical leads thickens since the announcement of a new INN and USAN for verubecestat (of which the latter is shown below)


Usefully, this is PubChem positive for the n2s as CID 51352361 but you can see from the synonym list below MK-8931 is not one of them, neither is it mentioned in the USAN document. 


There are a whole lot of interesting quirks associated with the list but I can only pick out a few. First up its a nice surprise to see SAR extraction of 21 examples from the patent by BindingDB


But the biggest suprise, by Merck's own data, is to see that verubecestat is actually a better BACE2 inhibitor than for BACE1.


So whats going on ?  I can't be sure but accepting the provenance of the INN and USAN sheets suggests the minimum parsimoneous assumption is that the MK-8931 n2s that ChemIDPlus put in PubChem for SID 223427121 was wrong, possibly because the only precedents were unprovenanced vendor entries.

There are some consequences to this which could confound interested parties. One of these is the clinical trials data disconnect  as shown below  (from the primary source)




In addition, secondary sources synonym-chain independently (they might be accidentally right but that's not the point)






Joining chemistry between journals and databases

$
0
0
This post will focus on navigation joins between explicit chemistry descriptions in journal papers and databases records. My colleage JS has already presented a slide set 

Provenancing approved drugs: a call for information

$
0
0
As well as laying out what I hope are useful observations, this is also a call (please) for filling in knowledge gaps on the information labyrinth associated with the surfacing of  newly approved drug structures. The main reasons are a) its an important part of what we need to understand at GtoPdb  b) it could add context for a nascent manuscript broadly based on this  ACS presentation  c) unless I have missed it, there is no joined up description of the many steps and connections.  For the record I'd be pleased to attribute anyone who can answer the questions raised (but if they want to remain anonymous that also fine).  Particularly useful could be comments from folk working for, or with, the FDA, the FDA/SPL/UNII team,  ChenIDPlus, WHO, AMA or CAS.  Also, of course, anyone from pharma companies who have actually been through the hoops of  IND/CAS/INN/USAN applications could contribute their insights.

Before I delve into arcane technicalities, I can point to two introductions on the naming issues. Firstly an old  BioIT presentation .


Secondly, here is a report on  drug naming  as an attempt at some level of clarification (you can see the first paragraph below).



Meanwhile, back at the GtoPdb  ranch, we have done a certain amount of head scratching about timings and obligatory couplings between applying for an FDA  IND  a WHO INN  and an AMA USAN as well as perpetually grappling with the maze of different, image and text-only  structural representations. As curators we are signed up for both the WHO Mednet  and  AMA websites that we use for curatorial corroboration.   We gleaned the following from the USAN application form"US firms that have a US IND number are expected to file for a USAN first, rather than requesting a non-proprietary name directly from the INN Programme. If you are requesting a name that is already an INN, please list INN number and explain why the INN submission was made first".   The AMA site has information  for a parent USAN application ($15,000)  followed by salt forms later ($8,000).  To (er-hem...) "streamline" FDA and INN compliance there are 5 separate USAN application forms.   There is a lot of interesting detail in these forms but the structural pages are shown below.


As an output example, I have pasted in the USAN for abemaciclib =  CID 46220502  = UZWDCWONPYILKI-UHFFFAOYSA-N  as the result of the form-filing (not to mention Lilley paying up the dosh).






For the INN application  (for $12000!)  the following are mandated,   a) chemical name or description (including stereochemical information)   b) "graphical formula" (chemical image?),  c) confirmation of  CAS  RN and Index Name. There is also information from CAS on  getting the RN where they also ask for a) a chemical structure diagram, b) systematic chemical name, c) common names, and d) molecular formula.  They charge by time, starting at $150 USD for 30 mins!  The structural section of the INN form is shown below.


So we have can find the equivalent output for the abemaciclib INN application (below)


Hooked into CID 46220502 we have two US official sources for approve drug info.  The first is  ChemIDPlus (below)


Notwithstanding the utility of the Unique Ingredient Identifier (UNII) system  the entry for SID 198954176 (below) doesn't actually give me a whole lot of substance information or metadata


The SciFinder entry  looks like this


But chemicalise.org will only convert the standard IUPAC, not the one they use with the end transposed to the beginning.


And in this case there is another set of applications and records for the mesylate salt CID 71576678



I can note what seem like quirks, but if they do have a rational explanation please let me know.

Synonym variants are always a bad idea and we have the instance here of 60UAB198HK = UNII-60UAB198HK.

There are three non-identical IUPAC names between the INN and USAN documents

 N-{5-[(4-ethylpiperazin-1-yl)methyl]pyridin-2-yl}-5-fluoro- 4-[4-fluoro-2-methyl-1-(propan-2-yl)-1H-benzimidazol- 6-yl]pyrimidin-2-amine

 2-Pyrimidinamine, N-[5-[(4-ethyl-1-piperazinyl)methyl]-2-pyridinyl]-5-fluoro-4-[4-fluoro-2-methyl-1-(1-methylethyl)-1H-benzimidazol-6-yl]

N-{5-[(4-ethylpiperazin-1-yl)methyl]pyridin-2-yl}-5-fluoro-4-[4-fluoro-2-methyl-1-(1-methylethyl)-1H-benzimidazol-6-yl]pyrimidin-2-amine

Note also that each the Mw's  are slightly different


 For the record the Sept 2015 CID counts by PubChem synonym search are 8274 INNs, of these 763 weremixtures.  Of the  5671 USAN  2387  are mixtures.  For INN or USAN the union was  10952 with the AND intersect being only 2994Note here that any submitter could have added either of both of the syonyms (even wrongly), not necessarily ChemIDPlus or other official source.

So, here are some of the Qs that I hope folk can help with
  1. Why are there many more INNs (mapped to SIDs) than USANs?
  2. How many recent INNs don't get a USAN and are these just non-US applicants?
  3. So am I right in guessing that electronic records of the structure (e.g. mol, SD, SMILES, InChI) are never interchanged in the whole process by applicants for IND/CAS/INN/USAN even thought CAS (and the applicants of course) actually hold the mol files ? 
  4. My guess is that for 95% of applicants their substance already has a CAS RN from extraction of their own patents into SciFinder.  So can they then use the Registry look up service rather than a de novo application?  (phew- a mere $30) 
  5. I note the UNII  SIDs typically precede those from ChemID plus by about a  month.  So do they formally collaborate/coordinate ?  (I'm sure some of them get to chat in the bars of DC) 
  6. Any reasons for not being able to add any metadata to those FDA UNII SIDs ? 
  7. Are either/both taking the NDA as primary structure provenance or USAN/INN docs? 
  8. Has the defaulting to CAS structure resolution by applicants globally ever been questioned?
  9. Can anyone cite explicit cases where the CAS structure was unequivocally different to an INN/USAN example ?  
  10. Can anyone cite cases of structural differences for parent structures between INN and USAN documents?  (there are some in the PubChem SID indexing but this may be due to secondary source errors)
  11. Many companies have more than one code name from mergers, do they ever put both into the USAN - or just the newest? 
  12. Why is there no specific mention of IUPAC names in CAS,  USAN or INN  instructions
  13. Does either authority iterate with applicants w.r.t. structure specification ?
  14. Why do INN applications include the CAS no but not the approved INN docs?
  15. Why do INNs and USANs contain different IUPAC strings and structure renderings and which different specific software packages to generate these?  (e.g. ChemDraw?)
  16. Why do SciFinder/CAS transform IUPACS by bring the back to the front? Is this a form of ontological stemming or simply done to confound automated name-to-struc?) 
  17. How do the mandates, technicalities and outputs of the US operations of   AMA, FDA, FDA/SPL/UNII, ChemID, RxNorm and PubChem dovetail with each other ? 
  18. Is there any coupling between  clinicaltrials.gov and the  IND/INN/USAN systems?
  19. China is generating its own INN-like names, most of which seem to be for blinded structures.   Is there any documentation (with Chinese IUPACs?) and/or will they eventually become  USANs and or ANDAs to be prescribed in the US? 

A kinase inhibitor gift horse

$
0
0
The new paper "Comprehensive characterisation of the Published Kinase Inhibitor Set" (Nat. Biotech PMID 26501955) is certain to generate interest and commentary.  What I present here is  a brief how-to and analysis of the PubChem mappings.  The first step is to pull the 369  SMILES out of the useful supplementary data files (yea, belt-n-braces addition of IUPACs and  IChIs would have helped - but mustn't grumble...)


The good news is most had CID matches  via the PubChem structure upload. The minor bad news is we either lost 11 records via conversion and/or they were not in PubChem.  The initial mapping result is shown below.


OK lets see if we can resolve the lost ones..  First up the publication quotes 367  and from below we can de-dupe the SMILES column to 366


Next thing is to salt-strip the 67 chlorides and re-upload.  This gives 360 CIDs so we are down to 6 missing (if only we had those InChIs....).  One thing we can do is the cross-check conversion (as below)

This indicates 366 have passed the rules so the missing six don't have CID exact matches.  The challenge is to identify them in the SMILES set. To cut the story short I had to circle through OpenBable and the PubChem Identifier Exchange to identify the un-mapped SMILES and InChIs below


I then had to do a complete salt-strip of ~ 10 acetates etc.  which gives 361 CIDs.  I'm not going to resolve  the salt-parent mismatch sets just now but I think all 366 do have CIDs as salt and/or parent. The intersects are below


You can access the union of all  441 salts and parents via the MyNCBI link.   Selected breakdown of the matches are shown in the left hand facets below.


My custom filters are at the top and the constitutive standard set on the bottom.  There are many aspects to explore for those inclined but I'll just pick up a few.  As we might expect since they loaded the inhibitor set of 366 as CHEMBL2007667 the single-source IDs are ChEMBL (JFTR these synonym mapped to 360 CIDs that intersected with 350 of the 441 above).  Patent mapping is somewhat lower than I expected but still high enough to track-track back to large SAR sets in some cases at least.  In this case BioAssay is circular w.r.t. ChEMBL.  Pharmacological action capture via MeSH is very low but vendor coverage is not bad.

The 10 matches for GtoPdb are shown below but note we generally only curate full leads with IC50/Ki .  Our collation and surfacing of kinase inhibitors is outlined in our latest NAR Database Issue publication (PMID 26464438).  Seeing as our new Immunopharmacolgy project includes a druggable kinome extension aspect we are likely to triage the set in this paper for additional ligand mappings.


1.
an image of a chemical structure SID 252166845
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (8645)
Deposit Date: 
2015-08-20
 
Available Date: 
2015-08-20
 
Modify Date: 
2015-08-20
SID: 
252166845
   
[CID: 
25263088]
2.
an image of a chemical structure SID 252166844
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (8644)
Deposit Date: 
2015-08-20
 
Available Date: 
2015-08-20
 
Modify Date: 
2015-08-20
SID: 
252166844
   
[CID: 
44581765]
3.
4.
an image of a chemical structure SID 249565896
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (8216)
Deposit Date: 
2015-03-19
 
Available Date: 
2015-03-19
 
Modify Date: 
2015-10-19
5.
an image of a chemical structure SID 249565830
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (8150)
Deposit Date: 
2015-03-19
 
Available Date: 
2015-03-19
 
Modify Date: 
2015-03-19
SID: 
249565830
   
[CID: 
16051023]
6.
an image of a chemical structure SID 249565717
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (8037)
Deposit Date: 
2015-03-19
 
Available Date: 
2015-03-19
 
Modify Date: 
2015-03-19
7.
an image of a chemical structure SID 178102662
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (6040)
Deposit Date: 
2014-05-23
 
Available Date: 
2014-05-23
 
Modify Date: 
2014-11-13
8.
an image of a chemical structure SID 178102660
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (6038)
Deposit Date: 
2014-05-23
 
Available Date: 
2014-05-23
 
Modify Date: 
2014-11-13
9.
an image of a chemical structure SID 178102618
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (5996)
Deposit Date: 
2014-05-23
 
Available Date: 
2014-05-23
 
Modify Date: 
2014-11-13
SID: 
178102618
   
[CID: 
766949]
10.
an image of a chemical structure SID 178102312
Source: 
IUPHAR/BPS Guide to PHARMACOLOGY (5685)
Deposit Date: 
2014-05-23
 
Available Date: 
2014-05-23
 
Modify Date: 
2015-03-19

Lastly,  this OpenBabel query rendering is impressive.





Drugs by the numbers

$
0
0
So how many "drugs" (in the pharmacological context of approved medicines) are there?  Its actually a surprisingly difficult question to answer.  I am in the midst of drafting a manuscript to address the question primarily at the structural comparison level using sources I can either extract from,  or map into, PubChem and therby explicitly count them as CIDs.  The paper should be broadly along  the lines of this slide set  and poster. There is also some context in this drug naming document.  What can do here is point our a few salient numbers for which I would a)  welcome comment (including any citeable public sources I have missed)  and b) if any commercial sources not listed care care to openly communicate their own precise current numbers (and how these are defined) this would be appreciated.  I may choose to include these in the paper but I will need a named "personal communication" as a provenanced citation  (if the company concered has posted any kind of open slide sets with a stable url and named presenter, this should also do). Needless to see I wont expand too much on the comparitive intepretations since I am currently grappling with this for the paper.

To count drugs in the first instance you  can simply check our current release of  GtoPdb that records  1241 approved drugs  (where the gif  at the bottom of the page gives the current release numbers).  However, as we explain in the " Approved drugs" section of our latest NAR Database issue publication (PMID 26464438), our curation stringency produces an under-count for several reasons. For an orthogonal assessment we can look at  the numbers declared by the excellent public resource archetype of  DrugBank.  Below are the release updates from PMID 24203711 but with 4.3 being from the current release (note I have assigned the year as the real collation period not the NAR Database issue of the subsequent year).



What is also unique about DrugBank is that it is the only source in PubChem where a specific subset of approved CIDs can be directly selected (as opposed the selection needing to be done from a source download).  This has to be done via the SID tag via the query (approved[comment] AND "DrugBank"[SourceName]) but this comes in at 1533  (n.b. we plan to eventually introduce a similar select option our our own SIDs).  The CID pivot falls out at 1504. This is 150 down on the declared total on the page and we (the GtoPdb intersect) only agree on 804.  If we now intersect the 1504 with ChEMBL and Therapeutic Target Database the 3-way consensus drops to 1056.  Ad in the FDA FDA/SPL Indexing data entries and the 4-way drops to 810

While discordancies of this type are adressed in the paper there are many different numbers around. which can be listed as follows:












Yet more antimalarial dot joining and possible target deconvolution

$
0
0
Update 22 Nov


Nov 20
Looks interesting. This was OSM Series #2, abandoned when we heard another, closed, group was working on it

From the OSM  master sheet,  Series 2,  OSM-S-56 = JRC12 = O=S(N(C)C1=CC=C(Cl)C=C1)(C2=NN(C(N(C)C)=O)C=N2)=O  = CID 231212480 .   This is from the same US7238720 patent family but the GSK unconfirmed result was 88% inhibition for 3D7 at 2uM (ie not that potent but worth a comparable IC50 retest by someone)  

******************************************************

The seed for this post (as for some others) was a tweet



Med Chem Malaria Paper of the Day: J Med Chem ChEMBL Score: 10.46

Integrated bioactivity databases

$
0
0
As some of you may have picked up from social media we were pleased to make it into the NAR Database issue with "The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands" (PMID 26464438).  As ever, its turning out to be another good annual issue as you can see from the Advance access page (my referring efforts generated some tokens).  Between the new submissions and other publications an increasing number of what could be described as mega-portals are appearing as sparkling heavenly bodies in the informatics firmament.  These generally follow an aggregation/integration approach with some kind of pipelineing and we have been subsumed into some of these.  As a database team we realize the importance of understanding the comparative utility of other chemistry-to-protein resources. Notwithstanding, even doing this for a modest set of dbs takes a lot of work, as you can see from just some the charts (below) from  PMID 24533037


In GtoPdb we maintain a Useful Links page  to key databases.  This post will provide an adjunct by picking up at least some of the mega portals. There is no room for evaluation but I'll pick up the stats if surfaced (there is no particular order to the listing

STITCH 5  includes 9.6 million  proteins from 2031 eukaryotic and prokaryotic genomes with 430 000 chemicals

SureChEMBL ~17 million compounds ~14 million patent documents from 1970 to present.


Resolving TB actives shouldn't be this tough

$
0
0
Update:  finally instantiated web page that chemicalize.org could process but had to resort to Google sites



**********************************************

We know how it is in informatics; what should be simple rarely is. My attention was drawn to PMID 26642067 for a number of reasons (including noticing a past colleague, MH, from our SB days). The title and data link implied I should be able, as a short post,  to quickly resolve the compounds to PubChem IDs that the TB community would find a useful mapping.  The bad news was I fell at the first hurdle since the result tables were PDF-entombed  and the Dryad link initially embargoed.


Whether its causatively  related to my comment I can't tell but someone (not necessarily an author) did get the link opened as per below.



The data sheet dropped out OK  (n.b. I was the 5th to do this) but the only molecular specifications were salt-stripped SMILES.


None of the Tres Cantos IDs mapped into PubChem synonyms so I immediately thought I could chemicalize.org the page. For this I knew I had to move to an open url such figshare - but - I forgot this was a JChem for Excel  so no dice there, as we can see below from the mangled upload.


So tried editing to a plain SMILES sheet but that did not work either  :(


Undaunted, I then tried the other route to instantiate something chemicalize could get its teeth into i.e. pasting the SMILES into "this" url.  Alas, for some reason  chemicalize was having a bad day with my blog pages, but it did handle the direct paste-in



While I could not download the results (like you can from a url on a good day) this exersise at least  told me that all 50 had converted.  I then tried these same SMILES in the  PubChem ID mapper.  However we only get 32 exact match CIDs (i.e. 18 un-mapped)

************************************************

O=C(NCc1ccoc1)N1CCN(Cc2ccc3OCOc3c2)CC151085031
O=C(NCc1ccco1)N1CCN(Cc2ccc3OCCc3c2)CC145799294
Clc1ccc(CN2CCN(CC2)C(=O)NCc2ccco2)s145859503
Nc1ccccc1Nc1ccc(Cl)cc1522323
NCC(c1cccs1)S(=O)(=O)c1ccccc116641349
NS(=O)(=O)c1ccc(Nc2cnc3ccccc3n2)cc115238409
O=C(NCc1ccco1)N1CCN(Cc2cccs2)CC145799432
CNc1snc(C)c1C(=O)NCCOc1ccc(OC)cc140084202
COc1ccc(cc1)-c1nnc2sc(Cc3csc(C)n3)nn1239341739
NC(=O)c1ccc(CN(C2CC2)S(=O)(=O)c2ccccc2F)cc125645999
CN(Cc1ccc(Cl)cc1)C(=O)CCc1nc2ccccc2oc1=O17493069
Clc1nnc(N2CCCC2)c2ccccc1246507894
O=N(=O)c1cnc(s1)-c1nc2ccccc2[nH]168150020
CCCCNCC1COc2ccc(OC)cc2C1=O4431837
FC(F)(F)c1nnc2ccc(nn12)-n1cnc(n1)C#N25659433
Cn1cc(CCNC(=O)Nc2ccc(nc2)N2CCCCC2)cn151084784
CC(C)N(C(=O)CNc1ccccc1Nc1ccccc1)c1ccccc110784706
COc1ccc(Nc2ncccc2N)cc111447232
FC(F)(F)c1nnc2ccc(Sc3nnc(o3)-c3ccco3)nn1216605568
Cc1ccc2oc(SCC(=O)Nc3nccs3)nc2n122587612
COC(=O)c1ccc(Oc2ccc(cc2F)-c2csc(C)n2)nn145833714
OCCNC(=O)c1c(O)c2ncc(Cc3ccc(F)cc3C(F)(F)F)cc2[nH]c1=O69482575
COc1ccc(CCNC(=O)C2(CCCCC2)n2cnnn2)cc145493996
OC(=O)CCc1nc2ccccc2oc1=O685879
Nc1ccc(cc1)S(=O)(=O)Nc1ccnn1-c1ccccc15335
CC(C)Oc1ccc(cc1Cl)-c1nnc(s1)-c1ccc2CCNCCc2c149841684
FC(F)(F)c1cc(cn2c(Cl)c(nc12)C(=O)NCc1cccs1)C#N57774707
CCOc1cc2ccccc2cc1C(=O)NC40017245
Cc1ccc2n(CCNC(=O)c3ccccn3)c(cc(=O)c2c1)C(F)(F)F46965346
Fc1cccc(Oc2cc(nc(n2)-c2ccncc2)C2CCCNC2)c146958202
Cc1c([nH]c2CCCC(=O)c12)C(=O)NCCOc1ccccc132933638
COc1ccc(CN2CCN(CC2)C(=O)C=Cc2csc(C)n2)cc172058996
COc1ccc2nccc(NC(=O)C3CCN(CCc4ccc(Cl)cc4)CC3)c2n1
CCCC\C=C\c1cc2ccncc2cc1OC1CCNCC1
O=S(=O)(NCc1ccccc1)c1ccc(s1)-c1cc2ccncc2cc1OC1CCNCC1
CNc1nc(SC)nnc1-c1cccc(Cl)c1Cl
Nc1ccc(cc1O)C(O)CNCCc1ccccc1
FC(F)(F)c1ccccc1C(=O)Nc1nc-2c(CCCc3cccnc-23)s1
NC(=O)c1cc2ccc(CC(=O)c3cccs3)cc2s1
Cc1n(C)nc2c(Nc3cnn(Cc4ccccc4F)c3)nc(C)nc12
CCOCn1ccc(NS(=O)(=O)c2ccc(N)cc2)nc1=O
FC(F)(F)Oc1ccc(cc1)-c1cc2ccncc2cc1OC1CCNCC1
COc1ccc(Nc2nc(N[C@@H]3CCNC3)nc3cc[nH]c(=O)c23)cc1NC(=O)c1ccccc1
C(COc1ccc(cc1)-c1cnc(s1)-c1ccccc1)Cn1ccnc1
Fc1ccc(Cc2nnc([nH]2)-c2ccnc(Nc3ccc(OCCN4CCOCC4)cc3)n2)cc1
CC1CC1c1ccc(CN2CCc3nc(Nc4ccccc4)sc3CC2)o1
CC(C)c1noc(n1)N1CCC(CC1)[C@H](C)Oc1cnc(cn1)-c1ccc(c(F)c1)S(C)(=O)=O
Fc1ccc(CC2CCN(CC2)C(=O)Nc2nc(cs2)-c2ccccn2)cc1
CCCCc1ccc(nc1)C(=O)Nc1nnc(s1)S(=O)(=O)CCCC
O=C1O[C@@]2(CN1c1cccnn1)CC[C@H](CNc1ccc(cn1)-c1cccc(c1)C#N)CC2

                     ************************************************
You can find the 32 live CIDs  here  that were vendor-rich (28 matches) if anyone wants to follow these up in their own TB experiments.


None of these had any PubChem Bioassay activity against Mycobacterium but, not unexpectedly, nine had automated patent extraction hits.  For example, the CID below hit a GSK HIV integrase patent  US20070124152.



These results indicate the Tres Cantos team has included 18 novel  (i.e. PubChem -ve) structures.  Beyond establishing these had an 97% Tanimoto similarity envelope of 40 2D neghbours in PubChem I'm not inclined to follow these up in any detail (especially since its the holidays). Notwithstanding, I could be persuaded if anyone in the TB community perceives high utility in doing this (and I can maybe get a beer off someone).

While I'm sure this paper is a significant step forward for TB drug research, surfacing structures for the wider community to follow-up (including for in silico modelling e.g. PMID 26517557) would have been soooooo much easier if the authors had taken the simple step of submitting the 50 structures as a PubChem Bioassay entry including the TCMDC codes (n.b. they could always do this retrospectively).

***********************************************

Just for reference and Google indexing you can get the InChIs (or other formats) out with Open Babel



LBEJUQWGEGOTNX-UHFFFAOYSA-N
CZPFRNLXROZEKS-UHFFFAOYSA-N
QJZNRXFIKWQOMX-AATRIKPKSA-N
YHXIWTLXVHWVLB-UHFFFAOYSA-N
ZDJZZRYSIFHZLQ-UHFFFAOYSA-N
WEUBIWJPIRTWDF-UHFFFAOYSA-N
WMHJBFDXPSMQBH-UHFFFAOYSA-N
CCXYYNHQDVOMEP-UHFFFAOYSA-N
ARMXGZLQWRQJHR-UHFFFAOYSA-N
PEDXWMRFTCNNTO-UHFFFAOYSA-N
NUGZAVACCBZKET-UHFFFAOYSA-N
CEJMGVAUCRFEQF-UHFFFAOYSA-N
VFNJTESZINGMOB-UHFFFAOYSA-N
TZPGSUQEDFUBRS-UHFFFAOYSA-N
KLHFYUUEXVAKFY-UHFFFAOYSA-N
VDQAHMWFBJZNLW-UHFFFAOYSA-N
MTSCFSRPNQRFPX-UHFFFAOYSA-N
XPRHLIFIHMNQEQ-UHFFFAOYSA-N
JEMIDLIMWOSGSA-UHFFFAOYSA-N
SVVHWHAXGGQMHJ-UHFFFAOYSA-N
XXHAJKCQSCKWNX-UHFFFAOYSA-N
VAVFBJFLXGHSPQ-UHFFFAOYSA-N
SPTZFPQHHPQVTO-UHFFFAOYSA-N
GTHFECLUXGSWER-UHFFFAOYSA-N
OPQJGFNBGVOHIM-QGZVFWFLSA-N
DWKQFRKKOQEXDI-UHFFFAOYSA-N
KLFZCKQHRVRNEJ-UHFFFAOYSA-N
GIVJEFBEUGNFPE-UHFFFAOYSA-N
XUCSQYPDVPHCIF-UHFFFAOYSA-N
GAKVHRACPVAOAC-UHFFFAOYSA-N
ANMZTQHJGKKCTO-UHFFFAOYSA-N
YLAZLYWDRKNILS-UHFFFAOYSA-N
PWTJVNAVXWYYHS-UHFFFAOYSA-N
WZDNTLZQCPQIGQ-UHFFFAOYSA-N
RGGFUUNDIDMLOO-UHFFFAOYSA-N
DWNSETYANILQHK-HNNXBMFYSA-N
YYIZDVGMIXDNCZ-UHFFFAOYSA-N
DSUDTUMHRCBCDJ-UHFFFAOYSA-N
GIFJHLAHTBXAFT-UHFFFAOYSA-N
QWCJHSGMANYXCW-UHFFFAOYSA-N
KPRMOLIOPCPHSJ-UHFFFAOYSA-N
PUFZDIOXMXCLFK-UHFFFAOYSA-N
XBSKAZRTIYWPDS-UHFFFAOYSA-N
IUVPMHZRMHZWPW-YGFACIEFSA-N
LWIBMEYBVJUXDE-UHFFFAOYSA-N
BWTDZAAFNWHDGR-UHFFFAOYSA-N
CSVHOSAOYURTOY-UHFFFAOYSA-N
ZJGVYQXYGBOOLQ-UHFFFAOYSA-N
MOGQQJIVYRWYFF-UHFFFAOYSA-N
GSNHABFVSAKIRC-UHFFFAOYSA-N



Cogitation on Open Trials

$
0
0
I freely admit my response to this tweet was too hasty;



However, the knee-jerk had an explanation since I had just been collating a post about integrated bioactivity databases.  Considering there are already 17 of these (and probably more on the way) we are, in the community sense, heading towards overkill and paralysis of choice. Hence the groan about YANAD (Yet ANother Aggregated Database).  Notwithstanding, watching the 50 min video by BG actually converted me to the virtuosity of this new resource.

I have an interest in this from a few angles.  Firstly we wrote a magazine piece in 2012 "Connecting Up: assessing the name space and molecular mappings of the drug interventions in ClinicalTrials.gov" wherein we discussed the features of the web interface for identifying the names and molecular details of the drugs specified as interventions and resolving the different types of drug names against chemical structures and their associated clinical data. Secondly, in 2013 we (a different author team) wrote  Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates wherin we allude to analogous difficulties of assigning company development codes (that had been in Phase 1) to structures, including cases of complete blinding (i.e. neither open nor commercial name-to-struc mapping).  Thirdly I have a few blog posts related to transparency including one about J and J clinical trials.

But fourthly, and most importantly, in our own curation of GtoPdb content  (PMID 26464438) we could really use a better link to clinical trials information, especially for new drug candidates. In addition we have grappled with many of the issues in the BG video  as well as some of our own assiocated with resolving code names to structures.  I will also suggest to the good folk at Open Trials that we (as GtoPdb) can offer to be their chemistry linking partner.  This could have a number of reciprocal advantages including exerting some leverage not only on the many cases of blinded trial entries that do not explicitly identify the drug intervention but also missing Phase 1 entries and/or results. 

Fruits of our labor: the Concise Guide to Pharmacology

$
0
0
Its not every day that ones PubMed count pips up by nine papers, representing an increase of 13% in my modest total. What you can see below, with the flyer in the background, is the nice piece of swag that accompanied the launch of the Special Issue: The Concise Guide to PHARMACOLOGY 2015/16, including the hefty print version, at the December BPS Pharmacology 2015 meeting in London.


The giveaway was a wrist band USB drive with each of the articles as per below



The 2015 FDA-approved drugs

$
0
0
Update: 6th Jan, Blogger is loosing some of the link highlighting later in the post, but, via mouse-over, you can get out to them.

This is an annual follow-on from previous years (e.g.2014,  2013 and 2012). Adding the NMEs and the biologics comes to 51. Good news for a good year then but I shall link to the pundit commentaries as they surface. My starting point is the official  FDA 2015 list ( not he 2015 biologicals) from which I will explore some of the molecular mapping methods and ways to identify the Guide to PHARMACOLGY (GtoPdb)  entries in particular.


 First up is to see how  chemicalize.org handles the FDA webpage off the bat.


The auto look-up result of 33 structures is not bad but you can see dictionary true positives in the drug column are patchy with only 19 from the 45. Next up we can try the PubChem Identifier Exchange Service which, after splitting the mixtures, did a pretty good job  (CIDs on the right, below)

lesinurad 53465279
selexipag 9913767
sugammadex 6918584
alectinib 49806720
ixazomib 25183872
cobimetinib 16222096
elvitegravir 5277135
cobicistat 25151504
emtricitabine 60877
tenofovir alafenamide 9574768
mepolizumab 56603701
trabectedin 108150
aripiprazole lauroxil 49831411
trifluridine 6256
tipiracil 6323266
cariprazine 11154555
uridine triacetate 20058
rolapitant 10311306
flibanserin 6918248
daclatasvir 25154714
sonidegib 24775005
brexpiprazole 11978813
sacubitril 9811834
valsartan 60846
lumacaftor 16678941
ivacaftor 16220172
eluxadoline 11250029
deoxycholic acid 222528
ivabradine 132999
cholic acid 221493
isavuconazonium sulfate 72196309
ceftazidime 5481173
avibactam 9835049
panobinostat 6918837
lenvatinib 9823820
palbociclib 5330286
edoxaban 10280735
sebelipase alfa
elotuzumab
necitumumab
daratumumab
osimertinib
asfotase alfa
patiromer
idarucizumab
insulin degludec
evolocumab
alirocumab
canegrelor
dinutuximab
parathyroid horomone
secukinumab

As expected, none of the "mabs" get a CID, with the exception of an obvious false-positive for mepolizumab (CID 56603701)


This is clearly a cross-over from CID 176596 that goes under the name of  Momordin Ic or Scoparianoside B. It looks like TTD DCL000561  are the source of the false positive (i.e. they submitted an SID erooneousy linking the name with the small-molecule). What I can't work out is how ChemIDplus managed to transitively inherit this false positive in their SID when their source entry does not show this mistake.

We had a true positive entry SID 223366020 for mepolizumab.  Because it not possible map directly against SIDs in PubChem Identifier Exchange Service we need a Boolean  workround viz (elotuzumab[All Fields] OR necitumumab[All Fields] OR daratumumab[All Fields] OR idarucizumab[All Fields] OR evolocumab[All Fields] OR (alirocumab[All Fields] AND dinutuximab[All Fields]) OR secukinumab[All Fields] AND "IUPHAR/BPS Guide to PHARMACOLOGY"[SourceName])

This clunky query actually produced clean list of  GtoPdb Ab entries (the SID links are in the first part of the url and the links direct to GtoPdb in the bracketed ID numbers)

We can do a complementary name mapping by simply intersecting all the ligand names in GtoPdb (via the Download CSV file (3MB) link) with the FDA list



So that leaves these "missing" names, that for various reasons (see ) are not in our capture brief anyway

sugammadex is the first selective relaxant binding agent (i.e. antidote to a drug) of which there are three CIDs with name matches - so take your choice, FDA UNII as the sodium salt, patent extraction as the "flat" or publication extraction with the stereo.




sebelipase alfa:  is a recombinant lysosomal acid lipase/cholesteryl ester hydrolase P38571
osimertinib:  turns out we had this as AZD9291  (so it will be updated)
elvitegravir: this is CID 5277135 but we don't do antiinfectives
emtricitabine
tenofovir alafenamide
asfotase alfa
patiromer
aripiprazole lauroxil
insulin degludec
uridine triacetate
daclatasvir
sonidegib
canegrelor
isavuconazonium sulfate
ceftazidime
avibactam
parathyroid horomone

The explicit CIDs for anti-retroviral fixed combinations like  Genvoya (elvitegravir, cobicistat, emtricitabine, and tenofovir alafenamide) can be difficult to find, as mixtures in general.  Of the five in this years list the CID query (Entresto OR Avycaz OR Orkambi OR Lonsurf OR Genvoya) only brings back two, that were both trade name linked by ChemIDPlus


The latter mixture, Enestro was the subject of a previous blog post

The 27 CIDs that match  GtoPdb SIDs are listed below, but they include some mixture splits


The unfortunate case of BIA-10-2474

$
0
0
Update 21 Jan:  Via two sources (thanks) I received a copy of the protocol now made public by Le Figaro with the following on page 26.


Regardless of the somewhat quirky pyridine oxide tuatomeric charge state,  this maps to  CID 46831476 (below)



It turns out a lot of sources had independently extracted this from Bial patents (except Discovery Gate who piggy-backed off Thomson/Derwent)


So egg on face?  A smear perhaps but the digging below was done in good faith.  The good news is that those folk engaged with modelling and/or planning experiments can get cracking (although the word from one of my chemical vendor contacts was that this looks tough to make). Note also it is not an irreversible inhibitor, according to Bial (but lets hope they tried some long pre-incubation times just to make sure).  SureChEMBL does a good job with the patent mapping, but on first glance these are US-only.


This is probably because the chemistry dropped out better from the USPTO xmls and CWUs. INPADOC indicates the family has a PCT as WO2010074588.


SciFinder also produces a document mapping  but  using  different representation rules- viz RN 1233855-46-3 below.


This results in an unusual multi-mapping to three different PCTs (WO2014017938, WO2010074588 and WO2012015324)  Note below that Wikipedia and I had also picked up on 4588 but not the structure.  So was CID 72734378 a red herring?  Since this would have been a rather expensive one (in terms of synthetic chemisty, drafting and patent filing costs) it seems more likely to have been a back-up for BIA 10-2474.  

For the record, the Wikipedia entry has also now switched to the new structure.  I will follow up later and/or a new post on molecular details


          *********************************************************************

Update 19th/20th Jan: latest to join the commentaries (the mentions of  GtoPdb were appreciated) are Nature News  Forbes and C&EN.  Since speculation has arisen, I could find no data on BIA-102474 vs FAAH2 (Q6GMR7).  Jansen have announced the precautionary suspension of their Phase 2 FAAH inhibitor trial. There are no links on their site but open sources suggest its JNJ-42165279  (CID 54576693) so we will update this in GtoPdb.

Update 18th Jan:  My IUPHAR colleague and co-author SA, an expert on the endocanbinod system is now quoted in a Science news release from 16th of Jan and I have also received expressions of journalistic interest.  While I need not disclose who was asking what, I can add some points for the record in advance.  There have been two transparency failures in that BIA neither publicly disclosed the molecular structure of BIA-102474 nor submitted a clinicaltrials.gov entry for the trial (regardless as to whether they were obligated to or not).  This means that I and others (including Wikipedia editors) have had to guess the likely chemistry from a recent patent. To be equable, Bial are by no means the only drug development operation to blind code names for trial compounds in their portfolio listings, press releases or clinicaltrials.org and they have been compliant in the past (e.g.  Efficacy and Safety of Eslicarbazepine Acetate (BIA 2-093) in Acute Manic Episodes Associated With Bipolar I Disorder NCT01822678)  However, we have "grumbled" about this opaque blinding practise in a publication and several of my blog posts have alluded to the same issue (e.g. the Merck BACE1 inhibitor).

The problem for Bial is what option they now have for retrospective transparency. Openly provenancing the molecular structure of BIA-102474 could have been accomplished by publishing a good journal paper and/or applying for an INN that the WHO make public. The former option remains open but not the latter obviously.  However, I suggest under these unprecedented circumstances there is nothing stopping them submitting to databases such as PubChem and ChemSpider (I could even help facilitate this). They might also consider a retrospective entry in clinicaltrials.gov in the interests of transparency and public data records (n.b. on a good day all three can be linked, CID, PMID and NCTID and even to PubChem BioAssay as a 4th).  As we know,  having the molecular structure allows anyone not only run in silco predictions (including docking)  of possible toxicological liabailties and so-called off-target effects but also to run real experiments to test such predictions, one of which could be tracking the sites of covalent modification by radio-labeling (I even have some experience of this from my protein chemistry past).  Realistically, none of this would help the unfortunate patients in this case, but it could add to our corpus of pharmacological and toxicological  knowledge for the future.


Update 16th Jan:  For the record, it needs to be born in mind that the name-to-structure below remains inferred.  The molecular identity of BIA-10-2474 can only be formally verified directly by BIAL or indirectly from regulatory documentation they may have submitted (n.b. the disclosure or not of which, has nothing directly to do with the accident).  The literature in this area is extensive but a detailed 2011 review (with linked forward citations)  is "The Discovery and Development of Inhibitors of Fatty Acid Amide Hydrolase (FAAH)" (PMID 21764305  OA).   Note that the Guide to PHARMACOLOGY entry for FAAH has useful links,  including for the target (Swiss-Prot O00519) and selected published inhibitors such as PF-04457845 (these will be updated in the next release).  Note also there is now some in silico modelling  (Bayesian no less) going on  here and here  (n.b. if anyone has this FAAH patent review I'd appreciate the PDF since Edinburgh does not subscribe)

***********************************************

One aspect of the 15 Jan announcements on the clinical trial disaster in France (see in the pipeline and the BBC ) revolved around what BIA-10-2474 actually is.  An informed guess from an In the Pipeline comment, on the basis of the Bial patent portfolio,  suggested that BIA-10-2474 could be  irreversible covalent inhibitor of FAAH.  The recent synthesis patent WO2015112036 is thus highly likely to be the lead


This turns out to be CID 72734378  SXKWDPMBWTZYCA-UHFFFAOYSA-N,  first pulled out  from a patent by Thomson Pharma in 2014 followed by SureChEMBL in 2016


This was shown to have an IC50 of 27 nM from assaying mouse brain homogenate in WO2014017936 but they describe going for liver FAAH inhibition rather than brain.  However, this patent  is largely a single-structure filing (i.e. not much SAR).  The Wikipedia entry (see below) cites this back to  the series from WO2010074588.  This has no less than 596 examples from the same team against the same target with partial % inhibition data. However, neither SciFinder nor SureChEMBL actually connect CID 72734378 between the two documents.   This structure has a  Tanimoto similarity shell of 255 in PubChem but this is vendor-heavy and without  any BioAssay results so far.

Perhaps unsurprisingly I was beaten to the surfacing of the key structure, literally by minutes,  from an enterprising but so far anonymous  Wikipedia editor  (but not the ITP commenter  M ) to which I have added minor edits


Google moves quickly so the current ranking for the code name is below, but I only make it into 7th just now.


The code number is specified on the BIAL website as per their pipeline


So this becomes yet another case of  a "blinded" clinical structure (cf  the MK-8931 story). The company has a lot of entries in clinicaltrials.gov - but none for BIA-10-2474


They also have a publication record but it looks like they tend to leave these until after the INN (i.e. more pharmacological reports rather than primary medicinal chemistry papers)

It turns out that we (GtoPdb) have a number of FAAH inhibitors listed as you can see below





Molecular details related to BIA 10-2474

$
0
0
Context to this post is given in the one for BIA-10-2474 and extensive commentary is appearing all over. What I can do here is add supplementary detail that I hope will be of use.  While the utility is likely for the in silico modellers in the first instance it could also help those planning in vito and even in vivo work in the near future. Both avenues have the possibility to provide data-supported mechanistic insight and testable hypotheses to help prevent future problems.  There will be some repetition of the basic facts but I will focus on extending them (I'd like to acknowledge PN for help with intra-patent mapping).

According to Bials own documented  name-to-struc (released yesterda) y BIA-10-2474 is example 363 (of 589!)  in WO2010074588 which corresponds to CID 46831476  DOWVMJFBDGWVML-UHFFFAOYSA-N.



The activity mapping in this patent is obscure (no IC50s) and patchy.  The best we can find is % activity figures as below










Med chem starting points for Zika

$
0
0
The Zika virus global emergency needs no introduction but the Lancet report ,  WHO updates and Wellcome Trust provide authoritative epidemiological details.  The issues around driving the medicinal chemistry anti-viral approaches forward as fast as possible  echo directly back to the Ebola virus crowd-sourcing call I posted last year. This lead to co-authorship on  Finding small molecules for the 'next Ebola' (PMID 25949804) but without any of us realising that the "next" would arrive in a mere 12 months. Yesterdays tweet stream provide a useful barometer but many others are also engaging.

@marshgroup Feb 4,  We'll need some funding @cdsouthan @ChemConnector @collabchem @HelenBranswell @wellcometrust @The_MRC @WHO :-) and @WarwickLifeSci

On the med chem front SE has already submitted  Open Drug Discovery for Zika Virus to F1000 Reseach  (the same avenue we used for PMID 25949804).  The draft manuscript is now open at Figshare and provides molecular background for this post.  At this stage I will point out ways of doing things that can enable others to dig out the data.  The first check is of course PubMed but we need to set the date back to Dec 2015 to filter out a rash of  34 articles that are mostly commentaries not scientific reports on Zika. Using this filter gives a mere 99 for "Zika Virus" 2317 for "Ebola Virus" and 78613 for "Dengue Virus".  So, while this is bad news on the "orphan" status of Zika, its good news that the phylogenetic stablemate Dengue has true positives for data within the set.

There is a lot of Dengue stuff including this useful  Dengue Drug Targets Database from which the snapshot below was taken.


So how close is this to Zika in the Flavivirus family.  One approach is to fish out the Zika polyprotein (e.g. Q32ZE1_9FLAV) and BLAST against UniProt 90.  Coming in after a whole lot of viruses I've never heard of we eventually (but solidly at E-0) hit the Dengue strains, as shown below.


Obviously we have target choice precendents from these polyproteins that are well studied, including the protease componant, of which one of the Ziki versions is below

tr|Q32ZE1|1499-1676
SGALWDVPAPKEVKKGETTDGVYRVMTRRLLGSTQVGVGVMQEGVFHTMWHVTKGAALRSGEGRLDPYWGDVKQDLVSYCGPWKLDAAWDGLSEVQLLAVPPGERARNIQTLPGIFKTKDGDIGAVALDYPAGTSGSPILDKCGRVIGLYGNGVVIKNGSYVSAITQGKREEETPVEC

The next obvious step is a BLAST of this against the PDB  to give the result set below.


The good news here is the even higher sequence identity would underpin the obvious idea of x-screening (i.e. testing data-suported Dengue protease inhibitors against the Ziki protease)


Via PDB  we can visualise some of the inhibitors (e.g. below)


And then pick up compounds off the bat, direct from PubChem via MMDB


We can of course find other routes to the chemical structures, for example querying European PubMed Central with "dengue virus protease" inhibitor  gives 33 papers.  Adding the restrict for ChEMBL only produces two hits.


But relaxing the stringency to just Dengue + ChEMBL pulls back 65


These ChEMBL entries can pull back a lot of chemistry. However, two problems are the curation lag time (at least a year or so in this case) and  some pretty low potencies are being written up (but these viral proteases are typically quite "floppy" so nM inhibitors are unusual anyway).  As a means of bringing these searches up to date we can simply try Dengue in  J.Med Chem.  This returns 49 but the top two below are useful true positives, including a timely review.


The review looks very useful but the leads are only described as images so its hard work to map these using OSRA and/or the primary citations (which I cant do just now).  The second paper (PMID 26562070) as an example provides the SMILES downloads (see below)


But unfortunately  these authors did not add the inhibition values into the sheet :( but we can do the work of mapping these in from the paper.  The most potent compound is 83 with 12 nM Ki vs DNV protease but is unfortunately not in PubChem


The authors did manage a docking study as you can see from the DNV result below





Viewing all 176 articles
Browse latest View live