CReM: chemically reasonable mutations framework

The open-source framework for fragment-based generation of chemical structures. The idea is similar to matched molecular pair - if two fragments are in the identical contexts they can replace each other to produce new chemically valid and probably synthetically feasible structures.

Features:

  1. Generation of chemically valid structures
  2. Control of chemotypes of generated structure by choosing a radius of a chemical context considered during replacement. No new fragments which are absent in a fragment database and have the size equal or less then the specified context radius will appear in generated structures.
  3. Indirect control of synthetic accessibility of generated structure by managment of the content of a fragment database. A database created from more synthetically accessibe compounds will result in more synthetically accessibe structures.

Implementation features:

  1. Generation of a custom fragment database
  2. Three modes of structure generation: MUTATE, GROW, LINK
  3. Context radius to consider for replacement
  4. Selection of the fragment size to replace and the size of a replacing fragment
  5. Protection of atoms from modification (e.g. scaffold protection)
  6. Replacements with fragments occurred in a fragment database with certain minimal frequency
  7. Make randomly chosen replacements up to the specified number

Repository:
https://github.com/DrrDom/crem

Links to fragment databases:
All fragment databases are populated with physico-chemical properties of fragments such as MW, logP, RTB, TPSA, fraction Csp3 and the number of pharmacophore features of each type. All CReM databases can be found here - https://doi.org/10.5281/zenodo.16909329.
DB link Source Molecule filters Fragment filters Size
chembl22_sa2.db.gz ChEMBL22
  • only organic atoms (C,N,O,S,P,F,Cl,Br,I)
  • no alerts from BMS, Dundee, Glaxo, Inpharmatica and PAINS
  • maximum synthetic accessibility score (SA) 2
maximum number of heavy atoms 20 350.2 MB
chembl22_sa2_hac12.db.gz ChEMBL22
  • only organic atoms (C,N,O,S,P,F,Cl,Br,I)
  • no alerts from BMS, Dundee, Glaxo, Inpharmatica and PAINS
  • maximum synthetic accessibility score (SA) 2
maximum number of heavy atoms 12 158.9 MB
chembl22_sa25_hac12.db.gz ChEMBL22
  • only organic atoms (C,N,O,S,P,F,Cl,Br,I)
  • no alerts from BMS, Dundee, Glaxo, Inpharmatica and PAINS
  • maximum synthetic accessibility score (SA) 2.5
maximum number of heavy atoms 12 886.8 MB
chembl33_sa2_f5.db.gz ChEMBL33
  • only organic atoms (C,N,O,S,P,F,Cl,Br,I)
  • no alerts from BMS, Dundee, Glaxo, Inpharmatica and PAINS
  • maximum synthetic accessibility score (SA) 2
maximum number of heavy atoms 15 281.3 MB
chembl33_sa25_f5.db.gz ChEMBL33
  • only organic atoms (C,N,O,S,P,F,Cl,Br,I)
  • no alerts from BMS, Dundee, Glaxo, Inpharmatica and PAINS
  • maximum synthetic accessibility score (SA) 2.5
maximum number of heavy atoms 15 1.9 GB
chembl33_f5.db.gz ChEMBL33
  • only organic atoms (C,N,O,S,P,F,Cl,Br,I)
  • no alerts from BMS, Dundee, Glaxo, Inpharmatica and PAINS
maximum number of heavy atoms 15 6.6 GB
enamine2025_sa2_f5.db.gz Enamine stock 2025
  • only organic atoms (C,N,O,S,P,F,Cl,Br,I)
  • no alerts from BMS, Dundee, Glaxo, Inpharmatica and PAINS
  • maximum synthetic accessibility score (SA) 2
maximum number of heavy atoms 15 776.9 MB

General workflow

CReM scheme



Results of Guacamol tests for implemented CReM-based stochastic search and reference approaches

task SMILES LSTM* SMILES GA* Graph GA* Graph MCTS* CReM
Celecoxib rediscovery 1.000 0.732 1.000 0.355 1.000
Troglitazone rediscovery 1.000 0.515 1.000 0.311 1.000
Thiothixene rediscovery 1.000 0.598 1.000 0.311 1.000
Aripiprazole similarity 1.000 0.834 1.000 0.380 1.000
Albuterol similarity 1.000 0.907 1.000 0.749 1.000
Mestranol similarity 1.000 0.79 1.000 0.402 1.000
C11H24 0.993 0.829 0.971 0.410 0.966
C9H10N2O2PF2Cl 0.879 0.889 0.982 0.631 0.940
Median molecules 1 0.438 0.334 0.406 0.225 0.371
Median molecules 2 0.422 0.38 0.432 0.170 0.434
Osimertinib MPO 0.907 0.886 0.953 0.784 0.995
Fexofenadine MPO 0.959 0.931 0.998 0.695 1.000
Ranolazine MPO 0.855 0.881 0.92 0.616 0.969
Perindopril MPO 0.808 0.661 0.792 0.385 0.815
Amlodipine MPO 0.894 0.722 0.894 0.533 0.902
Sitagliptin MPO 0.545 0.689 0.891 0.458 0.763
Zaleplon MPO 0.669 0.413 0.754 0.488 0.770
Valsartan SMARTS 0.978 0.552 0.990 0.04 0.994
Deco Hop 0.996 0.970 1.000 0.590 1.000
Scaffold Hop 0.998 0.885 1.000 0.478 1.000
total score 17.341 14.398 17.983 9.011 17.919

Citation:

Polishchuk, P., CReM: chemically reasonable mutations framework for structure generation. Journal of Cheminformatics 2020, 12, (1), 28. - https://doi.org/10.1186/s13321-020-00431-w

© Pavel Polishchuk 2010-2024