torchdrug.datasets

Knowledge Graph Datasets

FB15k

class FB15k(path, verbose=1)[source]

Subset of Freebase knowledge base for knowledge graph reasoning.

Statistics:
  • #Entity: 14,951

  • #Relation: 1,345

  • #Triplet: 592,213

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

FB15k237

class FB15k237(path, verbose=1)[source]

A filtered version of FB15k dataset without trivial cases.

Statistics:
  • #Entity: 14,541

  • #Relation: 237

  • #Triplet: 310,116

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

WN18

class WN18(path, verbose=1)[source]

WordNet knowledge base.

Statistics:
  • #Entity: 40,943

  • #Relation: 18

  • #Triplet: 151,442

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

WN18RR

class WN18RR(path, verbose=1)[source]

A filtered version of WN18 dataset without trivial cases.

Statistics:
  • #Entity: 40,943

  • #Relation: 11

  • #Triplet: 93,003

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

Hetionet

class Hetionet(path, verbose=1)[source]

Hetionet for knowledge graph reasoning.

Statistics:
  • #Entity: 45,158

  • #Relation: 24

  • #Triplet: 2,025,177

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

Molecule Property Prediction Datasets

BACE

class BACE(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Binary binding results for a set of inhibitors of human \(\beta\)-secretase 1(BACE-1).

Statistics:
  • #Molecule: 1,513

  • #Classification task: 1

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

BBBP

class BBBP(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Binary labels of blood-brain barrier penetration.

Statistics:
  • #Molecule: 2,039

  • #Classification task: 1

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

CEP

class CEP(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Photovoltaic efficiency estimated by Havard clean energy project.

Statistics:
  • #Molecule: 20,000

  • #Regression task: 1

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

ChEMBLFiltered

class ChEMBLFiltered(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]
Statistics:
  • #Molecule: 430,710

  • #Regression task: 1,310

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

ClinTox

class ClinTox(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Qualitative data of drugs approved by the FDA and those that have failed clinical trials for toxicity reasons.

Statistics:
  • #Molecule: 1,478

  • #Classification task: 2

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

Delaney

class Delaney(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Log-scale water solubility of molecules.

Statistics:
  • #Molecule: 1,128

  • #Regression task: 1

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

FreeSolv

class FreeSolv(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Experimental and calculated hydration free energy of small molecules in water.

Statistics:
  • #Molecule: 642

  • #Regression task: 1

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

HIV

class HIV(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Experimentally measured abilities to inhibit HIV replication.

Statistics:
  • #Molecule: 41,127

  • #Classification task: 1

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

Lipophilicity

class Lipophilicity(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Experimental results of octanol/water distribution coefficient (logD at pH 7.4).

Statistics:
  • #Molecule: 4,200

  • #Regression task: 1

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

MUV

class MUV(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Subset of PubChem BioAssay by applying a refined nearest neighbor analysis.

Statistics:
  • #Molecule: 93,087

  • #Classification task: 17

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

Malaria

class Malaria(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Half-maximal effective concentration (EC50) against a parasite that causes malaria.

Statistics:
  • #Molecule: 10,000

  • #Regression task: 1

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

OPV

class OPV(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Quantum mechanical calculations on organic photovoltaic candidate molecules.

Statistics:
  • #Molecule: 94,576

  • #Regression task: 8

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

QM8

class QM8(path, node_position=False, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Electronic spectra and excited state energy of small molecules.

Statistics:
  • #Molecule: 21,786

  • #Regression task: 12

Parameters
  • path (str) – path to store the dataset

  • node_position (bool, optional) – load node position or not. This will add node_position as a node attribute to each sample.

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

QM9

class QM9(path, node_position=False, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Geometric, energetic, electronic and thermodynamic properties of DFT-modeled small molecules.

Statistics:
  • #Molecule: 133,885

  • #Regression task: 12

Parameters
  • path (str) – path to store the dataset

  • node_position (bool, optional) – load node position or not. This will add node_position as a node attribute to each sample.

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

SIDER

class SIDER(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Marketed drugs and adverse drug reactions (ADR) dataset, grouped into 27 system organ classes.

Statistics:
  • #Molecule: 1,427

  • #Classification task: 27

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

Tox21

class Tox21(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Qualitative toxicity measurements on 12 biological targets, including nuclear receptors and stress response pathways.

Statistics:
  • #Molecule: 7,831

  • #Classification task: 12

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

ToxCast

class ToxCast(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Toxicology data based on in vitro high-throughput screening.

Statistics:
  • #Molecule: 8,575

  • #Classification task: 617

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

ZINC250k

class ZINC250k(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Subset of ZINC compound database for virtual screening.

Statistics:
  • #Molecule: 498,910

  • #Regression task: 2

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

ZINC2m

class ZINC2m(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

ZINC compound database for virtual screening. This dataset doesn’t contain any label information.

Statistics:
  • #Molecule: 2,000,000

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

MOSES

class MOSES(path, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Subset of ZINC database for molecule generation. This dataset doesn’t contain any label information.

Statistics:
  • #Molecule: 1,936,963

Parameters
  • path (str) – path for the CSV dataset

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

Retrosynthesis Datasets

USPTO50k

class USPTO50k(path, as_synthon=False, verbose=1, transform=None, lazy=False, node_feature='default', edge_feature='default', graph_feature=None, with_hydrogen=False, kekulize=False)[source]

Chemical reactions extracted from USPTO patents.

Statistics:
  • #Reaction: 50,017

  • #Reaction class: 10

Parameters
  • path (str) – path to store the dataset

  • as_synthon (bool, optional) – whether decompose (reactant, product) pairs into (reactant, synthon) pairs

  • verbose (int, optional) – output verbose level

  • transform (Callable, optional) – data transformation function

  • lazy (bool, optional) – if lazy mode is used, the molecules are processed in the dataloader. This may slow down the data loading process, but save a lot of CPU memory and dataset loading time.

  • node_feature (str or list of str, optional) – node features to extract

  • edge_feature (str or list of str, optional) – edge features to extract

  • graph_feature (str or list of str, optional) – graph features to extract

  • with_hydrogen (bool, optional) – store hydrogens in the molecule graph. By default, hydrogens are dropped

  • kekulize (bool, optional) – convert aromatic bonds to single/double bonds. Note this only affects the relation in edge_list. For bond_type, aromatic bonds are always stored explicitly. By default, aromatic bonds are stored.

property reaction_types

All reaction types.

Citation Network Datasets

Cora

class Cora(path, verbose=1)[source]

A citation network of scientific publications with binary word features.

Statistics:
  • #Node: 2,708

  • #Edge: 5,429

  • #Class: 7

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

CiteSeer

class CiteSeer(path, verbose=1)[source]

A citation network of scientific publications with binary word features.

Statistics:
  • #Node: 3,327

  • #Edge: 8,059

  • #Class: 6

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level

PubMed

class PubMed(path, verbose=1)[source]

A citation network of scientific publications with TF-IDF word features.

Statistics:
  • #Node: 19,717

  • #Edge: 44,338

  • #Class: 3

Parameters
  • path (str) – path to store the dataset

  • verbose (int, optional) – output verbose level