DFT-generated 3D conformation指的是通过密度泛函理论(DFT)计算得到的分子的三维构象。
DFT计算得到的分子构象可以用来理解分子的化学性质、反应机理、分子间相互作用等。在DFT计算中,需要先确定分子的初始构象,然后通过计算得到优化后的最稳定构象。DFT可以考虑分子中电子的相互作用,因此DFT-generated 3D conformation相对于其他构象预测方法来说,具有更高的精度和可靠性。DFT-generated 3D conformation也可以用于分子模拟、分子对接、药物设计等领域。
2、HOMO-LUMO gap label是什么?
HOMO-LUMO gap(Highest Occupied Molecular Orbital-Lowest Unoccupied Molecular Orbital gap)指的是一个分子中最高占据分子轨道(HOMO)和最低未占据分子轨道(LUMO)之间的能量差。
OGB-LSC (Open Graph Benchmark, Large Scale Challenge) 是一项由斯坦福大学发起的学术竞赛,旨在评估机器学习在大规模图数据上的表现。该竞赛首次在KDD CUP 2021上举办,吸引了来自DeepMind、微软、NVIDIA、UCLA等顶尖企业和高校的500多个参赛队伍,备受业界关注。近年来,越来越多的新型图机器学习模型也加入到这个比赛中,以证明自己的模型性能。可以说,OGB-LSC已成为公认的检验图机器学习模型性能的最佳试金石,类似于ImageNet在图像领域的地位。
We generated 8 conformations for each molecule using RDKit, at a per-molecule cost of approximately 0.01 seconds. During training, we randomly sampled 1 conformation as input r at each epoch, while during inference, we used the average HOMO-LUMO gap prediction based on 8 conformations.
We used the AdamW optimizer with a learning rate of 2e-4, a batch size of 1024, (β1, β2) set to (0.9, 0.999), and gradient clipping set to 5.0 during training, which lasted for 1.5 million steps, with 150K warmup steps. We also utilized exponential moving average (EMA) with a decay rate of 0.999. The training process required around 5 days, powered by 8 NVIDIA A100 GPUs. Additionally, inference on the 147k test-dev set took approximately 7 minutes, utilizing 8 NVIDIA V100 GPUs.
We incorporate previous submissions to the PCQM4MV2 leaderboard as baselines. In addition to the default 12-layer model, we evaluate the performance of Uni-Mol+ with two variants consisting of 6 and 18 layers, respectively, to investigate the impact of varying model parameter sizes.
PCQM4M-LSC(Predicting Chemical Quantum Mechanical (QM) properties using Molecular Machine Learning on Large-scale data Sets)是OGB-LSC任务集合中的一个任务,它是基于图的机器学习领域中的一个经典问题,即预测给定分子的量子力学性质【这个任务需要预测给定分子的热能、配位化学能、电子亲和势等多种量子力学性质】。
Fig 1 . In contrast to prior methods that directly predict QC properties from 1D/2D data, Uni-Mol+ employs a distinct approach.
Firstly, given a 2D molecular graph, Uni-Mol+ generates an initial 3D conformation from inexpensive methods such as RDKit.【由rdkit将2D分子图生成为3D初始构象】
Then, the initial conformation is iteratively optimized to its equilibrium conformation, and the optimized conformation is further used to predict the QC properties.【将3D初始构象迭代优化为平衡构象,平衡构象用于下一步的QC性质预测】
Figure 2: The Uni-Mol+ model backbone consists of two tracks of representations - atom and pair, initialized by atom features and 2D graph/3D conformation respectively.
These representations communicate with each other at every block. Besides, Uni-Mol+ optimizes the predicted 3D position iteratively using the previous iteration’s predicted conformations as input for the current iteration.
Figure 3: Illustration of the linear trajectory injection (LTI) method for conformational sampling.
The leftmost cheap conformation is generated by RDKit/OpenBabel, while the rightmost conformation is the target DFT conformation. LTI assumes a linear trajectory from the left to the right and enables us to sample conformations by controlling the parameter q. To highlight the differences between the conformations, we include the target DFT conformation as a reference in translucent gray.