Close Menu
Simply Invest Asia
  • Home
  • About us
  • Explore industries/sectors
    • Automobile
    • Aviation
    • Banking
    • Biotechnology
    • Chemical & Fertilizer
    • Entertainment and Media
    • Food Processing
    • Healthcare
    • Iron and Steel
    • Leather
    • Mining
    • Oil and Gas
    • Pharmaceutical
  • Explore by countries
    • China
    • Dubai / UAE
    • Hong Kong
    • India
    • Indonesia
    • Japan
    • Malaysia
  • Explore cities
    • Bangkok
    • Beijing
    • Chongqing
    • Delhi
    • Dubai
    • Guangzhou
    • Jakarta
    • Kuala Lumpur
  • Why Asia
Facebook X (Twitter) Instagram Threads
Trending:
  • S’porean man found dead at residential unit in Hong Kong after failing to report for work
  • Dubai’s glitzy restaurants forced to change menus as Iran war hits trade
  • Juanjo Duran, Global Head of Media and Entertainment Content Partnerships at Google
  • Brighton fighter Louie Doherty eyes Bangkok title shot
  • From Sick Care to Whole Health: 5 Forces Reshaping the Future of Healthcare
  • US, China discuss Trump’s planned visit to Beijing
  • Swedish AI IP firm Abrande expands to Hong Kong
  • Natixis CIB Expands in India with the Establishment of GIFT City Branch
  • Spring Blossoms in Harbin: University campus turns into seasonal hotspot – news.cgtn.com
  • Dubai chefs shrink menus as Iran war makes tomatillos, scallops harder to source
  • Japan’s Semiconductor Firms Look to Join Forces for Global Edge
  • Orosur Mining Inc Announces Granting of RSUs
  • China voices concerns over ‘discriminatory’ EU draft laws – World
  • Annual meeting reaffirms strength of Australia-Indonesia livex partnership
  • Chelsea vs AC Milan Set for Jakarta Showdown on August 8
  • Oil climbs as UAE’s OPEC exit seen offering little short-term relief
  • HKSAR govt blasts move to ‘sugarcoat Jimmy Lai’s criminal acts’
  • Food Technology Market in the Netherlands | Report – IndexBox
Friday, May 1
Facebook X (Twitter) Instagram
Simply Invest Asia
  • Home
  • About us
  • Explore industries/sectors
    • Automobile
    • Aviation
    • Banking
    • Biotechnology
    • Chemical & Fertilizer
    • Entertainment and Media
    • Food Processing
    • Healthcare
    • Iron and Steel
    • Leather
    • Mining
    • Oil and Gas
    • Pharmaceutical
  • Explore by countries
    • China
    • Dubai / UAE
    • Hong Kong
    • India
    • Indonesia
    • Japan
    • Malaysia
  • Explore cities
    • Bangkok
    • Beijing
    • Chongqing
    • Delhi
    • Dubai
    • Guangzhou
    • Jakarta
    • Kuala Lumpur
  • Why Asia
Simply Invest Asia
Home»Explore industries/sectors»Chemical & Fertilizer»A quantum-mechanical framework for million-atom scale biological systems
Chemical & Fertilizer

A quantum-mechanical framework for million-atom scale biological systems

By IslaApril 29, 202628 Mins Read
Share
Facebook Twitter Pinterest Threads Bluesky Copy Link


Modifications to the Hartree–Fock algorithm

Unlike density functional theory (DFT), Hartree–Fock is a wavefunction-based method and does not directly operate on the electronic density. Each electronic interaction integral contains products of four wave functions. As the system grows, the amount of involved wave functions increases, and the number of electronic repulsion integrals (ERIs) scales with the fourth power of the system size, making their evaluation one of the main computational bottlenecks of Hartree–Fock calculations. To address this, we use systematic approximations that significantly reduce the computational effort. Both wavefunction-wavefunction35 and density-density interactions are truncated using physically motivated thresholds. Since wave functions decay exponentially with distance, only nearby orbitals contribute significantly to overlaps (see subsection ‘Density cut-offs’). We assess the relevance of a density by calculating the overlap of its two constituent wave functions. For orbital types with negative signs, we use the absolute of the product to capture significant interactions that may otherwise cancel numerically but still produce meaningful contributions, especially to energy gradients, which are important for force calculations. Once densities below the threshold are excluded, the scaling of ERI evaluations drops from quartic to quadratic. To further reduce complexity to linear scaling (order N), we introduce a cut-off for the Coulomb interaction based on inter-density distance (see subsection ‘Coulomb cut-offs’). We use a short Coulomb cut-off of 10 Å—a more detailed discussion of this parametrization is found in a separate section. The distance of two densities is defined as the distance between the centres of the two wavefunction pairs that generate them. Only ERIs with Coulomb distances below the Coulomb cut-off are calculated. For ERIs with distances near the cut-off, we apply a smoothing function to ensure continuity.

An ERI is evaluated if the overlap threshold of both densities is exceeded and their centres lie within the Coulomb cut-off. Additionally, we compute the product of the two density relevance values (see the next subsections), divide it by the distance between their centres and compare it to a final relevance threshold. This avoids evaluating interactions between weakly overlapping densities and/or those separated by large distances. To further accelerate the process, we adopt an efficient and parallelizable ERI algorithm53,54,55 based on the so-called Gaussian lobe functions37,56 (see subsection ‘Basis function fitting’). Unlike conventional approaches57,58,59,60, which use angular prefactors (x, y, z) to construct directional orbitals, our method builds positive and negative lobes from distinct Gaussian components. This removes angular dependencies and simplifies the integral expressions, making them uniform across orbital types. As a result, all ERI calculations can be performed by using the same expression, which enables a simple parallelisation for both ERIs and their derivatives (necessary to calculate atomic forces). We also precompute components of the ERIs which only depend on pairs of wave functions. This reduces the full calculation of the electronic repulsion between two densities consisting of in total of four Gaussian functions to only 6 additions, 8 multiplications, 10 memory accesses, 1 square root and the evaluation of the error function (see below). For most interactions with large separations, the error function can be approximated as unity, further reducing computational cost.

The following subsections detail the generation of the lobe-based basis functions, the calculation of the density relevance, the calculation of the Coulomb interaction relevance and the mathematical expressions for the modified ERI algorithm.

Basis function fitting

In electronic structure calculations, wave functions are commonly modelled as Slater-type orbitals (STOs), which for an s-orbital have the form \({\varphi }_{i}\left({{\boldsymbol{r}}}\right)={A}_{i}\exp \left(-{\alpha }_{i}{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{i}{||}\right)\). To enable efficient integration, STOs are typically approximated by using Gaussian functions of the form \({g}_{i}\left({{\boldsymbol{r}}}\right)={A}_{i}\exp \left(-{\alpha }_{i}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{i}{||}}^{2}\right)\).

Each STO is approximated by a linear combination of \({n}_{G}\) Gaussians as

$$\begin{array}{c}{\varphi }_{i}\left({{\boldsymbol{r}}}\right)={\sum }_{a=1}^{{{n}_{{{\rm{G}}}}}_{i}}{A}_{a}\exp \left(-{\alpha }_{a}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{a}{||}}^{2}\right),\end{array}$$

(1)

This approximation enables an analytic evaluation of key integrals57,58,59,60, relevant for the overlap matrix, the kinetic and nuclei energy as well as the Coulomb and exchange interaction of the electrons in the system. Orbitals with higher angular momenta are typically expressed via Gaussians containing cartesian prefactors (\(x,y,z,{x}^{2},{xy},{xz},\ldots\)). Although these integrals can also be evaluated analytically, their complexity increases due to the recursive nature of the associated expressions57,58,59,60. Furthermore, different angular momentum combinations require distinct equations, reducing parallelisation efficiency.

To avoid this bottleneck, we here adopt Gaussian lobe function expansions37,56, which allow the construction of basis sets without cartesian prefactors. We show in this work that this approach, though currently not widely used, is particularly well-suited to modern computing architectures optimised for high parallelisation. It also enables the evaluation of ERIs using a unified matrix formalism, which is potentially compatible with GPU acceleration61. The algorithms for ERI evaluation are described in the section ‘Electron repulsion integral algorithm’.

For fitting, we start with a Slater-type orbital of the STO-6G type taken from the Basis Set Exchange62. In the case of p-orbitals, we approximate them with 6 Gaussian functions, three placed symmetrically slightly to the left and three to the right of the orbital centre. This corresponds in accuracy to a conventional STO-3G orbital. The fitting is performed on a three-dimensional real space grid. Fit parameters include 3 contraction exponents and 3 coefficients, with the latter multiplied by \(-1\) on one side to model the nodal character of p-orbitals. For the optimisation of the parameters during the fitting process, the trust region algorithm is used. This is done via the function curve_fit from the mathematic Python library scipy63.

The Gaussian lobe function expansion would also work for orbitals of higher angular momenta. For example, d-orbitals can be split into four lobes which would require twice as many Gaussians as used for p-orbitals (more for the special case of the \({{{\rm{d}}}}_{{z}^{2}}-{{\rm{orbital}}}\)). The increased computational effort due to the higher number of Gaussian functions should be comparable to the increased complexity of evaluating integrals over Gaussian functions multiplied with cartesian polynomials of degree two in the conventional approaches.

Density cut-offs

A density \({\rho }_{{ij}}\left({{\boldsymbol{r}}}\right)\) is defined as the product of two wave functions:

$$\begin{array}{c}{\rho }_{{ij}}\left({{\boldsymbol{r}}}\right)={\varphi }_{i}\left({{\boldsymbol{r}}}\right){\varphi }_{j}\left({{\boldsymbol{r}}}\right)={\sum }_{a=1}^{{{n}_{{{\rm{G}}}}}_{i}}{{\sum }_{b=1}^{{{n}_{{{\rm{G}}}}}_{j}}{A}_{a}}A_{b} \exp \left(-{\alpha }_{a}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{a}{||}}^{2}-{\alpha }_{b}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{b}{||}}^{2}\right).\end{array}$$

(2)

To determine whether a given density is relevant for further calculations, we compute a scalar value \(r\left({\rho }_{{ij}}\right)\), given by the integral over the absolute value of all pairwise products of Gaussian functions:

$$\begin{array}{c}r\left({\rho }_{{ij}}\right)=\int d{{\boldsymbol{r}}}{\sum }_{a=1}^{{{n}_{{{\rm{G}}}}}_{i}}{\sum }_{b=1}^{{{n}_{{{\rm{G}}}}}_{j}}\left|{A}_{a}\right|\left|{A}_{b}\right|\exp \left(-{\alpha }_{a}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{a}{||}}^{2}-{\alpha }_{b}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{b}{||}}^{2}\right).\end{array}$$

(3)

This equation is valid for s-type orbitals, where the integrand remains positive. Orbitals with nonzero angular momentum (e.g. p-orbitals) include prefactors \(x,y,z\) which would require using the absolute values \(\left|x\right|,\left|y\right|,\left|z\right|\) to ensure that the integrand is always positive. However, using these absolute values makes an analytic evaluation of the overlap integrals impossible. To resolve this, we developed an approach to represent each lobe of orbitals with a separate set of Gaussians. The previously described lobe-based basis functions (which achieve a nodal plane by summing Gaussians of different signs) are not suitable here, as their absolute-value integrals do not correspond to the absolute value of the full orbital. Instead, we independently fit each lobe of a p-type orbital using Gaussian functions, and then reconstruct the full orbital by combining lobes with different signs and different centres. The representation of a p-type orbital used for the density relevance calculations is given by:

$${\varphi }_{i}\left({{\boldsymbol{r}}}\right)={\sum }_{l=1}^{{{n}_{{{\rm{L}}}}}_{i}}{\sum }_{a=1}^{{{n}_{{{\rm{G}}}}}_{i}}{s}_{l}{A}_{a}$$

$$\begin{array}{c} \cdot \exp \left(-{{\alpha }_{x}}_{i,a}{\left(x-{{x}_{0}}_{i,l,a}\right)}^{2}-{{\alpha }_{y}}_{i,a}{\left(y-{{y}_{0}}_{i,l,a}\right)}^{2}-{{\alpha }_{z}}_{i,a}{\left(z-{{z}_{0}}_{i,l,a}\right)}^{2}\right)\end{array}$$

(4)

where \({s}_{l}\) is the sign of lobe \(l\) and each cartesian direction has an individual contraction factor (e.g. \({{\alpha }_{x}}_{i,a}\)) and offset (e.g. \({{x}_{0}}_{i,l,a}\)) for higher flexibility upon fitting. The fitting process is identical to that described in the section ‘Basis function fitting’.

Finally, to determine if a density \({\rho }_{{ij}}\) is relevant for the calculation, we compute \(r({\rho }_{{ij}})\) and compare it to a threshold value.

Coulomb cut-offs

In the Hartree–Fock formalism, the Coulomb interaction is modelled as the repulsion between two electronic densities. Following the method described above, Coulomb interactions are neglected if the separation between densities exceeds a defined cut-off. However, since densities are not point-like, a suitable distance metric must be established to estimate their effective separation. For this purpose, each density is assigned an origin, calculated as the arithmetic mean of the atomic positions associated with the two basis functions forming that density. The interaction distance between two densities \({\rho }_{{ab}}\) and \({\rho }_{{cd}}\) is defined as:

$$d\left({\rho }_{{ab}},{\rho }_{{cd}}\right)=\left|\left| \frac{{{{\boldsymbol{r}}}}_{a}+{{{\boldsymbol{r}}}}_{b}}{2}-\frac{{{{\boldsymbol{r}}}}_{c}+{{{\boldsymbol{r}}}}_{d}}{2}\right|\right|,$$

(5)

where \({{{\boldsymbol{r}}}}_{a},\ldots ,{{{\boldsymbol{r}}}}_{d}\) are the positions of the atoms involved in the four basis functions \(a,b,c,d\).

A lower and upper cut-off threshold \({r}_{{{{\rm{c}}}}_{{{\rm{l}}}}}\) and \({r}_{{{{\rm{c}}}}_{{{\rm{u}}}}}\) (typically 8 and 10 Å), are introduced to control the inclusion of interactions. For density pairs with distances between these two thresholds, a smooth transition is applied using a cut-off function to avoid discontinuities. The scaling function (cut-off function) used is

$$\begin{array}{c}f\left(x\right)=1+2{x}^{3}-3{x}^{2},\end{array}$$

(6)

which smoothly interpolates between 1 and 0 as \(x\) increases in the interval \(\left[{\mathrm{0,1}}\right]\). The distance interval \([{r}_{{{{\rm{c}}}}_{{{\rm{l}}}}},{r}_{{{{\rm{c}}}}_{{{\rm{u}}}}}]\) is linearly mapped to the domain \(\left[{\mathrm{0,1}}\right]\) of \(f\left(x\right)\).

Electron repulsion integral algorithm

An electron repulsion integral (ERI) describes the interaction between two electronic densities:

$${e}_{{ijkl}}\equiv \left({ij},|,{kl}\right)=\iint d{{\boldsymbol{r}}}d{{{\boldsymbol{r}}}}^{{{{\prime} }}}\frac{{\varphi }_{i}\left({{\boldsymbol{r}}}\right){\varphi }_{j}\left({{\boldsymbol{r}}}\right) \cdot {\varphi }_{k}\left({{{\boldsymbol{r}}}}^{{{{\prime} }}}\right){\varphi }_{l}\left({{{\boldsymbol{r}}}}^{{{{\prime} }}}\right)}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}^{{{{\prime} }}}{||}}$$

$$={\sum }_{a=1}^{{{n}_{{{\rm{G}}}}}_{i}}{\sum }_{b=1}^{{{n}_{{{\rm{G}}}}}_{j}}{\sum }_{c=1}^{{{n}_{{{\rm{G}}}}}_{k}}{\sum }_{d=1}^{{{n}_{{{\rm{G}}}}}_{l}}\iint d{{\boldsymbol{r}}}d{{{\boldsymbol{r}}}}^{{{{\prime} }}}\frac{{A}_{a}{A}_{b}{A}_{c}{A}_{d}}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}^{{{{\prime} }}}{||}}$$

$$\cdot \exp \left(-{\alpha }_{a}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{a}{||}}^{2}-{\alpha }_{b}{{||}{{\boldsymbol{r}}}-{{{\boldsymbol{r}}}}_{b}{||}}^{2}-{\alpha }_{c}{{||}{{{\boldsymbol{r}}}}^{{{{\prime} }}}-{{{\boldsymbol{r}}}}_{c}{||}}^{2}-{\alpha }_{d}{{||}{{{\boldsymbol{r}}}}^{{{{\prime} }}}-{{{\boldsymbol{r}}}}_{d}{||}}^{2}\right).$$

$$\equiv {\sum }_{a=1}^{{{n}_{{{\rm{G}}}}}_{i}}{\sum }_{b=1}^{{{n}_{{{\rm{G}}}}}_{j}}{\sum }_{c=1}^{{{n}_{{{\rm{G}}}}}_{k}}{\sum }_{d=1}^{{{n}_{{{\rm{G}}}}}_{l}}\left({g}_{a}{g}_{b} | {g}_{c}{g}_{d}\right),$$

(7)

where each orbital is expanded in terms of Gaussian functions. Each ERI of the individual Gaussian functions is given by:

$$\left({g}_{a}{g}_{b}|{g}_{c}{g}_{d}\right)={O}_{{g}_{a}{g}_{b}}{O}_{{g}_{c}{g}_{d}}{P}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}{}_{1}F_{1} \left(\frac{1}{2},\frac{3}{2},-{P}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}{d}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}^{2}\right)$$

(8)

with the confluent hypergeometric function \({}_{1}F_{1}\), two overlaps \({O}_{{g}_{a}{g}_{b}}\) and \({O}_{{g}_{c}{g}_{d}}\), a factor \({P}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}\) and a distance \({d}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}\) which are defined as:

$${O}_{{g}_{a}{g}_{b}} := \sqrt{2}{\pi }^{\frac{5}{4}}{A}_{a}{A}_{b}\frac{1}{{\left({\alpha }_{a}+{\alpha }_{b}\right)}^{\frac{3}{2}}} \cdot \exp \left(-\frac{{\alpha }_{a}{\alpha }_{b}}{{\alpha }_{a}+{\alpha }_{b}} || {{{\boldsymbol{r}}}}_{a}-{{{\boldsymbol{r}}}}_{b} ||^{2}\right),$$

(9)

$${P}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}:=\frac{1}{\frac{1}{{\alpha }_{a}+{\alpha }_{b}}+\frac{1}{{\alpha }_{c}+{\alpha }_{d}}},$$

(10)

$${d}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}:= || {{{\boldsymbol{r}}}}_{{ab}}-{{{\boldsymbol{r}}}}_{{cd}} || ,{{{\boldsymbol{r}}}}_{{ab}}=\frac{{\alpha }_{a}{{{\boldsymbol{r}}}}_{a}+{\alpha }_{b}{{{\boldsymbol{r}}}}_{b}}{{\alpha }_{a}+{\alpha }_{b}}.$$

(11)

We define \(x:={P}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}{d}_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}^{2}\) and evaluate the confluent hypergeometric function using the following equation:

$$\begin{array}{c}{}_{1}F_{1}\left(\frac{1}{2},\frac{3}{2},-x\right)=\sqrt{\pi }\frac{{{\rm{erf}}}\left(\sqrt{x}\right)}{2\sqrt{x}},\end{array}$$

(12)

where the error function can be efficiently approximated using:

$$\begin{array}{c}{{\rm{erf}}}\left(x\right)\cong 1-\left({c}_{1}t+{c}_{2}{t}^{2}+{c}_{3}{t}^{3}+{c}_{4}{t}^{4}+{c}_{5}{t}^{5}\right) \cdot \exp \left(-{x}^{2}\right),t=\frac{1}{1+{c}_{0}x}\end{array}$$

(13)

(with coefficients \({c}_{0}=0.3275911\), \({c}_{1}=0.254829592\), \({c}_{2}=0.284496736\), \({c}_{3}=1.421413741\), \({c}_{4}=-1.453152027\) and \({c}_{5}=1.061405429\))64. For large \(x\) the error function \({{\rm{erf}}}\left(x\right)\) rapidly approaches 1, since \(1-{{\rm{erf}}}\left(4\right)\approx 1.5 \cdot {10}^{-8}\) and \(1-{{\rm{erf}}}\left(5\right)\approx 1.5 \cdot {10}^{-12}\), and can be approximated accordingly.

We recall that we evaluate the ERIs of the form

$$\left({ij}|{kl}\right)={\sum }_{a=1}^{{{n}_{{{\rm{G}}}}}_{i}}{\sum }_{b=1}^{{{n}_{{{\rm{G}}}}}_{j}}{\sum }_{c=1}^{{{n}_{{{\rm{G}}}}}_{k}}{\sum }_{d=1}^{{{n}_{{{\rm{G}}}}}_{l}}\left({g}_{a}{g}_{b} | {g}_{c}{g}_{d}\right).$$

(14)

It will only be necessary to compute a small part of all possible combinations \(\left({ij},|,{kl}\right)\), since we use both a density cut-off and a Coulomb cut-off. Therefore, and to reduce redundant computation, the quadruple summation over all Gaussians is reformulated as a double summation over precomputed Gaussian pairs \({g}_{a}{g}_{b}\) and \({g}_{c}{g}_{d}\), for which the quantities \({O}_{{g}_{a}{g}_{b}}\) and \({{{\boldsymbol{r}}}}_{{ab}}\) as well as \({O}_{{g}_{c}{g}_{d}}\) and \({{{\boldsymbol{r}}}}_{{cd}}\), respectively, are stored in lookup tables. This is practical because number of density pairs is considerably smaller than the total number of ERIs. The complete ERI evaluation algorithm can be visualised in terms of a three-dimensional tensor: the first and second axis are the sums described above and the third axis is the list of all relevant combinations \(\left({ij},|,{kl}\right)\). Summing up this tensor along the first two axes yields the full list of ERIs. It must be noted that each integral \(\left({g}_{a}{g}_{b} | {g}_{c}{g}_{d}\right)\) in this tensor is computed by using the same equation. However, the lengths of the first two axes \({{n}_{{{\rm{G}}}}}_{i}{{n}_{{{\rm{G}}}}}_{j}\) and \({{n}_{{{\rm{G}}}}}_{k}{{n}_{{{\rm{G}}}}}_{l}\) are not always the same as the number of used Gaussian functions is not the same for all wave functions.

To account for this, we modify the equation for \(\left({ij}|{kl}\right)\) by setting the limits of the sums to \(\max {{n}_{{{\rm{G}}}}}_{i}{{n}_{{{\rm{G}}}}}_{j}\) and \(\max {{n}_{{{\rm{G}}}}}_{k}{{n}_{{{\rm{G}}}}}_{l}\) which leads to:

$$\left({ij}|{kl}\right)={\sum }_{a,b=1}^{\max {{n}_{{{\rm{G}}}}}_{i}{{n}_{{{\rm{G}}}}}_{j}}{\sum }_{c,d=1}^{\max {{n}_{{{\rm{G}}}}}_{k}{{n}_{{{\rm{G}}}}}_{l}}{\delta }_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}^{{ijkl}}\left({g}_{a}{g}_{b} | {g}_{c}{g}_{d}\right).$$

(15)

where \({\delta }_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}^{{ijkl}}\) is 1 if \({ab}\le {{n}_{{{\rm{G}}}}}_{i}{{n}_{{{\rm{G}}}}}_{j}\) and \({cd}\le {{n}_{{{\rm{G}}}}}_{k}{{n}_{{{\rm{G}}}}}_{l}\)—otherwise it is 0. Therefore, \({\delta }_{{g}_{a}{g}_{b}{g}_{c}{g}_{d}}^{{ijkl}}\) describes if the index combination \({abcd}\) is defined for the set of Gaussian functions \({g}_{a}{g}_{b}{g}_{c}{g}_{d}\) within \(\left({ij},|,{kl}\right)\).

This method allows the evaluation of all \(\left({ij}|{kl}\right)\) with the same equation and is therefore highly parallelizable. Additionally, it saves memory, since no intermediate values are stored (all Gaussian ERIs \(\left({g}_{a}{g}_{b} | {g}_{c}{g}_{d}\right)\) become contracted immediately in each step of the loop over \(({ij}|{kl})\)). Precomputing and storing intermediate values dependent on only one density (\({O}_{{g}_{a}{g}_{b}}\), \({O}_{{g}_{c}{g}_{d}}\), \({{{\boldsymbol{r}}}}_{{ab}}\) and \({{{\boldsymbol{r}}}}_{{cd}}\)) reduce the computational effort as well.

Implementation check

To ensure that the Hartree-Fock implementation runs correctly, especially under the implemented cut-off methods and basis function manipulations, results were benchmarked against the quantum-chemistry package ORCA13,65,66,67,68,69. Specifically, we assessed the agreement between our results and a standard calculation using the STO-3G basis set, focusing on both total and orbital energies. This allowed us to ensure that the programme runs correctly and that lobe function expansion and interaction truncations do not lead to strongly deviating results. As a test system, we selected the molecule beta-carotene (C40H56), which involves 541,089,856 unique ERIs, of which 30,572,290 were considered to have relevant density contributions. Among these, 3,946,243 integrals met both density and Coulomb interaction relevance criteria.

The total electronic energy calculated using our implementation was –1528.547 hartree (Ha), compared with –1528.663 Ha obtained via ORCA. This corresponds to an absolute deviation of 0.115 Ha, or 0.0075%. For the orbital energies, we compared all valence orbitals and the ten lowest virtual orbitals, yielding a root mean square deviation of 7.98 mHa. Using only the Coulomb cut-off leads to a value of 7.57 mHa, whereas only density cut-offs result in an error of 6.07 mHa. No cut-offs at all result in a deviation of 3.23 mHa which is attributed to the modified basis functions in the lobe function expansion which do not match the STO-3G basis exactly and therefore lead to a small error.

In terms of computational performance, ORCA required 192 s for a beta-carotene calculation on an Intel i5 processor. In comparison, our implementation with all cut-offs was completed in 21 s. When disabling the Coulomb cut-off and using a minimal density threshold of 10−10, typical of many Hartree–Fock codes, the runtime increased to 131 s.

Parallel Hartree–Fock on large systems

The density cut-off was set to 10−4 and the Coulomb cut-off to 10 Å with a smooth transition to zero starting at 8 Å. The self-consistent cycle of the Hartree–Fock calculation used a convergence threshold of 10−6 for the RMSD of the density matrices. The STO-3G basis set was split into individual Gaussian functions to allow an ERI evaluation according to the above-mentioned Gaussian lobe function expansion algorithm.

It has to be kept in mind that the short Coulomb cut-off neglects long-range interactions since no PME algorithm or similar methods are used. In the divide-and-conquer approach this is forced to guarantee a subsequent evaluation of the clusters. For the applications of UV/Vis spectra and atomic energies, this truncation only has a small to negligible influence on the results. Other properties are also inaccessible if the divide-and-conquer approach is used: these include the system-wide molecular orbitals such as for example HOMO and LUMO and the band gap. Orthogonality between the divide-and-conquer regions is also not maintained, the approach is no longer variational and there would be no access to MD simulations. Note that for applications up to a few thousand atoms, the formalism runs well without the divide-and-conquer approach which makes the properties mentioned above accessible in this region.

System preparation for the large-scale calculations involved solvation in water using the solvation tool of the programme Visual Molecular Dynamics (VMD)70. The solvated system was then partitioned into subsystems on a 3D mesh. Hartree-Fock calculations were carried out independently for each subsystem. To reconstruct the total electronic density, the density matrix elements were chosen from the subsystem containing both atoms corresponding to the two basis functions. If the basis functions were associated with different subsystems, the mean of the relevant density matrix elements was used.

The total electronic density was then calculated as

$$\begin{array}{c}{\rho }_{{{\rm{total}}}}\left({{\boldsymbol{r}}}\right)={\sum }_{a=1}^{{n}_{{{\rm{bf}}}}}{\sum }_{b=1}^{{n}_{{{\rm{bf}}}}}r\left(a,b\right){P}_{{ab}}{\rho }_{{ab}}\left({{\boldsymbol{r}}}\right),\end{array}$$

(16)

where \({n}_{{{\rm{bf}}}}\) is the total number of basis functions, \(r\left(a,b\right)\) is a relevance determining whether the pair basis functions \(a\) and \(b\) contributes significantly, and \({P}_{{ab}}\) is the corresponding element of the density matrix.

Two methods were used for the interpolation of density matrix elements. One method was to compute the new density element as the mean of the density elements of the clusters that get merged. In the current and more accurate method, which is the one found in the source code, the cross-section between the line connecting the two atoms involved in one density and the border between the two involved clusters is computed. This yields two new lengths—between the cross-section point and the two atoms—which are normalised to a sum of 1 and serve as the prefactors for the density contributions from the two subsystems. The length in one subsystem serves as the prefactor for the density of the other subsystem, for example if one atom is close to the border, then the contribution of the other subsystem, where the atom is further away from the border, will be small. The described atoms are the atoms associated with the two basis functions that each density is composed of.

Spectral calculation parameters

For the quantum spectral calculations, we employed the RT-TDHF formalism, as it scales more favourably for larger systems than the linear-response time-dependent Hartree–Fock (LR-TDHF) approach, which theoretically scales with the sixth power of the system size.

In RT-TDHF the system is excited by an electric field pulse and propagated in time using quantum-mechanical time evolution. During the propagation, the dipole moments are recorded at each time step and the frequency-dependent polarizability and absorption spectra are obtained via discrete Fourier transform of the dipole time series. The systems were propagated over 2000 steps with a time step of 0.25 atomic units (a.u.), corresponding to a total propagation time of 500 a.u. (12.1 fs). The applied electric field pulse had a strength of 10−5 a.u., a standard deviation of 0.2 a.u. and a starting time of 2 a.u. During the Fourier transform, an attenuation factor of 0.01 a.u. was used to supress numerical noise.

Following established practice, the resulting spectra were rescaled71,72 to account for the systematic shifts in excitation energies inherent to time-dependent Hartree–Fock and density functional theory calculations. A rescaling factor of 1.335 was applied, consistent with the benchmark study by Jacquemin et al., who computed TDHF spectra for a wide range of organic dyes72.

For the spectral calculations, the density cut-off was chosen as 10−6 and the Coulomb cut-off as 10 Å. This parametrization lead to results in good agreement with experimental data and in a test showed only small deviations from a calculation using no cut-offs. This test and calculations with further reduction of the density or Coulomb cut-off are shown in Supplementary Fig. 11. A density cut-off of 10-4 showed similar results to a value of 10-6 in this test but lead to some absorption peaks vanishing or decreasing in intensity in other tests with benzene and DNA, therefore we chose a cut-off of 10−6 as the standard parametrization. For the Coulomb cut-off 8 Å also lead to similar results to 10 Å—we nonetheless use 10 Å as the default Coulomb cut-off to have some buffer as the stability might differ from system and system and to not discard too many mid-range interactions.

Atomic energy processing

The atomic energies used for the comparison with AlphaFold’s pLDDT scores were smoothed using a Savitzky-Golay filter73 with a window length of 150. To match the scale of pLDDT values, the computed energies were multiplied by the factor of 15 and shifted by a structure-specific offset (–430, –440 and –460 for PDB entries AF_AFA0A023FF81F147, PDB AF_AFA0A023IKK2F147 and AF_AFA0A023GPK8F147, respectively). The rescaled energies were truncated at the corresponding structure’s maximum and minimum pLDDT values to eliminate extreme atomic energy outliers.

A modified expression for atomic energies was employed, defined as

$$\begin{array}{c}{E}_{a}=\frac{1}{{Z}_{a}}{\sum }_{i=1}^{{{n}_{{{\rm{bfv}}}}}_{a}}{\sum }_{j=1}^{{n}_{{{\rm{bf}}}}}{P}_{{ij}}\left({H}_{{ij}}+\frac{1}{2}{G}_{{ij}}\right),\end{array}$$

(17)

where \({E}_{a}\) is the atomic energy of atom \(a\), \({Z}_{a}\) is the atomic charge (used for normalisation), \({{n}_{{{\rm{bfv}}}}}_{a}\) is the number valence basis functions for atom \(a\) (excluding core contributions), and \({n}_{{{\rm{bf}}}}\) is the total number of basis functions. \({P}_{{ij}}\), \({H}_{{ij}}\) and \({G}_{{ij}}\) denote the density matrix, core Hamiltonian and two-electron interaction matrix elements, respectively.

System preparation was based on the aforementioned PDB entries47, with solvation performed using a 10 Å water shell generated via the solvation tool of Chimera74.

The actual and predicted pLDDT scores for the three investigated structures are shown as graphs in Supplementary Fig. 12.

The accuracy of subsystem splitting was tested using the protein Evasin P1126 (PDB ID: AF_AFA0A023FF81F1)47. For atomic energy calculations, only valence electrons were considered. Subsystems were constructed by segmenting the amino acid chain of the whole protein into smaller peptide fragments. Comparing a full Hartree-Fock calculation with the subsystem-based approach, we obtained a root mean square deviation of 0.385 Ha for the atomic energies, corresponding to a relative error of 0.444% and a Pearson correlation coefficient of 0.9986. This is shown in Supplementary Fig. 13.

Programme implementation

The described programme is written in the Python programming language75. To achieve high computational performance, all computational demanding functions are written either in matrix form or compiled to machine code with just-in-time (JIT) compilation. For JIT compilation, the library Numba76 was used, and matrix operations were performed with Numpy77 or PyTorch78. Visualisation of results and plotting were performed using the library Matplotlib79.

Biomolecular structures and visualisation

All calculations in this article were based on biomolecular structures obtained from the RCSB Protein Data Bank (PDB)47. The structures and their full references (including original research articles and PDB entries) are as follows: PDB 2M3X33 (12-bladed propeller protein), PDB 1H6836,80 (rhodopsin), PDB 6Q3G1,81 (bacteriophage P68), PDB 1A7Y5,82 (Actinomycin D), PDB 1D793,83 (DNA with 8 base pairs), PDB 2JYK4,84 (DNA with 21 base pairs), PDB 1MNV6,85 (Actinomycin D bound to DNA), PDB 8WL238,86 (flagellar motor), PDB 7PKR39,87 (vault protein complex). Structures used for comparing AlphaFold’s pLDDT scores to atomic energies include: PDB AF_AFA0A023FF81F1 (Evasin P1126), PDB AF_AFA0A023IKK2F1 (MHC Class II Beta Chain) and PDB AF_AFA0A023GPK8F1 (Friend of Echinoid, Isoform H). These were obtained from the Computed Structure Models (CSM) section of the RCSB PDB47.

Visualisations of biomolecular structures were created using ChimeraX34,74,88,89 and the PDB Mol* Viewer90 for Supplementary Figure 10. All other figure elements (including biomolecular structures at atomic resolution, electronic densities, absorption spectra and atomic energies) were generated with Matplotlib79 for python.

Further testing with molecular dynamics

As a test system to study structural optimisation and molecular dynamics, we used a DNA tetramer (sequence ACGT). We calculated the forces for this structure with and without using the Coulomb cut-off. The parametrization of the Coulomb cut-off is the same for the electron repulsion integrals (ERIs) and for the ion-ion interactions. The result is shown in Supplementary Fig. 14.

In addition to this, a structural optimisation was performed for the molecule. For this the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm was used—a quasi-Newton method which uses an approximation of the Hessian and which is commonly used for structural optimisation or for nonlinear optimisation problems in general. Compared to gradient descent methods which do not precondition the gradient vector, the BFGS algorithm usually shows much faster convergence. However, a convergence to zero or values very close to zero is usually not achieved for larger molecules16 since this would also require to perform a structural optimisation at a larger scale. Therefore, the root-mean-square (RMS) of the forces is often improved by one or two orders of magnitude until the algorithm plateaus.

The development of the RMS of the forces during a BFGS optimisation of the ACGT molecule is shown in Supplementary Fig. 15 for two calculations without and with Coulomb cut-off (cut-off values at 3.5 and 10 Angstrom), respectively. The calculation was run over 30 optimisation steps. In each iteration, the optimal descent prefactor was determined using a line search utilising the golden section search algorithm which avoids re-evaluations as much as possible. During the line search, only the energy is needed and not the forces and therefore the programme presented in this work was run in energy-only mode to save computation time. In Supplementary Fig. 15 we observe convergence for the calculations with and without cut-off. Both converge in a very similar way and show oscillatory convergence at the end. The minimum RMS is achieved at step 28 with a value of around 0.0009 for both calculations.

In Supplementary Fig. 16 the atomic coordinates from before and after the structure optimisation are shown as an overlay. Additionally, at the bottom, the final coordinates from the structural optimisation with Coulomb cut-off are overlayed with the final coordinates from the optimisation without Coulomb cut-off. Here, we observe that the structural optimisation does have an influence on the geometry of the system (see top panel of Supplementary Fig. 16), as it is expected. Furthermore, the final geometry is not significantly influenced by the Coulomb cut-off as both calculations lead to a very similar result. The root-mean-square displacement between the two calculations is computed as 0.047 Angstrom.

It should also be mentioned that forces that deviate from the forces without cut-offs (both density and Coulomb cut-off) still are the derivative of a (now slightly changed) energy landscape. Nonetheless, the sum of the forces still remains zero as they are the exact derivative of the energy. This has been tested by ensuring that the forces obtained from the programme can be reproduced numerically as the limit case of the finite-differences method. Testing has been done for the forces as a sum of derivatives as well as for the derivatives of the integrals itself (overlap, kinetic, nuclei and electron repulsion integrals).

We continue the testing by investigating the stability of molecular dynamics simulations with and without the Coulomb cut-off. To do so, we performed an MD simulation of the same system that was used above for the investigation of the structural optimisation process. The simulation runs over 200 time steps with a step interval of 0.2 fs.

It has to be mentioned that a slight drift in the total energy can be seen with a value of 6.0e-7 Ha per atom and per femtosecond, this is caused by some of the starting forces being very large and is not an error which occurs due to cut-offs as it happens in the calculation without cut-offs as well. The MD simulation was done with large forces to test the performance of the cut-offs for more extreme situations beyond the equilibrium—for this reason, we also chose to work with the rather drastic cut-off scheme 3.5/10. However, to ensure that drift of the total energy is indeed caused only by the integrator, a simulation was done with a smaller time step for a smaller system (DNA with just one base pair, cytosine) with the Coulomb cut-off used. In this case it could be observed that the changes of the total energy can be reduced by decreasing the time step of the molecular dynamics simulation. To be precise, reducing the time step ten-fold leads to an energy conservation error of 8.0e-8 Ha per atom and per femtosecond. For an MD simulation with 10 steps of geometry optimisation prior to its start, a time step of 0.5 fs can be used with an energy conservation error of 2.9e-9 Ha/atom/fs.

In total, we observe very little difference between the kinetics in the two MD simulations with and without cut-off used. Of course, this is not a full test, therefore we also investigate the actual trajectories. However, from Supplementary Fig. 17 it can be seen that the low Coulomb cut-off does not lead to numerical instabilities or to larger forces than in the simulation with cut-off.

Next, we consider the trajectories obtained in the two simulations. We computed the RMSD between the coordinates of the two simulations at each time step. This is shown in Supplementary Fig. 18 below as the green line. Additionally, the mean travelled distance of the atoms in the simulation was computed for both simulations is also displayed in Supplementary Fig. 18.

Here, we see a difference in the trajectories of the two MD simulations but it is small compared to the total movement of the atoms. The maximum RMSD is 0.091 Angstrom. In Supplementary Fig. 19 the atomic coordinates from the start and the end of the MD simulation with no cut-off are overlayed and a second overlay is shown where the final MD coordinates of the simulation with no cut-off are compared with the final coordinates of the MD simulation with the Coulomb cut-off.

We observe again that the changes due to the cut-off are small compared to the total movement in the MD simulation. Note that here a quite aggressive cut-off configuration (3.5 and 10 Angstrom for the lower and upper cut-off) is used which is not recommended due the very small value of 3.5 Angstrom for the lower cut-off. In the following paragraph we discuss the reason for the large deviation between the lower and upper cut-off and why a higher cut-off value is recommended for force calculations.

The choice of the lower and upper cut-off plays an important role, specifically for force calculations. For the calculations that only compute the energy of the system, the difference between the two cut-offs only has the purpose to smoothen the transition to zero. However, if the energy gradient is computed, a fast transition from the actual value of the Coulomb interaction to zero can lead to very large derivatives. This can lead to the derivative between the two cut-off values being larger that the derivative for distances below the lower cut-off value. Such a behaviour has to be avoided to maintain a smooth transition to zero for the forces as well. This is achieved by increasing the distance between the values of the two Coulomb cut-offs, ideally the upper Coulomb cut-off should be at least 2.5 times larger than the lower cut-off to ensure a smooth transition to zero for the forces. For this reason, a cut-off configuration of 8/20 is more recommended which was tested at the beginning of this section for the force comparisons.

To investigate the effect of the divide-and-conquer method on atomic forces, we performed the same test as for the Coulomb cut-off where the forces are computed with and without the approximation used. Here, we study a larger system in the form of the DNA 12-mer ACGTACGTACGT which will be split into three separate sections during the divide-and-conquer formalism. We compute the forces using the Coulomb cut-off configuration (3.5/10) used for the investigation of the effects of the Coulomb cut-off.

Two different configurations are tested, one with a buffer zone of 8 and one with 10 Angstrom. Neighbouring atoms to the buffer zone and hydrogen atoms for bond termination have been added as described in the respective section in the main text.

From Supplementary Fig. 20 we can see that the errors of the forces are quite low, even for the smaller buffer zone at 8 Angstrom. One explanation for this is that the Coulomb cut-off already truncated a lot of the long-range interaction but nonetheless this test gives the information that no significant additional errors occur on top of those caused by the Coulomb cut-off. Furthermore, we see that the distribution of the electron density in the core region is not significantly influenced by the parts not included in the buffer zone because otherwise stronger deviations in the forces would be expected. Another explanation for the low error is the adding of neighbouring atoms to avoid breaking bonds and the replacement of heavier atoms with hydrogen atoms at the section borders which help to construct physically sensible boundary conditions, potentially reducing errors further.

For investigating interactions between molecules, we consider the binding energies between two molecules and two proteins as two test systems. To be precise, the two systems studied are the vitamin biotin (vitamin H) bound to the protein avidin (PDB 1AVD)91,92 and the drug FK506 bound to the protein FKBP (PDB 1FKJ).93,94 Both of these systems have high binding energies of –20.4 and –12.8 kcal/mol, respectively.95,96 Since the computation of binding energies can be considered an area where high accuracies are necessary, it serves as a test if minimal-basis Hartree-Fock can still provide somewhat reliable results.

For this test we perform a single-point energy calculation using Hartree-Fock and try out different cut-off configurations, with no divide-and-conquer used. Since the whole protein with ligand is too large to study without cut-offs, we focus on the area around the ligand. Therefore, the system used in the calculations consists of the ligand and the environment within 4 Angstrom around the ligand. Additionally, bond cutting is accounted for by adding further atoms and replacing some atoms with hydrogen atoms to terminate bonds. The exact procedure is described in the corresponding section in the main text and the resulting systems as well as the complete protein-ligand complex are shown in Supplementary Fig. 21.

For the two systems discussed above, the binding energy is computed as the energy difference between the energy of the protein-ligand complex and the sum of energies of the protein and the ligand (with ‘protein’ we here refer to the part of the protein treated in the calculation, as described above). In Supplementary Table 1 the resulting energies are listed for different combinations of density and Coulomb cut-offs.

Here, we see limitations due to the use of cut-offs. Especially for the Coulomb cut-off configurations 3.5/10 and 8/10, the binding energy shows strong deviations from the results with no Coulomb cut-off—in some cases no binding is predicted. A reduced density cut-off of 1.0e-5 also show a deviation for the biotin/avidin complex. A higher Coulomb cut-off configuration (15/20) shows less strong deviation from the binding energy without a Coulomb cut-off. Lowering the density cut-off to values of 10−7 or 10−8 has little influence on the computed binding energy. However, for the calculations with the highest accuracy, the experimental binding energies can be reproduced with errors of 2.3 and 1.2 kcal/mol.

These results show that using low Coulomb cut-offs makes it impossible to use the formalism for applications which need a higher accuracy. In these cases, it is necessary to use no (or a much larger) Coulomb cut-off. The density cut-off at 10−6 appears to be a reasonable choice which should not compromise the accuracy significantly. Consequently, testing is required if a Coulomb cut-off or a higher density cut-off is used for applications beyond those studied in this work to ensure that the accuracy is sufficient for the investigated problems. If no Coulomb cut-off is used and a density cut-off smaller than 10−6 is employed, we observe that we can reproduce the experimental binding energies for the two studied systems within a reasonable error.



Source link

Related Posts

Energy Storage Chemicals Market in Germany | Report – IndexBox

April 30, 2026

Discovery renews confirmation that Mars had the “chemistry for life”

April 30, 2026

Syria arrests Assad-era general accused of involvement in chemical massacre

April 30, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Abandoned malls, whispers of nuclear war and young foreigners detained. This is what’s REALLY going on in Dubai… and the chilling warning one taxi driver gave to the Mail’s IAN BIRRELL

April 11, 2026

Chongqing Aims To Build Hub Role

April 15, 2026

US trade chief says tech restrictions to block Chinese autos

April 10, 2026
Don't Miss

S’porean man found dead at residential unit in Hong Kong after failing to report for work

By IslaMay 1, 2026

48-year-old Singaporean man found dead in Hong Kong had history of health issues A 48-year-old…

Dubai’s glitzy restaurants forced to change menus as Iran war hits trade

May 1, 2026

Juanjo Duran, Global Head of Media and Entertainment Content Partnerships at Google

May 1, 2026

Brighton fighter Louie Doherty eyes Bangkok title shot

May 1, 2026
SUBSCRIBE TO OUR NEWSLETTER

Get our latest downloads and information first. Complete the form below to subscribe to our weekly newsletter.


I consent to being contacted via telephone and/or email and I consent to my data being stored in accordance with European GDPR regulations and agree to the terms of use and privacy policy.

Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Top Trending

China voices concerns over ‘discriminatory’ EU draft laws – World

By IslaMay 1, 2026

Annual meeting reaffirms strength of Australia-Indonesia livex partnership

By IslaMay 1, 2026

Chelsea vs AC Milan Set for Jakarta Showdown on August 8

By IslaMay 1, 2026
Most Popular

Media & Entertainment CEO Pay Surged 117% In 2025

April 28, 2026

DSM Fresh Foods Appoints Rajneesh Bhasin as Independent Director to Strengthen Governance – scanx.trade

April 28, 2026

Chongqing Tongliang Long vs Shenzhen Peng City

April 17, 2026
Our Picks

HKSAR’s challenges warrant mission-oriented governance

April 21, 2026

‘Study in China’ trend facilitates mutual learning among civilizations

April 26, 2026

UAE oil firm chief says Hormuz still shut, Iran must open it without conditions

April 9, 2026
SUBSCRIBE TO OUR NEWSLETTER

Get our latest downloads and information first. Complete the form below to subscribe to our weekly newsletter.


I consent to being contacted via telephone and/or email and I consent to my data being stored in accordance with European GDPR regulations and agree to the terms of use and privacy policy.

© 2026 Simply Invest Asia.
  • Get In Touch
  • Cookie Policy
  • Privacy policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.

SUBSCRIBE TO OUR NEWSLETTER

Get our latest downloads and information first.

Complete the form below to subscribe to our weekly newsletter.


I consent to being contacted via telephone and/or email and I consent to my data being stored in accordance with European GDPR regulations and agree to the terms of use and privacy policy.