Investigations of Coarse-Grained Representations

Kidder, Katie

Investigations of Coarse-Grained Representations

Open Access

Author:: Kidder, Katie
Graduate Program:: Chemistry (PHD)
Degree:: Doctor of Philosophy
Document Type:: Dissertation
Date of Defense:: February 09, 2024
Committee Members:: William Noid, Chair & Dissertation Advisor
Lasse Jensen, Major Field Member
Mark Maroncelli, Major Field Member
Jorge Sofo, Outside Unit & Field Member
Philip Bevilacqua, Program Head/Chair
Keywords:: Coarse-graining
Information loss
Representations
Abstract:: Coarse-graining enables simulating longer length and time scales than traditional all atom models. This is achieved by reducing the number of degrees of freedom in the model. Defining a coarse-grained (CG) model requires defining a reduced resolution representation. This choice of CG representation reflects a level of detail, or resolution, and a method for relating the high and low resolution degrees of freedom. CG representations are usually chosen by physical intuition. However, one expects that the choice of representation will impact the resulting model, and that a poor choice will yield a poor CG model. Recently, there has been increased interest in developing methods for finding optimal CG representations. In contrast, we wish to investigate mapping space, or the set of all possible CG representations for a given molecule for given numbers of CG sites. This investigation will allow us to answer more general questions about CG representations. We will first determine what metrics identify good CG representations, and how these different metrics relate to one another. We will investigate whether or not it is easy to find good representations. In addition, we wish to determine how common good maps are within mapping space. We then aim to quantify the similarity of two CG representations. Based on this similarity we wish to understand two related questions. First, how different are the models resulting from very similar CG representations? Second, how similar are two different "good" representations? Additionally, we will investigate whether there is a qualitative distinction between good and poor maps. Finally, we wish to understand how the variance in the distribution of size sites impacts the CG model. For most of the results presented in this thesis, we will adopt the Gaussian Network Model (GNM) as our underlying fine-grain model. We can perform the coarse-graining analytically for this model, meaning we can exactly assess the quality of a CG representation. Due to the large number of possible representations for large molecules, mapping space cannot be exhaustively enumerated, and so we perform Monte Carlo sampling. We define two move sets which generate new representations to perform this sampling. We firs consider a swap-based move set, that swaps atoms between CG sites. This move set samples a canonical ensemble in which all CG sites correspond to the same number of atoms. Second, we consider a steal-based move set, in which atoms are stolen from one site to another. This move set samples a semi-grand canonical ensemble which relaxes the restriction on the number of atoms per site. Additionally, we have proven that the steal-based move set is ergodic. We then perform both equilibrium and biased simulations to sample the entirety of mapping space. From these simulations we calculate the density of states as a function of the metrics which quantify the CG representations. One of theses metrics is the spectral fitness, Q, which quantifies how well the low-frequency normal modes are preserved by the CG model. A second is the information content, I, which quantifies how much information is preserved by the CG model. We observe that maps which maximize Q match physical intuition. Based on the density of states, we find that these "good," high Q, representations are rare. In contrast, maps which maximize I appear poor by physical intuition. The steal move set allows us to define ``neighbors'' of a given map which are a single move apart. We find that maps have a similar Q value to their neighbors. We find that it is easy to find a good map using steepest descent on Q. Additionally, we observe a phase transition in mapping space appears below a certain number of CG sites. We observe an unexpected correlation between how uneven the assignment of atoms to CG sites is and the amount of information lost upon coarse-graining, or the mapping entropy. We then define a coordinate transformation which has a non-unit Jacobian that is related to the unevenness of the CG site sizes in the given CG representation. This Jacobian can be related to a term in the mapping entropy which helps us understand the relationship between the distribution of site sizes and the mapping entropy. Finally, we apply our techniques to more complex models to understand how our insights generalize.

Tools