f(phi)-value analysis

f-value analysis is a method to understand the effects of a mutation in a protein upon the free energy of the individual native state, denatured state and transition state.

I.  The free energy diagram for two-state protein folding

A simple two-state, reversible, protein folding process can be represented as:

N Û D

Where, N is the native (folded) state, and D is the denatured (unfolded) state.

The following diagram represents the free energy of the native and denatured forms of a protein under conditions where the native state is favored (e.g. 0M denaturant, physiological pH, room temperature, etc.): From the above diagram we can conclude the following:

• The native state (N) has a lower free energy than the denatured state (D) (in fact, the native state appears to be the global energy minimum).   The system will spontaneously adopt an equilibrium that favors the native state.
• The free energy difference between the N and D states (DGD-N) is a measure of the stability of the protein • In this case, DGN®D is defined as the free energy change in going from state 1 (N) to state 2 (D).
• "Delta" values are always defined as (value of state2 - value of state1), thus DGN®D = (DGD - DGN) or DGD - N
• From the above diagram this value should be a positive value
• A positive value indicates a non-spontaneous process as written (i.e. going to the right).  Thus, N Û D is non-spontaneous in the right-hand direction, but the reverse direction is spontaneous as written (thus, the above diagram reflects a situation where the equilibrium favors the native state of the protein)

II.  Rates of folding and unfolding and the free energy diagram

The folding transition state, , (which is actually likely to represent an ensemble of structures) describes the energy barrier between the N and D states, and determines the rate at which the N state converts to D (unfolding process), and the rate at which the D state converts to N (folding process)

• The energy barrier for unfolding is proportional to the height between the N® states
• The energy barrier for folding is proportional to the height between the D® states
• The rate at which the system can change states is inversely proportional to the height of the energy barrier (i.e. the larger the energy barrier, the fewer molecules in the sample that will have the necessary energy to overcome the barrier, and therefore, the slower the overall rate of change of state) In the above diagram, the N® energy barrier (DG - N) is larger than the D® energy barrier (DG - D), thus:

• The rate of unfolding should be slower than the rate of folding
• This situation results in an equilibrium condition that will favor the N state (i.e. the above diagram is representative of a protein that is stably folded)

The rates of folding and unfolding are a function of the rate constants for folding (kf) and unfolding (ku), thus, we have an overall diagram that looks like this: • As stated above, the rate at which the system can change states is inversely proportional to the height of the energy barrier.  In other words, if DG - N is large (i.e. the energy barrier to unfolding), then the corresponding unfolding rate constant ku is small (i.e. DG - N µ 1/ku); likewise, if DG - D is large (i.e. the energy barrier to folding), then the corresponding folding rate constant kf is small (i.e. DG - D µ 1/kf).

III. Equilibrium denaturation methods

Equilibrium denaturation experiments report the extent of denaturation as a function of added denaturant (isothermal equlibrium denaturation by guanidine or urea), or added heat (differential scanning calorimetry).

• Such experiments provide information on the equilibrium constant for denaturation, and therefore, DGD - N:

DG° = -RTln(Keq)

• These experiments assume the system is always at equilibrium (e.g. samples are allowed to come to equilibration prior to measurement), thus, they provide no information on the folding or unfolding kinetics (i.e. folding or unfolding rate constants)
• Thus, using equilibrium methods alone, we cannot say what the effects of mutations are upon the folding or unfolding kinetic properties

Example I: The mutant protein free energy diagram is shown in the green broken line, and the wild-type reference protein energy diagram is shown in the black line. The free energy of the native (N), denatured (D) and transition () states are shown, with the mutant indicated by an asterisk.

• The mutation has affected the transition state ().  Specifically, it has stabilized the transition state and has had no effect upon either the native or denatured states
• In stabilizing (i.e. lowering the free energy of) the transition state, the mutation will result in an increase in both the rate of unfolding and folding
• However, since the free energy levels of the native and denatured states are unchanged, the overall DGD - N value is unchanged.
• Thus, equilibrium denaturation methodologies would report no difference between the mutant and wild type proteins, but, kinetic experiments would clearly indicate that the mutation has altered the rates of folding and unfolding.

f-value analysis compares the free energy data from equilibrium denaturation methodologies to free energy values derived from kinetic studies, and this comparison allows a determination of how a mutation has affected the free energy of the native, denatured or transition states of the protein

IV. Probing the structure of the transition state

Example II:

A mutation (indicated by an asterisk *) that does not affect the denatured state, or the transition state, but destabilizes the native state: In this example:

·       The values of the folding rate constants, kf and k*f, for wild-type and mutant are observed to be equal, therefore, DG‡ - D and DG*‡ - D are identical in value, and the value of DDG‡ - D = 0 (i.e. DG - D - DG* - D = 0)

Note: it may seem that when you make a mutation that the mutant should be considered the "new state" (i.e. state 2) in comparison to the wild type "original state"  (i.e. state 1). Thus, any delta values relating a mutant to wild type should be of the form: (mutant value - wild type value).  However, there is no strict adherence to this frame of reference (even though it’s a “delta” value), and effects of mutations are commonly calculated by subtracting mutant values from the wild type.  The key thing is that you explicitly state how you are calculating the values for the mutant in terms of the wild type protein when you report the relevant delta values (and the resultant meaning of negative vs positive values).

·       The unfolding rate constants are different between mutant and wild-type (faster for the mutant).  Thus, DG‡ - N values are different and the value of DDG‡ - N (i.e. DG - N - DG* - N) is non-zero (positive in this case).

·       The value of DDG‡ - N can be determined from the wild type and mutant folding rate constants: (Note: DDG- N is also referred to as DDGunfolding or DDGu)

·       The DDGD - N value for the mutant (i.e. the effect of the mutation upon stability) is determined experimentally using isothermal equilibrium denaturation data (at the same temperature as the kinetic studies, or using DSC data with DDG value determined by extrapolation of individual DG  values to the temperature used for the kinetic experiments).

• Note that in the above case, the DDGD - N value is equal to the value of DDG‡ - N.  In other words, it looks like if you make a mutation that affects the stability of the protein, and if this effect is characterized by changes exclusively upon the unfolding rate constants, then the mutation has affected exclusively the native state.

If then it means that the energetic changes between the wild type and mutant native states accounts for the entire energetic difference observed in the equilibrium stability study (and we conclude that the mutation has affected the native state exclusively)

• If a mutation affects the native state and transition states equally, then it is assumed that the mutation site is as folded in the transition state as it is in the native state (i.e. the mutation site adopts the native configuration in the transition state)
• If a mutation affects the denatured state and transition states equally, then it is assumed that the mutation site is as unfolded in the transition state as it is in the denatured state (i.e. the mutation site adopts the denatured configuration in the transition state)
• In the above example, the transition state and the denatured state are unaffected by the mutation - in other words, the mutation has affected these states equally - thus, we would conclude that the site of mutation is as unfolded in the transition state as it is in the denatured state

• In this case, the perturbation of the mutation upon the denatured state is equivalent to the perturbation of the transition state.  Thus the site of mutation is unfolded in the transition state; it does not form part of the critical folding nucleus (i.e. folding transition state).

NOTE: there is potential for ambiguity in the energy diagram above. For example, the following two energy diagrams would yield exactly the same kinetics and equilibrium thermodynamics:  In the first diagram the D and D* states are assumed to be energetically equivalent; whereas in the second diagram the N and N* states are assumed to be energetically equivalent. Note however that in both diagrams the various thermodynamic parameters are identical. Thus, we cannot state with confidence the absolute energy levels; but what we can say with confidence is whether the state energy is moving coordinately with either the N or D state. In the above case, the state energy is moving coordinately with the D state energy (and the site of mutation is considered to be as unfolded in the transition state as it is in the D state).

Example III: In this example:

·       The values of the unfolding rate constants, ku and k*u, for wild-type and mutant are observed to be equal, therefore, the values of DG‡ - N and for DG*‡ - N are identical and the value of DDG‡ - N = 0

·       The folding rate constants are different between mutant and wild-type (faster for the wild-type).  Thus, DG‡ - D values are different and the value of DDG‡ - D is negative. (Note: DDG- D is also referred to as DDGfolding or DDGf)

• Note that in the above case, the DDGD - N value is equal to the value of DDG‡ - D.  In other words, it looks like if you make a mutation that affects the stability of the protein, and if this effect is characterized by changes only upon the folding rate constants, then the mutation has affected exclusively the denatured state.

If then it means that the energetic changes between the wild type and mutant denatured states accounts for the entire energetic difference observed in the equilibrium stability study (and we conclude that the mutation has affected the denatured state exclusively)

• For this example, the perturbation of the mutation on the transition state is equivalent to the perturbation upon the native state.  Therefore, the site of mutation is as folded in the transition state as it is in the native state; and this position forms part of the critical folding nucleus.

NOTE: the same relative energy ambiguity exists with this example also. The following two energy diagrams are indistinguishable in terms of thermodynamics and folding/unfolding kinetics:  In the first image the N and N* states are assumed to be energetically equivalent. In the second image the D and D* states are assumed to be energetically equivalent. However, notice again that all thermodynamic and kinetic parameters are unchanged. Thus, the only firm conclusion that can be state with confidence as regards energy levels is that the transition state and N states move coordinately. Thus, the site of mutation is as folded in the N state as it is in the transition state (and forms part of the critical folding nucleus).

V. Folding and unfolding kinetic data and the "chevron plot" model

The folding and unfolding kinetic constants are determined experimentally by either stopped-flow or manual mixing techniques. To determine folding kinetic constants the protein sample is initially denaturated by dilution (or dialysis) into high concentration of denaturant (e.g. 7.0M GuHCl). This sample is then rapidly mixed with buffer having no denaturant – upon which the protein begins to refold. This rate is typically rapid and so is performed in a stopped-flow instrument (monitoring some spectroscopic probe of folding – such as fluorescence or circular dichroism). To determine unfolding kinetic constants the protein is diluted or dialyzed into native buffer (i.e. buffer containing no denaturant). It is then mixed with a buffer containing high-denaturant – and the protein begins to unfold. The rate is typically slower than folding, and so manual mixing methods typically suffice. Folding and unfolding typically (but not always) is fit to a single exponential function: The above image is an example of a stopped-flow refolding study at a particular final concentration of denaturant. The folding rate constant (kf) under this condition is determined by a fit to the single exponential equation shown. The half-life for a given rate constant is (1/kf)*LN(2). It is important to collect data that covers a significant portion of the maximum amplitude. One half-life covers 50%, two half-lives covers 75%, three half-lives covers 87.5%. Most experiments collect 5-10 half-lives worth of data. You can always truncate data.

If folding and unfolding kinetic data are plotted as ln(kf) and ln(ku) vs [Denaturant] an idealized example will demonstrate two linear arms – the "folding" arm and the "unfolding arm" (called a "chevron" plot because of its shape): At the point indicated by "Cm" the folding and unfolding rates are equal, and this is the definition of the Keq condition; Cm is the midpoint of denaturation (where N and D states are half-populated at equilibrium). It should agree with the Cm value determined from isothermal equilibrium denaturation studies (at the same temperature as folding kinetic studies). In practical terms, kinetic data for the folding arm extends up to Cm, and the kinetic data for the unfolding arm extends down to Cm. The chevron plot is defined by two linear functions, where kf0 is the folding rate at 0M denaturant (and ln(kf0) is the Y-intercept of the folding arm) and mkf is the slope of the ln(kf) function. Similarly, ku0 is the unfolding rate at 0M denaturant (and ln(ku0) is the Y-intercept of the unfolding arm) and mku is the slope of the ln(ku) function: The equation that defines the simple chevron plot is the combination of the folding and unfolding arms. The linear function of ln(kf) as a function of denaturant concentration (i.e. the folding arm) is:

ln(kf) = mkf*X + ln(kf0)

Similarly, the linear function of ln(ku) as a function of denaturant concentration (i.e. the unfolding arm) is:

ln(ku) = mku*X + ln(ku0)

The two are combined as:

ln(exp(folding arm)+exp(unfolding arm)

= ln(exp(mkf*X + ln(kf0))+exp(mku*X + ln(ku0)))

= ln((kf0*exp(mkf*X))+ku0*exp(mku*X))

When the rate data is plotted as ln(kobs) values the actual fit to the above equation will look something like this: Since the rate of folding and unfolding is dependent upon denaturant concentration the condition of 0M denaturant is the typical reference for quoting the intrinsic folding and unfolding rates (i.e. kf0 and ku0).

VI. f value analysis

The basis of f value analysis is to compare the overall free energy change for a mutation to the individual contributions of the folding and unfolding free energy change.  The analysis usually is focused upon understanding whether a particular mutation site is folded or unfolded in the transition state (and in this way probes the "structure" of the transition state).

• The DDGD - N value is determined by denaturation equilibrium methods
• The value is determined from the unfolding kinetic constants of the mutant and wild type
• The value is determined from the folding kinetic constants of the mutant and wild type
• is called the "folding f value".  If it equals 1.0 it means that the site of the mutation is native-like in the transition state (a value of 0 means the opposite)
• is called the "unfolding f value".  If it equals 1.0 it means that the site of the mutation is denatured in the transition state (a value of 0 means the opposite)
• Fractional , or negative, values for f values are more difficult to interpret

Usually the choice of either ff or fu is based upon whether folding kinetic data or unfolding kinetic data can be more accurately determined.

VII. Cross-validation

DDGD - N values can also be determined from the kinetic values: • If the two-state model correctly describes the protein denaturation, then the above value should agree with the value from equilibrium denaturation studies

The above equation also suggests that if unfolding kinetic data for a mutant can be obtained, but folding kinetic data cannot, it can be predicted by comparing the equilibrium DG data and the known kinetic data: Isothermal equilibrium denaturation (IED) data provides information on the thermodynamics of unfolding, but not the kinetics. However, if the thermodynamic and kinetic analyses have shared assumptions (i.e. two-state, reversible unfolding) then the thermodynamic and kinetic data should cross-validate (i.e. be in agreement where applicable).

The IED data provides information on ΔGunfolding (ΔGu) as a function of denaturant: ΔGu = (m-value*X)+ΔG0

ΔGu is related to the equilibrium constant for unfolding (Keq):

ΔGu = -RT*ln(Keq)

exp(ΔGu /-RT) = Keq

Expanding this equation by the definition of ΔGu = (m-value*X)+ΔG0:

exp(((m-value*X)+ ΔG0)/-RT) = Keq

The definition of Keq for protein unfolding in terms of folding and unfolding rate constants kf and ku:

Keq = ku/kf

Setting the two terms as equalities:

exp(((m-value*X)+ ΔG0)/-RT) = ku/kf

Thus:

ku = kf * exp(((m-value*X)+ ΔG0)/-RT)

and

kf = ku / exp(((m-value*X)+ ΔG0)/-RT)

In other words, if the folding/unfolding is two-state you can predict the ku function from the kf function plus IED data; or you can predict the kf function from the ku function and IED data. This information allows you to fit the chevron plot data using knowledge of both folding/unfolding constants and IED data.

For example:

ln(kf) = mkf*X + ln(kf0)

kf = kf0*exp(mkf*X)

exp(((m-value*X)+ ΔG0)/-RT) = Keq = ku/kf

now we can define ku in terms of kf and IED m-value and ΔG0:

(kf0*exp(mkf*X)) * exp(((m-value*X)+ ΔG0)/-RT) = ku

The standard chevron plot equation is:

Y = LN(kf0*exp(mkf*X)+ku0*exp(mku*X))

Substituting the ku term yields:

Y = LN(kf0*exp(mkf*X)+ (kf0*exp(mkf*X)) * exp(((m-value*X)+ ΔG0)/-RT))

This will cause the chevron plot to be fit with linear functions for both folding/unfolding arms, and for the resultant Keq (i.e. ΔG function) to be equal to that derived from the IED data (note: this would require these terms to be constant values during the fit).

VIII. Hammond behavior

The chevron plot folding and unfolding arms often exhibit non-linear behavior. This is typically a "roll-over" at either low or high denaturant concentrations. This can be due to the structure of the transition state changing to a neighboring intermediate on the reaction profile. Thus, the folding/unfolding arms may be better modeled by a polynomial that includes a second order term for the curvature. Note that the ΔGu(denaturant) function determined from IED is still a linear function. Modification of the above model is as follows:

kf = kf0*exp(mkf*X+bkf*X2)

and

ln(kf) = ln(kf0)+ln(exp(mkf*X+bkf*X2))

ln(kf) = ln(kf0)+(mkf*X+bkf*X2)

exp(((m-value*X)+ ΔG0)/-RT) = Keq = ku/kf

ku = (kf0*exp(mkf*X+bkf*X2)) * (exp(((m-value*X)+ ΔG0)/-RT))

and

ln(ku) = ln((kf0*exp(mkf*X+bkf*X2)) * (exp(((m-value*X)+ ΔG0)/-RT))

ln(ku) = ln(kf0*exp(mkf*X+bkf*X2)) + ln(exp(((m-value*X)+ ΔG0)/-RT))

ln(ku) = ln(kf0) + ln(exp(mkf*X+bkf*X2)) + ((m-value*X)+ ΔG0)/-RT

ln(ku) = ln(kf0) + (mkf*X+bkf*X2) + ((m-value*X)+ ΔG0)/-RT

with ln(kf) and ln(ku) defined, we can now state the chevron plot function:

Y = ln(exp(ln(kf))+exp(ln(ku)))

Y = ln( exp(ln(kf0)+(mkf*X+bkf*X2)) + exp(ln(kf0) + (mkf*X+bkf*X2) + ((m-value*X)+ ΔG0)/-RT))

Y = ln( exp(ln(kf0))*exp(mkf*X+bkf*X2) + exp(ln(kf0)*exp((mkf*X+bkf*X2) + ((m-value*X)+ ΔG0)/-RT))

Y = ln( kf0*exp(mkf*X+bkf*X2) + kf0*exp((mkf*X+bkf*X2) + ((m-value*X)+ ΔG0)/-RT))

This will cause the chevron plot to be fit with a second order polynomial (the 2nd order term of which is identical for both arms) and as a two-state model whose ΔG(denaturant) agrees with the IED data (the terms for IED m-value and ΔG0 are defined as constants during the fit). Once the fitted parameters are determined, the following relationships hold:

ku0 = kf0 * exp(ΔG0/-RT)

bkf is the 2nd order polynomial for folding arm also

Cm = -ΔG0/m-value

mku = ln(kf0/ku0)/Cm + mkf

The independent ln(kf) and ln(ku) baselines from the above dataset: IX. Some derivations

The rate of folding is proportional to the free energy difference between the denatured state and the transition state: Assumptions:

·       k = 1.0

· Boltzmann's constant, kb = 1.380 x 10-23 J K-1

Temperature in K

Planck's constant, h = 6.626 x 10-34 J sec

·       n has units of (J K-1)*(K)/(J sec) = sec-1 (appropriate for a rate constant) Calculation of DDG‡-N from experimental rate constants of unfolding: Assuming n and k are the same in both cases: DDG‡-D follows a similar derivation.