Parameterizing the model¶

Multi-omics data, including fluxomics, metabolomics, and proteomics, together with enzyme kinetic databases such as BRENDA and SABIO-RK, as well as thermodynamics resources such as eQuilibrator, provide rich sources of information for parameterizing the kinetic model. However, these measurements are typically collected from different studies and experimental conditions and inherently contain noise and uncertainty. As a result, directly combining them often fails to produce a kinetically consistent model in which the mechanistic rate laws and steady-state constraints are simultaneously satisfied.

RobustNet adopts a Bayesian framework to infer feasible model parameters (posterior distributions) that remain close to the experimentally measured data (prior distributions) while trying to maintain consistency with the kinetic equations and feasible steady states.

Estimating the reference flux distribution¶

The first step is to estimate the reference-state flux distribution, which represents the steady-state metabolic fluxes before enzyme perturbation. These fluxes are typically obtained from isotope-assisted metabolic flux analysis (MFA). Our previously developed package, FreeFlux, can be used for this purpose.

Because the metabolic network used for MFA may differ from the kinetic network constructed in RobustNet, a calibration step is required to derive a flux distribution that both satisfies mass balance in the current network and remains close to the MFA estimates.

This step can be performed as follows:

[1]:

import pandas as pd
from robustnet import Model

MODEL_FILE = '../../models/e_coli/e_coli_model.xlsx'

model = Model('ecoli')
model.read_from_file(MODEL_FILE)

[2]:

FLUXOMICS = '../../models/e_coli/measured_fluxes.xlsx'
FLUX_BOUNDS = '../../models/e_coli/flux_bounds.xlsx'

flux_data = pd.read_excel(FLUXOMICS, header=0, index_col=0)
flux_bounds = pd.read_excel(FLUX_BOUNDS, header=0, index_col=0)

model.load_priors(
    'fluxomics',
    data=flux_data.iloc[:,0],
    std=flux_data.iloc[:,1],
)
fit_res = model.estimate_reference_fluxes(
    bounds={rxn: tuple(row) for rxn, row in flux_bounds.iterrows()},
    optimizer='scipy',
    method='COBYQA',
    tol=1e-8
)

Note: Intracellular metabolite and enzyme concentrations are typically expressed in mM. To maintain unit consistency and appropriate numerical scaling, fluxes in RobustNet are represented in units of mmol L\(^{-1}\) s\(^{-1}\).

Additional flux bounds can be provided to constrain the estimated fluxes and specify reaction directionality. For example, a lower bound greater than 0 enforces a forward-only reaction and an upper bound less than 0 enforces a reverse-only reaction as defined by the reaction direction in the model.

The estimated reference fluxes and their uncertainties can be accessed through the estimated_fluxes and estimated_flux_errors attributes of the returned FluxFitResults object. Comparison between the calibrated fluxes and the original MFA estimates can be visualized using the provided plotting function (using glucose-6-phosphate isomerase as an example).

[3]:

fit_res.plot_simulated_vs_measured_fluxes(
    out_dir=None,
    reactions=['PGI'],
    show_fig=True
)

Generating model parameter sets¶

The calibrated reference flux distribution is then integrated with metabolomics, proteomics, and enzyme kinetic data to generate ensemble model parameter sets. This step is performed by sampling from the posterior distributions of metabolite concentrations, enzyme concentrations, and kinetic parameters.

The following example demonstrates this procedure.

[4]:

METABOLOMICS = '../../models/e_coli/measured_metabolites.xlsx'
PROTEOMICS = '../../models/e_coli/measured_enzymes.xlsx'
KINETIC_PARAMETERS = '../../models/e_coli/measured_kinetic_parameters.xlsx'

metab_data = pd.read_excel(METABOLOMICS, header=0, index_col=0)
enz_data = pd.read_excel(PROTEOMICS, header=0, index_col=0)
kparam_data = pd.read_excel(KINETIC_PARAMETERS, header=0, index_col=0)

model.load_priors(
    'reference_fluxes',
    data=fit_res.estimated_fluxes,
    std=fit_res.estimated_flux_errors
)
model.load_priors(
    'metabolomics',
    data=metab_data.iloc[:,0],
    std=metab_data.iloc[:,1],
)
model.load_priors(
    'proteomics',
    data=enz_data.iloc[:,0],
    std=enz_data.iloc[:,1],
)
model.load_priors(
    'kparameters',
    data=kparam_data.iloc[:,0],
    std=kparam_data.iloc[:,1]
)
samp_res = model.generate_parameter_sets(
    alpha=None,
    n_tunes=5000,
    n_samples=2000,
    n_chains=10,
    n_jobs=10
)

Note: Metabolomics and proteomics data are expected in units of mM (cell volume-based). For kinetic parameters, catalytic constants have units of s\(^{-1}\), Michaelis, activation, and inhibition constants have units of mM, and equilibrium constants are dimensionless.

During posterior sampling, the alpha argument defines a tunable coefficient that controls the tradeoff between consistency with experimental measurements and feasibility of the kinetic model.

A smaller alpha keeps the posterior distributions closer to the original omics measurements and generally makes sampling easier, but the resulting kinetic model may be less balanced.
A larger alpha enforces stronger consistency between the kinetic equations and the reference steady state, but may lead to more difficult sampling and reduced convergence efficiency.

By default, alpha is set to the geometric mean of the inverse variances of the reference fluxes.

n_tunes is directly passed to the PyMC samplers and controls the number of tuning (burn-in) iterations used to stabilize sampling before posterior samples are collected. The total number of posterior samples is determined by both the number of sampling chains specified by n_chains and the number of samples drawn per chain specified by n_samples.

After sampling is completed, all results are received by samp_res. In particular, sampled kinetic parameter sets can be accessed through the sampled_kinetic_parameters attribute.

[5]:

samp_res.sampled_kinetic_parameters.sample(5)

[5]:

	Ka_ACCOA_vPPC	Ka_FBP_vPPC	Ka_FBP_vPYK	Ka_PEP_vFBPase	Keq_vACK	Keq_vACON	Keq_vADK	Keq_vAKGDH	Keq_vALCD	Keq_vALDDH	...	kcat_vPTA	kcat_vPYK	kcat_vRPE	kcat_vRPI	kcat_vSDH	kcat_vSUCOAS	kcat_vTAL	kcat_vTKT1	kcat_vTKT2	kcat_vTPI
10973	1.870129	0.870527	1.747532	2.243964	337.527424	0.073662	0.699464	155582.867178	1192.781793	0.491528	...	21.905336	124.891760	2.109956	2064.644968	76.213010	2.473156	13.864075	2.043599	1.760774	8741.377981
7099	2.332932	1.868009	0.978254	2.331784	322.006271	0.033219	0.945926	123754.968077	1026.579617	0.190596	...	131.290824	138.561988	1.863342	273.485551	59.245313	1.070354	12.369193	0.937129	1.461572	7178.409988
6569	2.010493	1.770139	0.833338	2.076259	141.966629	0.115442	0.839103	362922.022172	1356.087648	0.127495	...	155.345317	134.760783	2.146779	2355.050995	132.490001	3.681403	15.224663	0.254971	2.363003	9332.960377
9233	2.198546	1.933340	1.249352	2.111623	229.801055	0.081238	1.003101	185048.884845	1572.620443	0.225697	...	31.319926	153.433054	1.755173	586.469550	86.098486	2.388448	13.255742	1.460273	1.749851	7113.498239
7963	2.387464	1.557875	3.141863	3.601756	261.003044	0.160908	1.022405	103608.127097	1432.528746	0.297042	...	113.045903	153.958044	2.227516	311.411164	58.020611	1.738744	11.246085	2.944525	2.777271	7547.605797

5 rows × 351 columns

Comparisons between the prior distributions and sampled posterior distributions can be visualized using the provided plotting utility, illustrated here using the catalytic constant k\(_{cat}^{GlcPTS}\), Michaelis constant K\(_{m}^{PEP,PPC}\), and equilibrium constant K\(_{eq}^{FBA}\):

[7]:

samp_res.plot_sampled_vs_prior_kinetic_parameters(
    out_dir=None,
    parameters=['kcat_vGlcPTS', 'Km_PEP_vPPC', 'Keq_vFBA'],
    show_fig=True
)

Similarly, sampled metabolite concentration sets can be accessed through

[8]:

samp_res.sampled_metabolite_concentrations.sample(5)

[8]:

	AC	ACALD	ACCOA	ACETP	ADP	AKG	AMP	ATP	BPG	CIT	...	PYR	Pi	R5P	Ru5P	S7P	SUC	SUCCOA	UQ	UQH2	Xu5P
9932	0.210686	0.181244	0.302144	0.186986	0.030406	0.052820	0.008098	0.479109	0.137904	0.147553	...	0.054040	0.147472	0.018113	0.045438	0.145702	0.172249	0.132912	0.147945	0.141645	0.097859
7854	0.216531	0.125369	0.312520	0.150798	0.052235	0.045836	0.005128	0.510380	0.184664	0.196658	...	0.056460	0.145168	0.024162	0.109934	0.170378	0.140536	0.174965	0.200266	0.139823	0.179706
15282	0.210595	0.108958	0.337574	0.139802	0.055148	0.059934	0.005603	0.496219	0.177995	0.100588	...	0.072680	0.157538	0.029852	0.024523	0.112865	0.213476	0.102343	0.084586	0.135467	0.219680
276	0.202213	0.180576	0.287823	0.096652	0.083652	0.063701	0.006530	0.476595	0.177534	0.114668	...	0.046049	0.091807	0.018890	0.113220	0.174743	0.144751	0.105163	0.112118	0.257921	0.093595
6274	0.195270	0.233839	0.314113	0.225211	0.042064	0.059626	0.006866	0.529823	0.074897	0.216247	...	0.144874	0.156079	0.015216	0.085632	0.076764	0.125076	0.185114	0.142033	0.195642	0.183659

5 rows × 50 columns

and compared with the prior metabolomics data (using fructose 1,6-bisphosphate as an example).

[10]:

samp_res.plot_sampled_vs_prior_metabolites(
    out_dir=None,
    metabolites=['FBP'],
    show_fig=True
)

Sampled enzyme concentration sets are available through sampled_enzyme_concentrations, and comparisons with the prior proteomics data can be visualized (using pyruvate kinase as an example) as follows:

[11]:

samp_res.sampled_enzyme_concentrations.sample(5)

[11]:

	ACK	ACON	ADK	AKGDH	ALCD	ALDDH	ATPSyn	COX	CS	EDA	...	PPS	PTA	PYK	RPE	RPI	SDH	SUCOAS	TAL	TKT	TPI
1219	0.002922	0.038921	0.007545	0.009021	0.005966	0.006592	0.009701	0.001923	0.016836	0.008288	...	0.003619	0.003650	0.026954	0.007377	0.001651	0.004758	0.015840	0.019835	0.012376	0.005850
18685	0.003086	0.043500	0.008532	0.009250	0.006752	0.006218	0.009592	0.001733	0.016289	0.009758	...	0.003315	0.003463	0.023613	0.008305	0.001588	0.004434	0.014752	0.022703	0.013093	0.007637
15394	0.003669	0.050088	0.009360	0.008551	0.007302	0.006157	0.009723	0.002148	0.015850	0.009089	...	0.003443	0.003760	0.023940	0.008740	0.001465	0.003998	0.014591	0.022720	0.012212	0.007396
6000	0.003094	0.047729	0.008681	0.008357	0.008051	0.006331	0.009771	0.001866	0.017154	0.008673	...	0.003319	0.004492	0.025234	0.007138	0.001584	0.004128	0.015617	0.019959	0.013493	0.007629
9147	0.003227	0.045774	0.007356	0.009303	0.007498	0.006192	0.009635	0.001931	0.017013	0.008218	...	0.003567	0.003756	0.021409	0.008609	0.001349	0.004255	0.014663	0.015976	0.011786	0.006127

5 rows × 48 columns

[13]:

samp_res.plot_sampled_vs_prior_enzymes(
    out_dir=None,
    enzymes=['PYK'],
    show_fig=True
)

The full sampling trace is stored in samp_res.trace, which returns an arviz.InferenceData object. This enables direct compatibility with the analysis tool provided by ArviZ.

Parameterizing the model¶

Estimating the reference flux distribution¶

Generating model parameter sets¶

RobustNet

Navigation

Related Topics