Analyzing output
Install dependencies
Julia has several packages for plotting. Here, we will use Plots.jl
which you can install with
pkg> add Plots
Loading output
PhysiCellSnapshot
The base unit of PhysiCell output is the PhysiCellSnapshot
. These are currently considered pcvct internals and so the API may change. Each snapshot records the path to the PhysiCell output folder, its index in the sequence of outputs, the time of the snapshot in the simulation, and optionally the cell, substrate, and mesh data at that snapshot.
PhysiCellSequence
A PhysiCellSequence
is the full sequence of snapshots corresponding to a single PhysiCell simulation. As with PhysiCellSnapshot
's, these are currently considered internals and their API may change. In addition to the path to the PhysiCell output folder and the vector of PhysiCellSnapshot
's, it holds metadata for the simulation.
cellDataSequence
The main function to get sequences of cell data is cellDataSequence
. It accepts any of a simulation ID (<:Integer
), a simulation (::Simulation
), or a sequence (::PhysiCellSequence
) and either a single label (::String
) or a vector of labels (::Vector{String}
). For each cell in the simulation (as determined by the cell ID), the output creates a dictionary entry (the key is the integer cell ID) whose value is a named tuple with the input labels as keys as well as :time
. This means that if one sets
data = cellDataSequence(1, "position")
Then one can access the positions of the cell with ID 78 by
cell_78_positions = data[78].position # an Nx3 matrix for the N integer-indexed outputs (ignores the `initial_*` and `final_*` files)
and plot the x-coordinates of this cell over time using
cell_78_times = data[78].time
using Plots
plot(cell_78_times, cell_78_positions[:,1])
Note: Each call to cellDataSequence
will load all the data unless a PhysiCellSequence
is passed in. Plan your analyses accordingly as loading simulation data is not fast.
Population plots
Group by Monad
Plotting population plots is one the most basic analysis tasks and pcvct makes it super easy! If you call plot
on a Simulation
, Monad
, Sampling
, or the return value of a call to run
(though not for a sensitivity analysis), then a sequence of panels will be generated in a single figure. Each panel will correspond to a Monad
(replicates using the same parameter values) and will plot mean +/- SD for each cell type.
Finer-grained control of the output is possible, too!
- to include dead cells in your counts:
plot(...; ..., include_dead=true, ...)
- select a subset of cell types to include:
plot(...; ..., include_cell_type_names="cancer", ...)
- select a subset of cell types to exclude:
plot(...; ..., exclude_cell_type_names="cancer", ...)
- choose time units for the x-axis:
plot(...; ..., time_unit=:h, ...)
The include_cell_type_names
and exclude_cell_type_names
can also accept a Vector{String}
to include or exclude certain cell types, respectively. Furthermore, if the value of include_cell_type_names
is a Vector
and one of its entries is a Vector{String}
, pcvct will interpret this to sum up those cell types. In other words, to get the total tumor cell count in addition to the epithelial ("epi"
) and mesenchymal ("mes"
) components, you could use
using Plots
plot(Monad(1); include_cell_type_names=["epi", "mes", ["epi", "mes"]])
Finally, this makes use of Julia's Plot Recipes (see RecipesBase.jl) so any standard plotting keywords can be passed in:
using Plots
colors = [:blue :red] # Note the absence of a `,` or `;`. This is how Julia requires different series parameters to be passed in
plot(Simulation(1); color=colors, include_cell_type_names=["cd8", "cancer"]) # will plot cd8s in blue and cancer in red.
Group by cell type
Invert the above by including all data for a single cell type across all monads in a single panel with a call to plotbycelltype
. This function works on any T<:AbstractTrial
(Simulation
, Monad
, Sampling
, or Trial
) as well as any PCVCTOutput
object (the return value to run
). Everything above for plot
applies here.
using Plots
plotbycelltype(Sampling(1); include_cell_type_names=["epi", "mes", ["epi", "mes"]], color=[:blue :red :purple], labels=["epi" "mes" "both"], legend=true)
Substrate analysis
pcvct supports two ways to summarize substrate information over time.
AverageSubstrateTimeSeries
An AverageSubstrateTimeSeries
gives the time series for the average substrate across the entire domain.
simulation_id = 1
asts = pcvct.AverageSubstrateTimeSeries(simulation_id)
using Plots
plot(asts.time, asts["oxygen"])
ExtracellularSubstrateTimeSeries
An ExtracellularSubstrateTimeSeries
gives the time series for the average substrate concentration in the extracellular space neighboring all cells of a given cell type. In a simulation with cd8
cells and IFNg
diffusible substrate, plot the average concentration of IFNg experienced by CD8+ T cells using the following:
simulation_id = 1
ests = pcvct.ExtracellularSubstrateTimeSeries(simulation_id)
using Plots
plot(ests.time, ests["cd8"]["IFNg"])
Motility analysis
The motilityStatistics
function returns the time alive, distance traveled, and mean speed for each cell in the simulation. For each cell, these values are split amongst the cell types the given cell assumed throughout (or at least at the save times). To calculate these values, the cell type at the start of the save interval is used and the net displacement is used to calculate the speed. Optionally, users can pass in a coordinate direction to only consider speed in a given axis.
simulation_id = 1
mss = motilityStatistics(simulation_id)
all_mean_speeds_as_mes = [ms["mes"].speed for ms in mss if haskey(ms, "mes")] # concatenate all speeds as a "mes" cell type (if the given cell ever was a "mes")
all_times_as_mes = [ms["mes"].time for ms in mss if haskey(ms, "mes")] # similarly, get the time spent in the "mes" state
mean_mes_speed = all_mean_speeds_as_mes .* all_times_as_mes |> sum # start computing the weighted average of their speeds
mean_mes_speed /= sum(all_times_as_mes) # finish computing weighted average
mss = motilityStatistics(simulation_id; direction=:x) # only consider the movement in the x direction
Pair correlation function (PCF)
Sometimes referred to as radial distribution functions, the pair correlation function (PCF) computes the density of target cells around center cells. If the two sets of cells are the same (centers = targets), this is called PCF. If the two are not equal, this is sometimes called cross-PCF. Both can be computed with a call to pcvct.pcf
(or just pcf
if using PairCorrelationFunction
has been called).
Arguments
PCF computations can readily be called on PhysiCellSnapshot
's, PhysiCellSequence
's, or Simulation
's. If the first argument in a call to pcf
is an Integer
, this is treated as a simulation ID. If this is followed by an index (of type Integer
or value :initial
or :final
), this is treated as a snapshot; otherwise, it computes the PCF for the entire simulation.
The next argument is the cell type to use as the center cells as either a String
or Vector{String}
, representing the name of the cell type(s). If the target cells are different from the center cells, the next argument is the target cell type as either a String
or Vector{String}
. If omitted, the target cell type is the same as the center cell type and a (non-cross) PCF is computed. The resulting sets of center and target cell types must either be identical or have no overlap.
Keyword arguments
The following keyword arguments are available:
include_dead::Union{Bool, Tuple{Bool,Bool}} = false
: whether to include dead cells in the PCF computation.- If
true
, all cells are included. - If
false
, only live cells are included. - If a tuple, the first value is for the center cells and the second is for the target cells.
- If
dr::Float64 = 20.0
: the step size for the radial bins in micrometers.
Output
The output of pcf
is a PCVCTPCFResult
object which has two fields: time
and pcf_result
. The time
field is always a vector of the time points at which the PCF was computed, even if computing PCF for a single snapshot. The pcf_result
is of type PairCorrelationFunction.PCFResult
and has two fields: radii
and g
. The radii
is the set of cutoffs used to compute the PCF and g
is either a vector or a matrix of the PCF values of size length(radii)-1
by length(time)
.
Plotting
An API to make use of the PairCorrelationFunction
package plotting interface is available through the plot
function. Simply pass in the PCVCTPCFResult
! You can pass in as many such objects as you like or pass in a Vector{PCVCTPCFResult}
. In this case, these are interpreted as stochastic realizations of the same PCF and summary statistics are used to plot. See the PairCorrelationFunction
documentation for more details.
The pcvct implementation supports two keyword arguments:
time_unit::Symbol = :min
: the time unit to use for the time axis (only relevant if thePCVCTPCFResult
has more than one time point).- The default is
:min
and the other options are:s
,:h
,:d
,:w
,:mo
,:y
.
- The default is
distance_unit::Symbol = :um
: the distance unit to use for the distance axis.- The default is
:um
and the other options are:mm
and:cm
.
- The default is
Finally, a keyword argument supported by PairCorrelationFunction
is colorscheme
which can be used to change the colorscheme of the color map. pcvct overrides the default from PairCorrelationFunction
(:tofino
) with :cork
to use white to represent values near one.
Examples
simulation_id = 1
result = pcvct.pcf(simulation_id, "cancer", "cd8") # using PairCorrelationFunction will obviate the need to prefix with `pcvct`
plot(result) # heatmap of proximity of (living) cd8s to (living) cancer cells throughout simulation 1
monad = Monad(1) # let's assume that there are >1 simulations in this monad
results = [pcvct.pcf(simulation_id, :final, "cancer", "cd8") for simulation_id in simulationIDs(monad)] # one vector of PCF values for each simulation at the final snapshot
plot(results) # line plot of average PCF values against radius across the monad +/- 1 SD
monad = Monad(1) # let's assume that there are >1 simulations in this monad
results = [pcvct.pcf(simulation_id, "cancer", "cd8") for simulation_id in simulationIDs(monad)] # one matrix of PCF values for each simulation across all time points
plot(results) # heatmap of average PCF values with time on the x-axis and radius on the y-axis; averages omit NaN values that can occur at higher radii
Graph analysis
Every PhysiCell simulation produces three different directed graphs at each save time point. For each graph, the vertices are the cell agents and the edges are as follows:
:neighbors
: the cells overlap based on their positions and radii:attachments
: manually-defined attachments between cells:spring_attachments
: spring attachments formed automatically using attachment rates
Each of these graphs is expected to be symmetric, i.e., if cell A is attached to cell B, then cell B is attached to cell A. Nonetheless, pcvct holds the data in a directed graph.
Currently, pcvct supports computing connected components for any of these graphs using the connectedComponents
function. For an pcvct.AbstractPhysiCellSequence
object, the graphs can be loaded using the loadGraph!
function for any other analysis.
Examples
For all examples that follow, we will assume a PhysiCellSnapshot
object called snapshot
has been created, e.g., as follows:
simulation_id = 1
index = :final
snapshot = PhysiCellSnapshot(simulation_id, index)
To get a list of the connected components for the :neighbors
graph for all living cells in a simulation at the final timepoint, use
connected_components = connectedComponents(snapshot) # defaults to the :neighbors graph, all cells, and exclude dead cells
The connected_components
object is a Dict
with the cell type names in a single vector as the only key with value a vector of vectors. Each element is a vector of the cell IDs belonging to that connected component in the wrapper type pcvct.AgentID
.
If you want to compute connected components for subsets of cells, pass in a vector of vectors of cell type names (String
s) such that each vector corresponds to a subset of cell types.
subset_1 = ["cd8_active", "cd8_inactive"]
subset_2 = ["cancer_epi", "cancer_mes"]
connected_components = connectedComponents(snapshot; include_cell_type_names=[subset_1, subset_2])
In this case, the connected_components
object is a Dict
with subset_1
and subset_2
as the keys (the values stored in them, not the strings "subset_1"
and "subset_2"
). The value connected_components[subset_1]
is a vector of vectors of the cell IDs belonging to each connected component just considering the cells in subset_1
. Similarly for subset_2
.
Including dead cells is possible though not recommended. This is because dead cells automatically clear their neighbors and attachments. The optional keyword argument include_dead
can be set to true
to include dead cells in the graph.
connected_components = connectedComponents(snapshot; include_dead=true)
Finally, to combine a single connected component with the dataframe of cell data, the following can be used:
connected_components = connectedComponents(snapshot)
connected_components_1 = connected_components |> # julia's pipe operator
values |> # get the value for each key
first |> # get the connected components for the first subset of cells (in this case there's only one subset consisting of all cells)
first # get the first connected component for this subset
loadCells!(snapshot) # make sure the cell data is loaded
cells_df = snapshot.cells # this is the cell data
agent_ids = DataFrame(ID=[a.id for a in connected_components_1]) # get the IDs for the agents in the connected component
component_df = rightjoin(cells_df, agent_ids, on=:ID) # join on the agent IDs, keeping only the rows in the connected component