Trajectory Analysis of Kidney IgAN Data with PILOT

PILOT

Welcome to the PILOT Package Tutorial for pathomics Data!

You can find the pathomics data here.

import pilotpy as pl
import scanpy as sc

Kidney_IgAN Tubuli

Reading Anndata

The Anndata (h5ad) file is in the Datasets folder.

adata_T=sc.read_h5ad('Datasets/Kidney_IgAN_T.h5ad') 

Loading the required information and computing the Wasserstein distance:

In order to work with PILOT, ensure that your Anndata object is loaded and contains the required information.

Use the following parameters to configure PILOT for your analysis (Setting Parameters):

adata: Pass your loaded Anndata object to PILOT.
emb_matrix: Provide the name of the variable in the obsm level that holds the dimension reduction ( PCA representation).
clusters_col: Specify the name of the column in the observation level of your Anndata that corresponds to cell types or clusters.
sample_col: Indicate the column name in the observation level of your Anndata that contains information about samples or patients.
status: Provide the column name that represents the status or disease (e.g., “control” or “case”).

pl.tl.wasserstein_distance(
    adata_T,
    clusters_col = 'Cell_type',
    sample_col = 'sampleID',
    status = 'status', 
    data_type = 'Pathomics'
    )

plotting the Cost matrix and the Wasserstein distance:

Here we show the heatmaps of the Cost matrix (cells) and Wasserstein distance (samples). At this point, you should have a 'Results_PILOT' folder inside the folder where you are running your analysis, which includes all the plots created by PILOT.

pl.pl.heatmaps(adata_T)

Trajectory:

Here we show the Diffusion map of Wasserstein distance. In the upcoming trajectory analysis, the labels "<30" mean an eGFR level below 30 (considered as Low), "30-60" denotes a reduced eGFR range (marked as Reduced), and ">60" indicates a normal eGFR level

pl.pl.trajectory(adata_T, colors = ['red','blue','orange'])

Kidney_IgAN Glomeruli

Reading Anndata

The Anndata (h5ad) file is in the Datasets folder.

adata_G=sc.read_h5ad('Datasets/Kidney_IgAN_G.h5ad') #First read the object

pl.tl.wasserstein_distance(
    adata_G,
    clusters_col = 'Cell_type',
    sample_col = 'sampleID',
    status = 'status',
    data_type = 'Pathomics'
    )

pl.pl.heatmaps(adata_G)

png

pl.pl.trajectory(adata_G, colors = ['red','blue','orange'])

png

Combination:

Here, we combine the distances of samples. We get the sum of distances of samples based on Tubuli and Glomeruli distances.

adata_Com = adata_G
adata_Com.uns['EMD'] = adata_G.uns['EMD'] + adata_T.uns['EMD']

pl.pl.trajectory(adata_Com, colors = ['red','blue','orange'])

png

Fit a principal graph:

The difussion map creates an embedding that potentially reveals a trajectory in the data. Next, PILOT explores EIPLGraph to find the structure of the trajectory. An important parameter is the source_node, which indicates the start of the trajectory. Here, we selected a normal sample (node with id = 2). This method returns ranked samples, which we define as a disease progression score (t = t1, ..., tn), where tl represents the ranking of the nth sample.

pl.pl.fit_pricipla_graph(adata_Com, source_node = 2)

png

Cell-type Importance Glomeruli:

pl.tl.cell_importance(adata_Com, xlim = 125)

png

Feature selection for Glomeruli based on Combination:

Saving morphological features and maps them with the obtained order by PILOT (for Glomeruli):

In this step, morphological features are extracted from all clusters and linked to pseudotime using the PILOT trajectory order.

The “extract_cells_from_pathomics” function creates a ‘cells’ folder inside the ‘Results_PILOT’ folder to store these morphological features and corresponding pseudotime values as ‘All.csv’.

pl.tl.extract_cells_from_pathomics(adata_Com)

Getting the log scale of features

data=pl.tl.norm_morphological_features(
 column_names = ['glom_sizes', 'glom_distance_to_closest_glom','glom_diameters','glom_tuft_sizes',
                 'glom_bowman_sizes'],
 name_cell = 'All'
    )

Morphological features changes for Glomeruli:

Firstly, we should note that for pathomics data, instead of using cluster-specific changes for features/genes (please see MI tutorial analysis), we use the fit models to find the structural changes.

We apply the morphological_features_importance function to catch the features/structures that are changing over the trajectory (combination) for Glomeruli.

pl.tl.morphological_features_importance(data, height = 10, x_lim = 160, width = 20)    

Name of Cluster : [1mAll[0m
Sparsity:6.486269746268921e-05
For this cell_type, p-value of  14 genes are statistically significant.
  Expression pattern  count
0        linear down      6
1          linear up      4
2     quadratic down      2
3       quadratic up      2
data saved successfully

png

Feature selection for Tubuli based on Combination

We repeat the same process for Tubuli.

Saving morphological features and map them with the obtained order by PILOT (for Tubuli):

adata_T.uns['orders'] = adata_Com.uns['orders']
pl.tl.extract_cells_from_pathomics(adata_T)

Getting the log scale of features

data=pl.tl.norm_morphological_features(
 column_names = ['tubule_diameters', 'tubule_sizes', 'tubule_distance_to_closest_instance'],
 name_cell = 'All'
    )

Morphological features changes for Tubuli

pl.tl.morphological_features_importance(data, x_lim = 160, height = 5)    

Name of Cluster : [1mAll[0m
Sparsity:0.0
For this cell_type, p-value of  3 genes are statistically significant.
  Expression pattern  count
0        linear down      2
1          linear up      1
data saved successfully

png