Dimension Reuction with CytofDR

CytofDR is a PyCytoData Alliance Plus member, meaning that you can run DR right inside the PyCytoData object. Although the option is somewhat limited with only the run_dr_methods method, which is a close counterpart to that of the same method in CytofDR, you can run most of the standard DR with this method. Alternatively, you can run your own DR methods and store the Reductions method in your PyCytoData object. This tutorial showcases each of the two methods.

To run DR, you will need to install the ``CytofDR`` package first. To get started, a quick guide can be found here. CytofDR is not a mandatory dependency of PyCytoData.


Standard DR Workflow with run_dr_methods

One of the standard workflows of CyTOF data analysis is DR. And if you wish to use the default settings, you can use this simple API without having to dive into CytofDR. To get it started, you can do the following:

>>> from PyCytoData import DataLoader
>>> exprs = DataLoader.load_dataset(dataset="levine13", preprocess=True)
>>> exprs.run_dr_methods(methods=["UMAP", "open_tsne"])
Running UMAP
Running open_tsne
===> Finding 90 nearest neighbors using Annoy approximate search using euclidean distance...
--> Time elapsed: 8.48 seconds
===> Calculating affinity matrix...
--> Time elapsed: 2.80 seconds
===> Running optimization with exaggeration=12.00, lr=6189.67 for 250 iterations...
Iteration   50, KL divergence 6.3260, 50 iterations in 1.4977 sec
Iteration  100, KL divergence 5.1775, 50 iterations in 1.5622 sec
Iteration  150, KL divergence 4.7426, 50 iterations in 1.5219 sec
Iteration  200, KL divergence 4.4741, 50 iterations in 1.5252 sec
Iteration  250, KL divergence 4.2860, 50 iterations in 1.5313 sec
--> Time elapsed: 7.64 seconds
===> Running optimization with exaggeration=1.00, lr=6189.67 for 250 iterations...
Iteration   50, KL divergence 3.5153, 50 iterations in 1.4306 sec
Iteration  100, KL divergence 2.8738, 50 iterations in 1.5975 sec
Iteration  150, KL divergence 2.4706, 50 iterations in 2.3137 sec
Iteration  200, KL divergence 2.1967, 50 iterations in 3.2659 sec
Iteration  250, KL divergence 1.9980, 50 iterations in 4.6722 sec
--> Time elapsed: 13.28 seconds

Now, a Reductions object is stored in the reductions attribute. Furthermore, the original_data and original_cell_types attributed are automatically populated if available from the PyCytoData object. As with the standard workflow in CytofDR, you can evaluate your DR embeddings:

>>> exprs.reductions.evaluate(category = ["global", "local", "downstream"], auto_cluster = True, n_clusters = 20)
Evaluating global...
Evaluating local...
Evaluating downstream...

For more on the standard workflow, you can read the documentation of the CytofDR package.


Using the Full CytofDR API

You can use the full CytofDR API and then add the Reductions object to the PyCytoData manually. Here, we showcase how you can use a custom setting for DR:

>> from CytofDR import dr
>>> embedding = dr.NonLinearMethods.UMAP(data = exprs.expression_matrix, out_dims=2, n_neighbors = 30, min_dist = 0)
>>> results = dr.Reductions(reductions = {"custom_umap": embedding})

Then you can add the Reductions object into the PyCytoData object:

>>> exprs.reductions = results

Then, you can follow the standard procedures as usual. For more documentation on this, see this tutorial.