Dimension Reuction with CytofDR
CytofDR is a PyCytoData Alliance Plus member, meaning that you can run DR right inside
the PyCytoData object. Although the option is somewhat limited with only the run_dr_methods
method, which is a close counterpart to that of the same method in CytofDR, you can run most
of the standard DR with this method. Alternatively, you can run your own DR methods and store the
Reductions method in your PyCytoData object. This tutorial showcases each of the two
methods.
To run DR, you will need to install the ``CytofDR`` package first. To get started, a quick
guide can be found here. CytofDR
is not a mandatory dependency of PyCytoData.
Standard DR Workflow with run_dr_methods
One of the standard workflows of CyTOF data analysis is DR. And if you wish to use the default
settings, you can use this simple API without having to dive into CytofDR. To get it
started, you can do the following:
>>> from PyCytoData import DataLoader
>>> exprs = DataLoader.load_dataset(dataset="levine13", preprocess=True)
>>> exprs.run_dr_methods(methods=["UMAP", "open_tsne"])
Running UMAP
Running open_tsne
===> Finding 90 nearest neighbors using Annoy approximate search using euclidean distance...
--> Time elapsed: 8.48 seconds
===> Calculating affinity matrix...
--> Time elapsed: 2.80 seconds
===> Running optimization with exaggeration=12.00, lr=6189.67 for 250 iterations...
Iteration 50, KL divergence 6.3260, 50 iterations in 1.4977 sec
Iteration 100, KL divergence 5.1775, 50 iterations in 1.5622 sec
Iteration 150, KL divergence 4.7426, 50 iterations in 1.5219 sec
Iteration 200, KL divergence 4.4741, 50 iterations in 1.5252 sec
Iteration 250, KL divergence 4.2860, 50 iterations in 1.5313 sec
--> Time elapsed: 7.64 seconds
===> Running optimization with exaggeration=1.00, lr=6189.67 for 250 iterations...
Iteration 50, KL divergence 3.5153, 50 iterations in 1.4306 sec
Iteration 100, KL divergence 2.8738, 50 iterations in 1.5975 sec
Iteration 150, KL divergence 2.4706, 50 iterations in 2.3137 sec
Iteration 200, KL divergence 2.1967, 50 iterations in 3.2659 sec
Iteration 250, KL divergence 1.9980, 50 iterations in 4.6722 sec
--> Time elapsed: 13.28 seconds
Now, a Reductions object is stored in the reductions attribute. Furthermore,
the original_data and original_cell_types attributed are automatically
populated if available from the PyCytoData object. As with the standard workflow
in CytofDR, you can evaluate your DR embeddings:
>>> exprs.reductions.evaluate(category = ["global", "local", "downstream"], auto_cluster = True, n_clusters = 20)
Evaluating global...
Evaluating local...
Evaluating downstream...
For more on the standard workflow, you can read the documentation
of the CytofDR package.
Using the Full CytofDR API
You can use the full CytofDR API and then add the Reductions object to the
PyCytoData manually. Here, we showcase how you can use a custom setting for
DR:
>> from CytofDR import dr
>>> embedding = dr.NonLinearMethods.UMAP(data = exprs.expression_matrix, out_dims=2, n_neighbors = 30, min_dist = 0)
>>> results = dr.Reductions(reductions = {"custom_umap": embedding})
Then you can add the Reductions object into the PyCytoData object:
>>> exprs.reductions = results
Then, you can follow the standard procedures as usual. For more documentation on this, see this tutorial.