Dimension Reuction with CytofDR
CytofDR
is a PyCytoData Alliance Plus member, meaning that you can run DR right inside
the PyCytoData
object. Although the option is somewhat limited with only the run_dr_methods
method, which is a close counterpart to that of the same method in CytofDR
, you can run most
of the standard DR with this method. Alternatively, you can run your own DR methods and store the
Reductions
method in your PyCytoData
object. This tutorial showcases each of the two
methods.
To run DR, you will need to install the ``CytofDR`` package first. To get started, a quick
guide can be found here. CytofDR
is not a mandatory dependency of PyCytoData
.
Standard DR Workflow with run_dr_methods
One of the standard workflows of CyTOF data analysis is DR. And if you wish to use the default
settings, you can use this simple API without having to dive into CytofDR
. To get it
started, you can do the following:
>>> from PyCytoData import DataLoader
>>> exprs = DataLoader.load_dataset(dataset="levine13", preprocess=True)
>>> exprs.run_dr_methods(methods=["UMAP", "open_tsne"])
Running UMAP
Running open_tsne
===> Finding 90 nearest neighbors using Annoy approximate search using euclidean distance...
--> Time elapsed: 8.48 seconds
===> Calculating affinity matrix...
--> Time elapsed: 2.80 seconds
===> Running optimization with exaggeration=12.00, lr=6189.67 for 250 iterations...
Iteration 50, KL divergence 6.3260, 50 iterations in 1.4977 sec
Iteration 100, KL divergence 5.1775, 50 iterations in 1.5622 sec
Iteration 150, KL divergence 4.7426, 50 iterations in 1.5219 sec
Iteration 200, KL divergence 4.4741, 50 iterations in 1.5252 sec
Iteration 250, KL divergence 4.2860, 50 iterations in 1.5313 sec
--> Time elapsed: 7.64 seconds
===> Running optimization with exaggeration=1.00, lr=6189.67 for 250 iterations...
Iteration 50, KL divergence 3.5153, 50 iterations in 1.4306 sec
Iteration 100, KL divergence 2.8738, 50 iterations in 1.5975 sec
Iteration 150, KL divergence 2.4706, 50 iterations in 2.3137 sec
Iteration 200, KL divergence 2.1967, 50 iterations in 3.2659 sec
Iteration 250, KL divergence 1.9980, 50 iterations in 4.6722 sec
--> Time elapsed: 13.28 seconds
Now, a Reductions
object is stored in the reductions
attribute. Furthermore,
the original_data
and original_cell_types
attributed are automatically
populated if available from the PyCytoData
object. As with the standard workflow
in CytofDR
, you can evaluate your DR embeddings:
>>> exprs.reductions.evaluate(category = ["global", "local", "downstream"], auto_cluster = True, n_clusters = 20)
Evaluating global...
Evaluating local...
Evaluating downstream...
For more on the standard workflow, you can read the documentation
of the CytofDR
package.
Using the Full CytofDR
API
You can use the full CytofDR
API and then add the Reductions
object to the
PyCytoData
manually. Here, we showcase how you can use a custom setting for
DR:
>> from CytofDR import dr
>>> embedding = dr.NonLinearMethods.UMAP(data = exprs.expression_matrix, out_dims=2, n_neighbors = 30, min_dist = 0)
>>> results = dr.Reductions(reductions = {"custom_umap": embedding})
Then you can add the Reductions
object into the PyCytoData
object:
>>> exprs.reductions = results
Then, you can follow the standard procedures as usual. For more documentation on this, see this tutorial.