Module: PyCytoData.preprocess
- PyCytoData.preprocess.arcsinh(data: ArrayLike, channels: ArrayLike | None = None, transform_channels: ArrayLike | None = None, cofactor: int = 5) ndarray [source]
Arcsinh transformation for CyTOF data.
Arcsinh transformation is often the first step to preprocessing data. This function flexibly allows users to transform their data at a cofactor of their choice and to specify a transformation of their choice.
- Parameters:
data (ArrayLike) – The expression matrix array of two dimensions.
channels (ArrayLike, optional) – The channel names of the expression matrix in the order of the columns, defaults to None
transform_channels (ArrayLike, optional) – The channels to transformed as specify by name, defaults to None
cofactor (int) – The cofactor, defaults to 5
- Returns:
The arcsinh transformed expression matrix.
- Return type:
ArrayLike
- PyCytoData.preprocess.bead_normalization(data: ArrayLike, channels: ArrayLike, bead_channels: ArrayLike, time_channel: ArrayLike, transform_channels: ArrayLike) Tuple[ndarray, ndarray] [source]
Bead normalization to correct time-shift throughout an experiment.
Sometimes, CyTOF has a phenomenon known as time-dependent shift, meaning that the over expression becomes biased over time. To correct this, bead normaliztion is used.
- Parameters:
data (ArrayLike) – The expression matrix array of two dimensions.
channels (ArrayLike) – The channel names of the expression matrix in the order of the columns
bead_channels (ArrayLike) – The bead channels as specify by name
time_channel (ArrayLike) – The time channels as specify by name
transform_channels (ArrayLike) – The transform channels to apply the normalization as specify by name
- Returns:
A tuple of the normalized expression matrix and indices based on original data.
- Return type:
Tuple[np.ndarray, np.ndarray]
- Raises:
ValueError – More than 1 “Time” channel provided.
- PyCytoData.preprocess.gate_center_offset_residual(data: ArrayLike, channels: ArrayLike, cor_channels: ArrayLike, cutoff_quantile: float = 0.03) Tuple[ndarray, ndarray] [source]
Gating for center, offser, and residual cells.
This gating procedure gates for cells using the center, offset, and residual channels. All channel names and the three channels are needed.
- Parameters:
data (ArrayLike) – The expression matrix array of two dimensions.
channels (ArrayLike) – The channel names of the expression matrix in the order of the columns, defaults to None
cor_channels (ArrayLike) – The center, offset, and residual channels as specify by name, defaults to None
cutoff_quantile (float) – The top and bottom quantile to be excluded, defaults to 0.03.
- Returns:
A tuple of the gated expression matrix and indices based on original data.
- Return type:
Tuple[np.ndarray, np.ndarray]
- PyCytoData.preprocess.gate_debris_removal(data: ArrayLike, channels: ArrayLike, bead_channels: ArrayLike) Tuple[ndarray, ndarray] [source]
Pre-gating to remove debris.
This is a first step of the gating procedures to remove debris. Channels names and bead channel names are needed.
- Parameters:
data (ArrayLike) – The expression matrix array of two dimensions.
channels (ArrayLike) – The channel names of the expression matrix in the order of the columns, defaults to None
bead_channels (ArrayLike) – The bead channels as specify by name, defaults to None
- Returns:
The gated expression matrix or an array of indices.
- Return type:
Tuple[np.ndarray, np.ndarray]
- PyCytoData.preprocess.gate_intact_cells(data: ArrayLike, channels: ArrayLike, DNA_channels: ArrayLike, cutoff_DNA_sd: float = 2) Tuple[ndarray, ndarray] [source]
Gating for intact cells.
This gating procedure gates for intact cells following the debris removal procedure. All channel names and DNA channel names needed.
- Parameters:
data (ArrayLike) – The expression matrix array of two dimensions.
channels (ArrayLike) – The channel names of the expression matrix in the order of the columns, defaults to None
DNA_channels (ArrayLike) – The DNA channels as specify by name, defaults to None
cutoff_DNA_sd (float) – The number of standard deviations away from the mean to use as a cutoff for DNA channels, defaults to 2.
- Returns:
A tuple of the gated expression matrix and indices based on original data.
- Return type:
Tuple[np.ndarray, np.ndarray]
- PyCytoData.preprocess.gate_live_cells(data: ArrayLike, channels: ArrayLike, dead_channel: ArrayLike, cutoff_quantile: float = 0.03) Tuple[ndarray, ndarray] [source]
Gating for live cells.
This gating procedure gates for living cells following the gating procedure for intact cells. All channel names and ‘Dead’ channel names needed.
- Parameters:
data (ArrayLike) – The expression matrix array of two dimensions.
channels (ArrayLike) – The channel names of the expression matrix in the order of the columns, defaults to None
dead_channel (ArrayLike) – The dead channels as specify by name, defaults to None
cutoff_quantile – The top quantile to be excluded, defaults to 0.03.
- Returns:
A tuple of the gated expression matrix and indices based on original data.
- Return type:
Tuple[np.ndarray, np.ndarray]
- Raises:
ValueError – More than 1 “Dead” channel provided.