Module: PyCytoData.preprocess

PyCytoData.preprocess.arcsinh(data: ArrayLike, channels: ArrayLike | None = None, transform_channels: ArrayLike | None = None, cofactor: int = 5) ndarray[source]

Arcsinh transformation for CyTOF data.

Arcsinh transformation is often the first step to preprocessing data. This function flexibly allows users to transform their data at a cofactor of their choice and to specify a transformation of their choice.

Parameters:
  • data (ArrayLike) – The expression matrix array of two dimensions.

  • channels (ArrayLike, optional) – The channel names of the expression matrix in the order of the columns, defaults to None

  • transform_channels (ArrayLike, optional) – The channels to transformed as specify by name, defaults to None

  • cofactor (int) – The cofactor, defaults to 5

Returns:

The arcsinh transformed expression matrix.

Return type:

ArrayLike

PyCytoData.preprocess.bead_normalization(data: ArrayLike, channels: ArrayLike, bead_channels: ArrayLike, time_channel: ArrayLike, transform_channels: ArrayLike) Tuple[ndarray, ndarray][source]

Bead normalization to correct time-shift throughout an experiment.

Sometimes, CyTOF has a phenomenon known as time-dependent shift, meaning that the over expression becomes biased over time. To correct this, bead normaliztion is used.

Parameters:
  • data (ArrayLike) – The expression matrix array of two dimensions.

  • channels (ArrayLike) – The channel names of the expression matrix in the order of the columns

  • bead_channels (ArrayLike) – The bead channels as specify by name

  • time_channel (ArrayLike) – The time channels as specify by name

  • transform_channels (ArrayLike) – The transform channels to apply the normalization as specify by name

Returns:

A tuple of the normalized expression matrix and indices based on original data.

Return type:

Tuple[np.ndarray, np.ndarray]

Raises:

ValueError – More than 1 “Time” channel provided.

PyCytoData.preprocess.gate_center_offset_residual(data: ArrayLike, channels: ArrayLike, cor_channels: ArrayLike, cutoff_quantile: float = 0.03) Tuple[ndarray, ndarray][source]

Gating for center, offser, and residual cells.

This gating procedure gates for cells using the center, offset, and residual channels. All channel names and the three channels are needed.

Parameters:
  • data (ArrayLike) – The expression matrix array of two dimensions.

  • channels (ArrayLike) – The channel names of the expression matrix in the order of the columns, defaults to None

  • cor_channels (ArrayLike) – The center, offset, and residual channels as specify by name, defaults to None

  • cutoff_quantile (float) – The top and bottom quantile to be excluded, defaults to 0.03.

Returns:

A tuple of the gated expression matrix and indices based on original data.

Return type:

Tuple[np.ndarray, np.ndarray]

PyCytoData.preprocess.gate_debris_removal(data: ArrayLike, channels: ArrayLike, bead_channels: ArrayLike) Tuple[ndarray, ndarray][source]

Pre-gating to remove debris.

This is a first step of the gating procedures to remove debris. Channels names and bead channel names are needed.

Parameters:
  • data (ArrayLike) – The expression matrix array of two dimensions.

  • channels (ArrayLike) – The channel names of the expression matrix in the order of the columns, defaults to None

  • bead_channels (ArrayLike) – The bead channels as specify by name, defaults to None

Returns:

The gated expression matrix or an array of indices.

Return type:

Tuple[np.ndarray, np.ndarray]

PyCytoData.preprocess.gate_intact_cells(data: ArrayLike, channels: ArrayLike, DNA_channels: ArrayLike, cutoff_DNA_sd: float = 2) Tuple[ndarray, ndarray][source]

Gating for intact cells.

This gating procedure gates for intact cells following the debris removal procedure. All channel names and DNA channel names needed.

Parameters:
  • data (ArrayLike) – The expression matrix array of two dimensions.

  • channels (ArrayLike) – The channel names of the expression matrix in the order of the columns, defaults to None

  • DNA_channels (ArrayLike) – The DNA channels as specify by name, defaults to None

  • cutoff_DNA_sd (float) – The number of standard deviations away from the mean to use as a cutoff for DNA channels, defaults to 2.

Returns:

A tuple of the gated expression matrix and indices based on original data.

Return type:

Tuple[np.ndarray, np.ndarray]

PyCytoData.preprocess.gate_live_cells(data: ArrayLike, channels: ArrayLike, dead_channel: ArrayLike, cutoff_quantile: float = 0.03) Tuple[ndarray, ndarray][source]

Gating for live cells.

This gating procedure gates for living cells following the gating procedure for intact cells. All channel names and ‘Dead’ channel names needed.

Parameters:
  • data (ArrayLike) – The expression matrix array of two dimensions.

  • channels (ArrayLike) – The channel names of the expression matrix in the order of the columns, defaults to None

  • dead_channel (ArrayLike) – The dead channels as specify by name, defaults to None

  • cutoff_quantile – The top quantile to be excluded, defaults to 0.03.

Returns:

A tuple of the gated expression matrix and indices based on original data.

Return type:

Tuple[np.ndarray, np.ndarray]

Raises:

ValueError – More than 1 “Dead” channel provided.