SPEC 7 — Seeding Pseudo-Random Number Generation

Authors:
Stéfan van der Walt <stefanv@berkeley.edu>, Sebastian Berg <sebastianb@nvidia.com>, Pamphile Roy <roy.pamphile@gmail.com>, Matt Haberland <mhaberla@calpoly.edu>
Discussion:
https://github.com/scipy/scipy/issues/14322
History:
https://github.com/scientific-python/specs/commits/main/spec-0007
Endorsed by:
scikit-image, ipython, scipy

Description#

Currently, libraries across the ecosystem provide various APIs for seeding pseudo-random number generation. This SPEC suggests a unified, pragmatic API, taking into account technical and historical factors. Adopting such a uniform API will simplify the user experience, especially for those who rely on multiple projects.

We recommend:

  • standardizing the usage and interpretation of an rng keyword for seeding, and
  • avoiding the use of global state and legacy bitstream generators.

We suggest implementing these principles by:

  • deprecating uses of an existing seed argument (commonly random_state or seed) in favor of a consistent rng argument,
  • using numpy.random.default_rng to normalize the rng argument and instantiate a Generator1, and
  • deprecating the use of numpy.random.seed to control the random state.

We are primarily concerned with API uniformity, but also encourage libraries to move towards using NumPy pseudo-random Generators because:

  1. Generators avoid problems associated with naïve seeding (e.g., using successive integers), via its SeedSequence mechanism;
  2. their use avoids relying on global state—which can make code execution harder to track, and may cause problems in parallel processing scenarios.

Scope#

This is intended as a recommendation to all libraries that allow users to control the state of a NumPy random number generator. It is specifically targeted toward functions that currently accept RandomState instances via an argument other than rng, or allow numpy.random.seed to control the random state, but the ideas are more broadly applicable. Random number generators other than those provided by NumPy could also be accommodated by an rng keyword, but that is beyond the scope of this SPEC.

Concepts#

  • BitGenerator: Generates a stream of pseudo-random bits. The default generator in NumPy (numpy.random.default_rng) uses PCG64.
  • Generator: Derives pseudo-random numbers from the bits produced by a BitGenerator.
  • RandomState: a legacy object in NumPy, similar to Generator, that produces random numbers based on the Mersenne Twister.

Constraints#

NumPy, SciPy, scikit-learn, scikit-image, and NetworkX all implement pseudo-random seeding in slightly different ways. Common keyword arguments include random_state and seed. In practice, the seed is also often controllable using numpy.random.seed.

Core Project Endorsement#

Endorsement of this SPEC means that a project considers the standardization and interpretation of the rng keyword, as well as avoiding use of global state and legacy bitstream generators, good ideas that are worth implemented widely.

Ecosystem Adoption#

To adopt this SPEC, a project should:

  • deprecate the use of random_state/seed arguments in favor of an rng argument in all functions where users need to control pseudo-random number generation,
  • use numpy.random.default_rng to normalize the rng argument and instantiate a Generator, and
  • deprecate the use of numpy.random.seed to control the random state.

Badges#

Projects can highlight their adoption of this SPEC by including a SPEC badge.

SPEC 7 — Seeding pseudo-random number generation
[![SPEC 7 — Seeding pseudo-random number generation](https://img.shields.io/badge/SPEC-7-green?labelColor=%23004811&color=%235CA038)](https://scientific-python.org/specs/spec-0007/)
|SPEC 7 — Seeding pseudo-random number generation| 

.. |SPEC 7 — Seeding pseudo-random number generation| image:: https://img.shields.io/badge/SPEC-7-green?labelColor=%23004811&color=%235CA038
   :target: https://scientific-python.org/specs/spec-0007/
To indicate adoption of multiple SPECS with one badge, see this.

Implementation#

Legacy behavior in packages such as scikit-learn (sklearn.utils.check_random_state) typically handle None (use the global seed state), an int (convert to RandomState), or RandomState object.

Our recommendation here is a deprecation strategy which does not in all cases adhere to the Hinsen principle2, although it could very nearly do so by enforcing the use of rng as a keyword argument.

The deprecation strategy is as follows.

Initially, accept both rng and the existing random_state/seed/... keyword arguments.

  • If both are specified by the user, raise an error.
  • If rng is passed by keyword, normalize it with np.random.default_rng() and use it to generate random numbers as needed.
  • If random_state/seed/... is specified (by keyword or position, if allowed), preserve existing behavior.

After rng becomes available in all releases within the support window suggested by SPEC 0, emit warnings as follows:

  • If neither rng nor random_state/seed/... is specified and np.random.seed has been used to set the seed, emit a FutureWarning about the upcoming change in behavior.

  • If random_state/seed/... is passed by keyword or by position, treat it as before, but:

    • Emit a DeprecationWarning if passed by keyword, warning about the deprecation of keyword random_state in favor of rng.
    • Emit a FutureWarning if passed by position, warning about the change in behavior of the positional argument.

After the deprecation period, accept only rng, raising an error if random_state/seed/... is provided.

By now, the function signature, with type annotations, could look like this:

from collections.abc import Sequence
import numpy as np


SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
RNGLike = np.random.Generator | np.random.BitGenerator


def my_func(*, rng: RNGLike | SeedLike | None = None):
    """My function summary.

    Parameters
    ----------
    rng : `numpy.random.Generator`, optional
        Pseudorandom number generator state. When `rng` is None, a new
        `numpy.random.Generator` is created using entropy from the
        operating system. Types other than `numpy.random.Generator` are
        passed to `numpy.random.default_rng` to instantiate a `Generator`.
    """
    rng = np.random.default_rng(rng)

    ...

Also note the suggested language for the rng parameter docstring, which encourages the user to pass a Generator or None, but allows for other types accepted by numpy.random.default_rng (captured by the type annotation).

Impact#

There are three classes of users, which will be affected to varying degrees.

  1. Those who do not attempt to control the random state. Their code will switch from using the unseeded global RandomState to using an unseeded Generator. Since the underlying distributions of pseudo-random numbers will not change, these users should be largely unaffected. While technically this change does not adhere to the Hinsen principle, its impact should be minimal.

  2. Users of random_state/seed arguments. Support for these arguments will be dropped eventually, but during the deprecation period, we can provide clear guidance, via warnings and documentation, on how to migrate to the new rng keyword.

  3. Those who use numpy.random.seed. The proposal will do away with that global seeding mechanism, meaning that code that relies on it would, after the deprecation period, go from being seeded to being unseeded. To ensure that this does not go unnoticed, libraries that allowed for control of the random state via numpy.random.seed should raise a FutureWarning if np.random.seed has been called. (See Code below for an example.) To fully adhere to the Hinsen principle, these warnings should instead be raised as errors. In response, users will have to switch from using numpy.random.seed to passing the rng argument explicitly to all functions that accept it.

Code#

As an example, consider how a SciPy function would transition from a random_state parameter to an rng parameter using a decorator.

import numpy as np
import functools
import warnings


def _transition_to_rng(old_name, *, position_num=None, end_version=None):
    """Example decorator to transition from old PRNG usage to new `rng` behavior

    Suppose the decorator is applied to a function that used to accept parameter
    `old_name='random_state'` either by keyword or as a positional argument at
    `position_num=1`. At the time of application, the name of the argument in the
    function signature is manually changed to the new name, `rng`. If positional
    use was allowed before, this is not changed.*

    - If the function is called with both `random_state` and `rng`, the decorator
      raises an error.
    - If `random_state` is provided as a keyword argument, the decorator passes
      `random_state` to the function's `rng` argument as a keyword. If `end_version`
      is specified, the decorator will emit a `DeprecationWarning` about the
      deprecation of keyword `random_state`.
    - If `random_state` is provided as a positional argument, the decorator passes
      `random_state` to the function's `rng` argument by position. If `end_version`
      is specified, the decorator will emit a `FutureWarning` about the changing
      interpretation of the argument.
    - If `rng` is provided as a keyword argument, the decorator validates `rng` using
      `numpy.random.default_rng` before passing it to the function.
    - If `end_version` is specified and neither `random_state` nor `rng` is provided
      by the user, the decorator checks whether `np.random.seed` has been used to set
      the global seed. If so, it emits a `FutureWarning`, noting that usage of
      `numpy.random.seed` will eventually have no effect. Either way, the decorator
      calls the function without explicitly passing the `rng` argument.

    If `end_version` is specified, a user must pass `rng` as a keyword to avoid warnings.

    After the deprecation period, the decorator can be removed, and the function
    can simply validate the `rng` argument by calling `np.random.default_rng(rng)`.

    * A `FutureWarning` is emitted when the PRNG argument is used by
      position. It indicates that the "Hinsen principle" (same
      code yielding different results in two versions of the software)
      will be violated, unless positional use is deprecated. Specifically:

      - If `None` is passed by position and `np.random.seed` has been used,
        the function will change from being seeded to being unseeded.
      - If an integer is passed by position, the random stream will change.
      - If `np.random` or an instance of `RandomState` is passed by position,
        an error will be raised.

      We suggest that projects consider deprecating positional use of
      `random_state`/`rng` (i.e., change their function signatures to
      ``def my_func(..., *, rng=None)``); that might not make sense
      for all projects, so this SPEC does not make that
      recommendation, neither does this decorator enforce it.

    Parameters
    ----------
    old_name : str
        The old name of the PRNG argument (e.g. `seed` or `random_state`).
    position_num : int, optional
        The (0-indexed) position of the old PRNG argument (if accepted by position).
        Maintainers are welcome to eliminate this argument and use, for example,
        `inspect`, if preferred.
    end_version : str, optional
        The full version number of the library when the behavior described in
        `DeprecationWarning`s and `FutureWarning`s will take effect. If left
        unspecified, no warnings will be emitted by the decorator.

    """
    NEW_NAME = "rng"

    cmn_msg = (
        "To silence this warning and ensure consistent behavior in SciPy "
        f"{end_version}, control the RNG using argument `{NEW_NAME}`. Arguments passed "
        f"to keyword `{NEW_NAME}` will be validated by `np.random.default_rng`, so the "
        "behavior corresponding with a given value may change compared to use of "
        f"`{old_name}`. For example, "
        "1) `None` will result in unpredictable random numbers, "
        "2) an integer will result in a different stream of random numbers, (with the "
        "same distribution), and "
        "3) `np.random` or `RandomState` instances will result in an error. "
        "See the documentation of `default_rng` for more information."
    )

    def decorator(fun):
        @functools.wraps(fun)
        def wrapper(*args, **kwargs):
            # Determine how PRNG was passed
            as_old_kwarg = old_name in kwargs
            as_new_kwarg = NEW_NAME in kwargs
            as_pos_arg = position_num is not None and len(args) >= position_num + 1
            emit_warning = end_version is not None

            # Can only specify PRNG one of the three ways
            if int(as_old_kwarg) + int(as_new_kwarg) + int(as_pos_arg) > 1:
                message = (
                    f"{fun.__name__}() got multiple values for "
                    f"argument now known as `{NEW_NAME}`"
                )
                raise TypeError(message)

            # Check whether global random state has been set
            global_seed_set = np.random.mtrand._rand._bit_generator._seed_seq is None

            if as_old_kwarg:  # warn about deprecated use of old kwarg
                kwargs[NEW_NAME] = kwargs.pop(old_name)
                if emit_warning:
                    message = (
                        f"Use of keyword argument `{old_name}` is "
                        f"deprecated and replaced by `{NEW_NAME}`.  "
                        f"Support for `{old_name}` will be removed "
                        f"in SciPy {end_version}."
                    ) + cmn_msg
                    warnings.warn(message, DeprecationWarning, stacklevel=2)

            elif as_pos_arg:
                # Warn about changing meaning of positional arg

                # Note that this decorator does not deprecate positional use of the
                # argument; it only warns that the behavior will change in the future.
                # Simultaneously transitioning to keyword-only use is another option.

                arg = args[position_num]
                # If the argument is None and the global seed wasn't set, or if the
                # argument is one of a few new classes, the user will not notice change
                # in behavior.
                ok_classes = (
                    np.random.Generator,
                    np.random.SeedSequence,
                    np.random.BitGenerator,
                )
                if (arg is None and not global_seed_set) or isinstance(arg, ok_classes):
                    pass
                elif emit_warning:
                    message = (
                        f"Positional use of `{NEW_NAME}` (formerly known as "
                        f"`{old_name}`) is still allowed, but the behavior is "
                        "changing: the argument will be normalized using "
                        f"`np.random.default_rng` beginning in SciPy {end_version}, "
                        "and the resulting `Generator` will be used to generate "
                        "random numbers."
                    ) + cmn_msg
                    warnings.warn(message, FutureWarning, stacklevel=2)

            elif as_new_kwarg:  # no warnings; this is the preferred use
                # After the removal of the decorator, normalization with
                # np.random.default_rng will be done inside the decorated function
                kwargs[NEW_NAME] = np.random.default_rng(kwargs[NEW_NAME])

            elif global_seed_set and emit_warning:
                # Emit FutureWarning if `np.random.seed` was used and no PRNG was passed
                message = (
                    "The NumPy global RNG was seeded by calling "
                    f"`np.random.seed`. Beginning in {end_version}, this "
                    "function will no longer use the global RNG."
                ) + cmn_msg
                warnings.warn(message, FutureWarning, stacklevel=2)

            return fun(*args, **kwargs)

        return wrapper

    return decorator


# Example usage of _prepare_rng decorator.

# Suppose a library uses a custom random state normalisation function, such as
from scipy._lib._util import check_random_state

# https://github.com/scipy/scipy/blob/94532e74b902b569bfad504866cb53720c5f4f31/scipy/_lib/_util.py#L253


# Suppose a function `library_function` is defined as:
def library_function(arg1, random_state=None, arg2=0):
    random_state = check_random_state(random_state)
    return random_state.random() * arg1 + arg2


# We apply the decorator and change the function signature at the same time.
# The use of `random_state` throughout the function may be replaced with `rng`,
# or the variable may be defined as `random_state = rng`.
@_transition_to_rng("random_state", position_num=1)
def library_function(arg1, rng=None, arg2=0):
    rng = check_random_state(rng)
    return rng.random() * arg1 + arg2


# After `rng` is available in all releases within the support window suggested by
# SPEC 0, we pass the `end_version` param to the decorator to emit warnings.
@_transition_to_rng("random_state", position_num=1, end_version="1.17.0")
def library_function(arg1, rng=None, arg2=0):
    rng = check_random_state(rng)
    return rng.random() * arg1 + arg2


# At the end of the deprecation period, remove the decorator, and normalize
# `rng` with` np.random.default_rng`.
def library_function(arg1, rng=None, arg2=0):
    rng = np.random.default_rng(rng)
    return rng.random() * arg1 + arg2

Notes#


  1. Note that numpy.random.default_rng does not accept instances of RandomState, so use of RandomState to control the seed is effectively deprecated, too. That said, neither np.random.seed nor np.random.RandomState themselves are deprecated, so they may still be used in some contexts (e.g. by developers for generating unit test data). ↩︎

  2. The Hinsen principle states, loosely, that code should, whether executed now or in the future, return the same result, or raise an error. ↩︎

On this page