Bernstein Concentration Inequalities
1 Bernstein concentration inequality
The standard Bernstein inequality for real random variables demonstrates that a sum of independent bounded random variables exhibits normal concentration near its mean on a scale determined by the variance of the sum.
Theorem 1 (Bernstein inequality)
Let be independent and centered real random variables satisfying a.s. for . Let and assume . Then for any ,
Sums of independent random matrices exhibit the same type of behavior, where the normal concentration depends on a matrix generalization of the variance and the tails are controlled by a uniform bound on the spectral norm of each summand [3].
Theorem 2 (Matrix Bernstein concentration)
Let be independent random real symmetric matrices with dimension . Assume each random matrix satisfies
where is the matrix spectral norm, and . Then for any ,
(1) |
A remarkable feature is that the above bound depends on the dimension of the matrices. On the other hand, in [2, Chapter 7], Tropp points out that can be replaced by the so called “intrinsic dimension” of a matrix.
2 Operator Bernstein concentration
For a positive-semidefinite Hilbert-Schmidt operator , we can also define the intrinsic dimension of it, as (if is also trace class). The following result is from [1].
Theorem 3 (Operator Bernstein concentration)
Let be independent random self-adjoint Hilbert–Schmidt operators, where acts on a separable Hilbert space . Assume each random operator satisfies
where is the operator norm, and . Then for any ,
(2) |
In statistical learning, random self-adjoint operators frequently emerge in the study of approximation errors. Assume be a random data in . Rigorously speaking, let be a probability space and be measurable, such that the distribution on is . Let be a symmetric positive definite kernel, satisfying . Define the kernel integral operator on be
(3) |
Then is a Hilbert–Schmidt operator with the Hilbert–Schmidt norm satisfying
where we have used .
Let be the Reproducing Kernel Hilbert Space (RKHS) with kernel . If is a locally compact Hausdorff space, a -finite Radon measure, and , then is separable. In fact, the collection of countable eigenfunctions of corresponding to positive eigenvalues forms a complete orthogonal basis.
Suppose we have i.i.d. samples of , i.e., are i.i.d. random variables with the same distribution as . Define the empirical operator on as
(4) |
Now is a random self-adjoint operator, and it will approximate as . Notice that is also a Hilbert-Schmidt operator on .
Let . Then , and
where the Bochner integral converges in the norm. Therefore, we can apply the operator Bernstein concentration inequality to deal with the sum of the i.i.d random operators with zero means.
References
- [1] (2017) On some extensions of Bernstein’s inequality for self-adjoint operators. Statistics and Probability Letters 127, pp. 111–119. External Links: Link Cited by: §2.
- [2] (2015) An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning 8 (1-2), pp. 1–230. External Links: Link Cited by: §1.
- [3] (2012) User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics 12 (4), pp. 389–434. External Links: Link Cited by: §1.