Polarization and coherence for vectorial electromagnetic waves and the ray picture of light propagation

We develop a complete geometrical picture of paraxial light propagation including coherence phenomena. This approach applies both for scalar and vectorial waves via the introduction of a suitable Wigner function and can be formulated in terms of an inverted Huygens principle. Coherence is included by allowing the geometrical rays to transport generalized Stokes parameters. The degree of coherence for scalar and vectorial light can be expressed as simple functions of the corresponding Wigner function


INTRODUCTION
In this work we elaborate a complete geometrical formulation of paraxial optics for vectorial electromagnetic waves.By geometrical we mean that this propagation is fully described in terms of the light rays of standard geometrical optics.By complete we mean that includes without approximation all coherence phenomena.This is equivalent to define a Wigner function for vectorial waves, which in turn is equivalent to the prescription of a set of Stokes parameters to geometrical rays [1]- [4].This task is interesting since it merges in a single formalism geometrical and wave optics.This may provide physical insight and simple formulas for problems involving partially coherent and partially polarized light.
The definition of a suitable Wigner function and ray Stokes parameters for vectorial waves is accomplished in Section 3 in parallel to the scalar case displayed in Section 2. Both include a formulation of propagation that can be expressed by an inverted version of the Huygens principle involving rays instead of waves.We will apply this formulation to the degree of coherence for vectorial waves.This is a fundamental nontrivial problem only recently addressed in depth.Contrary to the scalar case, for vectorial waves there is no unique degree of coherence, and currently several definitions coexist [5]- [23].After recalling the main proposals in Section 4 we examine their relationship with the geometrical picture in Section 5.

COHERENCE AND WIGNER FUNCTION FOR SCALAR WAVES
We first recall the geometrical Wigner formulation of paraxial optics for scalar waves.Although standard geometrical optics excludes coherent phenomena, if we replace ray intensity by Wigner function we include once for all coherence effects.In particular we can express the degree of coherence as a functional of the Wigner function.The price to be paid is that the Wigner function can take negative values so it cannot represent always light intensity.

Definition and properties
We will always consider the spatial-frequency domain so that the Wigner function is defined in terms of the cross-spectral density function as [24]-[28] where the angle brackets represent ensemble average, r and r are Cartesian coordinates in a plane orthogonal to the main propagation direction along axis z, p are the angular variables representing the local direction of propagation, and k is the wavenumber in vacuum.
The connection between Wigner function and geometrical optics stems from the fact that r and p represent the parameters of a light ray, so that W assigns a number to each ray.The main properties of this formalism are: (a) The Wigner function provides complete information about second-order phenomena, including diffraction and interference, since its definition can be inverted to express the crossspectral density function in terms of the Wigner function where R = (r 1 + r 2 )/2 is the midpoint between r 1,2 .
(b) In particular, the light intensity (irradiance) at a given point can be obtained by integrating the angular variables This is to say that the intensity at a given point r is the sum of the values of the Wigner function for all the rays passing through r with different p.We will refer to this sum as an incoherent superposition since the ray contributions W(r, p) are added independently without cross terms between rays.
(c) The Wigner function cannot represent always light intensity since it can take negative as well as positive values [24]- [29].We may say that there are bright rays with positive W and dark rays with negative W [30]- [32].Dark rays are crucial to the completeness of the theory since they contain the coherence in two-beam interferometry as we shall see below.
(d) Finally a crucial property for the geometrical interpretation of the Wigner function is that it is constant along paraxial rays where (r, p) and (r , p ) are the ray parameters at the input (z = 0) and output (z > 0) planes of a paraxial optical system.

Inverted Huygens principle
These properties can be summarized in a principle analogous to the Huygens principle but with inverted terms replacing waves by rays and coherent by incoherent superpositions.We can enunciate this principle in three steps [33]: (i) Each point acts as a secondary source of a continuous distribution of rays with parameters r, p, W(r, p).We stress that this is a continuous distribution of rays instead of the more familiar single ray at each point normal to a wavefront.
(ii) The evolution of optical properties is given by the incoherent superposition of the optical properties of rays, as illustrated by the example of light intensity in point (b) above.We stress that this incoherence is a key feature of the theory independent of the actual state of coherence of the light.Coherence is expressed in a different way as we shall see later on.
(iii) The effect of spatial-local inhomogeneous filters (i.e., transparencies) altering phase and amplitude is described in the wave picture by the product of the amplitude of the input wave with a transmission coefficient t(r), i. e., U(r) → t(r)U(r).In the geometrical picture this effect is described by the convolution of angular variables p of the input Wigner function W U with the Wigner function W t of the transmission coefficient where

Coherence
The degree of coherence for scalar waves can be expressed in terms of the Wigner function from different perspectives.

Coherence as phase-difference average
From the inversion formula in Eq. ( 2) expressing the crossspectral density function in terms of the Wigner function, we have that the degree of coherence at two points r 1,2 is the average of the phase difference duced by an ensemble of plane waves with wavevectors proportional to p with where the weight of each plane wave p is W(R, p), with R = (r 1 + r 2 )/2, and I 1,2 are the light intensities at points r 1,2 [33].This agrees well with common intuition since rays are usually understood as local plane waves, and partial coherence is usually understood as the result of phase fluctuations.

Overall degree of coherence
A global or overall assessment of the total coherence conveyed by a field state can be provided by the formula [14], [34]-[37] This can be expressed in terms of the Wigner function as

Coherence in a Young interferometer
Next we examine the ray picture of coherence in action, for example in a Young interferometer with two apertures of vanishing widths at points r 1,2 in the plane z = 0 (see Figure 1).It can be seen that the Wigner function after the apertures implies the existence of just three secondary sources of rays at z = 0 [23, 33, 38].Two sources are located at the apertures with W(r 1,2 , p) = W(r 1,2 ) ∝ I 1,2 .The Wigner function at these points does not depend on p so that the emission is isotropic and all rays at each aperture carry the same weight W(r 1,2 ).Moreover, since in this case the Wigner function is positive W(r 1,2 ) ∝ I 1,2 these sources emit bright rays exclusively.The third source is located at the midpoint R between the apertures with W(R, p) proportional to the degree of coherence µ between the fields at the apertures where δ is a constant phase.Therefore W(R, p) takes positive as well as negative values when p varies, so that this source emits bright and dark rays with different weights depending on p.
Since there are three sources at z = 0 each observation point r in planes z > 0 is reached exclusively by three rays, one from each source, as illustrated in Figure 1.Their incoherent superposition gives the intensity distribution The contribution W(R, p) from the midpoint is actually the interference term, since it is the only one that depends on the observation point through the propagation direction specified by followed by the ray from the source at R to the observation point at r.

This implies a close relation between the degree of coherence
µ and the Wigner function in the midpoint W(R, p).More specifically [33,38]: (1) From Eq. ( 11) µ is proportional to the maximum modulus of the Wigner function at the midpoint when p is varied (2) The degree of coherence µ is proportional to the negativity of the Wigner function measured as the distance of W to its modulus (3) The degree of coherence is proportional to the amount of Wigner function at the midpoint measured as where the integration extends just to the region R between apertures.
From this interferometric point of view, coherence is incompatible with standard geometrical optics.In other words, coherence in interferometry is the distance from the light state after the apertures to standard geometrical optics represented by the set of situations with positive semidefinite W.

WIGNER FUNCTION FOR VECTORIAL WAVES
In this section we provide a Wigner function that includes the polarization variables allowing us to generalize the results of the preceding section to vectorial waves.

Definition and properties
The Wigner function we are going to use is the translation to optics of a similar Wigner function introduced in mechanics to describe a closely related problem, the Wigner function of a particle with spin one half [39]- [41].This is equivalent to a transversal wave since in both cases we have a field with two components.Such a Wigner function can be expressed in optics as [1] W(r, p, Ω) = S(r, p) where Ω is a four-dimensional real vector that represents the Poincaré sphere, and S(r, p) is a four-dimensional real vector with components S 0 (r, p) = W x,x (r, p) + W y,y (r, p), where W ,m are the elements of the Wigner matrix This Wigner function depends on the spherical coordinates Ω representing the variables specifying the polarization state.The four real quantities S(r, p) in Eq. ( 19) are ray properties because of their joint dependence on (r, p), so we may refer to them as ray Stokes parameters in contrast to the standard point Stokes parameters s(r) where that do not depend on p and express the light intensity and polarization state at point r without reference to propagation direction.
The properties of this Wigner function are fully equivalent to the scalar case [1]- [3]: The Wigner function provides complete information about second-order phenomena, since its definition can be inverted

07030-3
Journal of the European Optical Society -Rapid Publications 2, 07030 ( A. Luis to express the cross-spectral density tensor in terms of the ray Stokes parameters where σ (j) are the Pauli matrices, σ (0) being the identity, and R = (r 1 + r 2 )/2.
(b) In particular, at each spatial point r the intensity and the polarization state, represented by the point Stokes parameters s 0 (r) and s 1,2,3 (r), respectively, are obtained from the ray Stokes parameters by integrating the angular variables This is equivalent to say that s(r) are given by the incoherent superposition of the ray Stokes parameters S(r, p) associated to all rays passing through the same point r with different propagation directions p.
(c) The Wigner matrix may have negative eigenvalues so that the ray Stokes parameters may violate the ray analog of the relation always satisfied by the point Stokes parameters s 0 ≥ In accordance with the scalar case the rays satisfying S 0 ≥ S 2 1 + S 2 2 + S 2 3 ≥ 0 may be called bright rays, while the other ones, i. e., S 0 < S 2 1 + S 2 2 + S 2 3 or S 0 < 0 may be called dark rays.
(d) Finally a crucial property for the geometrical interpretation of the Wigner function is being constant along paraxial rays in free space.The effect of polarization changing devices is described in some detail below.

Inverted Huygens principle for vectorial light
As in the scalar case, the above properties can be summarized in a principle analogous to the Huygens principle [42]: (i) Each point acts as a secondary source of a continuous distribution of rays with parameters r, p, S(r, p).
(ii) These rays are superimposed incoherently, as illustrated by the example of the point Stokes parameters in point (b) above.
(iii) Spatial-local inhomogeneous filters altering phase and amplitude (i.e., transparencies) are described in the wave picture by the product with transmission coefficients in the form where t ,j (r) are the corresponding transmission coefficients.
In the geometrical picture these devices are described by expressing the output ray Stokes parameters S as angular convolution of the input ray Stokes parameters S with the action of the Wigner function of the Mueller matrix [3] where where the matrix t(r) has the matrix elements t j, (r) in Eq. ( 25).For homogeneous devices described by a Mueller matrix M we get the natural transformation

DEGREES OF COHERENCE FOR VECTORIAL WAVES
The proper definition of the degree of coherence for vectorial waves is a nontrivial problem.The increase of the number of degrees of freedom implies that an scalar quantity (the crossspectral density function) is replaced by a matrix (the crossspectral density matrix) so naturally there is no straightforward translation of µ from the scalar to the vectorial case.From the same reasons, several definitions can coexist since they will focus on different features of coherence with application to different situations or satisfy different symmetries [20].Here we can recall the main approaches to the problem.

Intensity fringes in a Young interferometer
A first approach to the degree of coherence at two points r 1,2 is derived directly in terms of the visibility of interference fringes in a Young interferometer with apertures at r 1,2 where only the intensity is measured in the observation plane, leading to [5]-[10] where and The main drawback of this definition is that it depends on the polarization state at r 1,2 .For example, for orthogonal polarizations E(r 1 ) • E * (r 2 ) = 0 we get µ 1 = 0, even if there is A. Luis perfect correlation between the fields at the apertures.More specifically, we say that µ 1 is not invariant under U(2) × U(2) transformations, i. e., under the action of unitary 2 × 2 matrices applied to the fields at the apertures.This corresponds in practice to place transparent phase plates at the apertures.
Two similar strategies have been proposed to solve this difficulty.On the one hand, we can consider the maximum of µ 1 when arbitrary phase plates are placed in the apertures leading to [9,43] where λ ± ≥ 0 are the singular values of Γ 1,2 , i. e.
On the other hand, we can consider the maximum of µ 1 when arbitrary phase plates followed by an arbitrarily oriented polarizer are placed in the apertures, leading to [18]-[20] where msv is the maximum singular value of the corresponding matrix.

Stokes fringes in a Young interferometer
Another approach that also focus on the Young interferometer is based on the visibility of the four systems of fringes obtained by measuring the four point Stokes parameters at the observation plane, leading to [11]-[17] This definition is invariant under U(2) × U(2) transformations.

Overall degree of coherence
An overall degree of coherence for vectorial waves µ G which parallels the scalar case Eq. ( 9) has been introduced as a weighted average of the local degree of coherence µ 2 [17] µ

Fringes in arbritrary interferometers
Finally, some other approaches consider all components on an equal footing so that de degree of coherence is a function of the whole Hermitian 4 × 4 correlation matrix Γ [44, 45] instead of defining it in terms of just the 2 × 2 complex matrix Γ 1,2 .This definition suits to the idea that arbitrary interferometers mix the four field components without taking into account to which wave they belong, so that the fringe visibility depends on the sixteen matrix elements E j (r m )E * (r n ) for j, = x, y and m, n = 1, 2.
In this regard we can define the degree of coherence as the distance between Γ and the 4 × 4 identity matrix I 4 representing fully incoherent and fully unpolarized light in the form [22,23] µ This definition is invariant under the action of 4 × 4 unitary matrices, that includes the U(2) × U(2) invariance as a particular case.
This definition is equivalent to the degree of polarization of the four-dimensional wave E = E x (r 1 ), E y (r 1 ), E x (r 2 ), E y (r 2 ) [14,46,47].This is interesting since in the scalar case the maximum degree of coherence that can be obtained by combining two waves E 1,2 is the degree of polarization of the two-dimensional wave E = (E 1 , E 2 ).
In agreement with the idea that polarization is a manifestation of coherence we have that µ 3 combines the degree of polarization of the individual waves P 1,2 and µ 2 in the form [22] µ 2 3 = where I 1,2 = I(r 1,2 ) are the corresponding intensities.
Following the same spirit µ 3 is closely related to the overall degree of coherence µ G in Eq. (38) after two apertures located at points r 1,2 , since for the field after the apertures we have [22] This is because the "diagonal" factors µ 2 2 (r, r) in the integration Eq. ( 38) contain the degree of polarization at r.
Concerning interferometric visibility, µ 3 provides upper bounds to the visibility V of arbitrary two-beam interferometers in the form [22] 3 2 where I 1 + I 2 ≥ I is the intensity of the two interfering beams extracted from the original fields, and [48] where λ max,min are the maximum and minimum eigenvalues of Γ.
A similar approach has been previously considered in terms of the normalized 4 × 4 correlation matrix L with matrix elements j (r m ) * (r n ) for j, = x, y and m, n = 1, 2, where [21] j (r ) = E j (r )

FIG. 1
FIG.1In a Young interferometer each observation point r at plane z > 0 is reached by just three rays arising from three secondary sources at z = 0 located at the apertures r 1,2 , and at the midpoint R, representing p the propagation direction of the ray reaching r from R.