Cosmic Microwave Background

In this second report of the course Data Science Laboratory, I’m summarizing my work on the data preparation and visualization of real CMB measurement made by the Planck space observatory between 2009 and 2013. I’ll also detail the theoretical background and my work on the recreation of the angular power spectrum of the CMB, using the Planck telescope’s data.

I. Introduction

In this course I’m studying the Cosmic Microwave Background radiation this year. In my project I’m ought to analyze the simulation aspect of it, as well as I’m processing real observational data of the CMB, made by the Planck space observatory. My two main goals are to replicate the CMB temperature map using an arbitrary angular power spectrum, and to extract the power spectrum from real observational data. In this report I would like to summarize my weekly progress in the latter topic, the power spectrum reconstruction. Starting off with a chronological summary, we can place the subject better in time, and understand why we should use the Planck observations.

The existence of the cosmic microwave background radiation was first confidently predicted in 1948 by Ralph Almer, who was working with Robert Herman and George Gamow that time[1]. Non-thermal radiation was considered already half a century earlier, also even the CMB itself was detected (but not correctly interpreted) in the interstellar medium as a component of molecular lines from the lowest state of CH and CN molecules, by Andrew McKellar in 1941[2]. The phenomenon remained a fairly unpopular topic among astronomers until the early-mid ’60s, when numerous individual observations related to the CMB came to light in a very short period of time. Most notably the first real observation of the CMB radiation itself by Arno Penzias and Robert W. Wilkinson in 1965. Interest was only heightened by the fact that the measurement results were consistent with previous theoretical assumptions[3].

Until the early ’70s a scientific consensus was slowly established to declare that the CMB is the remnant of the Big Bang. It was mostly backed by the observations, that the frequency range of this radiation corresponds exactly to the well-known black body radiation[4]. In 1974 NASA issued a tender (or a so-called Announcement of Opportunity) for spacecraft missions to study the CMB radiation from outside Earth’s atmosphere. Since then, in these past 40 years, there were 3 notable spacecrafts of NASA (Russia also had one called ), which created complete sky maps of the CMB radiation. Each of them was much more and more detailed, than their predecessor.

The last observations from space was done by the Planck space observatory between 2009 and 2013, succeeding the mission of the WMAP telescope and currently the most detailed datasets of the CMB originates from this telescope. Its mission was followed by 3 major data releases in 2013, 2015 and 2018. The main goal was simple: measuring the intensity and polarization of the CMB anisotropy within the entire sky with unprecedented accuracy. For this particular job, the telescope was mounted with two distinct instruments to detect the CMB photons. One for the lower end of the frequency range and one for the higher frequencies. These two instruments mapped the whole sky on 9 different observational frequencies in the GHz domain between 100 and 900 Ghzs. Since these measurements are very noisy, and the CMB itself is completely obscured by foreground radiation in the raw data, four different filtering pipelines (COMMANDER, NILC, SEVEM, and SMICA) were utilized to separate the CMB from the foreground effects to finally recreate the image of the CMB in the form as we all know it.

Fig. 1. The CMB temperature anisotropy map on Mollweide projection, reconstructed by using the aggregated ("Full") dataset of the Planck space observatory from its DR3 in 2018. The .fits table used to recreate this map is the output of Planck's Commander pipeline.

Source of data : NASA/IPAC Infrared Science Archive -- Planck Public Data Release 3 Maps

II. Dataset

The data gathered by the Planck telescope are handled by the IRSA (Infrared Science Archive) project, which aims to - as they say - “curate the science products of NASA’s infrared and submillimeter missions”. All the data from the three data releases of the Planck mission can be accessed freely on their web-archive by the public.

Datasets are categorized and grouped by a numerous different aspects, like the type of the data tables (CMB anisotropy maps, maps of astrophysical objects and foreground masks, primary frequency maps, etc.), observed frequencies of CMB photons, processing pipelines and much more. All of these entries are stored in the regular .fits format, and are storing multiple tables per entries. These tables are mostly the $I$ intensity map with the corresponding polarization maps of the $Q$ and $U$ components, the $T$ confidence masks, or the inpainted versions of all of the above.

All of the .fits tables are encoded in the HEALPix projection. HEALPix (Hierarchical Equal Area isoLatitude Pixelisation) is an old and widely used standard, not just in astronomy. It refers to the standard method for the effective pixelisation of the 2-sphere to help projecting its surface to various other geographical projections. As suggested from its name, the method divides the surface of the sphere into smaller “pixels”, where each pixel covers the same surface area as every other pixel[5]. HEALPix utilizes some “conventions” - as they call it - that we’ll need to understand in order to understand why this method is useful in this case and what exactly the “angular power spectrum” means in the context of the CMB.

III. Theoretical background

When discussing the framework of the HEALPix standard, we first consider a bandlimited function $f \left( \theta, \varphi \right)$ (bandlimited in the sense of signal procession) with the frequency limit $l_{\mathrm{max}}$ and expand it using spherical harmonics. This is an old method in the harmonic analysis of the sphere and we only build the HEALPix algorithm on top of this well-known fact. The spherical harmonic expansion of the function above is

\begin{equation} f \left( \theta, \varphi \right) = \sum_{\ell\,=\,0}^{\infty} \sum_{m\,=\,-\ell}^{\ell} a_{\ell m} Y_{\ell m} \left( \theta, \varphi \right), \end{equation} where $\theta$ and $\varphi$ denote the spherical coordinates with polar angle $\theta \in \left[ 0, \pi \right]$ and azimuth $\varphi \in \left[0, 2 \pi \right)$. The term $Y_{\ell m} \left( \theta, \varphi \right)$ denotes the spherical harmonics

\begin{equation} Y_{\ell m} \left( \theta, \varphi \right) = \sqrt{ \frac{2 \ell + 1}{4 \pi} \frac{\left( \ell - m \right)!}{\left( \ell + m \right)!} } P_{\ell m} \left( \cos \left( \theta \right) \right) e^{i m \varphi}. \end{equation}

Pixelising the function $f \left( \theta, \varphi \right)$ correspond to sampling it at $N_{\mathrm{pix}}$ locations $\left( \theta_{p}, \varphi_{p} \right)$[6], where $p \in \left[ 0, N_{pix} - 1 \right]$. The sample function $f_{p} \left( \theta_{p}, \varphi_{p} \right)$ can be then used for an estimator of the $a_{\ell m}$ coefficients as

\begin{equation} a_{\ell m} = \frac{4 \pi}{N_{\mathrm{pix}}} \sum_{p\,=\,0}^{N_{\mathrm{pix}} - 1} Y_{\ell m}^{\ast} \left( \theta_{p}, \varphi_{p} \right) f_{p} \left( \theta_{p}, \varphi_{p} \right), \end{equation} where $\ast$ in the upper index denotes the complex conjugate. Using the $a_{\ell m}$ coefficients the angular power spectrum can be approximated as

\begin{equation} C_{\ell} = \frac{1}{2 \ell + 1} \sum_{m\,=\,-\ell}^{\ell} \left| a_{\ell m} \right|^{2}. \end{equation}

In my first report I already detailed, why the CMB temperature anisotropy can be interpreted as a radiation originating from the surface of the sphere around the observer. The anisotropy itself will be observed as a function of $\left( \theta, \varphi \right)$ polar angles on this distant spherical surface. Since the first equation above describing exactly a function $f \left( \theta, \varphi \right)$ on the 2D surface of a sphere, it is a convenient choice to use it in the case of the CMB too. In this framework we can simply replace the arbitrary $f \left( \theta, \varphi \right)$ function with the $\Delta T \left( \theta, \varphi \right)$ temperature anisotropy and calculate its power spectrum as it was described above.

IV. Processing the data and results

Our measurements of the macroscopic world are always discrete, mostly because of our everlasting technological limitations. However the development of quantum physics since the early 20th century showed us that physical quantities are indeed discrete, a lot of them still can be threated as continuous on the macroscopic scale. Nonetheless, our measurements will be still discrete, which arises from the fact that we simply can’t sample continuous functions with arbitrary accuracy, because of the mentioned limitations in our instruments of observations.

The results is that we have to work with “discrete” datasets. In the case of the CMB, discretization manifests not only in one, but two separate phases during the observation. The first one occurs when we’re trying to resolve the image of the sky (the incoming photons) in the highest possible resolution. The second obstacle is resolving the frequency itself of the incoming photons. The first causes the observed image to be pixelated, while because of the second one we get only a histogram of photon frequencies as a result of the measurement instead of a true, continuous curve. Also, our data will have an overall bandlimit too, since the range in which the measuring instrument operates is also limited. Fortunately these factors create the perfect condition to use the HEALPix data format, because of their conventions, referring to bandlimited and discretized functions.

My main tool for processing the data of the Planck space telescope was Python. Besides being one of my main languages, building the procession pipeline to explore HEALPix data stored in .fits tables is also rather straightforward in Python, because the original HEALPix library is implemented in a package under the name healpy. It contains all the necessary subroutines to handle and convert HEALPix data into such a format, which can be then easily used for data exploration and visualization. These factors were satisfactory enough for me to use the Python language to carry out the project work. Using the mathematical conventions, we only need two steps to reconstruct the power spectrum from an available CMB dataset. After reading in the $\Delta T \left( \theta, \varphi \right)$ temperature data using the HEALPix library, we have to determine the $a_{\ell m}$ coefficients and then we can build the power spectrum using them immediately. In its core nature, the original HEALPix library was built on the Fortran90 standard, which included a function called anastat[7]. This subroutine calculates the $a_{\ell m}$ coefficients, as well as the $C_{\ell}$ power spectrum from an input $f_{p} \left( \theta_{p}, \varphi_{p} \right)$ HEALPix vector.

Fig. 2. The CMB temperature anisotropy map on orthographic projection, reconstructed using the same dataset as the image in Fig. 1. This is a demonstration of HEALPix being able to easily convert the same input dataset to any implemented geographical projection.

Source of data : NASA/IPAC Infrared Science Archive -- Planck Public Data Release 3 Maps

Independently from this step, healpy even offers us a routine, which can be used to convert the input dataset by projecting it on any geographical projections that are implemented in the library. This routine creates a rectangular matrix, which then can be simply visualized using simply matplotlib or other graphic packages in Python. There were however a lot of problems, visualization side. My original idea was to generate a map in equirectangular projection using the healpy.projector.CartesianProj() class, then wrap it around a map in Mollweide or other type of projections with matplotlib, using the matplotlib.pyplot.pcolormesh() graphical subroutine. However this method requires a pre-defined coordinate system, which the input equirectangular map is “wrapped around”. This coordinate system is different for every projection, which makes this method inconvenient to use.

My solution was to create rectangular matrices with the healpy.projector methods, already in the correct projections. This way all non-rectangular maps will be encompassed between the “borders” of the matrix, where elements outside the projection will be marked as -inf values. I created a custom colormap to make the images created with it as similar as possible to the CMB maps of the Planck space telescope. All the color values for a seamless gradient was generated from an input file. The colormap was also configured to correctly display these special, -inf values, as well as the actual CMB map itself. Using these I was able to easily create the images seen on Fig. 1. and Fig. 2. using healpy and matplotlib exclusively.

In the final part of the latest batch of my work, I calculated and plotted the angular power spectrum, using the method detailed above. After loading the original .fits file used for the visualizations above, I’ve used the already mentioned healpy.anafast() routine to obtain the $a_{\ell m}$ and $C_{\ell}$ values. However the $C_{\ell}$ quantity is what we call as “angular power spectrum”, it is traditional to plot another quantity against the multipoles $\ell$, which is only a function of $C_{\ell}$. This quantity is

\begin{equation} D_{\ell} = \frac{\ell \left( \ell + 1 \right)}{2 \pi} C_{\ell}, \end{equation} which isn’t really have any name to it. It is also a much better alternative aesthetically too. The final plot generated from the original CMB dataset can be seen on Fig. 3.

Fig. 3. The reconstructed angular power spectrum of the Planck CMB measurements from 2018. The $\Lambda$-CDM is the theoretical spectrum, generated with the CAMB software with true cosmological parameters. The difference between the two curves on higher multipoles are caused by instrumental noise, which wasn't yet substracted from the Commander pipeline's dataset that I used.

Source of data : NASA/IPAC Infrared Science Archive -- Planck Public Data Release 3 Maps

On the figure I compared the obtained $D_{\ell}$ curve to the theoretical one. The latter was created with the CAMB software (Code for Anisotropies in the Microwave Background), which can be accessed from the website of NASA’s Goddard Space Flight Center, which insitution maintains the (LAMBA) Legacy Archive for Microwave Background Data Analysis[8]. The difference between the theoretical and observed curve on the higher multipoles are due to the instrumental noise, which isn’t filtered in the Commander pipeline.

V. Further work

My very next goal will be to tidy up all my codes I’ve written until now. Since the pipeline is ready to simulate CMB maps, also to process observational data, I can begin to work on other goals, which I set for myself as optional/extra tasks. The first one is to try to determine cosmological parameters from the CMB maps, since this is one of the main point of its observation from the beginning. The second optional task would be to create CMB maps on a Mollweide projection without distortions, for which the HEALPix standard proves as a really useful tool.

VI. Notes

All my work can be accessed on my GitHub repository of the course. Final project codes are situated in the project_codes folder, while my presentation slides and presentation scripts can be found under the docs folder.

References

Alpher, R. A., Herman, R. C. (1948). On the relative abundance of the elements. Physical Review, 74(12), 1737. https://journals.aps.org/pr/abstract/10.1103/PhysRev.74.1737
McKellar, A. (1941). Molecular lines from the lowest states of diatomic molecules composed of atoms probably present in interstellar space. Publications of the Dominion Astrophysical Observatory Victoria, 7. https://ui.adsabs.harvard.edu/abs/1941PDAO....7..251M
Penzias, A. A., Wilson, R. W. (1965). A measurement of excess antenna temperature at 4080 Mc/s. The Astrophysical Journal, 142, 419-421. https://ui.adsabs.harvard.edu/abs/1965ApJ...142..419P
Peebles, P. J. E., Schramm, D. N., Turner, E. L., & Kron, R. G. (1991). The case for the relativistic hot big bang cosmology. Nature, 352(6338), 769-776. https://www.nature.com/articles/352769a0
Hivon, E. (2010). The HEALPix Primer. https://healpix.jpl.nasa.gov/html/intro.htm
Hivon, E. (2010). HEALPix conventions. https://healpix.jpl.nasa.gov/html/intronode6.htm
Hivon, E. et al. (2020). HEALPixFortran Facility User Guidelines. https://healpix.sourceforge.io/pdf/facilities.pdf
Addison, G. E. et al. (2019). Legacy Archive for Microwave Background Data Analysis (LAMBDA): An Overview. arXiv preprint arXiv:1905.08667. https://ui.adsabs.harvard.edu/abs/2019arXiv190508667A