In this second report of the course Data Science Laboratory, I’m summarizing my work on the data preparation and visualization of real CMB measurement made by the Planck space observatory between 2009 and 2013. I’ll also detail the theoretical background and my work on the recreation of the angular power spectrum of the CMB, using the Planck telescope’s data.
In this course I’m studying the Cosmic Microwave Background radiation this year. In my project I’m ought to analyze the simulation aspect of it, as well as I’m processing real observational data of the CMB, made by the Planck space observatory. My two main goals are to replicate the CMB temperature map using an arbitrary angular power spectrum, and to extract the power spectrum from real observational data. In this report I would like to summarize my weekly progress in the latter topic, the power spectrum reconstruction. Starting off with a chronological summary, we can place the subject better in time, and understand why we should use the Planck observations.
The existence of the cosmic microwave background radiation was first confidently predicted in 1948 by Ralph Almer, who was working with Robert Herman and George Gamow that time[1]. Non-thermal radiation was considered already half a century earlier, also even the CMB itself was detected (but not correctly interpreted) in the interstellar medium as a component of molecular lines from the lowest state of CH and CN molecules, by Andrew McKellar in 1941[2]. The phenomenon remained a fairly unpopular topic among astronomers until the early-mid ’60s, when numerous individual observations related to the CMB came to light in a very short period of time. Most notably the first real observation of the CMB radiation itself by Arno Penzias and Robert W. Wilkinson in 1965. Interest was only heightened by the fact that the measurement results were consistent with previous theoretical assumptions[3].
Until the early ’70s a scientific consensus was slowly established to declare that the CMB is the remnant of the Big Bang. It was mostly backed by the observations, that the frequency range of this radiation corresponds exactly to the well-known black body radiation[4]. In 1974 NASA issued a tender (or a so-called
The last observations from space was done by the Planck space observatory between 2009 and 2013, succeeding the mission of the WMAP telescope and currently the most detailed datasets of the CMB originates from this telescope. Its mission was followed by 3 major data releases in 2013, 2015 and 2018. The main goal was simple: measuring the intensity and polarization of the CMB anisotropy within the entire sky with unprecedented accuracy. For this particular job, the telescope was mounted with two distinct instruments to detect the CMB photons. One for the lower end of the frequency range and one for the higher frequencies. These two instruments mapped the whole sky on 9 different observational frequencies in the GHz domain between 100 and 900 Ghzs. Since these measurements are very noisy, and the CMB itself is completely obscured by foreground radiation in the raw data, four different filtering pipelines (COMMANDER, NILC, SEVEM, and SMICA) were utilized to separate the CMB from the foreground effects to finally recreate the image of the CMB in the form as we all know it.
The data gathered by the Planck telescope are handled by the IRSA (Infrared Science Archive) project, which aims to - as they say - “curate the science products of NASA’s infrared and submillimeter missions”. All the data from the three data releases of the Planck mission can be accessed freely on their web-archive by the public.
Datasets are categorized and grouped by a numerous different aspects, like the type of the data tables (CMB anisotropy maps, maps of astrophysical objects and foreground masks, primary frequency maps, etc.), observed frequencies of CMB photons, processing pipelines and much more. All of these entries are stored in the regular .fits
format, and are storing multiple tables per entries. These tables are mostly the $I$ intensity map with the corresponding polarization maps of the $Q$ and $U$ components, the $T$ confidence masks, or the inpainted versions of all of the above.
All of the .fits
tables are encoded in the HEALPix projection. HEALPix (Hierarchical Equal Area isoLatitude Pixelisation) is an old and widely used standard, not just in astronomy. It refers to the standard method for the effective pixelisation of the 2-sphere to help projecting its surface to various other geographical projections. As suggested from its name, the method divides the surface of the sphere into smaller “pixels”, where each pixel covers the same surface area as every other pixel[5]. HEALPix utilizes some “conventions” - as they call it - that we’ll need to understand in order to understand why this method is useful in this case and what exactly the “angular power spectrum” means in the context of the CMB.
When discussing the framework of the HEALPix standard, we first consider a bandlimited function $f \left( \theta, \varphi \right)$ (bandlimited in the sense of signal procession) with the frequency limit $l_{\mathrm{max}}$ and expand it using spherical harmonics. This is an old method in the harmonic analysis of the sphere and we only build the HEALPix algorithm on top of this well-known fact. The spherical harmonic expansion of the function above is
\begin{equation} f \left( \theta, \varphi \right) = \sum_{\ell\,=\,0}^{\infty} \sum_{m\,=\,-\ell}^{\ell} a_{\ell m} Y_{\ell m} \left( \theta, \varphi \right), \end{equation} where $\theta$ and $\varphi$ denote the spherical coordinates with polar angle $\theta \in \left[ 0, \pi \right]$ and azimuth $\varphi \in \left[0, 2 \pi \right)$. The term $Y_{\ell m} \left( \theta, \varphi \right)$ denotes the spherical harmonics
\begin{equation} Y_{\ell m} \left( \theta, \varphi \right) = \sqrt{ \frac{2 \ell + 1}{4 \pi} \frac{\left( \ell - m \right)!}{\left( \ell + m \right)!} } P_{\ell m} \left( \cos \left( \theta \right) \right) e^{i m \varphi}. \end{equation}
Pixelising the function $f \left( \theta, \varphi \right)$ correspond to sampling it at $N_{\mathrm{pix}}$ locations $\left( \theta_{p}, \varphi_{p} \right)$[6], where $p \in \left[ 0, N_{pix} - 1 \right]$. The sample function $f_{p} \left( \theta_{p}, \varphi_{p} \right)$ can be then used for an estimator of the $a_{\ell m}$ coefficients as
\begin{equation} a_{\ell m} = \frac{4 \pi}{N_{\mathrm{pix}}} \sum_{p\,=\,0}^{N_{\mathrm{pix}} - 1} Y_{\ell m}^{\ast} \left( \theta_{p}, \varphi_{p} \right) f_{p} \left( \theta_{p}, \varphi_{p} \right), \end{equation} where $\ast$ in the upper index denotes the complex conjugate. Using the $a_{\ell m}$ coefficients the angular power spectrum can be approximated as
\begin{equation} C_{\ell} = \frac{1}{2 \ell + 1} \sum_{m\,=\,-\ell}^{\ell} \left| a_{\ell m} \right|^{2}. \end{equation}
In my first report I already detailed, why the CMB temperature anisotropy can be interpreted as a radiation originating from the surface of the sphere around the observer. The anisotropy itself will be observed as a function of $\left( \theta, \varphi \right)$ polar angles on this distant spherical surface. Since the first equation above describing exactly a function $f \left( \theta, \varphi \right)$ on the 2D surface of a sphere, it is a convenient choice to use it in the case of the CMB too. In this framework we can simply replace the arbitrary $f \left( \theta, \varphi \right)$ function with the $\Delta T \left( \theta, \varphi \right)$ temperature anisotropy and calculate its power spectrum as it was described above.
Our measurements of the macroscopic world are always discrete, mostly because of our everlasting technological limitations. However the development of quantum physics since the early 20th century showed us that physical quantities are indeed discrete, a lot of them still can be threated as continuous on the macroscopic scale. Nonetheless, our measurements will be still discrete, which arises from the fact that we simply can’t sample continuous functions with arbitrary accuracy, because of the mentioned limitations in our instruments of observations.
The results is that we have to work with “discrete” datasets. In the case of the CMB, discretization manifests not only in one, but two separate phases during the observation. The first one occurs when we’re trying to resolve the image of the sky (the incoming photons) in the highest possible resolution. The second obstacle is resolving the frequency itself of the incoming photons. The first causes the observed image to be pixelated, while because of the second one we get only a histogram of photon frequencies as a result of the measurement instead of a true, continuous curve. Also, our data will have an overall bandlimit too, since the range in which the measuring instrument operates is also limited. Fortunately these factors create the perfect condition to use the HEALPix data format, because of their conventions, referring to bandlimited and discretized functions.
My main tool for processing the data of the Planck space telescope was Python. Besides being one of my main languages, building the procession pipeline to explore HEALPix data stored in .fits
tables is also rather straightforward in Python, because the original HEALPix library is implemented in a package under the name healpy
. It contains all the necessary subroutines to handle and convert HEALPix data into such a format, which can be then easily used for data exploration and visualization. These factors were satisfactory enough for me to use the Python language to carry out the project work. Using the mathematical conventions, we only need two steps to reconstruct the power spectrum from an available CMB dataset. After reading in the $\Delta T \left( \theta, \varphi \right)$ temperature data using the HEALPix library, we have to determine the $a_{\ell m}$ coefficients and then we can build the power spectrum using them immediately. In its core nature, the original HEALPix library was built on the Fortran90
standard, which included a function called anastat
[7]. This subroutine calculates the $a_{\ell m}$ coefficients, as well as the $C_{\ell}$ power spectrum from an input $f_{p} \left( \theta_{p}, \varphi_{p} \right)$ HEALPix vector.
Independently from this step, healpy
even offers us a routine, which can be used to convert the input dataset by projecting it on any geographical projections that are implemented in the library. This routine creates a rectangular matrix, which then can be simply visualized using simply matplotlib
or other graphic packages in Python. There were however a lot of problems, visualization side. My original idea was to generate a map in equirectangular projection using the healpy.projector.CartesianProj()
class, then wrap it around a map in Mollweide
or other type of projections with matplotlib
, using the matplotlib.pyplot.pcolormesh()
graphical subroutine. However this method requires a pre-defined coordinate system, which the input equirectangular map is “wrapped around”. This coordinate system is different for every projection, which makes this method inconvenient to use.
My solution was to create rectangular matrices with the healpy.projector
methods, already in the correct projections. This way all non-rectangular maps will be encompassed between the “borders” of the matrix, where elements outside the projection will be marked as -inf
values. I created a custom colormap to make the images created with it as similar as possible to the CMB maps of the Planck space telescope. All the color values for a seamless gradient was generated from an input file. The colormap was also configured to correctly display these special, -inf
values, as well as the actual CMB map itself. Using these I was able to easily create the images seen on Fig. 1. and Fig. 2. using healpy
and matplotlib
exclusively.
In the final part of the latest batch of my work, I calculated and plotted the angular power spectrum, using the method detailed above. After loading the original .fits
file used for the visualizations above, I’ve used the already mentioned healpy.anafast()
routine to obtain the $a_{\ell m}$ and $C_{\ell}$ values. However the $C_{\ell}$ quantity is what we call as “angular power spectrum”, it is traditional to plot another quantity against the multipoles $\ell$, which is only a function of $C_{\ell}$. This quantity is
\begin{equation} D_{\ell} = \frac{\ell \left( \ell + 1 \right)}{2 \pi} C_{\ell}, \end{equation} which isn’t really have any name to it. It is also a much better alternative aesthetically too. The final plot generated from the original CMB dataset can be seen on Fig. 3.
On the figure I compared the obtained $D_{\ell}$ curve to the theoretical one. The latter was created with the CAMB software (Code for Anisotropies in the Microwave Background), which can be accessed from the website of NASA’s Goddard Space Flight Center, which insitution maintains the (LAMBA) Legacy Archive for Microwave Background Data Analysis[8]. The difference between the theoretical and observed curve on the higher multipoles are due to the instrumental noise, which isn’t filtered in the Commander pipeline.
My very next goal will be to tidy up all my codes I’ve written until now. Since the pipeline is ready to simulate CMB maps, also to process observational data, I can begin to work on other goals, which I set for myself as optional/extra tasks. The first one is to try to determine cosmological parameters from the CMB maps, since this is one of the main point of its observation from the beginning. The second optional task would be to create CMB maps on a Mollweide projection without distortions, for which the HEALPix standard proves as a really useful tool.
All my work can be accessed on my GitHub repository of the course. Final project codes are situated in the project_codes
folder, while my presentation slides and presentation scripts can be found under the docs
folder.