In the third report of the course Data Science Laboratory, I’m summarizing my work on the project and I’ll show in detail, how all my scripts work and how to use them.
In the last week I mostly cleaned up and completely refractored almost all of my codes to finalize them. I’ve written docstrings for all of my routines, reworked most of my figures and partially completed some extra tasks I set for myself. In this report I would like to finally describe how my codes work and what should someone know about the usage of them.
During the course I’ve written three different pipelines as my Cosmic Microwave Background radiation project work. Two of them covers the topic of the computer simulation of the CMB radiation, while third one ought to load, process and analyze the publicly available CMB anisotropy maps of the Planck space observatory, but it can be used for the WMAP telescope’s measurements as well, since they’re both using the same HEALPix standard to encode their datasets. One of the generation algorithm and the latter analyzation routine both uses the HEALPix library, which was implemented in Python under the name healpy
. The other generation algorithm for the CMB temperature anisotropy map is based on the work of Jeff McMahon and Renée Hložek for the CMB summer school in 2019 of the McMahon Cosmology Laboratory. The link to the original materials can be accessed here. These codes are much longer and more complex by magnitudes than the prior ones, but they’re completely modular and much more instructive. Using them, one can really gain some insight about how the CMB radiation is measured or what astronomical components it consists of.
This is the first method for the simulation of the CMB and also the easiest one to implement among the three methods. In this routine we start from a single input angular power spectrum and at the end we’re generating a random HEALPix array with completely the same spherical surface indexation (HEALPix) standard, as with eg. the Planck telescope stores their datasets. It can be threated as a fully fledged CMB map from the Planck or WMAP telescope, which can be then loaded and processed using my Planck data procession pipeline in the same way as real datasets are. This routine only consist of two main functions called load_spectrum()
and gen_maps()
. Their names are trying to be meaningful. The function load_spectrum()
def load_spectrum(fname, lmax=None):
"""
Loads and arbitrary angular power spectrum from a file.
Datasets generated with the LAMBDA tool contains only the :math:`D_{l}`
values, alongside the array of the :math:`l` multipoles and the
corresponding errors.
...
"""
simply loads an angular spectrum from a file. This file is considered to be simply a plain-text datafile, where the first column contains the $\ell$ multipoles and the second column contains the $D_{\ell}$ transformed power spectrum values. The $C_{\ell}$ values are calculated and then returned along the $\ell$ and $D_{\ell}$ values. Meanwhile the gen_maps()
function
def gen_maps(cls, N_SIDE=2048, lmax=None,
pol=False, pixwin=False, fwhm=5.8e-3, sigma=8.7e-6):
"""
Generate randomized HEALPix arrays from an input angular power spectrum, using
the `synfast` subroutine from the `Fortran90` standard implemented in the HEALPix
library.
...
"""
creates the randomized HEALPix array using the synfast
routine, which was originally implemented in the Fortran90
standard and was later ported into the HEALPix library. Today in Python it can be found in the healpy
package.
In this part I would like to speak about the analysis pipeline for real observational data. This algorithm is still simple, but contains more subroutines, than the previous one.
All of the Planck CMB maps (regardless of type) are stored in the FITS file format. In every FITS table, there are couple of fields ($\approx 2-7$), each storing a specific HEALPix array. The FITS’s header contains more information about the datasets in every field.
In the very first step, this routine using the function load_HPX()
def load_HPX(file, field=1):
"""
Loads a HEALPix array from a given field of an input FITS file.
...
"""
extracts the HEALPix array stored in a given field from an input FITS file. In my project I’ve worked particularly with the intensity maps of the temperature anisotropy. These temperature maps of Planck are stored as Kelvin values, so the the function above also converts them to micro Kelvins and returns this converted array as well. This is only a raw, regular 1D HEALPix array yet.
In the next step the function called get_projection()
def get_projection(hpx, proj='moll', N_SIDE=2048):
"""
Projects the input HEALPix dataset on an arbitrary geographical projection,
which is implemented in the `healpy` package.
...
"""
interprets the special HEALPix indexing of this array, and maps this on an arbitrary geographical projection. (Note: the function can use only those projections that are implemented in the healpy
package.) The output of this step is a 2D matrix, which completely encompasses the non-always-rectangular projection itself, as some kind of border or container. Outside of the actual projection, the bad values are handled uniformly by being set to the value inf
.
Finally the function plot_cmb()
def plot_cmb(proj, cmap=None, c_min=None, c_max=None,
save=False, save_filename='default_name_map'):
"""
Plots an input image generated by a `healpy.projector` routine and scales the values if needed.
The routine uses the classic Planck CMB colormap by default to shade pixels on the image.
...
"""
which generates an image using the original colormap of the Planck CMB temperature maps to shade it and thus making it completely similar to those well-known CMB images.
This part contains two other functions as well, which are used to extract the angular power spectrum of the input HEALPix array and visualize it on a graph, as well as compare it to the theoretical curve, generated the CAMB LAMBDA tool. The first function cmb_spectrum()
def cmb_spectrum(hpx, lmax=2500, alm=True):
"""
Calculates the :math:`a_{lm}` and :math:`C_{l}` parameters using the
`anafast` subroutine from the Fortran90 standard, up to a given
:math:`l_{\mathrm{max}}` bandlimit.
...
"""
uses another Fortran90
subroutine, called anafast
to calculate the $C_{\ell}$ angular power spectrum values up to a given $\ell_{\mathrm{max}}$ bandlimit. It optionally returns the $a_{\ell m}$ coefficients in the spherical harmonics expansion of the input function (HEALPix array in this case).
Finally, again, another function called plot_spectrum()
def plot_spectrum(ell, Dl, DlTT,
save=False, save_filename='default_name_spectrum'):
"""
Plots the angular power spectrum of the CMB.
...
"""
visualizes the transformed $D_{\ell}$ values and compares it to the theoretical curve on the same graph.
The second method used for the generation of CMB temperature maps is the last of my routines and it is the most complex as I’ve already mentioned. The main characteristics of this routine is, that it generates all components of a regular CMB temperature map individually. The base idea behind this particular CMB map generation is to first generate these component individually, then superimpose them over each other to obtain the final, raw intensity map. By “components” I mean the pure CMB temperature anisotropy, the foreground effects and instrumental as well as other type of noises. Foreground effects are consists of point sources with different distributions, while noises are can be attributed to the PSF of the instrumental beam, atmospheric perturbations, observational frequency dependent noise of the instrument (pink noise), or simple, common white noise.
In its current form, the naïve generation contains two .py
files (called as constants
and cmb_modules
), and three Jupyter Notebooks (called as part_1_pure_cmb
, part_2_adding_foreground
and part_3_adding_noise
) for demonstration purposes.
In the constants.py
file there are the simulation parameters are defined. This includes the aspect, extent and resolution of the generated CMB maps, extent of colorbars, number and mean amplitude of foreground sources, noise levels for all types of noise generation and beam FWHM of the simulated instrument. The file cmb_modules.py
contains all subroutines for the naïve generation algorithm, where by default, the function arguments uses the parameters defined in the file constants.py
.
The three notebooks are contains demos for the algorithm. The first part_1
Notebook showcases the steps of the simulation of a pure CMB map in general. The second Notebook marked with part_2
focuses on the generation of foreground objects, and analyses their power spectrum. The third Notebook, which name starts with part_3
details the noise generation and the effects on an instrument of a beam with non-zero radius.
The base layer of the simulation is the intensity map of the CMB temperature anisotropy. At the beginning of the simulation, the make_CMB_I_map()
def make_CMB_I_map(ell, DlTT,
N_x, N_y,
X_width, Y_width, pix_size,
random_seed=None):
"""
Makes a realization of a simulated CMB sky map given an input :math:`D_{\ell}` as a function
of :math:`\ell`. This routine creates a 2D :math:`\ell` and :math:`C_{\ell}` spectrum and
generates a Gaussian, random realization of the CMB in Fourier space using these. At last the
map is converted into Image space, which will result us a randomly generated intensity map of
the CMB temperature anisotropy.
...
"""
can be used to construct this randomly generated, pure CMB temperature map using an input angular power spectrum. It should be noted that there are numerous other parameters, which determines the shape, extent and other attributes of the simulated maps, which can be tuned at will by the user. The basic methodology of this function above is that it creates some noise with Gaussian distribution in Fourier space, which parameters are based on the input power spectrum. After that the map is converted to image space and it results in a randomized temperature noise map with angular power spectrum close-to-identical to the original input spectrum. This can simply be considered to a randomly generated CMB temperature map, which serves as our base layer for the simulation.
The output maps throughout all of the notebooks can be easily visualized using the function plot_CMB_map()
,
def plot_CMB_map(CMB_I, X_width, Y_width,
c_min, c_max, cmap=None,
save=False, save_filename='default_name_cmb',
no_axis=False, no_grid=True, no_title=True,
no_cbar=False):
"""
Plots the generated rectangular intensity map of the CMB temperature anisotropy.
...
"""
which serves as a general plotting routine for the created datasets to obtain figures with similar appearance. The plots can be controlled in detail using the function’s arguments. After the generation of the pure CMB map, visualizing it results in the the plot seen on Fig. 1.
In reality, the CMB radiation is completely obscured by the overwhelming radiation from objects inside our galaxy, and from other astrophysical structures (eg. other galaxies, AGNs, etc.). Proper filtering pipelines needs to be utilized to separate the foreground effects from the CMB. In this project I’ve simulated the effect of this component, but in a much more subtle way. In real measurements this enormous effect is attributed to billions of objects, but in this simulations, where individual foreground objects are generated, this huge number can not be replicated. That’s why this part of the simulation is more of an “informative” component, than a faithful representation of real measurement data.
In this simulation there are three types of foreground objects that are generated:
Point sources can be arise from a number of (mostly) bright astronomical objects, like Active Galactic Nuclei (AGNs), Dust Star Forming Galaxies (DSFGs), and the bright tail of lensed DSFGs, as they are listed as examples in the original Jupyter Notebooks of Jeff McMahon and Renée Hložek. To simulate a mock image of these sources, we’re approximating their population by combining a layer of abundant, but faint sources with a layer of a few, but bright ones.
The first type of point sources represents the faint layer of foreground effects. They’re the most abundant, but least intensive objects in the simulation. Because their intensities are generated using the Poisson distribution, their radiation covers only a very narrow, interval in the power spectrum, making them very easy to drop out them during a filtering process. The function poisson_source_component()
def poisson_source_component(N_x, N_y,
pix_size,
number_of_sources, amplitude_of_sources):
"""
Makes a realization of the naive foreground point source map with Poisson
distribution.
...
"""
generates a given number of these faint point sources using the argument amplitude_of_sources
as a parameter for a Poisson distribution to chose their intensities.
The second type of point sources are those with intensity chosen by an exponential distribution. These objects are generated by the function exponential_source_component()
def exponential_source_component(N_x, N_y,
pix_size,
number_of_sources_EX, amplitude_of_sources_EX):
"""
Makes a realization of the naive foreground point source map with exponential
distribution.
...
"""
They’re normally much smaller in number, but cover a wider range of temperatures on the power spectrum. Because of this, some of these sources are very intensive on the temperature map compared to the first type of point sources. The two point source map can be summed, which results the map of the point sources seen on Fig. 2.
The last of the simulated foreground distortions is the thermal component of the Sunyeav-Zeldovich effect. When a CMB photon propagate through galaxy clusters filled with high-energy electrons, it receives a small energy boost by interacting them through inverse Compton scattering. This causes changes in the photon’s temperature and polarization as well. Since we’re only working with temperature maps, only the prior component is considered here. Speaking on technical terms, observing the Sunyaev-Zeldovich effect can be used to discover distant and dense galaxy clusters and thus is a field of great interest in cosmology. The subroutine called SZ_source_component()
def SZ_source_component(N_x, N_y,
X_width, Y_width, pix_size,
number_of_SZ_clusters, mean_amplitude_of_SZ_clusters,
SZ_beta, SZ_theta_core):
"""
Makes a realization of a naive Sunyaev–Zeldovich effect map.
...
"""
intends to simulate the SZ effect. The main concept of this function is to generate large point sources and then “smudge”/”blur” them by convolving the generated SZ effect map with a $\beta$-function. The result is a layer with large and cold sources on the CMB temperature map. A randomly generated SZ source map looks like the one seen on Fig. 3.
As it was detailed earlier, noise arises from many different aspects of the measuring instrument and from various natural causes as well. The first of these simulated is the distortion in the seeing, caused by the non-zero extent/FWHM of the measuring instrument’s beam. Just as our eyes have limited capabilities to resolve two distant, but neighboring objects, our instruments also suffer from this limitation. If we want realistically simulate a measurement, we have to incorporate this phenomenon into our work. Using the function convolve_map_with_gaussian_beam()
def convolve_map_with_gaussian_beam(Map,
N_x, N_y,
beam_size_fwhp):
"""
Convolves a map with a Gaussian beam pattern.
...
"""
we can achieve exactly this. Here we assume that the beam profile is Gaussian. To simulate its effect on the measured temperature map, we convolve a Gaussian distribution with a given FWHM over the generated CMB image. The wider the beam, the more blurry our image will be.
The effect of white noise exist in virtually everything, including all scientific measurements and experiments. It’s main characteristic is that its power spectrum is constant in all frequencies, and is completely uniformly distributed over all frequency ranges. A special form of white noise is the Gaussian noise, where the amplitudes of noise values are sampled from a normal distribution. In digital electronics this is the most common white noise that arises during measurements, and thus Gaussian noise is widely used in signal processing to more precisely simulate realistic conditions. The routine called gen_white_noise()
def gen_white_noise(N_x, N_y,
pix_size,
white_noise_level):
"""
Makes a white noise map.
...
"""
randomly generates a Gaussian white noise map over the observed domain in the simulation. The level of noise is defined by the user. This intends to replicate the mentioned noisy conditions in digital measurements, like the “photography” of the CMB temperature map.
The second type of noise generated by the algorithm is atmospheric noise. Of course in measurement with space telescopes (like COBE, WMAP and Planck) this factor won’t affect the observation due to trivial reasons. However CMB research are not exclusively conducted from outside of the atmosphere, but ground-based measurements are also made (eg. using the Atacama Cosmology Telescope). To be able to replicate these measurements too beside those that made by space telescopes, we’re simulating the effect of atmospheric perturbations using the function called gen_atmospheric_noise()
def gen_atmospheric_noise(N_x, N_y,
X_width, Y_width, pix_size,
atmospheric_noise_level):
"""
Makes an atmospheric noise map.
...
"""
This routine works in the similar way as the pure CMB temperature map is generated. In this case a random 2D Gaussian dataset is generated and converted to Fourier space. After some transformation there, it’s converted back to image space, which results us the final atmospheric noise map.
At last but not least the $1/f$ noise or pink noise is generated. The $1/f$ name stands for the behavior of its power spectrum density to be inversely proportional to the frequency. Pink noise is almost just as common in nature as white noise. It extensively arises in observations, where the measurements are made with electronic devices. In the context of electronics this effect is called as “flicker noise”. To simulate this effect we’re using the function gen_one_over_f_noise()
def gen_one_over_f_noise(N_x, N_y,
pix_size,
one_over_f_noise_level):
"""
Generates 1/f noise in the X direction.
...
"""
Here a random Gaussian noise map is generated, but being stretched along a linear dimension (along the X direction in this function). This transformed image will be our pink noise map. Comparing all of the noise types generated in the steps above in one figure results in Fig. 5.
At the very last step we can flatten all the simulated layers into only one. This will result us a mock of the raw intensity map of the CMB temperature anisotropy, where the observation instrument is ground-based. We can optionally set the atmospheric noise level to $0$ to exclude it from the simulation and to replicate a raw image made by a space telescope. The ground based result can be seen in Fig. 6.