Git and GitHub Tutorial

In the previous part of the tutorial, we talked about the basics of version control systems, online providers and the history of Git. In this part, we will learn how to use Git and GitHub to manage our code repositories.

Link to the post in the thumbnail on reddit.

Creating and setting up a GitHub profile

The steps to create a GitHub account are identical practically to any other online registration:

Navigate to GitHub.
Sign up a profile with a username and with an email address you have access to.
Set a password, activate your email with the link you receive, just do the usual stuff.
That is it, you are done! Welcome to your free GitHub account!

Installing and setting up Git

While there are some GUI software for Git on Windows, at its core it is a command line utility. That is why I am using Git this way here. Here is a short guide how to install Git for Windows, MacOS or Linux.

Install the Git software.
- If you are using Windows, you usually install Git bash, a Unix-like terminal emulator for Windows that is designed for the use of Git Install Git Bash from the git downloads page.
- On Linux, MacOS or on Windows Subsystem for Linux (WSL) we only need to install the Git software package using an available package manager
```
$ sudo [apt|brew|yum|...] install git
```
Setup git with your “credentials”. If you are going to work on a remote codebase using Git, the servers needs to identificate you (the client) to know who made the changes. These are usually your username and/or your email address. In case of a closed project (e.g. at a company), where the codes are stored on a private server, your “credentials” is usually just your username on that specific server. In case you are using Git to manage codes stored on GitHub/GitLab/etc. these are your username and email address that you registered with on these sites. Registering these “credentials” on a local device can be done by modifying the configuration of your Git installation from Git Bash (in case of Windows) or simply from the terminal (in case of Linux or WSL):
```
 $ git config --global user.name  "github-username"
 $ git config --global user.email "email@of-your-github-account"
```
(Optional) Other configuration options and first time installation tips can be found on the git website.
Setup your SSH keys and config. Starting from 2020, GitHub is using SSH keys to authenticate users. This is a more secure way of authentication than HTTPS. If you are using Git for the first time, you probably do not have any SSH keys set up yet. Follow either this tutorial or the official GitHub tutorial to set up your SSH keys and config file.
Add your public key to your GitHub account. After you generated the key-pair in the previous step, you need to associate the public key with your GitHub account to be able to use SSH to authenticate yourself. The steps to do this are as follows:
1. Visit github.com and log in if you did not already.
2. Click on your profile image in the upper right corner of your homepage and go to Settings.
3. On the left hand side of the page, click the SSH and GPG keys option.
4. Click on the big, green New SSH key button (you cannot miss it).
5. Copy and paste the contents of the ~/.ssh/id_github.pub file into the large text box. You can give an arbitrary name to it to identify the key easily.
6. Click on the green Add SSH key button and you are done!
7. Test your connection to GitHub by opening a new Git Bash/terminal and trying sshing to GitHub. For the very first time connecting to any server, OpenSSH will always ask, whether you trust this connection or not. Now just type yes and press enter. If everything goes well, then you will see an information message starting with Hi (USERNAME)!. This indicates, that your SSH connection is well established between your device and GitHub and you are good to go.
```
 $ ssh -T git@github.com
 #The authenticity of host 'github.com (IP ADDRESS)' can't be established.
 #RSA key fingerprint is SHA256:(PUBLIC KEY FINGERPRING).
 #Are you sure you want to continue connecting (yes/no)? YES
 #Hi (USERNAME)! You've successfully authenticated, but GitHub does not
 #provide shell access.
```
  A fun-fact-worthy, but still good-to-know clarification for this command above: If you look up what the -T flag in the implementation of OpenSSH means, you will find that “it disables pseudo-tty allocation”. Okay, but what does it mean? Why we are using it here? The answer consists of two parts:
  - First of all, “TTY” simply means “terminal” and you can come across this acronym in many places if you are working with computers and command lines. It originates from the word teletypewriter, the “father” of the physical computer terminals and the “grandfather” of terminal emulators, so any “terminal” you run on your computer. Besides the regular Unix terminals in e.g. Linux or MacOS, some well-known examples of terminals are e.g. PuTTY or Windows Terminal on Windows. However, Windows Terminal should not be confused with cmd.exe, the default command-line interpreter of Windows or PowerShell, which is a more advanced and versatile command-line tool.
  - Second, the description of the -T flag in “natural language” means that this disables sending a terminal start-up request to the remote machine (which is otherwise sent by default). It is a very common and important practice to pass the -T flag to the ssh command, when testing an SSH connection. The reason for this is that large majority of remote servers that people use with ssh are forbidding access to a remote terminal. In that case, any ssh test without specifying the -T flag will be unsuccessful in all instances. The remote server will simply reject our ssh request and we will be left confused why our perfect ssh setup did not work. In case of GitHub, using -T is actually unnecessary. Although it correctly informs us that “GitHub does not provide shell access”, GitHub is configured in a way that it handles commands coming from careless users appropriately and it will not provide shell access even when said users forgot the -T flag. Still, following good practices are always very much advised, because life will not be so tolerant with us in the future.

Basic structure of GitHub

GitHub consists of so-called code repositories. A code repository (or simply repository or repo for short) is like a “folder” and any user can create an arbitrary number of them associated with their account. These repos are used to organize and isolate individual projects or cohesive groups of files from each other. On various online hosting providers, like GitHub and others, repos can be either set to public or private. The difference between them is simple:

“Public repo” means that anyone can download the contents of the repository and see what is inside it.
“Private repo” means that only users who are explicitly given access to the repository can see and download its contents.

If you navigate to a repo on GitHub, you will see a list of files and directories, as well as some information about the repo and the code base in it. First and foremost it includes a readme, which is a long description and usage manual of the code in said repository, situated under the file structure. You can also find an additional short description, the list of contributing users, statistic about the programming languages used in the project and some other info.

Interacting with GitHub using Git

The majority of interactions of users with their GitHub repositories are limited to a handful of the most basic Git commands. This means that learning how to use Git and GitHub takes approximately 10 minutes for a complete beginner. Of course, it takes substantially more to also get comfortable with them, but that is just a matter of practice.

The most important interactions of a user with GitHub are the following:

Creating a new repository.
Downloading a repository to a local machine (i.e. your computer).
Updating the repository on the remote machine: Uploading the new or modified files to the repository on GitHub (or to any remote server, where the code base is stored).
Updating the repository on the local machine: Downloading new or modified files that exists on GitHub (or on a remote server), but not on the local machine.
Restoring a previous version of the repo.
Exploring changes (due to modification) between the code base in the remote repository and on the local machine.
Creating branches and working with them.

Every Git command starts by invoking the git binary, which is then followed by a “subcommand” that specifies what the command will do. E.g. git clone <url> downloads (or so-called clones) a repository to your machine. Similarly, git pull downloads the changes from an online repository to your already existing local repo. More on the important commands in the next sections.

1. Creating a new repository on GitHub

There are multiple options on how to create a new repository using Git, but since we are using GitHub and not a private server, probably the easiest way is to leave GitHub create and set up it for us. Eg. on your homepage you can click the “+” sign in the upper right corner of the page and then click on the New repository option:

This will open a new page, where you can configure all basic settings of your new repository. It can be discussed in two parts just to help the clarity. The first part consists of the quite obvious settings. Here you can give a unique name to your repository and optionally give a very short description to it that will be shown on the right hand side of the page if people are opening your repo on GitHub. Here you can also set the visibility of your repository.

GitHub gives an idea, how repository names on GitHub looks like by convention (only lowercase letters and words, separated with an - symbol). Here I took the recommendation and also set the visibility of this repository to private:

The second part consists of the non-trivial settings and options. The page prompts you whether you want to initialize this new repository with a README and/or a gitignore and/or a license file? If you do not select any of these and press the green “Create repository” button, GitHub will prompt you with a new page. On this page GitHub explains that it is advised that every repository is created with all of these above and shows you a tutorial on how to do it right now automatically or from a command line.

Okay, what are the purpose of these files and why do we need them at all?

README.md : This file is the primary documentation of the repository. Here you can summarize what are your codes all about, how to use them etc., anything you would like to tell someone about your codes in particular. Some good examples for serious project READMEs can be found eg. here or here. A repository should be always initialized with a README file. So at least this checkbox should be always ticked.
.gitignore : (Optional, but recommended) Tells Git what files or folders to ignore inside the repository. You can specify both file names and file extensions here with a very basic syntax. Every file that is created locally on a machine, but specified in the .gitignore will not be uploaded to the online, GitHub repository, when the user tells Git that “okay, refresh and update the online repository with my changes and modifications”. It is useful to ignore temporary or cache files, or large data files. The best practice is to upload only those files that are necessary for the project and are not automatically generated. If you are working with eg. Jupyter Notebooks, C/C++ or TeX/LaTeX, unnecessary files will be generated in every case. You do not want to see them in you repository, so it is advised to ignore them using .gitignore. To lend us a helping hand, on GitHub there are lots of pre-built .gitignore files that you can select during repository creation from a drop-down menu.
LICENSE : (Optional, can be useful) A specific digital license can be chosen for any project and automatically generated with your credentials for that specific repo. If you just collect your homework to a repo it does not matter, but if you are developing something more serious (even during your studies), then it is a nice to have. Usually for smaller projects the MIT license is recommended that you can select during repository creation on GitHub from a drop-down menu.

As you can see in this screenshot I initialized the new repository with a README file, added a pre-built .gitignore for TeX/LaTeX files and added GNUv3 license just for the sake of example:

If everything was successfully configured in the previous screen and you press the “Create repository” button, GitHub will redirect you to your new repository, where you should see something like this:

(Succotash is apparently a dish of North African origin. Its main ingredients are sweet corn and beans. Thank you GitHub for the fantastic name recommendation, very cool, very swag, I like it.)

2. Downloading a repository to a local machine

You can download (or “clone”) any public repository from GitHub, GitLab, Bitbucket etc. with the git clone command. All these storage provider websites use an almost identical layout for repositories, so I will showcase the “cloning” process using only GitHub.

If you open a public repository or (private repo that you have access to) in your browser, then above the box that shows the list of files in the repository, you will see multiple buttons in the upper right corner. By clicking on the “Code” button, a pop-up will come up and list your options on how can you download the contents of this repository:

You want to choose one of those options, where the command line is used for this task (so HTTPS, SSH or GitHub CLI). Other providers usually have only HTTPS and SSH options, but GitHub also has “GitHub CLI”, which is very similar to git, but it is specifically designed for GitHub. If you are interested in it, you can read more about it here.

For now we will use the SSH option for 2 reasons:

We already established a working SSH connection between GitHub and our machine in the steps above. This is also much safer protocol than HTTPS and it is also easier to use.
As it was already mentioned, GitHub stopped supporting HTTPS authentication back in 2020. So if you want to use HTTPS, you have to generate a personal access token (PAT) and use that instead of your password. This is a bit more complicated and much more inconvenient than SSH, so I will not cover it here.

Copy and pasting the path to the repository after a git clone command will create a folder with the same name as the repo itself in your current working directory and download the contents of the repository into that folder:

(I am keeping all the local versions of my GitHub repositories inside a folder name GitHub that resides in my home directory, that is why I cd-ed into it.) Now you are ready to start working on the code base locally on your machine!

3-4. The Four Horseman of Git: pull, add, commit, push

The majority of Git commands executed by developers on a daily basis are the most basic ones. Understanding the four main stages and their corresponding commands both at the same time is essential to understand how to manage code changes in a Git repository effectively. The image below shows these $3+1$ stages with the corresponding commands that can be used to move back-and-forth between them.

The "stages" in Git and some related Git commands.
Source of image: https://medium.com/@nmpegetis/git-how-to-start-code-changes-commit-and-push-changes-when-working-in-a-team-dbc6da3cd34c

Although Git offers a large selection of commands and options to navigate between these stages, there are four commands, which deserve special attention and which I call as the “Four Horsemen of Git”:

git pull (Updating local) : Downloads and applies all updates (file changes) from an existing online repository to a local clone of the same repo. (At least this is the default behaviour, also referred to as fast-forward.)
git add (Tracking local changes) : Adds files to the “staging” area (or simply “stage files”). The staging area serves the purpose of a “checkpoint”. It make it possible to track file changes without any irreversible consequences. Files added to the staging area can be restored to their original state if any modification happens to them after they were staged.
git commit (Creating snapshot) : Create a permanent and finalized snapshot or so-called commit of the modified and staged files in the local repository. Git works on a snapshot basis. “Snapshots” are those states of a project that are saved and kept in the repository history. During development you can go back-and-forth between these snapshots to revert the code base back to some previous state.
git push (Updating remote) : Updates the online repository with the commits created in the local repository.

Some important notes on the Four Horsemen

While git pull and git push could work well on their own by default, git add and git commit does not. Both of them have many optional, but some necessary flags and arguments that need to be specified in every case (only listing the necessary ones here):

git add [<pathspec>...] : You have to explicitly define which files to add to the staging area. In simple projects you are good to go with the command
```
  $ git add .
```
which tells Git to add every modified or new file to the staging area from the current working directory and every subdirectory below that. (This means that you have to execute this command from the project’s main directory to really add all modified files in the whole repository to the staging area.) Of course, sometimes you only want to add specific files to the staging area, not all of them. In that case, always use git add very carefully! Another important note is that it is NOT advised to upload large or unnecessary files to a Git repository. Keeping the size of the repository as small as possible is always important.

As an example, for my university projects I always tried to follow these two simple, yet powerful rules:
1. Never upload any data files to the repository. If you have to use data files, then either write code that can automatically download (and format) them for the project or use Git LFS. Git LFS is a Git extension that replaces large files with text pointers inside Git while storing the file contents on a remote server.
  
  If you generate data files during the execution of your code, it is unnecessary to upload them to the repository, even with Git LFS. You can always generate them again if you have the code to do so.
  
  Of course, you have to approach every situation individually and decide whether it is necessary or smart to upload a file. For example, if your code generates a larger file that takes hours or days to create but is necessary for later use, then it is a good idea to store it using Git LFS. However, if the file is large but takes seconds to regenerate, then obviously, it is not economical to store it anywhere.
  
  However, e.g. Jupyter Notebooks can also take up an unnecessary amount of space in the repository, especially if they are saved with several large outputs (like high-resolution images, interactive blocks, etc.) inside them. Uploading notebooks with outputs included is often a good choice for demonstration purposes. However, longer notebooks with many outputs can take up a lot of space and render on GitHub very slowly or not at all. In this case, it is better to save the notebook without outputs (Edit > Clear Outputs of All Cells) and upload the outputs separately if they are necessary for any demonstration.
2. Never upload unnecessary files that is not required for the project. This usually includes temporary files, like various build files (e.g. object files in C/C++), automatic backups (e.g. .ipynb_checkpoints) or maybe cache files (e.g. __pycache__) and so on. Any files that unnecessarily take up space can be prevented from uploading to a remote repository using a correctly set up .gitignore file.
  
  You can find a .gitignore file for almost every programming language and environment on the internet and it can be freely edited to fit your needs. The syntax of the .gitignore file is very simple and it is well-documented on the official Git website. Also it is possible to keep multiple .gitignore files in a single repository, like a Python/C/C++/etc. one inside the folder you are working your codes on and a TeX one inside the folder where you are writing your lab report, thesis, article, documentation, etc.
git commit -m "<msg>" : You have to add a commit message enclosed in "" apostrophes after the -m flag, when using git commit. The purpose of this message is to summarize in a compact (in 5-10 words total) and meaningful way the changes in the committed snapshot, compared to the previous one. A short and meaningful commit would look something like this:
```
  $ git commit -m "Mark ChatRender#render as ApiStatus.Override"
```
or
```
  $ git commit -m "Deployed unit tests for cgr.RNG module"
```
or
```
  $ git commit -m "uploaded 2nd homework and presentation"
```
The emphasis is on the word meaningful. Of course, no one has the energy to write perfect commit messages to each commit throughout their careers. But you still have to try, as it makes it easier for you and anyone else to grasp some idea about the workflow without needing to look at the actual code changes. To give some bad examples, here are some commit messages that are not helpful in any way:

If you change your mind along the way, you still have the chance to correct your mistake. The git commit command can be used to edit the commit message of the last commit, if you made a mistake in it and even add new files to the last commit. This can be done with the --amend flag:
```
  $ git commit --amend
```
This command will open the default text editor of your system and you can edit the commit message and the list of files that are included in the last commit. If you only want to edit the commit message, then you can add the -m flag to the --amend flag:
```
  $ git commit --amend -m "new commit message"
```

5. Restoring a previous version of the repo

A huge benefit of Git and GitHub is that nothing is really irreversible. Or specifically in case of GitHub, at least for 30 days after an accident… Even accidental and complete file deletion can be reverted. Of course, the complexity of commands grows as the accident becomes more and more severe. E.g. while it was quite stressful, I was able to restore one of my repositories after accidentally purging all of the files in my local repository and in the online repo as well.

However, I will not give any specific examples here. Any commands editing repository history should be approached with the uttermost caution. For every little accident, you can find thorough and detailed descriptions – mostly on StackOverflow – about how to restore your repository to the exact state you want to return to. But you can really f*** this up, if you are not careful enough. I simply do not want to bear the responsibility for any potential accidents, so this section will be left empty for now.

6. Exploring changes

Tracking file changes made in your local repository

Using git status:
```
  $ git status
```
This command displays the “status” of the repository, which means it will show you which files reside in the staging area right now or which files are waiting to be pushed to the online repo.
Using git diff:
- Basic aesthetics:
```
  $ git diff [--stat]
```
  The command git diff displays the exact changes to every line in the repository:
  
  Adding the --stat flag makes it to only display the names of changed files and the changed number of rows in every those files:
- Check modifications between workspace and staging area:
```
  $ git diff
```
- Check modifications between workspace and local repo:
```
  $ git diff HEAD
```
- Check modifications between staging area and local repo:
```
  $ git diff --staged
```

Tracking file changes made in the online repository

Check changes in an online repository without overwriting local files with git pull:
```
  $ git fetch && git diff HEAD
```
The command git fetch downloads the metadata about the snapshots pushed to the online repository, but without downloading any actual files/snapshots. Along with git diff HEAD this can be used to check exact differences in an active, online repository without overwriting any local files on accident.
List snapshots in a repository in a nice way
```
  $ git log --pretty=oneline --graph --decorate --all
```
The subcommand git log is a powerful tool that can be used to easily overview key details in the history of a repository at glance. Using various command flags, git log has the capability to display all metadata of any (or all) commits in a very compact and informative way. The command above is a good example for this. It displays all commits in the repository in a single line, with a graph that shows the branching and merging of the repository, and with the names of the branches and tags.

An example workflow

Imagine you have a repository on GitHub that is managed by only you. You are doing some measurements in a lab at the university and you want to upload your datafiles from the lab computer to your GitHub (because of course, what else a sane person would do in this situation). You hack the lab computer, aquire sudo, install Git, setup an SSH key and download your physlab57 repository to the computer via SSH using the command
```
 $ git clone git@github.com:username/physlab57.git
```
This command will create a folder named physlab57 in the current working directory (i.e. in the directory where you executed the command) and download the contents of the repository into it.
You copy the lab files into this downloaded physlab57 folder and then you upload all of them to the remote repository on GitHub using the following chain of commands:
```
 $ cd path/to/physlab57
 $ git add .
 $ git commit -m "added datafiles from lab computer"
 $ git push
```
You go home and 6 days later you start working on your lab report, because you have to hand it in before 23:59. (You only have precisely 2 hours and 23 minutes until that.) (POV: you are a university student.) However, first you want to download the datafiles from GitHub to your own computer at home to start working on them. You already have your repository cloned to your local machine, so you just cd into its folder and download the datafiles with
```
 $ cd path/to/physlab57
 $ git pull
```

That is it. Now you can start working on your lab report. You will not finish in time, but good luck!