masterdesky

Git and GitHub Tutorial

1. Introduction to Version Control Systems

In this guide series I am trying to introduce the basics of version control systems and related software, primarily focusing on Git and GitHub. I will cover the most important aspects of these tools, including the installation and configuration of Git, the basic commands and workflows, and the usage of GitHub for code hosting and collaboration. This first part will be an introduction to version control systems: what they are, what is the motivation behind their creation, what philosophies and workflows they propagate, and why they are essential for modern programming.

Whether we talk about software developers, researchers or programmers of any kind, all of them need a platform to safeguard their codes, organize coding projects and distribute and update source codes easily. Tracking the history of code changes, aka software versions, as well as the speed and transparency of all previously mentioned operations are also essential. Software that fulfil these criteria are referred to as version control systems (VCS) or source-control management (SCM) systems. Deploying such software on a remote server that is connected to the internet can be used for virtually everything mentioned above. If you are trying to write any type of code during your career, you will definitely need to get used to a VCS.

We mainly distinguish two types of VCS: centralized and distributed, both of them having its own workflow and benefits. The true picture is – as always – more complicated; however, the basic tenets above apply to most VCS. Let us break down the process:

  1. Differences of CVCS and DVCS:
    • Centralized VCS like Subversion or CVS, have a single central repository where all the version history is stored. Developers check out code from this central repository to their personal machines. Here, they work on the code independently, making their own changes and updates. Once changes are complete, developers commit them back to the central repository.
    • Distributed VCS like Git or Mercurial, allow each developer to have a full copy of the entire repository, including its history on their local computers. This means that operations like committing, viewing history, and branching can be done entirely locally, with changes being pushed to a shared server only when necessary.
  2. Parallel development challenges: One critical aspect of the workflow in both CVCS and DVCS – actually not even limited to programming but in many collaborative environments – is the simultaneous modifications made to the same thing, i.e. the same code base in case of development. Multiple developers often alter the same sections or work on different features that might not be initially compatible with each other.
  3. Branching and forks: This leads to what we call branching or forks in development. These terms refer to the creation of separate versions (branches) of the software where developers can make changes without affecting the main code base. Each branch represents a parallel development path, allowing for diversified development without having to worry about any immediate overlap.

To address the arising code conflicts of people working on the same code in different branches and editing the same parts of the code base, or implementing independent code segments that are incompatible with each other, we need a system that can handle or makes it possible to handle all errors and complications originating from this parallel development process. Hence the emergence of version control systems that intend to simplify collaborative development as much as possible.

A schematic of the feature-based development workflow, what git is primarily designed for.

Trunk-based vs. Feature-based workflow

There are a number of philosophies on how to organize the development process, specifically this branching development structure detailed above. Now, there is no inherently better or worse approach. Different projects, teams and developers have different needs and preferences. Even different version control systems prioritize specific approaches over others in their design and conventions. At the end of the day, the most important aspect to consider is to keep the workflow consistent and in-line with the needs of the project and team.

Trunk-based development is characterized by a single, active development line – often known as the ‘trunk’. In this workflow, developers frequently merge their changes back into the ‘trunk’, often multiple times a day. This approach encourages a high degree of collaboration and continuous integration of changes, ensuring that the codebase remains stable and reducing the potential for major merge conflicts. However, keeping the trunk in a releasable state at all times requires a highly disciplined team; any changes that break the build or introduce bugs can critically impact the stability of the codebase. This method is particularly well-suited for small, agile teams working on projects with short development cycles, where rapid iteration and frequent releases are essential. Teams employing this method benefit from simplified management of their source code, as they avoid the complexity of handling numerous branches for different features.

A comparison of the feature-based (so-called Git Flow) and trunk-based development workflows.

Source of image : Trunk-Based Development in Software Development, GeeksForGeeks

On the other hand, the feature-based workflow, also known as feature branching, involves creating separate branches for each new feature being developed. This method allows developers to work independently on their respective features without affecting the main codebase, which is particularly useful for larger teams or projects where features may take longer to implement. Each feature branch can undergo its own cycle of development, testing, and review before being merged back into the main branch. This separation can lead to more structured and organized development but comes with its challenges, primarily in managing and merging these multiple branches. Merge conflicts are more common, and there can be a significant overhead in keeping these branches up-to-date with the main codebase. Feature-based workflow is typically suited for scenarios where features are developed over extended periods, or when teams are geographically distributed, requiring a more decoupled approach to software development.

Comically short history of Git

Starting in 1998 until 2005, the development of Linux took place on a version control system called BitKeeper. In 2005, BitKeeper decided to stop providing free versions of its client software, forcing its users to either subscribe to a payed BitKeeper license or switch to another version control system.

In response to this new policy, Linus Torvalds – the original creator of Linux – out of spite, almost immediately started to work on a his own version control system, dubbed as Git. His goal was to create a VCS that suits all the needs of Linux development and provide features that did not yet exist in freely available systems. Oh, and of course, it had to be faster than any available version control systems at the time. Two weeks after Linus started the project, the very first version if Git made its public debut. Since then, Git became the largest and most popular version control system of our time.

Fun fact, that while the development of the Linux kernel and the surrounding distributions are now on GitHub, many Linux developers are still using Subversion to manage Linux code bases as it was the case before the introduction of Git.

Popularity of git and some other, once popular version control systems between 2004-2024 according to Google Trends. Git unarguably dominates the market.

Source of image: Google Trends in December, 2023.

GitHub, Bitbucket, GitLab etc.

Free-of-charge online hosting providers for code development that implement Git’s and their own functionalities are now essential tools in modern programming. These sites offer free registration for users and provide practically unlimited amounts of online storage for them with a ready-to-use framework to manage their code repositories using a supported VCS. Users can access their repositories via the website or the terminal. Some sites, e.g. GitHub, even provide a downloadable GUI tool. These websites usually offer additional features and storage for paying users, but the most important ones are – usually – free.

GitHub (and Git) are as deeply intertwined with the modern programming community as StackOverflow. I already mentioned this, but most developers are bound to use a VCS and, often, an online hosting provider for their code repositories. For a long time now, the most popular tools like these are Git and GitHub, respectively. Nowadays, many critical open-source projects – even by top IT companies – are hosted on GitHub. It is generally common for companies to use Git as the management tool for their internal code repositories that they host on their own servers, or sometimes privately on online hosting providers.

There are other hosting providers that offer similar services to GitHub, such as GitLab or Bitbucket. Nowadays, the differences between them are not that significant, but GitHub get the upper hand in terms of free-of-charge features in recent years. Additionally, under the hood, both GitLab and Bitbucket utilize Git as their primary method for version control and codebase management.

An important aspect that many people seem to forget that even an individual can benefit from using online hosting providers. For example, if you have to work on the same codebase from multiple computers, then Git and GitHub are the perfect tools to keep the code up-to-date on all machines. Even if someone works on a single computer, GitHub could serve as a free-of-charge and virtually unlimited backup storage for their projects. Online hosting providers could also function as a portfolio for someone who wants to showcase their programming skills to the world and potential employers. Some people even hijack the capabilities of, e.g. GitHub and use it to compile and present their CVs; although I cannot comment on the elegance of this approach. A better alternative would be another widely popular feature unique to GitHub: GitHub Pages. It allows users to create and host their own (static) websites for free; this website is also hosted there.