In this guide series I am trying to introduce the basics of version control systems and related software, primarily focusing on Git and GitHub. I will cover the most important aspects of these tools, including the installation and configuration of Git, the basic commands and workflows, and the usage of GitHub for code hosting and collaboration. This first part will be an introduction to version control systems: what they are, what is the motivation behind their creation, what philosophies and workflows they propagate, and why they are essential for modern programming.
Link to the post in the thumbnail on reddit.
Whether we talk about software developers, researchers, or programmers of any kind, they all need a platform to safeguard their code, organize coding projects, and distribute and update source code easily. Tracking the history of code changes (also known as software versions), as well as ensuring the speed and transparency of all previously mentioned operations, is also essential. Software that fulfills these criteria is referred to as version control systems (VCS) or source-control management (SCM) systems. Deploying such software on a remote server connected to the Internet can be used for virtually everything mentioned above. If you plan to write any type of code during your career, you will definitely need to get used to a VCS.
We mainly distinguish between two types of VCS: centralized and distributed, both of which have their own workflow and benefits. The true picture is—as always—more complicated; however, the basic tenets above apply to most VCS. Let us break down the process:
To address the so-called code conflicts that arise from people working on the same code in different branches and editing the same parts of the code base, or from implementing independent code segments that are incompatible with each other, we need a system that can handle—or makes it possible to handle—all errors and complications originating from this parallel development process. Hence the emergence of version control systems that aim to simplify collaborative development as much as possible.
Modern VCS—like Git—address this challenge using merge algorithms. When integrating changes from different branches, Git typically performs a three-way merge: it compares the common ancestor of the branches with the current changes in each branch. If the modifications do not overlap, Git merges them automatically. However, when the same sections of a file have been altered differently, Git flags a merge conflict. In these cases, the affected file is marked with clear conflict markers (e.g. <<<<<<< HEAD
, =======
, and >>>>>>> branch-name
), and developers must manually review and resolve these differences. Many teams use visual merge tools—such as Meld or KDiff3—which provide side-by-side comparisons and interactive conflict resolution (although VS Code is also has this capability.)
There are a number of philosophies on how to organize the development process, specifically this branching development structure detailed above. Now, there is no inherently better or worse approach. Different projects, teams and developers have different needs and preferences. Even different version control systems prioritize specific approaches over others in their design and conventions. At the end of the day, the most important aspect to consider is to keep the workflow consistent and in-line with the needs of the project and team.
Below, I will briefly introduce two of the most common workflows: trunk-based and feature-based development.
Trunk-based development is characterized by a single, active development line—often known as the ‘trunk’. In this workflow, developers frequently merge their changes back into the ‘trunk’, often multiple times a day. This approach encourages a high degree of collaboration and continuous integration of changes, ensuring that the codebase remains stable and reducing the potential for major merge conflicts. However, keeping the trunk in a releasable state at all times requires a highly disciplined team; any changes that break the build or introduce bugs can critically impact the stability of the codebase. This method is particularly well-suited for small, agile teams working on projects with short development cycles, where rapid iteration and frequent releases are essential. Teams employing this method benefit from simplified management of their source code, as they avoid the complexity of handling numerous branches for different features.
Source of image : Trunk-Based Development in Software Development, GeeksForGeeks
On the other hand, the feature-based workflow, also known as feature branching, involves creating separate branches for each new feature being developed. This method allows developers to work independently on their respective features without affecting the main codebase, which is particularly useful for larger teams or projects where features may take longer to implement. Each feature branch can undergo its own cycle of development, testing, and review before being merged back into the main branch. This separation can lead to more structured and organized development but comes with its challenges, primarily in managing and merging these multiple branches. Merge conflicts are more common, and there can be a significant overhead in keeping these branches up-to-date with the main codebase. Feature-based workflow is typically suited for scenarios where features are developed over extended periods, or when teams are geographically distributed, requiring a more decoupled approach to software development.
Starting in 1998 until 2005, the development of Linux took place on a version control system called BitKeeper. In 2005, BitKeeper decided to stop providing free versions of its client software, forcing its users to either subscribe to a paid BitKeeper license or switch to another version control system.
In response to this new policy, Linus Torvalds—the original creator of Linux—out of spite, almost immediately started working on his own version control system, dubbed as Git. His goal was to create a VCS that suits all the needs of Linux development and provide features that did not yet exist in freely available systems. Oh, and of course, it had to be faster, than any available version control system at the time. Two weeks after Linus started the project, the very first version of Git made its public debut. Since then, Git became the largest and most popular version control system of our time.
Fun fact: Although core Linux development now relies on Git—with the Linux kernel available via a GitHub mirror—many long-standing or legacy Linux projects continue to be managed using Subversion. This persistence is largely due to historical infrastructure and workflows that were established long before Git became the dominant version control system. For these projects, migrating to Git might involve significant effort, and the familiar SVN environment still meets their needs.
Source of image: Google Trends in December, 2023.
Free-of-charge online hosting providers for code development that implement Git’s and their own functionalities are now essential tools in modern programming. These sites offer free registration for users and provide practically unlimited amounts of online storage for them with a ready-to-use framework to manage their code repositories using a supported VCS. Users can access their repositories via the website or the terminal. Some sites, e.g. GitHub, even provide a downloadable GUI tool. These websites usually offer additional features and storage for paying users, but the most important ones are—usually—free.
GitHub (and Git) are now as deeply intertwined with the modern programming community as Stack Overflow was for over a decade—before various LLM-based AI tools began to replace its role. As I already mentioned, virtually all developers are bound to use a VCS and, often, an online hosting provider for their code repositories. For a long time now, the most popular tools for this task are Git and GitHub, respectively. Nowadays, many critical open-source projects—even by top IT companies—are hosted on GitHub. It is generally common for companies to use Git as the management tool for their internal code repositories that they host on their own servers, or sometimes privately on online hosting providers.
There are other hosting providers that offer similar services to GitHub, such as GitLab or Bitbucket. Nowadays, the differences between them are not that significant, but GitHub get the upper hand in terms of free-of-charge features in recent years. Additionally, under the hood, both GitLab and Bitbucket utilize Git as their primary method for version control and codebase management.
An important aspect that many people seem to forget that even an individual can benefit from using online hosting providers. For example, if you have to work on the same codebase from multiple computers, then Git and GitHub are the perfect tools to keep the code up-to-date on all machines. Even if someone works on a single computer, GitHub could serve as a free-of-charge and virtually unlimited backup storage for their projects. Online hosting providers could also function as a portfolio for someone who wants to showcase their programming skills to the world and potential employers. Some people even hijack the capabilities of, e.g. GitHub and use it to compile and present their CVs; although I cannot comment on the elegance of this approach. A better alternative would be another widely popular feature unique to GitHub: GitHub Pages. It allows users to create and host their own (static) websites for free; this website is also hosted there.