You are now in the main content area

About the Book

Introduction

Data science is a multi-disciplinary field that uses scientific and computational tools to extract valuable knowledge from, typically, large data sets. Once the data is processed and cleaned, it is analyzed and presented in a form that is appropriate to support decision making processes. As collecting data has become much easier and cheaper these days than in the past, data science and machine learning tools have become widely used in companies of all sizes. Indeed, data-driven businesses were worth hundreds of billions of dollars in 2025 and it is expected to continue growing.This monograph concentrates on mining networks, a subfield within data science. Virtually every human-technology interaction, or sensor network, generates observations that are in some relation with each other. As a result, many data science problems can be viewed as a study of some properties of complex networks in which  nodes represent the entities that are being studied and edges represent relations between these entities. In these networks (for example, Instagram and Facebook on-line social networks, respectively the 2nd and the 3rd most downloaded mobile apps of 2024), nodes not only contain some useful information (such as the user's profile, photos, tags) but are also internally connected to other nodes (relations based on follower requests, similar users' behaviour, age, geographic location). Such networks are often large-scale, decentralized, and evolve dynamically over time. Mining complex networks in order to understand the principles governing the organization and the behaviour of such networks is crucial for a broad range of fields of study, including information and social sciences, economics, biology, and neuroscience. Here are a few selected typical applications of mining networks:

  • community detection (which users on some social media platform are close friends),
  • link prediction (who is likely to connect to whom on such platforms),
  • predicting node attributes (what advertisement should be shown to a given user of a particular platform to match their interests),
  • detecting influential nodes (which users on a particular platform would be the best ambassadors of a specific product).

After reading this book, one should be able to answer such questions, and much more, using state-of-the-art methods and computational techniques.

Second Edition

The first edition of this book was published in early 2021, and the field has seen significant advancements since then. While all chapters were reviewed, the more substantial changes are as follows. New material and examples on random geometric graphs were added in Sections 2.8 and 4.6. Chapter 6 on node embeddings was augmented in several places including a discussion on classical vs. structural embeddings, more details on graph neural networks (GNNs) as well as other directions. Several new tools and techniques were introduced in Chapter 7 on mining hypergraphs with new material on centrality, hypergraph-specific properties such as degree vs. edge size correlation, simpliciality and coreness. Some discussion on embedding hypergraphs was also added. New material on post-processing for overlapping communities was added in Chapter 8; in particular, new experiments using the ABCD-o^2 benchmark. Chapter 9 was mostly re-written with a focus on a framework for embedding graphs co-developed by the authors. Finally, a short Chapter 12 on fairness in network mining models was added, which represents an active and important area of research.

Target Audience

The book was written based on the lecture notes for a graduate course entitled Graph Mining (DS 8014) which was offered to students enrolled in the Data Science and Analytics Master's program at Toronto Metropolitan University (Toronto, Canada). This textbook is aimed to be suitable for an upper-year undergraduate course or a graduate course. Students in programs such as data science, mathematics, computer science, business, engineering, physics, statistics, and social science will benefit from courses that are based on this textbook. Having said that, this book can be successfully used by all enthusiasts of data science at various levels of sophistication who would like to expand their knowledge or consider changing their career path. The Core Material (Part I) can be successfully used for a 12-week long course (for example, in Canadian system) but we additionally provide the Additional Material (Part II) that can be added for a 15-week long course (for example, in US or European systems).

Need for Another Book

This textbook is not the first (and certainly not the last) book related to network science. There are a number of excellent books that conceptually overlap with our book. Let us then present a few reasons why we decided to write this book.

Most books present a mixture of various topics in modelling and mining networks. Modelling complex networks is an important research direction and a few random graph models are included in our book but are mainly used as tools to benchmark and guide algorithms or to create synthetic networks for testing the behaviour of the tools in various scenarios. We focus on aspects related to mining complex networks, and carefully select the most important tools to create a nice and coherent blend that is appropriate for a one term course.

The three authors actively collaborate together, publishing research papers on various topics related to mining networks, including community detection algorithms, mining hypergraphs, unsupervised evaluation of graph embeddings, synthetic random graph models, anomaly detection algorithms, and link prediction algorithms. Our respective individual skills and experiences nicely complement each other, providing three different perspectives: pure mathematics (Pawel), mining large networks (François), and applying machine learning tools in business (Bogumil). This cumulative experience enables us to carefully select problems and tools that are suitable for a one-term course on mining networks. The content of this textbook represents the most important and useful aspects of the daily life of a data scientist, and with its use, data scientists can make a meaningful impact in business.

Most existing related books concentrate on theory. On the other hand, in our book the theoretical foundations are combined with practical experiments where students are expected to code and analyze graph datasets by themselves. This book is accompanied by Jupyter (external link, opens in new window)  notebooks (in Python and Julia) which not only contain all of the experiments presented in the book but which also include additional material. We will continue updating them, making sure they work with currently available environments. In particular, we mainly use the igraph (external link, opens in new window)  library for Python which distinguishes us from other books that also use Python for their experiments, while other libraries are introduced as required by the various experiments. The igraph network analysis tool was chosen due to its superior performance in dealing with large graphs, and the richness of its library of graph analytics. For example, many centrality measures and graph clustering algorithms are available directly within igraph. Moreover, the library is written in C and can be used as such, and there are packages for R and Python, two of the most popular languages for data science. Moreover, the library is written in C and can be used as such, and there are packages for R and Python, two of the most popular languages for data science. We also have a YouTube channel with some videos that walk the reader through our notebooks. Finally, we also made slides publicly available for the instructors to use, which should help them to adopt the book for their needs and their audience. 

A distinguishing feature of mining networks, as opposed to traditional data mining, is that very often one needs to implement custom algorithms to perform an analysis for a given problem at hand. In traditional data mining, there are standard tools such as deep-learning networks, XGBoost, etc., to which we typically just pass appropriately prepared data. In mining networks, despite the fact that there exist standard tools and techniques, they usually require slight modifications to fit the studied problem. Because of this, apart from applying standard algorithms that are pre-implemented in the libraries such as igraph, one often needs to complement them with carefully tailored code that is computationally intensive. The reader will be able to notice this characteristic in virtually every chapter of this book. In such cases, one needs tools that allow one to implement such custom code efficiently while ensuring the code's speed (as usually complex networks are large). Traditionally, in such situations data scientists faced the so-called two language problem. In order to write the code efficiently Python was used, as it is a nice language for prototyping. However, these implementations were usually not scalable. Therefore, the next step was to re-write the prototype in some low level language such as C++.

In order to solve the two language problem, in this book we provide implementations of the examples not only using the Python language but also using the Julia language. Julia, like Python, is a high-level language (actually, in many cases the code is quite similar) but at the same time it is compiled (as opposed to Python which is interpreted), which allows the execution speed of the programs to be comparable to languages such as C++. These features of the Julia language have resulted in its popularity increasing recently, not only for mining complex networks but for all kinds of data science tasks that require performance and scalability.

Accompanied Material

Jupyter notebooks can be found here: 

https://github.com/ftheberge/GraphMiningNotebooks (external link, opens in new window) 

YouTube channel can be found here:

https://www.youtube.com/@MiningComplexNetworks (external link, opens in new window) 

The book is available as PDF file here:

 (PDF file) https://math.torontomu.ca/~pralat/mining-complex-networks-2ed.pdf (opens in new window) 

Courses

The book was used for the following courses:

Please let us know if you adopted the book for one of your courses.