Erdős–Rényi Model - a random graph model

The Erdős–Rényi model in probability theory is a model for generating random graphs.

There are two closely related variants of Erdős–Rényi models. In the G(n,M)G(n,M) a graph is chosen uniformly at random from the collection of graphs which have nn nodes and MM edges. In the G(n,p)G(n,p) model, a graph on nn vertices is constructed by randomly adding edges with probability pp. We sometimes view these graphs as subsets of {0,1}([n]2)\lbrace 0,1\rbrace^{\begin{pmatrix}[n] \\2 \end{pmatrix}}, where the edges are randomly selected from the set ([n]2)={{a,b}1a<bn}.\begin{pmatrix}[n] \\2\end{pmatrix} = \lbrace \lbrace a,b\rbrace \mid 1\leq a < b \leq n\rbrace.

This models are closely related because by the law of large numbers G(n,p)G(n,p) should behave similarly to G(n,M)G(n,M) where M=(n2)pM=\begin{pmatrix}n \\2\end{pmatrix}p.

Monotone Properties

The behaviour of these random graphs are studied often in the regime where the number of nodes tends to infinity. This behaviour is usually studied in the context of monotone properties.

A property G{0,1}([n]2)\mathcal{G} \subseteq \lbrace 0,1\rbrace ^{\begin{pmatrix}[n] \\2 \end{pmatrix}} is called monotone if for any two graphs G,HG,H such that GGG \in \mathcal{G} and GHG \subseteq H, then HG.H \in \mathcal{G}.

Thresholds for Monotone Properties

Consider the monotone property of connectivity in the Erdős–Rényi model. It is known that if p<(1ε)lnnnp<\frac{(1-\varepsilon)\ln{n}}{n}, then GG(n,p)G\sim G(n,p) almost surely is disconnected and if p>(1+ε)lnnnp>\frac{(1+\varepsilon)\ln{n}}{n}, then GG(n,p)G\sim G(n,p) almost surely is connected. In this case, Θ(lnnn)\Theta(\frac{\ln{n}}{n}) is called a threshold for the monotone property of connectivity, formally the threshold of a property G\mathcal{G} is defined to be the probability pc(G)p_{c}(\mathcal{G}) for which PrGG(n,pc)(GG)=12.\Pr_{G\sim G(n,p_{c})}(G \in \mathcal{G})=\frac{1}{2}.

Threshold for Triangles in a random graph

Lower Bound via Expectation

Consider the monotone property: GG contains a triangle, that is an unordered triples {i,j,k}\lbrace i, j, k \rbrace such that {i,j}\lbrace i, j \rbrace, {j,k}\lbrace j, k \rbrace, and {k,i}\lbrace k, i \rbrace are edges in GG. We wish to compute PrGG(n,p)(G contains a triangle).\Pr_{G\sim G(n,p)}(G \text { contains a triangle}). Let N(G)N_\triangle(G) be the number of triangles in GG. Then, PrGG(n,p)(G contains a triangle)=PrGG(n,p)(N(G)1).\Pr_{G\sim G(n,p)}(G \text { contains a triangle}) = \Pr_{G\sim G(n,p)}(N_\triangle(G) \geq 1). We can rewrite the latter term using the expectation instead PrGG(n,p)(N(G)1)=E[1N(G)1]E[N(G)1N(G)1]=E[N(G)]\begin{aligned}\Pr_{G\sim G(n,p)}(N_\triangle(G) \geq 1) &= \mathbb{E}[1_{N_\triangle(G) \geq 1}] \\ &\leq \mathbb{E}[N_\triangle(G) 1_{N_\triangle(G) \geq 1}]\\ & =\mathbb{E}[N_\triangle(G)]\end{aligned} This argument is called Markov’s Inequality. Using Linearity of Expectation, it is easy to compute E[N(G)]\mathbb{E}[N_\triangle(G)]. N(G)=i<j<kXabc(G)N_\triangle(G)=\sum_{i<j<k}{X_{abc}}(G) where XabcX_{abc} is 11 if the triangle abcabc is contained in GG and 00 otherwise. E[N(G)]=E[a<b<cXabc(G)]=i<j<kE[Xabc(G)]=(n3)p3\begin{aligned}\mathbb{E}[N_\triangle(G)] &= \mathbb{E}\left[\sum_{a<b<c}{X_{abc}(G)}\right] \\&= \sum_{i<j<k}{\mathbb{E}[X_{abc}(G)]} \\&= \begin{pmatrix} n \\ 3 \end{pmatrix}p^3\end{aligned} From this it is clear that for p=n1p = n^{-1}, PrGG(n,p)(G contains a triangle)E[N(G)]=(n3)n316\begin{aligned}\Pr_{G\sim G(n,p)}(G \text{ contains a triangle}) &\leq \mathbb{E}[N_\triangle(G)] \\ &= \begin{pmatrix}n \\ 3 \end{pmatrix}n^{-3} \longrightarrow \frac{1}{6}\end{aligned}

Since the probability that GG contains a triangle increases with pp we know that pc()=Ω(n1).p_{c}(\triangle) = \Omega(n^{-1}).

Upper Bound via Second Moment Method

To compute an upper bound on pcp_{c} we need to compute a lower bound on PrGG(n,p)(N(G)1)\Pr_{G\sim G(n,p)}(N_\triangle(G) \geq 1).

The second moment method allows us to compute lower bounds on Pr(X>0)\Pr(X > 0) using E[X]\mathbb{E}[X] (the first moment) and E[X2].\mathbb{E}[X^{2}]. Intuitively, if XX has a high mean and a low variance, then it is positive with high probability. E[X]=E[X1X>0]E[X2]Pr(X>0)\begin{aligned}\mathbb{E}[X] &= \mathbb{E}[X 1_{X>0}] \leq \sqrt{\mathbb{E}[X^{2}]}\sqrt{\Pr(X>0)}\end{aligned} Thus Pr(X>0)E[X]2E[X2].\Pr(X>0) \geq \frac{\mathbb{E}[X]^2}{\mathbb{E}[X^2]}. For X=N(G)X=N_\triangle(G), E[N(G)2]=E[(abcXabc)2]=abcabcE[XabcXdef].\begin{aligned}\mathbb{E}[N_\triangle(G)^{2}] &= \mathbb{E}\left[\left(\sum_{abc}{X_{abc}}\right)^2\right]\\ &= \sum_{abc}\sum_{abc}{\mathbb{E}[X_{abc}X_{def}]}.\end{aligned}

  • If the triangle abcabc and the triangle defdef don’t share any edges XabcX_{abc} and XdefX_{def} are independent random variable, then E[XabcXdef]=E[Xabc]E[Xdef]=p6.\mathbb{E}[X_{abc} X_{def}] = \mathbb{E}[X_{abc}]\mathbb{E}[X_{def}] = p^6. There are (n6)(63)136n6\begin{pmatrix}n \\ 6\end{pmatrix}\begin{pmatrix}6 \\ 3\end{pmatrix} \sim \frac{1}{36} n^{6} such terms.
  • If ijkijk and abcabc share one edge, then E[XijkXabc]=p4.\mathbb{E}[X_{ijk} X_{abc}] = p^4. There are (n4)(43)16n4\begin{pmatrix}n \\ 4\end{pmatrix}\begin{pmatrix}4 \\ 3\end{pmatrix} \sim \frac{1}{6}n^{4} such terms.
  • If ijkijk are abcabc share two edges (in other words, are identical), then E[XijkXabc]=p3.\mathbb{E}[X_{ijk}X_{abc}] = p^{3}. There are 16n3\sim \frac{1}{6} n^3 such terms.

Thus, Pr(N(G)1)136(np)6136(np)6+16(np)4+16(np)3\Pr(N_\triangle(G)\geq 1) \geq \frac{\frac{1}{36}(np)^6}{\frac{1}{36}(np)^6 + \frac{1}{6}(np)^4 + \frac{1}{6}(np)^3} for some constant CC and p=Cn1p=Cn^{-1}, Pr(N(G)1)12,\Pr(N_\triangle(G)\geq 1)\geq \frac{1}{2}, hence pc=O(n1).p_c = O(n^{-1}).

n1n^{-1} is the threshold for triangles

In this example, the threshold we using the expectation bound is tight (up to a constant).

We could try to use this method for other monotone properties such as connectivity. For connectivity, the expectation bound gives us p=n1p = n^{-1} which is not tight, as the real threshold is Θ(lnnn)\Theta(\frac{\ln n}{n}). Kahn and Kalai conjectured that the threshold from the expectation bound is tight up to a logarithmic term in nn.

Kahn-Kalai Conjecture

Every monotone property G\mathcal{G} is defined by some set of minimal elements S\mathcal{S}, we say the G=S\mathcal{G} = \langle \mathcal{S} \rangle, if G={FSSSF}\mathcal{G} = \lbrace F \mid \exists S\in \mathcal{S} \quad S \subseteq F\rbrace For example in the context of connectivity, the minimal elements are the trees on nn vertices, and in the context of triangles the minimal elements are all the different triangles.

If we define NS(G)N_\mathcal{S}(G) to be the number of elements in S\mathcal{S} contained in GG then E[NS(G)]=SSpS=costp(S)\mathbb{E}[N_\mathcal{S}(G)] = \sum_{S\in \mathcal{S}}{p^{|S|}} = \mathrm{cost_p}(\mathcal{S}) by linearity of expectation. By Markov’s Inequality, Pr(GG)=Pr(NS(G)1)E[NS(G)],\Pr(G\in \mathcal{G})=\Pr(N_\mathcal{S}(G)\geq 1)\leq \mathbb{E}[N_\mathcal{S}(G)], If E[NS(G)]=12,\mathbb{E}[N_\mathcal{S}(G)] = \frac{1}{2}, then Pr(GG)12.\Pr(G \in \mathcal{G}) \leq \frac{1}{2}. Note that this is true even if SS contains elements that are even smaller than the minimal elements of G\mathcal{G}, i.e. GS\mathcal{G} \subseteq \langle \mathcal{S} \rangle (we say in this case that S\mathcal{S} covers G\mathcal{G}). This motivates the following definition, let q(G)=max{pSGScostp(S)=12}q(\mathcal{G}) = \max\left\lbrace p \mid \exists \mathcal{S} \quad \mathcal{G} \subseteq \langle \mathcal{S} \rangle \quad \mathrm{cost_p}(\mathcal{S}) = \frac{1}{2}\right\rbrace This defines the tightest lower bound on the threshold given by the expectation computation, i.e. q(G)pc(G)q(\mathcal{G})\leq p_{c}(\mathcal{G}). The **Kahn-Kalai Conjecture** (Proved by Park-Pham in 2022) states that q(G)q(\mathcal{G}) also yields an upper bound given by pc(G)<Kq(G)log((G)+1),p_{c}(\mathcal{G}) < K q(\mathcal{G})\log (\ell(\mathcal{G}) + 1), where K>0K>0 is some universal constant and (G)\ell(\mathcal{G}) is the largest minimal element of G\mathcal{G}.

Park-Pham Theorem

To prove the upper bound we will prove that pc(G)64log((G)+1)<q(G)\frac{p_{c}(\mathcal{G})}{64\log(\ell(\mathcal{G}) + 1)} < q(\mathcal{G}) Denote q=pc(G)64log((G)+1)q' = \frac{p_{c}(\mathcal{G})}{64\log(\ell(\mathcal{G}) + 1)}, we show that for there exist S\mathcal{S} which covers G\mathcal{G} and E[NS(G)]=costq(S)<12.\mathbb{E}[N_{\mathcal{S}}(G)]=\mathrm{cost_{q'}}(\mathcal{S}) < \frac{1}{2}.

S\mathcal{S} is constructed using the probabilistic method. We show that there exists a family of sets such that:

  1. S\mathcal{S} covers G\mathcal{G}: PrS(GS)14.\Pr_{\mathcal{S}}(\mathcal{G} \subseteq \langle \mathcal{S} \rangle ) \geq \frac{1}{4}.

  2. S\mathcal{S} has small cost: PrS(costq(S)<12)>34.\Pr_{\mathcal{S}}\left(\mathrm{cost_{q'}}(\mathcal{S}) < \frac{1}{2}\right) > \frac{3}{4}.

Combining 1 and 2 by union bound, PrS(GS and costq(S)<12)>0\Pr_{\mathcal{S}}\left(\mathcal{G} \subseteq \langle \mathcal{S} \rangle \text{ and }\mathrm{cost_{q'}}(\mathcal{S}) < \frac{1}{2}\right) > 0 and via the probabilistic method, there is some choice for S\mathcal{S} that covers G\mathcal{G} and achieves a small cost costq(S)<12\mathrm{cost_{q'}}(\mathcal{S}) < \frac{1}{2}.

An algorithm for constructing small covers