2019-09-07

POJ 2186 Popular Cows scc的三种姿势

缘起

【1】、【2】、【3】分别给出了求scc的三种姿势，并且在【4】中指出了三种算法之间的优劣. 本文的目的是希望能尽快将这三种算法的板子敲定下来. POJ 2186 Popular Cows

分析

每头牛都想成为牛群中的红牛. 给定N头牛和M个有序对（A,B）,其中（A，B）表示A认为B是红牛. 该关系具备
传递性. 即A认为B红,B认为C红,则A亦认为C红. 不过给定的有序对中可能包含(A,B)、(B,C)但是不包含(A,C)
求被其他所有牛认为是红牛的牛的总数.

【输入】
N和M    1 <= N <= 10,000   1 <= M <= 50,000
M行（A，B）

【输出】
答案

【样例输入】
3 3
1 2
2 1
2 3

【样例输出】
1

【限制】
2s

【hint】
Cow 3 is the only cow of high popularity.

此题是scc的经典题目了.

如果将牛视作点，(A,B)视作A->B的弧，则题目中给的数据其实形成了一个有向图. 而我们只需将该有向图进行scc（strongly connected component 强连通分支）分解. 将每个scc视作一个点的话（即图论技巧中所谓的缩点），则原图形成DAG（肯定无环，不然形成环的点就不会在不同的scc里面了）. 则统计DAG中出度为0的点包含原图中的点的个数即可. 如果DAG中有>=2个出度为0的点的话，则不存在被所有牛视作红牛的牛.

所以本题的算法就是 scc分解+缩点+统计出度为0的点包含原图的点的个数就是答案.

因为 scc分解有 Kosaraju、Tarjan、Gabow三种算法. 所以本题采用这三种姿势艹它

三种算法的复杂度都是 O(n+e)

Kosaraju

关于kosaraju算法，虽然【1】中已经说过了. 但是现在复习还是觉得讲的比较粗糙. 现在将其按照意识流的手法再讲一遍.

【1】中说到我们在第一次dfs的时候, 得到了 finished数组finished[0,..,n]. 然后按照 finished[n,..,1]的顺序运行逆图上的dfs,即

for(int i = n; i; i--)
{
	dfs_reverse(finished[i]);
}

得到的scc.而且得到的scc还是缩点之后的DAG的拓扑序. 即这个过程我们收集到第一个scc（中包含的顶点，还可以标记哪个点属于这个scc）、收集到第二个scc（中包含的顶点）… 则按照

第一个scc、第二个scc… 这样的顺序就是原图按照scc缩点之后形成的DAG图的拓扑序.

为什么能这样呢? 因为你反着dfs（下简称rdfs，第一次顺着dfs下简称dfs）回去的时候, 能rdfs到的点，都是之前dfs也能到的（这是因为dfs的时候，是所有子节点都完成搜索退出之后才将当前节点入的finished数组）.既能dfs的到，又能rdfs的到，则一定和finished[i]强连通. 所以形成了scc. 注意，finished[i+1,…,n]不需要去管，因为我是从n一路rdfs过来到i的. 如果rdfs(finished[i])会运行的话，说明finished[i+1,…,n]与finished[i]不在一个scc中（不然程序中就会将其isvisited掉，rdfs(finished[i])就不会运行了）.而且能和finished[i]在一个scc中的，一定是finished[i]能dfs到的. 根据finished数组的入队规则，一定在[1,..,i-1]中. 至于为什么是拓扑序, 这是因为finished数组的入队规则，rdfs发现的第一个scc一定是第一个scc. 纵观整个算法过程，不得不说，kosaraju算法委实精妙.

//#include "stdafx.h"

#include <stdio.h>
#include <string.h>
#include <vector>
using namespace std;
//#define LOCAL

const int maxn = 10005, maxm = 50005;
int n,m, head[maxn], rhead[maxn], cnt, rcnt, finished[maxn],top,sccnum, belong[maxn],outdeg[maxn]; // top是已经进入finished中的元素个数,belong[i]表示顶点i属于哪个scc
bool v[maxn];
vector<int> scc[maxn]; // 则scc[1]、scc[2]、scc[3]... 这个顺序就是有向图缩点之后形成的DAG的拓扑序

struct Arc
{
	int from, to, nxt;
} g[maxm], rg[maxm]; // (g,head, cnt)是原图, (rg, rhead, rcnt)是逆图

void addArc(int a, int b) // 弧 a->b
{
	g[cnt].from = a, g[cnt].to = b, g[cnt].nxt = head[a];
	head[a] = cnt++; // 建图
	rg[rcnt].from = b, rg[rcnt].to = a, rg[rcnt].nxt = rhead[b];
	rhead[b] = rcnt++; // 反向建图
}

void dfs(int i)
{
	v[i] = true;
	for (int h = head[i],to; ~h; h = g[h].nxt)
	{
		to = g[h].to;
		if (!v[to])
		{
			dfs(to);
		}
	}
	finished[top++] = i; // 只有等所有的孩子都dfs过了, i自己才进finished
}

void rdfs(int i) // 在逆图rg上深搜,期间进行scc收集
{
	v[i] = true;
	scc[sccnum].push_back(i); // 收集scc
	belong[i] = sccnum; // i属于sccnum这个scc
	for (int h = rhead[i],to; ~h; h = rg[h].nxt)
	{
		to = rg[h].to;
		if (!v[to])
		{
			rdfs(to);
		}
	}
}

void kosaraju() // 正反两边dfs的kosaraju算法
{
	for (int i = 1; i<=n; i++) // 正向dfs一遍
	{
		if (!v[i])
		{
			dfs(i);
		}
	}
	memset(v,0,sizeof(v)); // 重置
	for (int i = top-1; ~i; i--) // 反向dfs一遍
	{
		if (!v[finished[i]])
		{
			sccnum++;
			rdfs(finished[i]);
		}
	}
}

int main() {

#ifdef LOCAL
	freopen("d:\\data.in", "r", stdin);
#endif
	memset(head, -1, sizeof(head));
	memset(rhead, -1, sizeof(rhead));
	scanf("%d%d", &n, &m);
	while(m--)
	{
		int a,b;
		scanf("%d%d", &a, &b);
		addArc(a,b);
	}
	kosaraju();
	for (int i = 0,from, to; i<cnt; i++) // 考察原弧
	{
		from = g[i].from, to = g[i].to; // from->to的弧
		if (belong[from]!=belong[to])
		{
			outdeg[belong[from]]++;
		}
	}
	int sum = 0, ans; // ans表示哪一个scc是出度为0的, 这个scc中的点就是最受欢迎的奶牛
	for (int i = 1;i<=sccnum;i++)
	{
		if (!outdeg[i])
		{
			ans = i;
			sum++;
		}
	}
	if (sum>1) // >=2个出度为0的缩点,则不存在
	{
		puts("0");
	}
	else
	{
		printf("%d\n", scc[ans].size());
	}
	return 0;
}

ac情况

Status	Accepted
Time	313ms
Memory	1528kB
Length	2046
Lang	C++
Submitted	2019-09-07 21:13:06
Shared
RemoteRunId	20842059

Tarjan

tarjan算法比kosaraju算法写起来简短的多. 而且快的多（实测快 30%）. 但是kosaraju算法相比tarjan和gabow有一个优势就是它最后得到的scc已经是DAG拓扑有序的了.

tarjan在【2】中已经论述过了. 这里再意识流的论述一下

其实tarjan算法求scc缩点（妈的，tarjan这天才想出了太多算法了）的思想就在上图中. tarjan算法也只是在dfs的过程中做了一点事情. 即引入了两个数组，一个是 timestamp数组（上图用t简写），表示一个点被搜到的时间戳. 一个是low数组，表示一个点最早可以回到那个时间戳是多少.

注意，4、3、2、5、6、7 这6个点的low是一样的，都是3，因为都可以回到4（时间戳为3）

你可以这么想——timestamp是出身，没办法改变（dfs搜到了是几就是几），但是low是可以通过攀炎附势提升的. 例如上图中尽管3、2、5、6、7 这5个点出身卑微——t比较靠后. 但是3攀炎附势2、2攀炎附势5、5攀炎附势6、6攀炎附势7、7攀炎附势4最终到达了t=3这个阶层.完美的实现了逆袭！

而显然low<=timestamp的. 因为low初始值是timestamp，而且能通过攀炎附势往上爬，但是timestamp是无法改变的.

如果low=timestamp（就像上图中的1、8、4），则就表明出现了scc. 例如4，出现了4、3、2、5、6、7 构成的scc. 就可以收割了. 那么怎么记录这个scc呢? 通过栈.

//#include "stdafx.h"

#include <stdio.h>
#include <string.h>
#include <algorithm>
#include <vector>
#include <stack>
using namespace std;
//#define LOCAL

const int maxn = 10005, maxm = 50005;
int n,m, head[maxn], cnt, sccnum, belong[maxn],outdeg[maxn], timestamp[maxn], low[maxn],t; // belong[i]表示顶点i属于哪个scc, timestamp数组其实取代了dfs中的isvisited数组,而且具备计时功能,t是流逝的时间
vector<int> scc[maxn];
stack<int> s; // 用于临时保存scc的栈
bool ins[maxn]; // 是否在栈中

struct Arc
{
	int from, to, nxt;
} g[maxm]; // (g,head, cnt)是图

void addArc(int a, int b) // 弧 a->b
{
	g[cnt].from = a, g[cnt].to = b, g[cnt].nxt = head[a];
	head[a] = cnt++; // 建图
}

void dfs(int i)
{
	timestamp[i] = low[i] = ++t; // 类似于常规dfs做的操作isvisited[i]=true
	s.push(i), ins[i] = true; // 入栈
	for (int h = head[i],to; ~h; h = g[h].nxt)
	{
		to = g[h].to;
		if (!timestamp[to])
		{
			dfs(to);
			low[i] = min(low[i], low[to]); // 图中3攀炎附势2、2攀炎附势5、5攀炎附势6、6攀炎附势7、7攀炎附势4，最终抵达t=3这个阶层,实现逆袭!
		}
		else if (ins[to]) // 如果to已经搜过了,而且在栈中,这里to比如是图中的4,这里的i比如是图中的7
		{
			low[i] = min(low[i], low[to]);
		}
	}
	if (timestamp[i]==low[i]) // 最终依旧没有实现逆袭,比如图中的4,则遇到了scc,需要收割了
	{
		sccnum++;
		int j = i;
		do 
		{
			j = s.top(),s.pop(),ins[j] = false;
			scc[sccnum].push_back(j), belong[j] = sccnum;
		} while (j!=i); // 最后i（例如图中的4）也入了此scc
	}
}

void tarjan()
{
	for (int i =1; i<=n;i++)
	{
		if (!timestamp[i]) // 如果i还没被搜到
		{
			dfs(i); // 搜之!
		}
	}
}

int main() {

#ifdef LOCAL
	freopen("d:\\data.in", "r", stdin);
#endif
	memset(head, -1, sizeof(head));
	scanf("%d%d", &n, &m);
	while(m--)
	{
		int a,b;
		scanf("%d%d", &a, &b);
		addArc(a,b);
	} // 建图
	tarjan();
	for (int i = 0,from,to; i<cnt; i++) // 遍历所有的弧
	{
		from = g[i].from, to = g[i].to;
		if (belong[from]!=belong[to])
		{
			outdeg[belong[from]]++;
		}
	}
	int sum = 0, ans;
	for (int i = 1; i<=sccnum; i++)
	{
		if (!outdeg[i])
		{
			ans = i;
			sum++;
		}
	}
	if (sum>1)
	{
		puts("0");
	}
	else
	{
		printf("%d\n", scc[ans].size());
	}
	return 0;
}

ac情况（呵呵~ 被打脸~ 比kosaraju还慢！）

Status	Accepted
Time	391ms
Memory	1332kB
Length	1950
Lang	C++
Submitted	2019-09-07 21:56:10
Shared
RemoteRunId	20842186

Gabow

好吧~ gabow才是三种姿势中最快的（但是感觉没有tarjan自然）! 【3】中已经给了论述. 已经说得很透彻了. 这里复述一下

三种scc算法的思想都是

有向图可以剖分成深搜树（深搜树不交地组成深度优先生成森林），而深搜树剖分成scc

Gabow算法和Tarjan算法的思想是一样的。每个scc都有一个“根”(例如tarjan图中的4号顶点)，根是scc中最先被
检查的顶点。在一个scc中，沿着根不断DFS，最终将会回到根，路上经过的顶点则属于这个scc。

下面祭出gabow的代码

//#include "stdafx.h"

#include <stdio.h>
#include <string.h>
#include <algorithm>
#include <vector>
#include <stack>
using namespace std;
//#define LOCAL

const int maxn = 10005, maxm = 50005;
int n,m, head[maxn], cnt, sccnum, belong[maxn],outdeg[maxn], timestamp[maxn],t; 
vector<int> scc[maxn];
stack<int> s,low; // 注意, gabow与tarjan唯一的区别就在于如何刻画一个最早可以回到的时间戳,tarjan用的是low数组, 而gabow用的是low栈
bool ins[maxn];

struct Arc
{
	int from, to, nxt;
} g[maxm];

void addArc(int a, int b) 
{
	g[cnt].from = a, g[cnt].to = b, g[cnt].nxt = head[a];
	head[a] = cnt++;
}

void dfs(int i)
{
	timestamp[i] = ++t,low.push(i);
	s.push(i), ins[i] = true;
	for (int h = head[i],to; ~h; h =g[h].nxt)
	{
		to = g[h].to;
		if (!timestamp[to])
		{
			dfs(to);
		}
		else if (ins[to])
		{
			while (timestamp[low.top()]>timestamp[to])
			{
				low.pop();
			}
		}
	}
	if (i==low.top()) // 这里比tarjan复杂一些
	{
		sccnum++;
		low.pop();
		while(s.top()!=i)
		{
			scc[sccnum].push_back(s.top()), belong[s.top()] = sccnum;
			ins[s.top()] = false;
			s.pop();
		}
		scc[sccnum].push_back(i),belong[i] = sccnum;
		s.pop(),ins[i] = false;
	}
}

void gabow()
{
	for (int i =1; i<=n;i++)
	{
		if (!timestamp[i]) 
		{
			dfs(i);
		}
	}
}

int main() {

#ifdef LOCAL
	freopen("d:\\data.in", "r", stdin);
#endif
	memset(head, -1, sizeof(head));
	scanf("%d%d", &n, &m);
	while(m--)
	{
		int a,b;
		scanf("%d%d", &a, &b);
		addArc(a,b);
	} 
	gabow();
	for (int i = 0,from,to; i<cnt; i++)
	{
		from = g[i].from, to = g[i].to;
		if (belong[from]!=belong[to])
		{
			outdeg[belong[from]]++;
		}
	}
	int sum = 0, ans;
	for (int i = 1; i<=sccnum; i++)
	{
		if (!outdeg[i])
		{
			ans = i;
			sum++;
		}
	}
	if (sum>1)
	{
		puts("0");
	}
	else
	{
		printf("%d\n", scc[ans].size());
	}
	return 0;
}

ac情况（唉~ 依旧打脸~ 快不了多少~）

Status	Accepted
Time	297ms
Memory	1480kB
Length	1749
Lang	C++
Submitted	2019-09-07 22:27:03
Shared
RemoteRunId	20842244

综上所述，用的最多的其实是kosaraju和tarjan.

gabow用的较少. 因为最后收割scc的代码比较多. 容易出错.

参考

【1】https://yfsyfs.github.io/2019/05/22/%E6%9C%89%E5%90%91%E5%9B%BE%E7%9A%84%E5%BC%BA%E8%BF%9E%E9%80%9A%E5%88%86%E6%94%AF%E4%B9%8B%E6%9C%B4%E7%B4%A0%E6%B1%82%E6%B3%95-%E5%88%A9%E7%94%A8%E6%B7%B1%E6%90%9C/

【2】https://yfsyfs.github.io/2019/05/22/%E6%9C%89%E5%90%91%E5%9B%BE%E7%9A%84%E5%BC%BA%E8%BF%9E%E9%80%9A%E5%88%86%E6%94%AF%E4%B9%8BTarjan%E7%AE%97%E6%B3%95/

【3】https://yfsyfs.github.io/2019/05/22/%E6%9C%89%E5%90%91%E5%9B%BE%E7%9A%84%E5%BC%BA%E8%BF%9E%E9%80%9A%E5%88%86%E6%94%AF%E4%B9%8BGabow%E7%AE%97%E6%B3%95/

【4】https://yfsyfs.github.io/2019/05/22/%E6%9C%89%E5%90%91%E5%9B%BE%E7%9A%84%E5%BC%BA%E8%BF%9E%E9%80%9A%E5%88%86%E6%94%AF%E4%B9%8BKosaraju%E7%AE%97%E6%B3%95/