EPOLL是如何实现的?

news/2024/7/9 17:04:37 标签: epoll, 数据结构与算法

为什么80%的码农都做不了架构师?>>>   hot3.png

网址:http://www.quora.com/Network-Programming/How-is-epoll-implemented

我会从“select是如何实现的?”开始,因为涉及的大多数基础结构都是相同的。

epoll 和 select的主要不同是:使用 select,要监控的文件描述符列表存在于一次select调用期间,调用线程仅仅在调用期间存在于fd的等待队列中。 使用 epoll,你创建一个唯一的fd,整合事件和多个你想监控的文件描述符(存在于 open file description),因此监控列表是持久的 或者说 长期的,通过多个系统调用后,线程继续停留在fd等待队列上。而且,epfd可以在多个线程间共享,在fd等待队列中不再只有一个线程,而是一个数据结构,包含当前正在epfd上等待的所有线程。(根据Linux实现,socket fd等待队列持有一个函数指针,一个 void *数据指针传递给那个函数;感觉这个地方特别别扭,如果不纠结于实现的话,可以任务epfd 和 socket fd之间会有一种关联关系)。

so,详细解释这种机制:

1、epoll fd有一个私有的struct eventpoll,它记录哪一个fd注册到了epfd上。eventpoll 同样有一个等待队列,记录所有等待的线程。还有一个预备好的fd列表,这些fd可以进行读或写。    

2、使用 epoll_ctl 添加fd到epfd,epoll 添加 eventpoll 到 fd的等待队列(指针 or something?)。同时检查这个fd是否是ready状态(可读,可写),如果ready,添加到 ready list。

3、线程调用 epoll_wait 等待时,内核首先检查ready list,如果有fd ready,立即返回。如果没有 ready list,添加到 epfd 的等待队列中,开始休眠。

4、当 epfd 监控的socket有事件发生时,添加 fd 到 ready list,唤醒等待 epfd 的所有线程。

明显的,struct eventpoll需要很多小心的锁操作, 多种列表 和 等待队列,但那就是 实现细节了。

要记住的重要的事情是:我上面描述的没有一步是需要遍历兴趣fd列表的。通过完全使用事件驱动,持久的fd集合和兴趣列表,epoll避免消耗 O(n) 时间的操作,n是被监控的文件描述符的个数。

summary:终于网路这块又可以向前了,现在我想知道的是 内核协议栈是如何 读取数据? 写socketbuffer? 通知fd?进程调度?

原文:

I'd start off by reading my answer to "How is select implemented?" (Network Programming: How is select implemented?), because most of the infrastructure involved is the same.

The main difference between epoll and select is that in select(), the list of file descriptors to wait on only exists for the duration of a single select() call, and the calling task only stays on the sockets' wait queues for the duration of a single call. In epoll, on the other hand, you create a single file descriptor that aggregates events from multiple other file descriptors you want to wait on, and so the list of monitored fd's is long-lasting, and tasks stay on socket wait queues across multiple system calls. Furthermore, since an epoll fd can be shared across multiple tasks, it is no longer a single task on the wait queue, but a structure that itself contains another wait queue, containing all processes currently waiting on the epoll fd. (In terms of implementation, this is abstracted over by the sockets' wait queues holding a function pointer and a void * data pointer to pass to that function).

So, to explain the mechanics a little more:

    1. An epoll file descriptor has a private struct eventpoll that keeps track of which fd's are attached to this fd. struct eventpoll also has a wait queue that keeps track of all processes that are currently epoll_waiting on this fd. struct epoll also has a list of all file descriptors that are currently available for reading or writing.
    2. When you add a file descriptor to an epoll fd using epoll_ctl, epoll adds the struct eventpoll to that fd's wait queue. It also checks if the fd is currently ready for processing and adds it to the ready list, if so.
    3. When you wait on an epoll fd using epoll_wait, the kernel first checks the ready list, and returns immediately if any file descriptors are already ready. If not, it adds itself to the single wait queue inside struct eventpoll, and goes to sleep.
    4. When an event occurs on a socket that is being epoll()ed, it calls the epoll callback, which adds the file descriptor to the ready list, and also wakes up any waiters that are currently waiting on that struct eventpoll.

Obviously, a lot of careful locking is needed on struct eventpoll and the various lists and wait queues, but that's an implementation detail.

The important thing to note is that at no point above there did I describe a step that loops over all file descriptors of interest. By being entirely event-based and by using a long-lasting set of fd's and a ready list, epoll can avoid ever taking O(n) time for an operation, where n is the number of file descriptors being monitored.

摘录了 eventpoll 结构体的代码

struct eventpoll {
	/* Protect the access to this structure */
	spinlock_t lock;

	/*
	* This mutex is used to ensure that files are not removed
	* while epoll is using them. This is held during the event
	* collection loop, the file cleanup path, the epoll file exit
	* code and the ctl operations.
	*/
	struct mutex mtx;

	/* Wait queue used by sys_epoll_wait() */
	wait_queue_head_t wq;

	/* Wait queue used by file->poll() */
	wait_queue_head_t poll_wait;

	/* List of ready file descriptors */
	struct list_head rdllist;

	/* RB tree root used to store monitored fd structs */
	struct rb_root rbr;

	/*
	* This is a single linked list that chains all the "struct epitem" that
	* happened while transferring ready events to userspace w/out
	* holding ->lock.
	*/
	struct epitem *ovflist;

	/* The user that created the eventpoll descriptor */
	struct user_struct *user;
};

转载于:https://my.oschina.net/astute/blog/92683


http://www.niftyadmin.cn/n/1221724.html

相关文章

MYSQL 查看最大连接数和修改最大连接数

MySQL查看最大连接数和修改最大连接数 1、查看最大连接数show variables like %max_connections%;2、修改最大连接数set GLOBAL max_connections 200; 以下的文章主要是向大家介绍的是MySQL最大连接数的修改,我们大家都知道MySQL最大连接数的默认值是100, 这个数值…

Log4j输出到指定日志文件

1、Log4j的概念 Log4j中有三个主要的组件,它们分别是 Logger、Appender和Layout,Log4j 允许开发人员定义多个Logger,每个Logger拥有自己的名字,Logger之间通过名字来表明隶属关系。有一个Logger称为Root,它永远 存在&a…

交换机的互连技术

单独一台交换机的端口数量是有限的,不足以满足网络终端设备接入网络的需求。为此我们需要使用多台交换机来提供终端接入功能,并将多台交换机互连,形成一个局域网。 交换机的互连主要有级联(Uplink)和堆叠(Stack)两种方式 交换机的级联 交换机…

关系型和非关系型数据库的区别?

当前主流的关系型数据库有Oracle、DB2、Microsoft SQL Server、Microsoft Access、MySQL等。 非关系型数据库有 NoSql、Cloudant。 nosql和关系型数据库比较?优点:1)成本:nosql数据库简单易部署,基本都是开源软件&…

ODPS基础

遇到一个项目需求是统计128张分库分表的数据表记录的最大id,通过单表查询计算非常费时,也无法应对分表数更多的情况,因此考虑到通过odps进行任务发布和运算 在云端 http://d2.alibaba-inc.com/是云梯的第二版,叫在云端&#xff0c…

jFreeChart在Linux下的问题以及常见异常

前些天一直在做报表的统计视图,发现一些问题,在此做一个总计:1,枚举最好不要用在插入数据中,windows下没问题,Linux回报jFreeChar 105错误2,插入数据不能为空,从表中查询数据时要判空…

refcardz

http://refcardz.dzone.com 记录下,提供IT开发技术方面各类"cheat sheet"。转载于:https://www.cnblogs.com/jenasy/archive/2012/12/04/2801320.html

java 接口(翻译自Java Tutorials)

原文出自 http://www.cnblogs.com/ggjucheng/archive/2012/12/04/2802086.html 英文出自 http://docs.oracle.com/javase/tutorial/java/IandI/createinterface.html 软件工程的许多情况是,不同群组的程序员同意一个说明他们的软件如何交互的”合约”。每个组都应该…