学号:SA12**6112
前面一篇博文分析了进程从用户态切换到内核态时,内核所做的主要的事,本文将研究在进程切换时,内核所做的事。
在内核态,进程切换主要分两步:
1:切换页全局目录
2:切换内核堆栈和硬件上下文
用prev指向被替换进程的表述符,next指向被激活进程的描述符
下面分析进程切换的第二步
第二步主要由switch_to宏实现:
3.3内核中X86体系下:/arch/x86/include/asm/system.h文件的第48行处:
48 #define switch_to(prev, next, last) \ 49 do { \ 50 /* \ 51 * Context-switching clobbers all registers, so we clobber \ 52 * them explicitly, via unused output variables. \ 53 * (EAX and EBP is not listed because EBP is saved/restored \ 54 * explicitly for wchan access and EAX is the return value of \ 55 * __switch_to()) \ 56 */ \ 57 unsigned long ebx, ecx, edx, esi, edi; \ 58 \ 59 asm volatile("pushfl\n\t" /* save flags */ \ 60 "pushl %%ebp\n\t" /* save EBP */ \ 61 "movl %%esp,%[prev_sp]\n\t" /* save ESP */ \ 62 "movl %[next_sp],%%esp\n\t" /* restore ESP */ \ 63 "movl $1f,%[prev_ip]\n\t" /* save EIP */ \ 64 "pushl %[next_ip]\n\t" /* restore EIP */ \ 65 __switch_canary \ 66 "jmp __switch_to\n" /* regparm call */ \ 67 "1:\t" \ 68 "popl %%ebp\n\t" /* restore EBP */ \ 69 "popfl\n" /* restore flags */ \ 70 \ 71 /* output parameters */ \ 72 : [prev_sp] "=m" (prev->thread.sp), \ 73 [prev_ip] "=m" (prev->thread.ip), \ 74 "=a" (last), \ 75 \ 76 /* clobbered output registers: */ \ 77 "=b" (ebx), "=c" (ecx), "=d" (edx), \ 78 "=S" (esi), "=D" (edi) \ 79 \ 80 __switch_canary_oparam \ 81 \ 82 /* input parameters: */ \ 83 : [next_sp] "m" (next->thread.sp), \ 84 [next_ip] "m" (next->thread.ip), \ 85 \ 86 /* regparm parameters for __switch_to(): */ \ 87 [prev] "a" (prev), \ 88 [next] "d" (next) \ 89 \ 90 __switch_canary_iparam \ 91 \ 92 : /* reloaded segment registers */ \ 93 "memory"); \ 94 } while (0)
一:由上面的代码可以看出,切换内核堆栈主要工作是:
1:把eflags和ebp寄存器保存到prev内核栈中。
2:把esp保存到prev->thread.sp中,eip保存到prev->thread.ip中。
3:把next指向的新进程的thread.esp保存到esp中,把next->thread.ip保存到eip中
至此已经完成了内核堆栈的切换。
二:切换内核堆栈之后,TSS段也要相应的改变:
这是因为对于linux系统来说同一个CPU上所有的进程共用一个TSS,进程切换了,因此TSS需要随之改变。
linux系统中主要从两个方面用到了TSS:
(1)任何进程从用户态陷入内核态都必须从TSS获得内核堆栈指针
(2)用户态读写IO需要访问TSS的权限位图。
所以在进程切换时也要更新TSS中的esp0和IO权位图的值,这主要在_switch_to函数中完成:
3.3内核X86体系下:/arch/x86/kernel/process_32.c文件中第296行处:
296 __notrace_funcgraph struct task_struct *297 __switch_to(struct task_struct *prev_p, struct task_struct *next_p)298 {299 struct thread_struct *prev = &prev_p->thread,300 *next = &next_p->thread;301 int cpu = smp_processor_id();302 struct tss_struct *tss = &per_cpu(init_tss, cpu);303 fpu_switch_t fpu;304 305 /* never put a printk in __switch_to... printk() calls wake_up*() indirectly */306 307 fpu = switch_fpu_prepare(prev_p, next_p, cpu);308 309 /*310 * Reload esp0.311 */312 load_sp0(tss, next);313 314 /*315 * Save away %gs. No need to save %fs, as it was saved on the316 * stack on entry. No need to save %es and %ds, as those are317 * always kernel segments while inside the kernel. Doing this318 * before setting the new TLS descriptors avoids the situation319 * where we temporarily have non-reloadable segments in %fs320 * and %gs. This could be an issue if the NMI handler ever321 * used %fs or %gs (it does not today), or if the kernel is322 * running inside of a hypervisor layer.323 */324 lazy_save_gs(prev->gs);325 326 /*327 * Load the per-thread Thread-Local Storage descriptor.328 */329 load_TLS(next, cpu);330 331 /*332 * Restore IOPL if needed. In normal use, the flags restore333 * in the switch assembly will handle this. But if the kernel334 * is running virtualized at a non-zero CPL, the popf will335 * not restore flags, so it must be done in a separate step.336 */337 if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))338 set_iopl_mask(next->iopl);339 340 /*341 * Now maybe handle debug registers and/or IO bitmaps342 */343 if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||344 task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))345 __switch_to_xtra(prev_p, next_p, tss);346 347 /*348 * Leave lazy mode, flushing any hypercalls made here.349 * This must be done before restoring TLS segments so350 * the GDT and LDT are properly updated, and must be351 * done before math_state_restore, so the TS bit is up352 * to date.353 */354 arch_end_context_switch(next_p);355 356 /*357 * Restore %gs if needed (which is common)358 */359 if (prev->gs | next->gs)360 lazy_load_gs(next->gs);361 362 switch_fpu_finish(next_p, fpu);363 364 percpu_write(current_task, next_p);365 366 return prev_p;367 }
由上面的代码可看出:TSS的更新主要是
1: load_sp0(tss, next); 从下一个进程的thread字段中获取它的sp0,并用它来更新TSS中的sp0
2: __switch_to_xtra(prev_p, next_p, tss);必要的时候会更新IO权位值。