Taking the first step towards adding multitasking in my Kernel

I've decided I want to add multitasking in my kernel after doing some preliminary work on VFS and RAMFS. How hard is it going to be?

Hi there! I've been working on my own operating system since last year. It currently has paging, virtual file system, ramfs, and it can show stuff by writing pixels into a framebuffer. I've always been curious with how Linux, Windows and other operating systems do multitasking. More importantly, how do they switch context?

Trying to learn from OSDev Wiki

In the OSDev Wiki, I found a pretty interesting tutorial on multitasking. It can be found here. I tried to read through it and understand what's going on. But, alas! I couldn't understand even 50% of it. I was pretty confused about the implementation of switch_to_task. But tbh, it was not the OP's fault. The most interesting thing was Thread Control Block. If we look at the code:

1 ;C declaration:
2 ;   void switch_to_task(thread_control_block *next_thread);
3 ;
4 ;WARNING: Caller is expected to disable IRQs before calling, and enable IRQs again after function returns
5  
6 switch_to_task:
7  
8     ;Save previous task's state
9  
10     ;Notes:
11     ;  For cdecl; EAX, ECX, and EDX are already saved by the caller and don't need to be saved again
12     ;  EIP is already saved on the stack by the caller's "CALL" instruction
13     ;  The task isn't able to change CR3 so it doesn't need to be saved
14     ;  Segment registers are constants (while running kernel code) so they don't need to be saved
15  
16     push ebx
17     push esi
18     push edi
19     push ebp
20  
21     mov edi,[current_task_TCB]    ;edi = address of the previous task's "thread control block"
22     mov [edi+TCB.ESP],esp         ;Save ESP for previous task's kernel stack in the thread's TCB
23  
24     ;Load next task's state
25  
26     mov esi,[esp+(4+1)*4]         ;esi = address of the next task's "thread control block" (parameter passed on stack)
27     mov [current_task_TCB],esi    ;Current task's TCB is the next task TCB
28  
29     mov esp,[esi+TCB.ESP]         ;Load ESP for next task's kernel stack from the thread's TCB
30     mov eax,[esi+TCB.CR3]         ;eax = address of page directory for next task
31     mov ebx,[esi+TCB.ESP0]        ;ebx = address for the top of the next task's kernel stack
32     mov [TSS.ESP0],ebx            ;Adjust the ESP0 field in the TSS (used by CPU for for CPL=3 -> CPL=0 privilege level changes)
33     mov ecx,cr3                   ;ecx = previous task's virtual address space
34  
35     cmp eax,ecx                   ;Does the virtual address space need to being changed?
36     je .doneVAS                   ; no, virtual address space is the same, so don't reload it and cause TLB flushes
37     mov cr3,eax                   ; yes, load the next task's virtual address space
38 .doneVAS:
39  
40     pop ebp
41     pop edi
42     pop esi
43     pop ebx
44  
45     ret                           ;Load next task's EIP from its kernel stack

It seems to store some registers, and then loads the TCB of previous task into edi. And then loads the TCB of next task to esp. And then does some mov's I don't really understand. There are some stuff like kernel stack and virtual address space that aren't really present in the kernel right now. So, I decided to change my plan.

Plan B

I started to go through random repositories of fantastic kernels written by fantastic people to try to grasp how they did multitasking. I looked through source code of Lemon OS, Aero OS and many more. One thing they had in common is that the scheduler would do some stuff and then pass the context information to an assembly function. That thing does the actual work of moving and updating all the necessary registers and doing the jump.

But, what about rip? This was a pretty stupid thing done by me tbh. I was trying to find how to update rip directly. Like, mov rip, rax. I didn't realize that it was impossible. It was possible to read value by using lea. So, I wasted couple weeks thinking about it. Yeah. But then, today I just found I can do it just by pushing a value onto stack and running ret. Damn. This easy?

So now I started to implement a basic function that'll take a struct Context and use the values from there to update the registers.

1 #[derive(Default, Debug, Copy, Clone)]
2 #[repr(packed)]
3 pub struct Context {
4     pub rbx: u64,
5     pub rbp: u64,
6     pub r12: u64,
7     pub r13: u64,
8     pub r14: u64,
9     pub r15: u64,
10 }

Following is the code of the function:

1 #[naked]
2 pub unsafe extern "C" fn switch_context(
3     new_context: &Context,
4     new_cr3: usize,
5     rip: usize,
6 ) {
7     asm!(
8         "\
9         mov rax, rsp
10         mov rsp, rdi
11         pop rbx
12         pop rbp
13         pop r12
14         pop r13
15         pop r14
16         pop r15
17         mov rsp, rax
18         mov cr3, rsi
19         push rdx
20         ret
21     ",
22         options(noreturn)
23     )
24 }

This is a naked function(doesn't have the prologue and epilogue like normal functions do) that takes the context, cr3(for setting the address space of switched task) and the value for rip. The code basically backs up the stack first, then sets the pointer to Context as stack pointer. Then we pop the values from it with the same order we declared the register values in Context. The ordering is very important here. Otherwise, register will get wrong value. Then it restores original stack, updates cr3. After that, it basically pushes rdx. Why rdx? Well, the "C" calling convention requires that function arguments are put in following order: rdi, rsi, rdx, rcx, etc. That means, new_context is stored in rdi, new_cr3 is stored rsi and rip is stored in rdx. After that, we run the ret instruction.

Testing the thing

Let's make a dummy function and call it through this. First, the dummy function:

1 fn test_func() {
2     let rbx: u64;
3     let rbp: u64;
4     let r12: u64;
5     let r13: u64;
6     let r14: u64;
7     let r15: u64;
8 
9     unsafe {
10         asm!("\
11             mov {}, rbx\n\
12             mov {}, rbp\n\
13             mov {}, r12\n\
14             mov {}, r13\n\
15             mov {}, r14\n\
16             mov {}, r15\n\
17         ", out(reg) rbx,
18             out(reg) rbp,
19             out(reg) r12,
20             out(reg) r13,
21             out(reg) r14,
22             out(reg) r15);
23     }
24 
25     info!("Hello from test func!");
26     info!("Registers: ");
27     info!("rbx = 0x{:x}, rbp = 0x{:x}, r12 = 0x{:x}, r13 = 0x{:x}, r14 = 0x{:x}, r15 = 0x{:x}",
28         rbx, rbp, r12, r13, r14, r15);
29 }

The function basically reads all the registers updated by the context switch function and displays the values. Let's call switch_context with some values:

1 unsafe {
2     switch_context(
3         &Context {
4             rbp: 0xcafe,
5             rbx: 0xbabe,
6             r12: 0xdeadbeef,
7             r13: 0xfeedbed,
8             r14: 0xfacefeed,
9             r15: 0xfacebace,
10         },
11         Cr3::read().0.start_address().as_u64() as usize,
12         test_func as *const fn() as usize,
13     );
14 }

We are basically setting some interesting values to the registers, setting cr3 to what we are using currently as we can't set up new page tables now, and then passing the pointer to test_func. The switch_context function doesn't crash and switches task successfully. Here's the output:

Conclusion

I've learned some stuff about switching context and tried to implement them. Tbh, current one isn't really efficient. It doesn't really store the context of previous task somewhere, and it's currently a fancy unsafe function caller. I've plans to make it so that it actually works properly in real multitasking scenario. I'm going to have to write a scheduler and make it schedule tasks. It can be done through PIC. But that's for next time. Currently, my college class is going to begin soon, and I'm going to have to study to get into a good university. Till then, stay safe and keep grinding.

1	;C declaration:
2	; void switch_to_task(thread_control_block *next_thread);
3	;
4	;WARNING: Caller is expected to disable IRQs before calling, and enable IRQs again after function returns
5
6	switch_to_task:
7
8	;Save previous task's state
9
10	;Notes:
11	; For cdecl; EAX, ECX, and EDX are already saved by the caller and don't need to be saved again
12	; EIP is already saved on the stack by the caller's "CALL" instruction
13	; The task isn't able to change CR3 so it doesn't need to be saved
14	; Segment registers are constants (while running kernel code) so they don't need to be saved
15
16	push ebx
17	push esi
18	push edi
19	push ebp
20
21	mov edi,[current_task_TCB] ;edi = address of the previous task's "thread control block"
22	mov [edi+TCB.ESP],esp ;Save ESP for previous task's kernel stack in the thread's TCB
23
24	;Load next task's state
25
26	mov esi,[esp+(4+1)*4] ;esi = address of the next task's "thread control block" (parameter passed on stack)
27	mov [current_task_TCB],esi ;Current task's TCB is the next task TCB
28
29	mov esp,[esi+TCB.ESP] ;Load ESP for next task's kernel stack from the thread's TCB
30	mov eax,[esi+TCB.CR3] ;eax = address of page directory for next task
31	mov ebx,[esi+TCB.ESP0] ;ebx = address for the top of the next task's kernel stack
32	mov [TSS.ESP0],ebx ;Adjust the ESP0 field in the TSS (used by CPU for for CPL=3 -> CPL=0 privilege level changes)
33	mov ecx,cr3 ;ecx = previous task's virtual address space
34
35	cmp eax,ecx ;Does the virtual address space need to being changed?
36	je .doneVAS ; no, virtual address space is the same, so don't reload it and cause TLB flushes
37	mov cr3,eax ; yes, load the next task's virtual address space
38	.doneVAS:
39
40	pop ebp
41	pop edi
42	pop esi
43	pop ebx
44
45	ret ;Load next task's EIP from its kernel stack

1	#[derive(Default, Debug, Copy, Clone)]
2	#[repr(packed)]
3	pub struct Context {
4	pub rbx: u64,
5	pub rbp: u64,
6	pub r12: u64,
7	pub r13: u64,
8	pub r14: u64,
9	pub r15: u64,
10	}

1	#[naked]
2	pub unsafe extern "C" fn switch_context(
3	new_context: &Context,
4	new_cr3: usize,
5	rip: usize,
6	) {
7	asm!(
8	"\
9	mov rax, rsp
10	mov rsp, rdi
11	pop rbx
12	pop rbp
13	pop r12
14	pop r13
15	pop r14
16	pop r15
17	mov rsp, rax
18	mov cr3, rsi
19	push rdx
20	ret
21	",
22	options(noreturn)
23	)
24	}

1	fn test_func() {
2	let rbx: u64;
3	let rbp: u64;
4	let r12: u64;
5	let r13: u64;
6	let r14: u64;
7	let r15: u64;
8
9	unsafe {
10	asm!("\
11	mov {}, rbx\n\
12	mov {}, rbp\n\
13	mov {}, r12\n\
14	mov {}, r13\n\
15	mov {}, r14\n\
16	mov {}, r15\n\
17	", out(reg) rbx,
18	out(reg) rbp,
19	out(reg) r12,
20	out(reg) r13,
21	out(reg) r14,
22	out(reg) r15);
23	}
24
25	info!("Hello from test func!");
26	info!("Registers: ");
27	info!("rbx = 0x{:x}, rbp = 0x{:x}, r12 = 0x{:x}, r13 = 0x{:x}, r14 = 0x{:x}, r15 = 0x{:x}",
28	rbx, rbp, r12, r13, r14, r15);
29	}

1	unsafe {
2	switch_context(
3	&Context {
4	rbp: 0xcafe,
5	rbx: 0xbabe,
6	r12: 0xdeadbeef,
7	r13: 0xfeedbed,
8	r14: 0xfacefeed,
9	r15: 0xfacebace,
10	},
11	Cr3::read().0.start_address().as_u64() as usize,
12	test_func as *const fn() as usize,
13	);
14	}