Hi there! I've been working on my own operating system since last year. It currently has paging, virtual file system, ramfs, and it can show stuff by writing pixels into a framebuffer. I've always been curious with how Linux, Windows and other operating systems do multitasking. More importantly, how do they switch context?
Trying to learn from OSDev Wiki
In the OSDev Wiki, I found a pretty interesting tutorial on multitasking. It can be found
here. I tried to read through it and understand what's
going on. But, alas! I couldn't understand even 50% of it. I was pretty confused about the implementation of switch_to_task.
But tbh, it was not the OP's fault. The most interesting thing was Thread Control Block. If we look at the code:
1 ;C declaration:
 2 ;   void switch_to_task(thread_control_block *next_thread);
 3 ;
 4 ;WARNING: Caller is expected to disable IRQs before calling, and enable IRQs again after function returns
 5  
 6 switch_to_task:
 7  
 8     ;Save previous task's state
 9  
 10     ;Notes:
 11     ;  For cdecl; EAX, ECX, and EDX are already saved by the caller and don't need to be saved again
 12     ;  EIP is already saved on the stack by the caller's "CALL" instruction
 13     ;  The task isn't able to change CR3 so it doesn't need to be saved
 14     ;  Segment registers are constants (while running kernel code) so they don't need to be saved
 15  
 16     push ebx
 17     push esi
 18     push edi
 19     push ebp
 20  
 21     mov edi,[current_task_TCB]    ;edi = address of the previous task's "thread control block"
 22     mov [edi+TCB.ESP],esp         ;Save ESP for previous task's kernel stack in the thread's TCB
 23  
 24     ;Load next task's state
 25  
 26     mov esi,[esp+(4+1)*4]         ;esi = address of the next task's "thread control block" (parameter passed on stack)
 27     mov [current_task_TCB],esi    ;Current task's TCB is the next task TCB
 28  
 29     mov esp,[esi+TCB.ESP]         ;Load ESP for next task's kernel stack from the thread's TCB
 30     mov eax,[esi+TCB.CR3]         ;eax = address of page directory for next task
 31     mov ebx,[esi+TCB.ESP0]        ;ebx = address for the top of the next task's kernel stack
 32     mov [TSS.ESP0],ebx            ;Adjust the ESP0 field in the TSS (used by CPU for for CPL=3 -> CPL=0 privilege level changes)
 33     mov ecx,cr3                   ;ecx = previous task's virtual address space
 34  
 35     cmp eax,ecx                   ;Does the virtual address space need to being changed?
 36     je .doneVAS                   ; no, virtual address space is the same, so don't reload it and cause TLB flushes
 37     mov cr3,eax                   ; yes, load the next task's virtual address space
 38 .doneVAS:
 39  
 40     pop ebp
 41     pop edi
 42     pop esi
 43     pop ebx
 44  
 45     ret                           ;Load next task's EIP from its kernel stack
 
It seems to store some registers, and then loads the TCB of previous task into edi. And then loads
the TCB of next task to esp. And then does some mov's I don't really understand. There are some stuff like
kernel stack and virtual address space that aren't really present in the kernel right now. So, I decided to change
my plan.
Plan B
I started to go through random repositories of fantastic kernels written by fantastic people to try to grasp how they did multitasking. I looked through source code of Lemon OS, Aero OS and many more. One thing they had in common is that the scheduler would do some stuff and then pass the context information to an assembly function. That thing does the actual work of moving and updating all the necessary registers and doing the jump.
But, what about rip? This was a pretty stupid thing done by me tbh. I was trying to find how to update rip directly.
Like, mov rip, rax. I didn't realize that it was impossible. It was possible to read value by using lea.
So, I wasted couple weeks thinking about it. Yeah. But then, today I just found I can do it just by pushing a value
onto stack and running ret. Damn. This easy?
So now I started to implement a basic function that'll take a struct Context and use the values from there
to update the registers.
1 #[derive(Default, Debug, Copy, Clone)]
 2 #[repr(packed)]
 3 pub struct Context {
 4     pub rbx: u64,
 5     pub rbp: u64,
 6     pub r12: u64,
 7     pub r13: u64,
 8     pub r14: u64,
 9     pub r15: u64,
 10 }
 
Following is the code of the function:
1 #[naked]
 2 pub unsafe extern "C" fn switch_context(
 3     new_context: &Context,
 4     new_cr3: usize,
 5     rip: usize,
 6 ) {
 7     asm!(
 8         "\
 9         mov rax, rsp
 10         mov rsp, rdi
 11         pop rbx
 12         pop rbp
 13         pop r12
 14         pop r13
 15         pop r14
 16         pop r15
 17         mov rsp, rax
 18         mov cr3, rsi
 19         push rdx
 20         ret
 21     ",
 22         options(noreturn)
 23     )
 24 }
 
This is a naked function(doesn't have the prologue and epilogue like normal functions do) that takes the context,
cr3(for setting the address space of switched task) and the value for rip. The code basically backs up the stack first,
then sets the pointer to Context as stack pointer. Then we pop the values from it with the same order we declared
the register values in Context. The ordering is very important here. Otherwise, register will get wrong value. Then
it restores original stack, updates cr3. After that, it basically pushes rdx. Why rdx? Well, the "C" calling convention
requires that function arguments are put in following order: rdi, rsi, rdx, rcx, etc. That means, new_context
is stored in rdi, new_cr3 is stored rsi and rip is stored in rdx. After that, we run the ret instruction.
Testing the thing
Let's make a dummy function and call it through this. First, the dummy function:
1 fn test_func() {
 2     let rbx: u64;
 3     let rbp: u64;
 4     let r12: u64;
 5     let r13: u64;
 6     let r14: u64;
 7     let r15: u64;
 8 
 9     unsafe {
 10         asm!("\
 11             mov {}, rbx\n\
 12             mov {}, rbp\n\
 13             mov {}, r12\n\
 14             mov {}, r13\n\
 15             mov {}, r14\n\
 16             mov {}, r15\n\
 17         ", out(reg) rbx,
 18             out(reg) rbp,
 19             out(reg) r12,
 20             out(reg) r13,
 21             out(reg) r14,
 22             out(reg) r15);
 23     }
 24 
 25     info!("Hello from test func!");
 26     info!("Registers: ");
 27     info!("rbx = 0x{:x}, rbp = 0x{:x}, r12 = 0x{:x}, r13 = 0x{:x}, r14 = 0x{:x}, r15 = 0x{:x}",
 28         rbx, rbp, r12, r13, r14, r15);
 29 }
 
The function basically reads all the registers updated by the context switch function and displays the values.
Let's call switch_context with some values:
1 unsafe {
 2     switch_context(
 3         &Context {
 4             rbp: 0xcafe,
 5             rbx: 0xbabe,
 6             r12: 0xdeadbeef,
 7             r13: 0xfeedbed,
 8             r14: 0xfacefeed,
 9             r15: 0xfacebace,
 10         },
 11         Cr3::read().0.start_address().as_u64() as usize,
 12         test_func as *const fn() as usize,
 13     );
 14 }
 
We are basically setting some interesting values to the registers, setting cr3 to what we are using currently as we can't
set up new page tables now, and then passing the pointer to test_func. The switch_context function doesn't crash and switches
task successfully. Here's the output:

Conclusion
I've learned some stuff about switching context and tried to implement them. Tbh, current one isn't really efficient.
It doesn't really store the context of previous task somewhere, and it's currently a fancy unsafe function caller. 
I've plans to make it so that it actually works properly in real multitasking scenario. I'm going to have to write a 
scheduler and make it schedule tasks. It can be done through PIC. But that's for next time. Currently, my college
class is going to begin soon, and I'm going to have to study to get into a good university. Till then, stay safe and keep grinding.