Nobel Sharanyan J

Sunday, July 19, 2015

Memory management - PART I

It is been a long time, the topic I choose now is memory management is a vast topic. This blog is my understanding based on "Understanding Linux Kernel"

Before starting into this topic, we will discuss about some basic, how bus are connected with CPU.

two type of bus are connected with CPU front side bus and backside bus.
front side bus - bus that connect CPU and most other components in computer like memory, bridge.
Back side bus - separate connection between processor and level 2 cache.

The above figure explain you how the peripheral been connected to processor.

Now we will see how an executable code get executed. If CPU need to execute a program it need to bring the code from hard disk to ram. Let say if we have N process each for different use cases and size. Then it need to bring it to memory and executed. CPU can execute only one at a time then there should be queue to maintain it. There are lot of background process like scheduler will be running to maintain fair chances to all. But how address of the code get mapped?

Different binding mechanism:

Before starting this you need to understand C compilation process. Which will help you to understand clearly. Here is some crisp message - compiler generate relocatable object files and loader generate absolute address. So when a C program is compiled it hold just a relative address. So that it can get mapped to memory. How a compiled C code get mapped to memory does it mapped randomly or how it get mapped? who help it to get mapped? how a executable code fetched from storage to memory? - I won't share this I will ask the reader to share their answer in comments.

1. Compile time binding

Compiler generate code which bind with physical address. which can't be loaded other than mapped location.

2. Load time binding

Compiler generate a relative address and loader bind that address to absolute address. So the code can be loaded to any location. but once it loaded to that location it can't be remapped to different location.

3. Execution time binding

Here it is same as above but it can be loaded and reloaded to any location in memory during run time. It required a special hardware to do so. because the CPU generate relative address it has to get mapped to corresponding to program.

Now we understand the different mechanism. Hey! If we look above CPU will be using relative address i.e, logical address and there will be hardware mechanism that convert the logical address to physical address.


Different CPU modes

Protected Modes - (PE - 0) - Instruction and architectural features are available. It is the recommended mode for all new application and Operating system.

Real address Modes - (PE - 1, VM - 0) - provide programming environment of 8086. When powered on CPU runs in real mode. In real mode the cpu address 16 bits memory.

System Management Mode -  provide system wide handling functions like power management, system hardware control or proprietary design code.

Virtual Mode - (PE - 1, VM - 1) - In this mode processor execute 8086 software in a protected, multitasking environment.

there two different type of addressing, byte addressing and segment addressing. Segment + byte addressing will provide better way to access or address byte in memory.


System Registers
EFLAGS - control task, mode switching, interrupt handling, instruction tracing and access rights.

Control registers - indicates support for specific processor capabilities within Operatin system.

Cr0 - control flags, that control operating state of processor.

Cr1 - Reserved.

Cr2 - Page Fault linear address.

Cr3 - Physical address of the base of page directory. If you look into this the physical address hold 20 bits and remaining 12 bits are set as zero. if 12 bits are zero then physical address is aligned to 4K(4096) base address.

Cr4 - contains group of flags that enable and allow access to only privilege mode.

Debug registers - allow setting of break points for use in debugging programs and system software.

Segment Selector register - there are six segment selector registers CS, DS, ES, FS and GS each 16 bit size.

Memory Management regsters - GDTR(Global Descriptor Table Register), LDTR(Local Descriptor Table Register), IDTR(Interrupt Descriptor Table Register), TSS(Task State Segment). Both GDTR and IDTR comes under system table register, hold 48bit in which 32bit Linear base address and 16bit Table limits. 
Where us Task register and LDTR hold 64bit in which 16bit segment selector, 32 bit base address, 16 bit segment limits.
When task switch occurs, TSS is automatically loaded with segment selector and descriptor for TSS.

Model specific registers - group of registers primarily to operating system or executive producers.

Why we need all the above thing to understand memory management?

Let start with system boot

we know that CPU system cannot boot by it own. So how actually a system start when we click the power button. When power ON CPU enter into real mode. i.e it is compatible to lower version 8086. Therefore it can address only 16 bit address. CS register hold 0xFFFF and EIP points to 0x0. If there is no default code to execute the system crashes.

Since it can address only 16bit the BIOS code is kept low like 8KB in size and stored in ROM(Flash). So that every time the system start BIOS code get executed.

BIOS puts interrupt vector table at beginning which is a size of 1KB = 256 x each 4 bytes. When you compiled the code you will notice some segment like code, data, stack, heap. whereas code and data are created during compile time and remaining get created while running. Now for the things to get loaded there will be linker scripts which will provide information about that. BIOS data memory get loaded into memory next to it. After that interrupt service routine get loaded.i.e from 0xE05B to 0xFFFE.
First BIOS will load bootsec into the memory. where us bootsec load the remaining sector into memory.

Where is actually bootsec is present? bootsec it present in first sector of floppy disk or hard disk. Which can be modified in the boot screen. BIOS load the bootsector at 0x7C00. In real mode the maximum memory it can address is 1MB(0x00000 ~ 0xFFFFF).

After bootsec loading the OS it need to do the following functionalities,
1. enabling 32 bit addressing control
2. enabling the protected mode.
3. establishing interrupt service mechanism.
4. conducting issues about protected mode.
5. building memory paging mechanism.
6. preparing for main function.

disable interrupt EFLAG - copy core program to 0x0000 because BIOS no longer needed and its interrupt vector too.
OS can't run without interruption - system call, peripheral interaction, scheduling.

After loading the OS/Kernel, initial value of GDTR and IDTR need to be set. GDT is the only array that store the segment register value in the system. It is an entrance to all program. It hold
1. total list of all process,
2. storing every task LDT address and TSS for segment addressing,
3. site protection and
4. recovery.

Program need entrance of GDT when using segment descriptor through segment register.

IDT - entrance for all interrupt service routine.
Is there any difference between IDT and IVT?
IVT Interrupt Vector Table, it contain only address for particular service routine used only in real mode.
IDT Interrupt Descriptor Table, it contain complex structure to handle interrupt while CPU running in protected mode.

Enabling 32 bit addressing
In real mode the CPU can address upto 0xFFFFF(1MB) when it tried to address more than that the CPU roll back to beginning of the memory.

modify/register the corresponding interrupt in interrupt vector table.

switch to protected mode by modifying the register CR0

bootsect -> setup.S -> head.S -> main

before loading main, some prerequiste need to be done which is done by head.S
1. manages the layout of kernel program in memory.
2. creating kernel paging system in the memory space.
create page table directory
page table
buffer
GDT
IDT

3. The size of head.S is approximately 25KB, arch/i386/kernel/head.S

Segment selector(16 bits) is associated with each segment descriptor, software use this to point the location of segment descriptor. whether it is located in GDT or LDT, requester privilege level. remaining 13 bit is used to address whether segment descriptor is located in GDT or LDT.

processor addressable memory space - linear address
linear address space to small protected address space - segment.
to address a byte in a segment we required a different mechanism called Logical address -> segment + offset

Why we actually this many type of conversion?
Logical address hold segment selector + 32 bit offset, that map program segment assigned by compiler. It give you a protection to your that it won't affect or change the other code. Unless you understand the virtual memory concept you won't understand the need of address translation of linear to physical.

Linear address is common to both user mode and kernel mode, both linear addressing limits is 2 power 32 -1. Where as the field in corresponding descriptor hold the key difference.

another key thing is kernel is just a one execuatble code, so when ever switching happens from user to kernel mode. _KERNEL_CS_ holds the same value.

Paging - linear to physical addressing. key things for paging are page table directory, page table.
In CR0, there is a seperate field used tell whether paging is enabled or not. If it is not enabled it directly point linear address as physical address.

CR3 hold the entry of page directory.

Linear address -> Directory(31 - 22)| Table(21 -12)| Offset(11 - 0)

Address type:
user virtual address: address seen by user program not a direct physical address.

Physical address: address used by processor and memory

Bus address: address between peripheral buses and memory. IOMMU that remaps address between physical address and IO bus address.

Kernel Logical address: normal address space of the kernel. Occupy same portion of physical memory. The address are stored in void * or unsinged long.
example: memory allocated using kmalloc.

Kernel virtual address: simialr to logical address. There is no one to one mapping to physical address. example: memory allocated using vmalloc().

logical address -> __pa() -> Physical address.
logical address <- __va() <- Physical address.

Memory is splited in chunk of memory called page. each page size is 4096(PAGE_SIZE) = xxxxx000. where as "xxxxx" is a page frame number.

to be continue in Part II....

Sunday, May 31, 2015

System calls

Before starting System call, we will start with OS. Operating System(OS) is a software that manage system resource and provide reliability to users. The Key service provided by OS are,
1. Controlling execution of process
2. Scheduling process in a fairly manner
3. Allocating Main Memory for process
4. Allocating Secondary memory for efficient storage and retrieve data and
5. Allowing process control over peripherals devices.

Application process can't access hardware without the help of Kernel. The below figure shows the privilege levels and communication path.

System call is a primary mechanism for application to interact and request kernel to provide service. So if user want to access system hardware it request system call. When System call get called, it initiate software interrupt/trap. Which makes CPU to switch over from user mode to kernel mode. Interrupt number of software interrupt is 0x80. Below is an example how system call is implemented in assembly language.

/*size_t write(int fd, const void *buf, size_t count)*/

mov edx,count ; message length

mov ecx,buf  ; message to write

mov ebx,fd  ; file descriptor

mov eax,4  ; system call number (sys_write)

int 0x80  ; call kernel

C wrapper function is used, in order to avoid repeated implementation of above code for different system calls. There are lot of pros and cons for using wrapper functions. To add a new system calls you need to build Linux kernel source and then follow the below steps,

1. add entry in "arch/x86/syscalls/syscall_64.tbl". If you are using x86_64 bit system.

# <number> <abi> <name> <entry point>

323    common    test_syscall    sys_test_syscall

2. add "sys_test_syscall()" declaration in "include/linux/syscalls.h". asmlinkage means the function should expect its arguments on the stack rather than in registers.

asmlinkage long sys_test_syscall(void);

3. add definition of "sys_test_syscall()" in a new or existing file.
In our case we added this function in test_syscall.c in directory test_syscall in <kernel_source>

#include <linux/kernel.h>

#include <linux/linkage.h>



asmlinkage long sys_test_syscall(void) {

    pr_err("Successfully created syscall!!\n");

}

4. Make sure the file is build by default. by Adding suitable changes in Makefile

<KERNEL_SOURCE>/test_syscall/Makefile



obj-y    := test_syscall.o



<KERNEL_SOURCE>/Makefile

....

core-y    := usr/ test_syscall/

5. Compile and load the kernel, after compiled kernel get loaded.
Create a c file to call the syscall.

#include <stdio.h>



int main() {

    syscall(323);

    return 0;

}

6. test_syscall kernel print get printed which can be seen in kernel log.

[  2185.076091] Successfully created syscall!!

Here we successfully created the system call. If you need further clarification please comment my blog.

Tuesday, May 26, 2015

Booth’s algorithm: Multiplication of two unsigned numbers and signed numbers

Here, I am going to share how multiplication is done inside processor. Due to evolution of human mind, There may be better and different way to do.

Each and every method have some pros and cons, If we choose first it is quite complicated to implement the same in processor as everything in processor is Logic High and Logic Low. For the second one, it take O(multiplier) times and you need to take care in choosing lower value as multiplier. Third method shows multiplication done using Logic High(1’s) and Logic Low(0’s). If you look into third one, every times 1’s and 0’s are multiplied with the multiplicand and shifted left (2 power n) times. Where n is the position of bit used by multiplier.

Below figure, explains how Multiplication is done for two unsigned numbers.

Lets understand the concept first, for example, take 6(0b0110) as multiplicand and 2(0b0010)as multiplier and Initial value of Accumulator and Carry bit are zero.

Step 1:

Check the Last bit of multiplier(i.e. 2) if it is 0 proceed else jump to step 2. Right shift Accumulator, Multiplier and carry in such a way that last bit(LSB) of Accumulator jump to first position(MSB) of Multiplier, Carry bit jump to first position(MSB) of Accumulator and Last bit of Multiplier is left alone. We can also do the same thing using left shift which use additional memory compare to this method.

Count = Count + 1

Accumulator = 0b0000

Multiplier = 0b0001

Step 2:

Now check again the last bit of multiplier if it 1. Add multiplicand with Accumulator and store the result in Accumulator.

Accumulator = 0b0110

After that do the Right shift Accumulator, Multiplier and carry in such a way that last bit(LSB) of Accumulator jump to first position(MSB) of Multiplier, Carry bit jump to first position(MSB) of Accumulator and Last bit of Multiplier is left alone.

Count = Count + 1

Accumulator = 0b0011

Multiplier = 0b0000

Step 3:

Do the Above steps until count reach the size of multiplier.

Count == 4

Accumulator = 0b0000

Multiplier = 0b1100

And the Result will be,

Result = Accumulator(4bit MSB) | Multiplier (4bit LSB).

Result = (0b0000 << 4) | 0b1100 = 0b00001100 = 12.

The above method will not be applicable to solve multiplication of negative number. What we can do is convert both multiplier and multiplicand to positive numbers, perform the multiplication then take 2’s complement of the result.

The above figure is the advance method used for multiplication of two signed or unsigned numbers.

Here, is how the above method multiply two signed/unsigned numbers.

Here is the C source code, for the above algorithm

#define BYTE                 8

#define SET_MSB_BIT(x)       ((x) << (BYTE -1)) 

#define TWOS_COMPLEMENT(x)   ((~x) + 1) 

short multiply(unsigned char multiplicand, unsigned char multiplier) { 

    unsigned short result = 0;

    /*Accumulator, intial value is assigned as zero*/

    unsigned char ACC = 0; 

    unsigned char M = multiplicand; 

    unsigned char Q = multiplier; 

    /*Number of bits in a byte*/

    unsigned char count = BYTE; 

    /*Q-1, intial value assigned as zero*/

    bool Q_prev = 0; 

    /*Q 0*/

    bool Q_0 = 0;

    /*Accumulator MSB bit*/

    bool ACC_msb = 0; 



    do { 

            Q_0 = Q & 1; 

            if (Q_0 ^ Q_prev) {

                    /*Check xor truth table*/ 

                    ACC = ACC + (Q_0 ? TWOS_COMPLEMENT(M): M); 

            }

            /*Right shift Q by one*/ 

            Q = Q >> 1;

            /*store Q_0 value as Q_prev*/

            Q_prev = Q_0;

            /*Store MSB bit of ACC*/

            ACC_msb = ACC & SET_MSB_BIT(1);

            /*Set MSB bit of Q, if ACC LSB bit is set*/

            Q |= SET_MSB_BIT(ACC & 1);

            /*right shift Accumulator*/

            ACC = ACC >> 1;

            ACC |= SET_MSB_BIT(ACC_msb);

            count--;

    } while(count > 0);

    result = Q | (ACC << BYTE);

    return result;

}

/*example code for division for unsigned number*/

#define SIZE_HWORD (16) /*bits*/ 

 

struct division { 

    unsigned short quotient; 

    unsigned short reminder; 

}; 

 

struct division divide(unsigned short divisor, unsigned short dividend) { 

    struct division result; 

    unsigned short ACC = 0; 

    unsigned short M = divisor; 

    unsigned short Q = dividend; 

    short count = SIZE_HWORD; 

    bool Q_0 = 0; 

    bool Q_msb = 0; 

 

    do { 

        /*Assign Q_0 ,i.e. LSB bit as 1*/ 

        Q_0 = 1; 

        /*store MSB bit of Q as Q_msb*/ 

        Q_msb = (Q & (1 << (SIZE_HWORD -1))); 

 

        /*left shift Acc by 1*/ 

        ACC = ACC << 1; 

        /*ADD Q_msb to the LSB of ACC*/ 

        ACC |= Q_msb; 

        /*left shift Q by 1*/ 

        Q = Q << 1; 

 

        /*subtract ACC with M*/ 

        ACC -= M; 

 

        /*check ACC is negative, 

        if so revert the value back. ADD zero to Q_0*/ 

        if ((short)ACC < 0) { 

            ACC += M; 

            Q_0 = 0; 

        } 

        Q |= Q_0; 

        /*decrement count*/ 

        count--; 

    } while (count > 0); 

    result.reminder = ACC; 

    result.quotient = Q; 

    return result; 

}

Thanks for viewing my blog, please post your comment.

Function call – stack implementation

Hi,

I like to share in depth of how stack grows and how it look like.

Lets take the below example,

#include<stdio.h>

int foo(int a, int b, int c) {

        int sum = 0;

        sum = a + b + c;

        return sum;

}

int main() {

        foo(10,20,30);

        return 0;

}

When you compile and run this code, crt0 will load your file into memory and check for the entry to get start. By default entry is main so the function get start from main() function. Inside main() function we have called function foo(), so the caller is main() and callee is foo(). After executing the statements in foo(), it need to return back to main(), i.e the one who calls(caller). The scope of the variable defined inside the function is limited to lifetime of the function. In order to avoid violation, the two dedicated CPU registers ebp(Base Pointer), esp(Stack Pointer) control its. So how does the two register control scope of variable and function calls. Here is pictorial representation of Caller calling callee and how the stack get updated.

From the above figure, I have mentioned Higher address and Lower address too show that the stack grows but in downward. So when Caller calls Callee it PUSH the argument from right to left into stack and it left a WORD size for return. If Callee return value greater than size of WORD then return hold the address of location where it getting stored. Base pointer is otherwise knows as frame pointer for the function, and stack pointer hold the maximum limit of the function. Whenever you call PUSH the stack pointer get updated first and value is loaded next. When you return back from callee the base pointer and stack pointer get updated to the caller. But the callee’s content are still in memory which may be override when caller call another function or swapping happens. During unwinding phase it doesn’t update the stack with zero.

Here is some questions to think.

1. When you defined an array inside a function how it get assigned into memory, whether index zero pushed first or the index last?

2. When you implement a recursive function did you face a stack over flow? If so how will you find the stack size(RLIMIT)? – getrlimit()

3. Create a function foo() which return datatype int, define a variable foo_i of datatype int inside foo with value 100 to it and return foo_i. Similarly define another function bar() same as foo() and variable bar_i similar to foo_i but undefined. Increment the bar_i and return bar_i. Consolidate the both foo() and bar() returns to a variable and print it.

Oops!! I haven’t explained the code written on the top. will leave that as an assignment to you.

Steps to Apply Patches to Kernel

Here I am going to share the steps to apply patches to kernel. interesting is it.

Step 1: Clonning
Clone the latest Linux kernel from kernel.org, I prefer source from Linus
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Step 2: Branching
After cloning the linux kernel, It is better to create a new branch
# git branch
Switch to new branch
# git checkout

Step 3: Apply Changes
Make some changes and Check the difference
# git diff

Step 4: Commit
Commit the Changes
# git commit -a

A good commit message look like below things
———————————————————————
Header line: explain commit in one line

Body of commit message in few lines of text, explaining things
in more detail, like background about the issue being fixed, etc.
The Body of the message should not exceed 70 lines.

Reported-by:
Signed-off-by: Your Name <youremail@yourhost.com>

example:
Subject:[PATCH] Makefile: Change text to contain text

As a part of example, text field in Makefile is changed to text.
This is just an example, I think you can explain better than me.

Signed-off-by: Nobel Sharanyan J <nobelsharanyan86@gmail.com>
———————————————————————

Step 5: Generate Patch
Create the Patch
# git format-patch master..
or you can use something like
# git format-patch –stdout origin/master

Step 6: Cross check the patch before sending
Check whether patch is error free and ok to submit
# ./scripts/checkpatch.pl .patch

Step 7: whom to mail
# ./scripts/get_maintainer.pl .patch
the above command will list down the maintainers email id.

Step 8: How to mail,
To send mail you can use either mutt or git
if your are using git sendmail follow this,
# git config –global sendemail.smtpuser
# git config –global sendemail.smtpserver smtp.googlemail.com
# git config –global sendemail.smtpencryption tls
# git config –global sendemail.smtpserverport 587
# git config –global sendemail.smtppass

to cross check whether config is written correctly check “~/.gitconfig”

Step 9: Test mail.
Try a test mail not to maintainer, no need to distrub them
# git send-email –to <name>@gmail.com –cc <name>@gmail.com .patch

check the mail. hurray!! ready to send patch to linux.

view git log using below command,
git log
git log –oneline
There are lot of better comments, better google it or refer some books.

Now you follow the same process to apply patches to kernel, instead of to add maintainer in to list

Counting number of bit sets in an integer (Hamming Weight)

There is always a multiple ways to get a solutions for given problem and I am not going to argue this is the best.

int count_number_of_bit_set(unsigned value) {

    unsigned int shift[] = {1, 2, 4, 8, 16};

    unsigned int magic_number[] = {

                                   0x55555555,

                                   0x33333333,

                                   0x0F0F0F0F,

                                   0x00FF00FF,

                                   0x0000FFFF};

    unsigned char i = 0;

    for (; i <= 4; i++) {

         value = (value & magic_number[i]) +

                 ((value >> shift[i]) & magic_number[i]);

    }

    return value;

}

Let understand the logic inside the code,

The above picture is drawn to explain the concept for a byte. which can be extended for multiple bytes by applying the above concept.

If you like my blog, Please comment your views and feedback.