Subroutines

del

Every programming language has its own term for a subroutine: procedure, function, subprogram, method, routine, etc.

It is interesting to break a large program into smaller separate logical units. It makes the program more modularized and allows to reuse code for a task.

A subroutine is a sequence of instructions. If you want to use it, you jump to it, execute its instructions and jumps back to the place from which it was called.

The subroutine is not allowed to modify the environment (registers) after the jump from the main code (caller address). In order to jump back to the address of the next instruction after the one that started the function, the call address must be saved.

A safe way to write a subroutine is to follow the Procedure Call Standard for the ARM Architecture (AAPCS).

Calling a subroutine

To define a subroutine, begin by using

.type YourFunction, %function

which ensures that the lowest bit is set (rather than using +1).

Subroutine calls are performed with the BL (branch and link) instruction, which

  • saves in the Link Register (LR or register r14) the return address of the next instruction following the BL instruction. In Cortex M, this address is PC + 4.

  • performs a jump. In other words, it loads the program counter (PC) with the memory address of the first instruction of the subroutine.

When the subroutine is executed, it returns to the caller by jumping to the address stored in LR by executing the following instruction:

mov pc, lr      ; load program counter (PC or r15) with the return adress stored in the link register (LR or r14).

When calling a subroutine, the BL instruction sets the less significant bit in LR.

Let’s improve our GPIO blinking code by adding two subroutines:

53clockGPIO:
54ldr  r1, =RCC_BASE       //  4002 1000
55ldr  r2, [r1,#AHB2ENR]   //  4002 104c
56orr  r2, #RCC_AHB2ENR_GPIOAEN
57orr  r2, #RCC_AHB2ENR_GPIOBEN
58str  r2, [r1,#AHB2ENR]
59mov  pc,lr
 64config_GPIO:
 65
 66ldr  r1, =GPIOA_BASE        //  4800 0000
 67ldr  r2, [r1,#GPIO_MODER]      // after reset, r2 is 0xabffffff
 68bic  r2, #GPIO_MODER_MODE6_1     // reset bit 1 mode 6  -> "01"    set PA6 to OUTPUT mode  2*6=12
 69bic  r2, #GPIO_MODER_MODE7_1     // reset bit 1 mode 7  -> "01"    set PA6 to OUTPUT mode  2*7=14
 70str  r2, [r1,#GPIO_MODER]
 71
 72ldr  r1, =GPIOB_BASE        //  4800 0400
 73ldr  r2, [r1,#GPIO_MODER]      // after reset, r2 is 0xabffffff
 74bic  r2, #GPIO_MODER_MODE6_1     // reset bit 1 mode 6  -> "01"    set PA6 to OUTPUT mode  2*6=12
 75str  r2, [r1,#GPIO_MODER]
 76
 77// set PUSH-PULL/open-drain mode
 78ldr  r1, =GPIOA_BASE        //  4800 0000
 79ldr  r2, [r1,#GPIO_OTYPER]     // after reset, r2 is 0xabffffff
 80bic  r2, #GPIO_OTYPER_OT6       // 0: output push-pull (reset state)
 81bic  r2, #GPIO_OTYPER_OT7       // 0: output push-pull (reset state)
 82str     r2, [r1,#GPIO_OTYPER]  //  value of PA OTYPE register
 83
 84// set pin speed
 85ldr  r1, =GPIOA_BASE        //  4800 0000
 86ldr  r0, [r1,#GPIO_OSPEEDR]    //  value of PA OSPEED register
 87ldr  r2, = GPIO_OSPEEDR_OSPEED6_0
 88ldr  r3, = GPIO_OSPEEDR_OSPEED6_1   //  10
 89orr  r0, r0, r2
 90orr  r0, r0, r3
 91str     r0, [r1,#GPIO_OSPEEDR]
 92
 93// set "no pull"
 94//  00: No pull-up, pull-down   01: Pull-up   10: Pull-down  11: Reserved
 95ldr  r1, =GPIOA_BASE        //  4800 0000
 96ldr  r2, [r1,#GPIO_PUPDR]      //  value of PUPD register
 97bic  r2, #GPIO_PUPDR_PUPD6_0    // no pull (reset state)
 98str     r2, [r1,#GPIO_PUPDR]
 99
100bx     lr

The main program becomes:

16.text
17
18.type Reset_Handler, %function  // the following is instructions, not data. Thus lsbit of address is set.
19Reset_Handler:
20
21ldr  r0, =0x10101010
22ldr  r1, =0x10203040
23bl clockGPIO           // enable the GPIO A and B clocks via the AHB2ENR register
24bl config_GPIO         // configure GPIO registers
25
26// ---------------------------------------------------------------
27ldr r3, =1000000
28ldr r5, =200000
29
30ldr r1, =GPIOB_BASE        //  4800 0400
31ldr r6, =GPIO_BSRR_BR_6
32ldr r7, =GPIO_BSRR_BS_6
33
34BlinkLoop:
35str     r6, [r1,#GPIO_BSRR]
36
37mov r4, r3
38delay1: subs r4, #1
39bne delay1
40
41str     r7, [r1,#GPIO_BSRR]
42
43mov r4, r5
44delay2: subs r4, #1
45bne delay2
46
47b BlinkLoop

Just before the instruction bl clockGPIO, we have:

lr             0xffffffff          -1
pc             0x8000106           0x8000106 <Reset_Handler+6>

After 5 single steps, we are ready to leave the clockGPIO subroutine:

clockGPIO () at gpio.s:54
54     ldr  r1, =RCC_BASE       //  4002 1000

PC and LR are now
lr             0x800010b           134217995
pc             0x800012e           0x800012e <clockGPIO>

LR memorized the return address 0x 800 0106 + 4 + 1 (bit 0 = 1), or 0x 800 010b

Subroutine last instruction before return

(gdb) s
halted: PC: 0x0800013c
59     mov  pc,lr

lr             0x800010b           134217995
pc             0x800013c           0x800013c <clockGPIO+14>

After returning to the main program

(gdb) s
halted: PC: 0x0800010a
Reset_Handler () at gpio.s:24
24     bl config_GPIO          // configure GPIO registers

lr             0x800010b           134217995
pc             0x800010a           0x800010a <Reset_Handler+10>

The PC register has now the value stored in LR (minus 1; the LSB is reset), so the next instruction will be a jump to the second subroutine.

Subroutine calling another subroutine

Let’s insert a call to config_PIO inside clockGPIO:

wrong code!
53clockGPIO:
54 ldr  r1, =RCC_BASE       //  4002 1000
55 ldr  r2, [r1,#AHB2ENR]   //  4002 104c
56 orr  r2, #RCC_AHB2ENR_GPIOAEN
57 orr  r2, #RCC_AHB2ENR_GPIOBEN
58 str  r2, [r1,#AHB2ENR]
59 bl config_GPIO                // configure GPIO registers
60 mov  pc,lr

Just before the call to config_GPIO; we have

; lr             0x800010b           134217995
; pc             0x800013c           0x800013c <clockGPIO+14>

Let’s single step:

(gdb) s
halted: PC: 0x0800013e
config_GPIO () at gpio.s:66
66     ldr  r1, =GPIOA_BASE        //  4800 0000

Now the LR value has changed:

(gdb) i r lr pc
lr             0x800013d           134218045
pc             0x800013e           0x800013e <config_GPIO>

LR now contains the return address to clockGPIO.

After several steps, the config_GPIO subroutine is fully executed, ready to return to clockGPIO:

(gdb)
halted: PC: 0x0800018c
100    bx      lr

(gdb) i r lr pc
lr             0x800013d           134218045
pc             0x800018c           0x800018c <config_GPIO+78>

Next step:

(gdb) s
halted: PC: 0x0800013c
clockGPIO () at gpio.s:60
60     mov  pc,lr

(gdb) i r lr pc
lr             0x800013d           134218045
pc             0x800013c           0x800013c <clockGPIO+18>

The LR address is now the same as the PC address (with the LSB set). In consequence, the program is stuck at the end of the first subroutine clockGPIO and cannot return to the main code.

The solution is to push the LR on the stack before calling a second subroutine and pop the LR after returning from the second subroutine to the first one.

clockGPIO:
push {lr}
ldr  r1, =RCC_BASE       //  4002 1000
ldr  r2, [r1,#AHB2ENR]   //  4002 104c
orr  r2, #RCC_AHB2ENR_GPIOAEN
orr  r2, #RCC_AHB2ENR_GPIOBEN
str  r2, [r1,#AHB2ENR]
bl config_GPIO         // configure GPIO registers
pop {lr}
mov  pc,lr

Separate ASM source files

Let’s put the two subroutines into two separate source files called sub1.s and sub2.s.

In order to make them visible to others files, we add .global clockGPIO and .global config_GPIO before the subroutine definition. This way the main file can call the 2 functions.

.global Reset_Handler is also necessary

Without it, arm-none-eabi-readelf does not show the function (FUNC) Reset_Handler :

245: 08000194     0 NOTYPE  LOCAL  DEFAULT    1 $t
246: 080001a4     0 NOTYPE  LOCAL  DEFAULT    1 $d
247: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 _bss_end__
248: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __bss_start__
249: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __bss_end__
250: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _start
251: 08000195     0 FUNC    GLOBAL DEFAULT    1 clockGPIO
252: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __bss_start
253: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __end__
254: 08000141     0 FUNC    GLOBAL DEFAULT    1 config_GPIO
255: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 _edata
256: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 _end
257: 00080000     0 NOTYPE  GLOBAL DEFAULT    3 _stack
258: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __data_start

With the .global directive, the problem is solved:

  244: 08000194     0 NOTYPE  LOCAL  DEFAULT    1 $t
  245: 080001a4     0 NOTYPE  LOCAL  DEFAULT    1 $d
  246: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 _bss_end__
  247: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __bss_start__
  248: 08000101     0 FUNC    GLOBAL DEFAULT    1 Reset_Handler
  249: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __bss_end__
  250: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _start
  251: 08000195     0 FUNC    GLOBAL DEFAULT    1 clockGPIO
  252: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __bss_start
  253: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __end__
  254: 08000141     0 FUNC    GLOBAL DEFAULT    1 config_GPIO
  255: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 _edata
  256: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 _end
  257: 00080000     0 NOTYPE  GLOBAL DEFAULT    3 _stack
  258: 20000000     0 NOTYPE  GLOBAL DEFAULT    1 __data_start

Note

For each subroutine file, issue these directives at the beginning of the file:

.syntax unified
.cpu cortex-m4
.arch armv7e-m
.thumb
.text

Note

Makefile

gpio.elf: gpio.o clockGPIO.o config_GPIO.o
        arm-none-eabi-ld -Ttext 0x8000000 -Tdata=0x20000000 -o gpio.elf gpio.o config_GPIO.o clockGPIO.o

gpio.o: gpio.s
        arm-none-eabi-as -mthumb -g gpio.s -o gpio.o

clockGPIO.o: sub1.s
        arm-none-eabi-as -mthumb -g sub1.s -o clockGPIO.o

config_GPIO.o: sub2.s
        arm-none-eabi-as -mthumb -g sub2.s -o config_GPIO.o

Add a separate C source file

We create the following lab.c file:

#include <stdint.h>       // for uint32_t
#include "stm32l476xx.h"

void delay(uint32_t time)
{
   uint32_t delay = 0;
   for(delay = 0; delay < time; delay++)
   {
   }
}

// -----------------------------------------------
extern int foo (int a, int b)
{
// enable the clock to GPIO port A & C - enable SYSCFG clock
// RCC->AHB2ENR = RCC_AHB2ENR_GPIOAEN | RCC_AHB2ENR_GPIOCEN;
int i;
for(i=0; i<6; i++)
{
      /* Turn on LED6 */
       GPIOA->BSRR = GPIO_BSRR_BS_6;
       delay(50000);
       /* Turn off LED0 */
       GPIOA->BSRR = GPIO_BSRR_BR_6;
               delay(100000);
}

for(i=0; i<8; i++)
{
       GPIOA->BSRR = GPIO_BSRR_BS_7;
       delay(50000);
       GPIOA->BSRR = GPIO_BSRR_BR_7;
               delay(100000);
}

 return a+b;
}

Note

Makefile with asm & c code

gpio.elf: gpio.o clockGPIO.o congpio.elf: gpio.o clockGPIO.o config_GPIO.o lab.o
        arm-none-eabi-ld -Ttext 0x8000000 -Tdata=0x20000000 -o gpio.elf gpio.o config_GPIO.o clockGPIO.o lab.o

gpio.o: gpio.s
        arm-none-eabi-as -mthumb -g gpio.s -o gpio.o

clockGPIO.o: sub1.s
        arm-none-eabi-as -mthumb -g sub1.s -o clockGPIO.o

config_GPIO.o: sub2.s
        arm-none-eabi-as -mthumb -g sub2.s -o config_GPIO.o

lab.o: lab.c
        arm-none-eabi-gcc -mcpu=cortex-m4 -Wall -Wextra -Werror -g3  -O0 -fstack-usage --specs=nosys.specs -save-temps -fverbose-asm -Iinclude -c lab.c -o lab.o

Warning

h files

Make sure the following header files are available
  • cmsis_compiler.h

  • cmsis_gcc.h

  • cmsis_version.h

  • core_cm4.h

  • mpu_armv7.h

  • stm32l476xx.h

  • system_stm32l4xx.h

Calling Convention

The AAPCS (ARM Architecture Procedure Call Standard) defines a calling convention for ARM, C, C++, …

Functions may only modify the registers r0-3 and r12. If more registers are needed, they have to be saved and restored using the stack.

The APSR may be modified too. The LR is used as shown for the return address.

When returning (via “bx lr”) the stack should be exactly in the same state as during the jump to the function (via “bl”).

The registers r0-r3 may be used to pass additional information to a function, called parameters, and the function may overwrite them. The register r0 may be used to pass a result value back to the caller, which is called the return value.

This means that when you call a function, you must assume registers r0-r3 and r12 may be overwritten but the others keep their values. In other words, the registers r0-r3 and r12 are (if at all) saved outside the function (“caller-save”), and the registers r4-r11 are (if at all) saved inside the function (“callee-save”).

A function that does not call any other functions is called a “leaf-function” (as it is a leaf in the call tree). If such a function is simple, it might not require to touch the stack at all, as the return value is just saved in a register (LR) and it might only overwrite the registers r0-r3 and r12, which the caller can make sure to contain no important data. This makes small functions efficient, as register accesses are faster than memory accesses, such as to the stack.

If all your functions follow the calling convention, you can call any function from anywhere and be sure about what it overwrites, even if it calls many other functions on its own. Restructuring the LED blinker could look like this:

Stack

A stack is a last-in-first-out (LIFO) data structure.

A stack has two fundamental operations: push and pop. The push operation adds an item to the top of the stack. The pop operation removes the item that was added last.

A stack also refers to a contiguous region in the data memory that software programs or processors use to hold a stack data structure. The stack pointer (SP) holds the memory address of the top of the stack. A program can utilize stacks to preserve and recovery the runtime environment when it calls a subroutine.