Recommended Tools Option for Optimization

1 Code Size Optimization.. 2

1.1 Generic. 2

1.2 H8 Specific. 3

1.3 SH Specific. 4

1.4 M16CM32C Specific. 5

1.5 RX Specific

1.6 RL78 Specific

1.7 V850 Specific

2. Speed Optimization.. 5

2.1 Generic. 5

2.2 H8 Specific. 6

2.3 SH Specific. 6

2.4 M16CM32C Specific. 6

2.5 RX Specific

2.6 RL78 Specific

2.7 V850 Specific

3. Workaround for some known problems with Optimization.. 6

3.1 Generic. 6

3.2 H8 Specific. 8

3.3 SH Specific. 8

3.4 M16CM32C Specific. 8

3.5 RX Specific

3.6 RL78 Specific

3.7 V850 Specific

 

4. Optimization that user can do in application code itself. 8

4.1 Care to be taken in startup assembly file. 8

4.2 General Tips 8

5. Summary.. 9

 

 

Revision History

 

Version No.

Date

Overview of Changes

1.0

31 March 2006

Draft Created

2.0

03 October 2006

Added section 1.4.1

3.0

08 June 2007

Modified for Optimized libraries section

4.0

30 May 2008

Modified for Library Generator addition (Section 1.1.1)

5.0

30 September 2008

Added the option '-mbitops' (Section 1.3.1)

6.0

30 September 2009

Added information about the new compiler option called "optimize".

7.0

13 July 2012

Targets RX, RL78 and v850 are added.

 

Overview:

This document is prepared to guide beginners  to use proper optimization options to achieve proper code or speed optimization.  It also describes some known problems with optimization and their workarounds.

The schemes, described below, will work better for ELF tool chain.

 

Some of the options explained below may affect debugging.

 

 

1 Code Size Optimization


1.1 . Generic

1.1.1 Using Library Generator

The library generator tool, 'libgen', builds the standard/optimized libraries with customized compiler and assembler options.

'Library Generator' on Command Line:

For information on the usage of  'libgen' tool and  supported options, please refer to the following link,
http://www.kpitgnutools.com/manuals/binutils.html#SEC15 

'Library Generator' in HEW:


1.1.2 Using Optimized libraries ( “liboptc.a” and “liboptm.a)

Optimized libraries are provided with all KPIT GNU ELF toolchains. User can use these libraries to get optimized code. Usage of these libraries can generate up to 40% optimized code as compared to standard libraries.

Use on command line:

Usage of optimized libraries on command line is similar to standard libraries. User need to specify the optimized libraries to be searched and included. These libraries are available in same path as of standard libraries.

e.g. #h8300-elf-gcc t.cmhloptm -loptc

Use in HEW:

User can select to use optimized libraries while creating new project for RX/RL78/H8/SH/V850/M16CM32C targets. The option to use optimized libraries can also be selected once the project is created. To enable the use of optimized libraries, a check box is provided on “archive” property page of linker property sheet

Sources of optimized libraries are not available for download.

1.1.3 Using Command Line Options and attributes

Optimum level of code size optimization can be achieved by using optimization option “Os”.

Further reduction in code size can be achieved by using following options.

a.    Compiler Option “fomit-frame-pointer”

This option will save 2/4 bytes required to save and restore frame pointer register in prologue and epilogue of function.

b.    Compiler Option “fdata-sections” and “ffunction-sections”

These options will force compiler to create separate sections for all functions and variables. Usage of these options along with linker option “gc-section”, explained later, produces better results for code size optimization.

c.    Option “fsort-data”

When "-fsort-data" option is passed to compiler while building application code, separate data sections are generated for global initialized variables and constant data based on their alignment. The "n" (1/2/4) byte aligned data will go either in .data.align"n" or .rodata.align"n" section. Arrays, structures and unions are also treated in same way. This option is enabled with "-Os". Due to this option, padding of bytes between data allocation within section is not required.

This will reduce the size of RAM as well as ROM required to store global initialized variables.

This option will be ignored if option "-fdata-sections" is passed to compiler.

d.    Compiler Option “fno-function-cse

When this option is used, compiler does not put function addresses in registers. i.e. this option makes each instruction that calls a constant function contain the function's address explicitly.

e.    Compiler Option “funit-at-a-time”

This option enables parsing of whole compilation unit before starting to produce code and hence results in optimization. This option removes unreferenced static variables and functions.

However, this may result in undefined references when an ‘asm statement refers directly to variables or functions that are otherwise unused. In that case either the variable/function shall be listed as an operand of the ‘asm statement operand or, in the case of top-level ‘asm statements the attribute used shall be used on the declaration.

f.     Compiler Option “fpack-struct[=n]” or attribute “aligned” and “packed”

This option when specified without a value, pack all structure members together without holes. Similarly, it works for mentioned attributes. You may refer to section “Specifying Attributes of Variables” in

http://www.kpitgnutools.com/manuals/gcc.html for more details

 

g.    Compiler Option “falign-jumps”

This option aligns the branch targets to a power-of-two boundary, for branch targets where the targets can only be reached by jumping, skipping up to n bytes like `-falign-functions'. In this case, no dummy operations need be executed.

 

h.    Linker Option “gc-sections”

This option will remove unused and unreferenced function(s) and variables thereby reducing overall code size. When link-time garbage collection is in use (--gc-sections), it is necessary to mark sections that should not be eliminated for example “.vects” section containing vector table. This is accomplished by surrounding an input section's wildcard entry with KEEP(), as in KEEP(*(.vects)) in linker script.

 

i.     Linker Option “strip-all

This option omits all symbol information from the output file.

 

j.     Linker Option “no-keep-memory”

ld” normally optimizes for speed over memory usage by caching the symbol tables of   input files in memory. This option tells ld to   instead optimize for memory usage, by rereading the symbol tables as necessary. This may be required if ld runs out of memory space while linking a large executable.

 

k.    Compilerattribute'optimize'

'optimize' attribute allows programmer to change the optimization level and particular optimization options for an individual function.

Ex:- int foo(int i) __attribute__((optimize("-O3")));

 

l.     Linker option ‘-flto’

This option runs the standard link-time optimizer. When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF  sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.

 

To use the link-time optimizer, `-flto' needs to be specified at compile time and during the final link. For example:

gcc -c -O2 -flto foo.c

gcc -c -O2 -flto bar.c

gcc -o myprog -flto -O2 foo.o bar.o

 

“DON'Ts”:

In order to achieve further code size optimization, ensure that following option is not used.

            

a.    Linker Option “emit-relocs

This option Leaves relocation sections and contents in fully linked executables resulting in a larger executable.

Therefore, in order to achieve better code optimization, this option should not be used. By default, this option is disabled.

 

1.2 H8 Specific

1.2.1  Using Command Line Options and attributes

Following target specific options can be used for H8 targets to achieve code size optimization,

a. Option “mtinydata” (v0601 and onward)

When "-mtinydata" target specific command line option is passed to the compiler while building the application code, compiler creates ".tinyrodata" , ".tinydata" and  ".tinybss" sections. All the constant variables are placed in ".tinyrodata" section which gets relocated into the lower most 32kb memory region of H8. The initialized global variables are placed in ".tinydata" section and all un-initialized variables are placed in ".tinybss" section which is relocated into the upper most 32kb of memory region of H8. Hence, all these variables can be accessed using 16-bit addressing method. Thus the size of the code generated will be reduced resulting in code size optimization.

The ".tinybss" implementation does not work as expected. It is a known problem.

b. Option “mrelax

This option shortens some address references at link time, when possible; uses the linker option “-relax”. Linker can perform global optimization on following instructions when option for relaxing is passed,

(“jmp”, “jsr”, “mov.b instructions which uses the sixteen-bit absolute address form, but refers top page of memory”, “mov.b/w/l instructions which use register indirect with 32 bit displacement address form, but refers top page of memory”, “bit manipulation instructions like band, bclr, biand, bild, bior, bist, bixor, bld, bnot, bor, bset, bst, btst, bxor which use 32 bit and 16 bit absolute address form, but refers top page of memory”, “ ldc.w and stc.w instructions which use 32 bit absolute address form, but refers top page of memory”, ldc.w, stc.w instructions which use register indirect with 32 bit displacement address form, but refers top page of memory”}.

Please refer  following link for more details:-

http://www.kpitgnutools.com/manuals/ld.html

c. Attribute "function_vector"

Any function can use this attribute along with vector location address. GNUH8 compiler uses indirect memory addressing jump instruction "jsr @aa:8" for this attribute. Programmer has to write address of function (function pointer) at vector address location. Whenever this function is called from another function, program will jump to vector address location and pick up function pointer and then jump to that function. This process will reduce the speed of execution but it certainly reduces the code size by 2 byte for every function call. If certain function is called 100 times in application, then user saves 200 bytes in total. Programmer can use function vector attributes for frequently called functions and unused IVT locations to reduce code size.

    Example :

                void foo (void) __attribute__ ((function_vector(vector_address)));

                void foo (void)

                {}

                void bar (void)

                {

                  foo();

                }

 

1.3 SH Specific

1.3.1  Using Command Line Options and attributes

Following target specific options can be used for SH targets to achieve code size optimization:-

a. Option “mrelax

This option shortens some address references at link time, when possible; uses the linker option “-relax”. Please refer the following link for more details:-

http://www.kpitgnutools.com/manuals/ld.html

b. Option “ffast-math

This option, when used for SH4A targets, enables the generation of sh4a target specific instructions, resulting in code size reduction and speed improvement.

c.Option“mbitops
Bit instructions in SH2A target will be generated only on enabling the command line option "-mbitops".

1.4 M16CM32C Specific

1.4.1  Using Command Line Options and attributes

Following target specific options can be used for M16CM32C targets to achieve code size optimization,

a. Attribute "function_vector"

On M16C targets, function_vector attribute declares a special page subroutine call function. Use of this attribute reduces the code size by 2 bytes per each call generated to the subroutine. The arguement to attribute is vector number entry from the special page vector table which contains 16 low-order bits of subroutine's entry address. Each vector table has special page number(18 to 255) which are used in JSRS instruction. Jump address of the routines are generated adding 0F0000H (in case of M16C targets) or FF0000H (in case of M32C targets), to 2 byte addresses set in the vector table. Therefore you need to ensure that all the special page vector routines should get mapped within the address range 0F0000H to 0FFFFFH (for M16C) and FF0000H to FFFFFFH (for M32C).

 

In the following example 2 bytes will be saved for each call to fuction "foo".

 

void foo (void) __attribute__((function_vector(0x18)));
void foo (void)
{
}

void bar (void)
{
    foo();
}

If functions are defined in one file and are called in another file, then be sure to write this declaration in both files.

This attribute is ignored for R8C target.

 

1.5     RX Specific

1.5.1 Using Command Line Options and attributes

Following target specific options can be used for RX targets to achieve code size optimization,

a.-msmall-data-limit=N

Specifies the maximum size in bytes of global and static variables that  can be placed in small data area. Using the small data area can lead to smaller and faster code, but the size of area is limited and it is up to the programmer to ensure that the area does not overflow. Also when the small data area is used one of the RX's registers (usually r13) is reserved for pointing to this area, so it is no longer available for use by the compiler. This could result in slower and/or larger code if variables which once could have been held in the reserved register are now pushed onto the stack.

 

b. -mrelax

Enable linker relaxation. Linker relaxation is a process whereby the linker will attempt to reduce the size of a program by finding shorter versions of various instructions. Disabled by default.

 

1.6     RL78 Specific

1.6.1 Using Command Line Options and attributes

a. -mmul=none, -mmul=g13, -mmul=rl78

Specifies the type of hardware multiplication support to be used. The default is none, which uses software multiplication functions. The g13 option is for hardware multiply/divide peripheral only on RL78/G13 targets. The rl78 option is for standard hardware multiplication defined in RL78 software manual.

 

This reduces the number of instructions used for multiplication and achieves code size optimization.

 

1.7 V850 Specific

1.7.1 Using Command Line Options and attributes

Following target specific options can be used for V850 target to achieve code size optimization:-

a.-mno-prolog-function
Do not use external functions to save and restore registers at the prologue and epilogue of a function. The external functions are slower, but use less code space if more than one function saves the same number of registers. The -mprolog-function option is on by default if you optimize.

 

b. mspace

Try to make the code as small as possible. At present, this just turns on the -mep and -mprolog-function options.

 

2. Speed Optimization


2.1 Generic

2.1.1 Using Command Line Options and attributes

Optimum level of speed optimization can be achieved by using “O3”. At “O3”, GCC enables 

all the options from “O2”, along with more advanced methods, such as “inlining” functions, renaming registers, 

and other scheduling improvements.

Further improvement in speed can be achieved by using the following options. This scheme will work better only for Release version of source code and only for ELF tool chain.

a. Compiler Option “funit-at-a-time”

This option enables parsing of the whole compilation unit before starting to produce code. This allows some extra optimizations to take place. This option removes unreferenced static variables and functions.

However, this may result in undefined references when an ‘asm statement refers directly to variables or functions that are otherwise unused. In that case either the variable/function shall be listed as an operand of the ‘asm statement operand or, in the case of top-level ‘asm statements the attribute used shall be used on the declaration.

b. Compiler Option “funroll-loops”

This option unrolls loop whose number of iterations can be determined at compile time or upon entry to the loop. This option implies both `-fstrength-reduce' and `-frerun-cse-after-loop'. This option makes code larger, and may or may not make it run faster.

c. Compiler Option “fno-gcse

This option may give better runtime performance as it disables the global common sub-expression elimination.

d. Compiler Option “finline-*”

 Most of these options are enabled by default when optimization switch “O3” is used.

 

2.2 H8 Specific

2.2.1  Using Command Line Options and attributes

a. Option “mtinydata” (v0601 and onward)

When "-mtinydata" target specific command line option is passed to the compiler while building the application code, compiler creates ".tinyrodata" , ".tinydata" and  ".tinybss" sections. All the constant variables are placed into the ".tinyrodata" section which gets relocated into the lower most 32kb memory region of H8. The initialized global variables are placed into the ".tinydata" section and all un-initialized variables are placed into the ".tinybss" section which is relocated into the upper most 32kb of memory region of H8. Hence, all these variables can be accessed using 16-bit addressing method. Thus the size of the code generated will be reduced resulting inachieving the code size optimization.

The ".tinybss" implementation does not work as expected. It is a known problem.

When "-mtinydata" target specific command line option is passed to the compiler while building the application code, compiler creates ".tinyrodata" , ".tinydata" and  ".tinybss" sections. All the constant variables are placed into the ".tinyrodata" section which gets relocated into the lower most 32kb memory region of H8. The initialized global variables are placed into the ".tinydata" section and all un-initialized variables are placed into the ".tinybss" section which is relocated into the upper most 32kb of memory region of H8. Hence, all these variables can be accessed using 16-bit addressing method.

The ".tinybss" implementation does not work as expected. It is a known problem.

 

2.3 SH Specific

2.3.1  Using Command Line Options and attributes

a. Option “ffast-math”

This option, when used for SH4A targets, enables the generation of sh4a target specific instructions, resulting in code size reduction and speed improvement.

 

2.4 M16CM32C Specific

2.4.1 Using Command Line Options and attributes

No target specific options or attributes are available for M16C architecture targets to achieve code size optimization.

 

2.5 RX Specific

2.5.1 Using Command Line Options and attributes

There are no RX specific speed optimization related options.

 

2.6 RL78 Specific

2.6.1 Using Command Line Options and attributes

There are no RL78 specific speed optimization related options.

 

2.7 V850 Specific

2.7.1 Using Command Line Options and attributes

a.-mep

Do not optimize (do optimize) basic blocks that use the same index pointer 4 or more times to copy pointer into the ep register, and use the shorter sld and sst instructions. The -mep option is on by default if you optimize.

3. Workaround for some known problems with Optimization


3.1 Generic

3.1.1  Null pointer deletion check

Example Snippet:

---------------------------------------------------------------------

                unsigned long foo (void)

                {

                volatile unsigned long i = 4;

                while (i)

                {

        i -= sizeof(unsigned long);

        if (*(unsigned long *)i == 12)

        return i;

                }

                }

----------------------------------------------------------------------

Optimization options:         “O2” and above.      

Targets observed on:          ALL

Observation:            

The “while” loop in above code is executed only once. Compiler doesn't check whether "i" is changed in while loop or not. All other essential code is optimized away by compiler.

Workaround:

User need to pass an option “-fno-delete-null-pointer-checks” along with “-O2” to solve above-mentioned problem.

3.1.2 const” array optimization

Example Snippet:

---------------------------------------------------------------------------------------------------------

                #define STEPPER_1_DTC 1
                #define STEPPER_2_DTC 2
                #define STEPPER_3_DTC 3
                #define NO_TBL 0
                const unsigned short dtcVectors[] __attribute__ (( section(".dtc_vec") )) =
                {

        STEPPER_1_DTC, 
        STEPPER_2_DTC, 
        STEPPER_3_DTC, 
        NO_TBL, 
        NO_TBL, 

                };

                int main(void)
                {

        return 0;

                }

---------------------------------------------------------------------------------------------------------

Optimization options:         “O1” and above.

Targets observed on:          ALL

Observation:            

If above file is compiled with C++ compiler and with optimization enabled, the compiler optimizes away the const array. Though this array is not directly referenced (as may case of vector table), compiler removes it as unused variable.

Workaround:

The C compiler behaves differently and doesn't optimize away the const array. Hence in the above-mentioned situation, the solution is to move const array in C file instead of C++ file.

3.2 H8 Specific

3.2.1  Un-necessary Loop unrolling

Example Snippet:

-----------------------------------------------------

                void theLoop( void )

                {

        unsigned long cnt = 0L;

        do

        cnt++;

        while (cnt < 1000000L);

                }

-----------------------------------------------------

Optimization options:         “O2” and above.

Targets observed on:          H8300H, H8300HN, H8S, H8SN

Observation:            

In generated output, loop is incremented by 50 or 25 or any random number instead of incrementing by one. 

Workaround 1:

User need to pass either of an option “-fno-strength-reduce” or “-fno-loop-optimize” along with “-O2” to solve above-mentioned problem.

Workaround 2:

Please include following statement inside empty while/for loop.

         asm("nop");

3.3 SH Specific   

Currently there are no such cases available for SH targets.

3.4 M16CM32C Specific

Currently there are no such cases available for M16C architecture series targets.

3.5 RX Specific

Currently there are no such cases available for RX targets.

3.6 RL78 Specific

Currently there are no such cases available for RL78 targets.

3.7 V850 Specific

Currently there are no such cases available for V850 targets.

 

 4. Optimization that user can do in application code itself


After applying optimization capabilities offered by the compiler, you may take your application a step further. However, this time by helping compiler by writing smart code.

Here are some tips about how you can achieve this,

4.1 Care to be taken in startup assembly file

Remove call to library routine “_exit” from startup assembly file or make it conditional so as to call only in debug mode. In release mode, use branch instruction for implementing unending loop. Macros DEBUG and Release are added in HEW debug and release modes. Call to “_exit” will be only in debug mode. This will save around 1K bytes in case of SH target.

4.2 General Tips

a.    If the function exits by returning the value of another function with the same parameters that were passed to your function, put the parameters in the same order in function prototypes. The compiler can then branch directly to the other function.

b.    Do not declare virtual functions inline. When declaring functions, use the “const” for constant variables.

c.    Use full prototype for all functions in “C” code.

d.    Use the built-in functions, instead of coding your own. In C program, functions are mapped to built-in functions, if you include “math.h” and “string.h”.

e.    Whenever possible, minimize breaking application source into too many small functions.

f.     Virtual functions and virtual inheritance to be used only when necessary, as these features are costly in object space and function invocation performance.

g.    While declaring a structure, put the variables in descending order i.e. declare the largest size member first.

h.    Also, its good idea to put members in a structure near each other, if their usage is mostly together.

i.      Reduce usage of global variables, static variables and volatile, whenever possible.

j.     Whenever possible, use constants instead of variables.

k.    Avoid taking the address of a variable. Taking the address of a variable inhibits optimizations that would otherwise be done on calculations involving that variable.

l.      Whenever possible, avoid forcing the compiler to convert numbers between two datatypes.

m.  Avoid the use of go-to statements.

 

5. Summary


To summarize, user can use following set of options (command line) to achieve,

a. maximum code size optimization

--------------------------------------------------------------------------------------------------------

        -Os                        -fno-function-cse             -funit-at-a-time       -falign-jumps          

        -fdata-sections        -ffunction-sections           –Wl,--gc-sections    

        AND    < target specific options >

--------------------------------------------------------------------------------------------------------

b. maximum code speed optimization

--------------------------------------------------------------------------------------------------------

        -O3                         -fno-function-cse             -funit-at-a-time       -funroll-loops          

        -fno-gcse                  AND    < target specific options >

--------------------------------------------------------------------------------------------------------

c. Code Size as well as Code Speed Optimization

As explained above, “Os” can be used for code size optimization and “O3” for code speed optimization.

However, “average” of both the optimization can be achieved by using optimization option “O2”

This option helps to get minimum code size while maintaining better code speed.

Therefore, it is recommended to use option “O2” to get optimum results for compromised effect of “Os” and “O3”.