LibBfd: The Binary File Descriptor library
This page aims to describe the internal implementation of the AVR32 ELF BFD backend. This is very much a work in progress, but some of the most important stuff is documented, so it may still be useful.
Table of Contents
Overview
The AVR32 ELF backend consists mainly of two files:
- cpu-avr32.c: Very basic architecture description.
- elf32-avr32.c: All the ugly low-level hooks necessary to link AVR32 ELF objects. Also contains some code related to parsing of object files and core files.
There are a few other files that had to be modified in order to make libbfd aware of the AVR32 architecture. Most of it is pure boilerplate, with one notable exception:
reloc.c. This file contains definitions of the AVR32
RelocationTypes.
Since ELF is the only supported object file format on AVR32, this documentation will focus on the ELF linking code, in particular how the low-level ELF hooks are implemented on AVR32. The AVR32 ELF backend interfaces with the generic ELF code in libbfd, which in turn interfaces with the generic libbfd code.
Interface to the BFD ELF core
The actual interface between the generic ELF code and the AVR32 ELF backend is located at the very end of
elf32-avr32.c as a set of macro definitions. After all the necessary macros have been defined, the file
elf32-target.h is included, hooking everything up to the internal ELF backend structure.
elf32-target.h is generated from
elfxx-target.h by substituting "32" for all occurrences of "NN".
A short walkthrough of all the AVR32-specific backend definitions follows. Note that
elf32-target.h provides reasonable defaults for all macros that haven't been defined by the backend.
-
ELF_ARCH - Defined as
bfd_arch_avr32. This has to custom writing paper correspond with the definitions used in archures.c and cpu-avr32.c
-
ELF_MACHINE_CODE - Magic number used to identify ELF files for AVR32. This is
EM_AVR32, which is defined by include/elf/common.h to be 0x18ad.
-
ELF_MAXPAGESIZE - Maximum page size supported by the target. Defined as 4096 bytes, which is not strictly true since the AVR32 supports page sizes up to 1MiB. At the moment, however, only 4K pages are supported for userspace programs.
-
TARGET_BIG_SYM - Big endian target vector. This has to correspond with the definitions in config.bfd and targets.c.
-
TARGET_BIG_NAME - Name of the big endian target. This can be used in the linker script to select this particular target, although this isn't all that useful on AVR32 since only one target is defined.
-
elf_backend_grok_prstatus - Hook for parsing process status information in core dump files.
-
elf_backend_grok_psinfo - Hook for parsing the elf_prpsinfo structure in core dump files.
-
elf_backend_may_use_rel_p - 0 because AVR32 does not use REL relocations (without addend.)
-
elf_backend_may_use_rela_p - 1 because AVR32 does use RELA relocations (with addend.)
-
elf_backend_default_use_rela_p - 1 because RELA is the default relocation type (duh.)
-
elf_backend_rela_normal - Not quite sure what this is.
-
elf_info_to_howto_rel - NULL because REL relocations aren't being used. I think this hook is only used on targets that use both REL and RELA.
-
elf_info_to_howto - Look up a "relocation howto" structure based on an ELF relocation type. A "relocation howto" is a common internal representation of a relocation which all architectures are forced to use whether they like it or not.
-
bfd_elf32_bfd_copy_private_bfd_data - Copy backend-specific data from one object module to another. This is necessary to preserve architecture-specific fields and flags when object files are copied.
-
bfd_elf32_bfd_merge_private_bfd_data - Merge backend-specific data from one object module into another. This is used when several object-files are combined into one output (i.e. linking), so this hook needs to consider both the flags in the input and the flags that have already been set in the output. When this has been called for all inputs, the resulting set of flags in the output should be compatible with all inputs. For AVR32, this specifically means that
EF_AVR32_PIC should only be set if all input files had this flags set. The same applies to EF_AVR32_LINKRELAX.
-
bfd_elf32_bfd_set_private_flags - Set the given target-specific flags in the given object file.
-
bfd_elf32_bfd_print_private_bfd_data - Display the target-specific flags that are set in a given object file.
-
bfd_elf32_new_section_hook - This hook is called whenever a new section is created. The AVR32 ELF backend uses this to allocate a private per-section data structure, struct avr32_section_data used during relaxing.
-
elf_backend_gc_mark_hook - Return the section referenced by a given relocation.
-
elf_backend_gc_sweep_hook - Called when a section is being removed. Go through all relocations in this section and update the internal state to reflect the fact that these relocations are no longer there. This includes decrementing the reference count of all GOT entries referenced by these relocations.
-
elf_backend_relocate_section - Relocate a section. This involves fixing up the raw binary data in the section to reference the symbol the relocation refers to directly. In dynamically linked executables and shared libraries, not all relocations can be resolved completely; this function will insert dynamic relocations (a small subset of the available relocations) for all such unresolved references so that they can be fixed up at runtime by the dynamic linker.
-
elf_backend_copy_indirect_symbol - Called when an indirect symbol is replaced by a direct symbol. Make sure that the direct symbol reflects the state associated with the indirect symbol.
-
elf_backend_create_dynamic_sections - Create any dynamic (i.e. linker-created) sections that may be needed during the link. This includes the .got (Global Offset Table), .stub (function stubs for lazy binding) and .rela.got (dynamic relocations for the .got section) sections as well as the
_GLOBAL_OFFSET_TABLE_ symbol.
-
bfd_elf32_bfd_link_hash_table_create - Create a link hash table. This allocates space for any AVR32-specific information in addition to the generic ELF link hash table.
-
elf_backend_adjust_dynamic_symbol - Called for all symbols that can't be resolved directly, i.e. they're either not defined in the output binary at all or they are only defined weakly (and can thus be overridden by a shared library at runtime.)
-
elf_backend_size_dynamic_sections - Determine the size of all linker-generated sections and allocate memory for them.
-
elf_backend_finish_dynamic_symbol - Finalize a dynamic symbol. This involves initializing a GOT entry for it and emitting dynamic relocations as necessary.
-
elf_backend_finish_dynamic_sections - Finalize all dynamic sections. This involves initializing the .dynamic section with various pieces of information for the dynamic linker.
-
bfd_elf32_bfd_relax_section - Go through all relocations in a section and try to replace them with smaller ones. This is called multiple times for each section, until no sections can be optimized further.
-
elf_backend_check_relocs - Initial pass over all sections to see if any special treatment is required. This can be dynamic section creation, dynamic relocations, got entries, etc. An initial reference count of all GOT entries is also made, and the highest refcount is recorded for use when allocating memory for the GOT sorting algorithm later on.
-
elf_backend_can_refcount - Used to do strange voodoo magic in elf.c. Defined to 1 because we can indeed refcount the GOT entries.
-
elf_backend_can_gc_sections - Defined to 1 because --gc-sections is supported.
-
elf_backend_plt_readonly - Defined to 1 since the PLT (which is not really used, but would have contained code if it was) should be read-only.
-
elf_backend_plt_not_loaded - Defined to 1.
-
elf_backend_want_plt_sym - 0. We don't want a PLT symbol (sheesh)
-
elf_backend_plt_alignent - 2. Whatever.
-
elf_backend_want_dynbss - 0. We don't need a dynamic .bss section.
-
elf_backend_want_got_plt - 0. We don't want a .got to go with the .plt which we don't want either.
-
elf_backend_want_got_sym - Yes, we do want the
_GLOBAL_OFFSET_TABLE_ symbol to be defined (although it seems like we have to define it ourselves anyway.)
-
elf_backend_got_header_size - The number of bytes at the start of the GOT before the actual GOT entries begin. On AVR32, this is 8 (i.e. two "entries".)
Data Structures
struct got_entry
This structure represents one entry in the Global Offset Table and consists of the following members:
- next
- Pointer to the next entry in the list of entries with the same refcount.
- pprev
- Pointer to the previous entry's next pointer in the list of entries with the same refcount.
- refcount
- Number of references to this GOT entry. This is initially calculated by
avr32_check_relocs and subsequently updated by the relaxation code and the GC code as relocations are converted or removed completely.
- offset
- This entry's current offset into the Global Offset Table. The offset will change during relaxation as relocations are re-sorted according to their refcount, removed or reinstated.
Note that traditionally, each GOT entry is represented by a union of refcount and offset. Since the AVR32 linker does aggressive optimization of the GOT, it needs to keep more state around and therefore uses the third member of the union: the
glist pointer. The BFD core tends to mess up this pointer from time to time by assigning either 0 or -1 to it, but this has been worked around by careful initialization of the init_{got,plt}_{refcount,offset} members in the link hash table.
All got_entry objects are kept in a pigeonhole array where they are moved around as their reference count changes. When the
.got section is relaxed, new offsets are assigned based on their position in the array: The ones with most references are assigned the lowest offsets. This is because large offsets may cause large instructions or even multiple instructions to be emitted when referencing them, so this strategy ensures that as many instructions as possible are kept small.
struct elf_avr32_link_hash_entry
This structure represents a global symbol in the link hash table. Global symbols, as opposed to local ones, are referenced by name and must therefore be kept in a hash table for easy lookup as the linker comes across references to them. Global symbols also require a bit extra care to be taken when dealing with them; they may be defined in a shared library, they may be overridden and they may require dynamic relocations to be emitted for references to them.
This structure consists of the following members:
- root
- The generic ELF link hash entry that we are extending.
- possibly_dynamic_relocs
- Number of non-GOT references to this symbol, as determined by the
avr32_check_relocs hook. How many of these references will actually require dynamic relocations to be emitted is not known until later since it depends on whether or not the symbol can be fully resolved.
- no_fn_stub
- If non-zero, something other than a plain function call was found referring to this symbol. This means that lazy binding can not be used.
- readonly_reloc_sec
- If non-NULL, a reference to this symbol which may possibly require a dynamic relocation was found in a read-only section, in which case this member contains a pointer to the section where the first such reference was found. If it later turns out that this symbol is indeed dynamic, the link fails and this member as well as the next is used to print an informative error message.
- readonly_reloc_offset
- Offset into
readonly_reloc_sec where the first possibly-illegal (see above) reference to the symbol was found. Used to make the error message more informative.
- sym_frag
- The fragment (if any) where this symbol is located. This is used to keep track of any movements of this symbol during relaxing. When the final size of all relocations have been determined, the symbol value is updated according to how much the associated fragment moved.
struct elf_avr32_link_hash_table
This structure contains the link hash table of global symbols as well as pretty much anything that is considered global to the link as a whole. It consists of the following members:
- root
- The generic ELF link hash table that we are extending.
- sgot
- Shortcut pointer to the section containing the GOT.
- srelgot
- Shortcut pointer to the section containing dynamic relocations for the GOT section.
- sstub
- Shortcut pointer to the section containing function stubs for lazy binding.
- got_hole
- Pointer to the pigeonhole array used to sort the GOT. The storage for this array is allocated by
avr32_elf_size_dynamic_sections.
- nr_got_holes
- The number of entries in the
got_hole array.
- local_dynamic_relocs
- Number of possibly-dynamic relocations to local symbols. When linking shared libraries with -Bsymbolic or normal executables, references to local symbols are never dynamic, so this member is only used when linking a shared library without -Bsymbolic.
- relocations_analyzed
- Before any relaxing can be done, the linker must make a pass over all relocations to analyze them and divide the sections into fragments. If this member is set, this pass has been done.
- symbols_adjusted
- If set, all symbols have had their value adjusted according to frag movement. This is done all in one go after the size of all relocations is known but before the section contents are actually moved.
- repeat_pass
- If set, at least one relocation or the GOT changed size, so another iteration needs to be done before moving on to the next stage.
- relax_iteration
- How many complete relax iterations over all sections have been done so far.
- relax_pass
- Current relax pass (or rather, relax stage.) The relax code first does one or more iterations of the "size" pass and then does one "move" pass to update symbols, relocations and section contents.
struct relax_state
During relaxing, each relaxable relocation moves between different
relax states. This structure describes one such state: The opcode and relocation type it corresponds to, the range of values that can be represented by the immediate, how large the result is and which state changes are allowed.
Each relax state is classified according to its
reference type, which can be one of the following:
-
REF_ABSOLUTE - Absolute reference. The symbol value is encoded directly into the immediate. The mov instruction is an example of an instruction taking an absolute reference.
-
REF_PCREL - PC-relative reference. The distance from the reference to the symbol is encoded into the immediate. The sub and rcall instructions are examples of instructions taking relative references.
-
REF_CPOOL - Constant pool indirect reference. The symbol value is entered into the constant pool and loaded into the target register using a lddpc or ld.w instruction or called using the mcall instruction. Such references are not position-independent, so they can usually not be used in shared libraries.
-
REF_GOT - GOT indirect reference. The symbol value is entered into the Global Offset Table and loaded into the target register using a ld.w instruction or called using the mcall instruction. This must be used for references to run-time external symbols. When linking shared libraries without -Bsymbolic, all references to global symbols must use
REF_GOT.