Progress beyond the state of the art

The following is a summary of the state of the art in analysing the timing behaviour of embedded real-time systems and in reconciling performance with predictability. We also describe the baselines from which the project starts and define measures of progress.

Architecture

Baseline 1

The problem of WCET determination for single tasks and quite complex processors has been essentially solved. Commercial tools are available, all of them coming from Europe. Experience has shown that predictability critically hinges on some architectural features such as the cache replacement strategy. Recent analytical results show that LRU caches have the best predictability properties of all set associative cache architectures. However, they are costly to implement and therefore rarely used with high degrees of associativity. Caches whose implementation is less costly than that of LRU and whose predictability is higher than that of PLRU will be developed in the project.

Criteria for success

New cache-replacement strategy with implementation costs lower than for LRU and predictability better for PLRU
Special techniques to provide for a fast development of cache analyses for newly designed caches
Reduced time for the realization of cache analyses

Baseline 2

Higher degrees of predictability can be achieved by taking decisions statically rather than dynamically. Compiler-directed memory management using scratchpad memory originally developed to decrease energy consumption also increases time predictability. Parallelism instead of speculation is used in EPIC and VLIW architectures.

Criteria for success

Specification of models for different classes of memories within the reference execution platform
Exploitation of these models by compilers with respect to different classes of optimisation goals, e.g. predictability or energy consumption

Baseline 3

Multi-core and MPSoC targets offer a good performance-energy ratio. However, they need simple cores and predictable interconnects. Simple cores should neither have timing anomalies nor domino effects. Timing analysis for multi-core platforms and MPSoCs has not been developed so far. The complexity of such a development greatly depends on the sharing layer in the memory hierarchy and the used application programming model. Sharing of caches combined with loose synchronisation will lead to extremely high analysis efforts. Strong synchronisation will loose the performance benefits. A realistic architecture will be selected by an industrial end user, and a prototype timing-analysis tool will be realised for this architecture.

Criterion for success

Prototype timing-analysis tool realised.

Baseline 4

System interconnects present a significant challenge to predictability, in that they are shared among multiple communication actors (cores, IOs, accelerators, etc.). Time-triggered communication protocols have been proposed to enhance interconnect robustness and predictability. Real-time and contention-free networking concepts are also being exploited to provide strong predictability properties for scalable interconnects. Two key challenges have yet to be fully addressed:

reducing the performance, area and power overhead of predictable communication support
ensuring predictability across the interface between processor cores and the communication fabric

Operating system

To increase the efficiency of the system, resource allocation should not be driven by worst-case assumptions, and reclaiming mechanisms should be adopted in the kernel whenever resources are not fully utilised. On the other hand, sporadic peak load situations (which become more frequent under a more optimistic design) should not put the system in a critical condition, hence suitable overload and overrun management mechanisms must be adopted in the operating system, to prevent unpredictable performance degradation.

Methods to determine and reduce context-switch costs will be used for a fair comparison between preemptive and non-preemptive scheduling.

Criterion for success

Strong reduction of context-switch costs for sets of not heavily dependent tasks of moderate size.

Software

Model-based design (MDB) and code synthesis are frequently used in embedded systems development. The code synthesised from formal specifications is often very cleanly structured. This supports timing analysis. High precision can be achieved. Information contained in the specified model can be exploited to increase precision and to reduce the necessary effort for user annotation.

Criterion for success

High precision and reduced user-annotation effort.

Task level

Model-based approaches to the design of complex embedded systems have become popular. The objective is to determine formal models for all relevant elements such as application, communication and synchronisation, scheduling, hardware architecture, and the associated mapping. These models need to be consistent allowing for an increased predictability of the overall system behaviour. Conservative design aims at avoiding non-functional dependencies between tasks. On the other hand, it is well known that the model-based approach impedes restrictions on the system specification that are not well accepted by designers.

The use of models for predictability in software synthesis and optimisation is uncommon. Optimised memory hierarchy exploitation by compilers focusing on WCET is frequently unaware of worst-case properties and their dynamics during code optimisation. In the few cases where WCET-aware compiler optimisations that simultaneously consider altering worst-case paths are published, the authors report a significantly increased complexity of the proposed optimisations.

Criteria for success

WCET-aware compiler infrastructure realised
Reduced WCETs of compiler-generated code due to developed WCET-aware memory allocation strategies
A good trade-off between WCET and other optimisation objectives such as energy dissipation

Models of computation

A prominent example of a model of computation that increases the predictability is the time-triggered paradigm. The time-triggered paradigm can be seen as an extreme solution that combines well-known techniques such as the use of preemption points and cooperative multitasking in order to increase the predictability of interrupts and task switching. A recent example of a methodology based on a restricted model is Giotto. It allows for the consideration of harmonic periods, involves mode changes during the execution, and strictly separates computation and control. The approach is closely related to the synchronous paradigm and corresponding languages, for example Esterel.

Results are available concerning the predictability of restricted data flow models such as synchronous data flow (SDF) graphs. They are characterised by a deterministic partial ordering of tasks which favour time predictability. There are recent approaches to extend the pure control-dominated and pure dataflow-oriented models described above towards a unified model that still allows for worst-case analysis. Examples in this direction are the recurring task model, the SPI and the FunState model.

Reducing interference

In order to predict communication delays, several protocols have been proposed that schedule the messages statically, e.g. the TDMA (Time Division Multiple Access), the time-triggered protocol and the corresponding TTA (Time Triggered Architecture). For task scheduling, statically determined cyclic schedules are usually used. In case of a distributed system, this mode of operation assumes that all participating nodes are perfectly synchronised. However, it is well known that this restriction to a pure static operation comes with the disadvantage of higher cost, higher power consumption, reduced flexibility and smaller utilisation factors. Therefore, advanced analysis techniques have been investigated to provide close bounds on the time behaviour even in case of event-triggered or mixed models.

Influencing the memory allocation allows a reduction of the interference of preemptible tasks on the caches and improved predictability of context-switching costs. This approach will be combined with Scuola Superiore Sant'Anna's concept of pre-emption groups to still improve predictability. Depending on the characteristics of the tasks sets, e.g. task sizes, dependencies, this approach is expected to yield low and precisely predictable context-switching costs on some task sets.

Approaches to predictable virtual-memory-based systems mostly consist in dedicating disjoint sets of resources to critical tasks often requiring hardware support. The first is risking performance, the second is very difficult to argue with the architecture community. Appropriate combinations of scratchpad memory and predictable virtual caches will reduce the variability of memory access times and thus make systems using virtual memory analysable.

Criteria for success

Design of virtual memory with good predictability
Reduced WCETs of compiler-generated code due to virtual memory-aware optimisations

Advanced analysis

Recent approaches to the timing analysis of distributed embedded systems handle messages and communication resources in a similar way as tasks and computation resources. They all start from a restricted event model, e.g. periodic, sporadic or periodic with jitter. There have been analysis results where the interference between event-triggered and time-triggered computation and/or communication paradigms can be bounded. Recently, a unifying approach to performance analysis was proposed that is based on real-time calculus and a generalised event model that allows the modelling of hierarchical and heterogeneous scheduling and arbitration. The approach can be used to model both computation and communication resources, which can be considered to be a step towards a composeable worst-case analysis of distributed embedded systems.

Criterion for success

Prototype analysis tool available that allows the timing and memory analysis of the distribution of tasks and their interaction in distributed embedded systems, multicore platforms and MPSoCs.

Partitioned caches

Partitioned caches are used to improve timing predictability. However, as opposed to simple scratchpad memories, they require non-standard hardware mechanisms. Also, no tight integration of work on partitioned caches with worst-case execution time considerations is known.