## Publikationer Institutionen för systemteknik

DatorseendeDatorteknik

Elektroniska Kretsar och System

Fordonssystem

Informationskodning

Kommunikationssystem

Reglerteknik

## Senaste doktorsavhandlingarna

Numerical algorithms for efficiently solving optimal control problems are important for commonly used advanced control strategies, such as model predictive control (MPC), but can also be useful for advanced estimation techniques, such as moving horizon estimation (MHE). In MPC, the control input is computed by solving a constrained finite-time optimal control (CFTOC) problem on-line, and in MHE the estimated states are obtained by solving an optimization problem that often can be formulated as a CFTOC problem. Common types of optimization methods for solving CFTOC problems are interior-point (IP) methods, sequential quadratic programming (SQP) methods and active-set (AS) methods. In these types of methods, the main computational effort is often the computation of the second-order search directions. This boils down to solving a sequence of systems of equations that correspond to unconstrained finite-time optimal control (UFTOC) problems. Hence, high-performing second-order methods for CFTOC problems rely on efficient numerical algorithms for solving UFTOC problems. Developing such algorithms is one of the main focuses in this thesis. When the solution to a CFTOC problem is computed using an AS type method, the aforementioned system of equations is only changed by a low-rank modification between two AS iterations. In this thesis, it is shown how to exploit these structured modifications while still exploiting structure in the UFTOC problem using the Riccati recursion. Furthermore, direct (non-iterative) parallel algorithms for computing the search directions in IP, SQP and AS methods are proposed in the thesis. These algorithms exploit, and retain, the sparse structure of the UFTOC problem such that no dense system of equations needs to be solved serially as in many other algorithms. The proposed algorithms can be applied recursively to obtain logarithmic computational complexity growth in the prediction horizon length. For the case with linear MPC problems, an alternative approach to solving the CFTOC problem on-line is to use multiparametric quadratic programming (mp-QP), where the corresponding CFTOC problem can be solved explicitly off-line. This is referred to as explicit MPC. One of the main limitations with mp-QP is the amount of memory that is required to store the parametric solution. In this thesis, an algorithm for decreasing the required amount of memory is proposed. The aim is to make mp-QP and explicit MPC more useful in practical applications, such as embedded systems with limited memory resources. The proposed algorithm exploits the structure from the QP problem in the parametric solution in order to reduce the memory footprint of general mp-QP solutions, and in particular, of explicit MPC solutions. The algorithm can be used directly in mp-QP solvers, or as a post-processing step to an existing solution.

```
@phdthesis{diva2:1089521,
author = {Nielsen, Isak},
title = {{Structure-Exploiting Numerical Algorithms for Optimal Control}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1848}},
year = {2017},
address = {Sweden},
}
```

Bayesian state estimation is a flexible framework to address relevant problems at the heart of existing and upcoming technologies. Application examples are obstacle tracking for driverless cars and indoor navigation using smartphone sensor data. Unfortunately, the mathematical solutions of the underlying theory cannot be translated to computer code in general. Therefore, this thesis discusses algorithms and approximations that are related to the Kalman filter (KF).

Four scientific articles and an introduction with the relevant background on Bayesian state estimation theory and algorithms are included. Two articles discuss nonlinear Kalman filters, which employ the KF measurement update in nonlinear models. The numerous variants are presented in a common framework and the employed moment approximations are analyzed. Furthermore, their application to target tracking problems is discussed. A third article analyzes the ensemble Kalman filter (EnKF), a Monte Carlo implementation of the KF that has been developed for high-dimensional geoscientific filtering problems. The EnKF is presented in a simple KF framework, including its challenges, important extensions, and relations to other filters. Whereas the aforementioned articles contribute to the understanding of existing algorithms, a fourth article devises novel filters and smoothers to address heavy-tailed noise. The development is based on Student’s *t *distribution and provides simple recursions in the spirit of the KF. The introduction and articles are accompanied by extensive simulation experiments.

```
@phdthesis{diva2:1077486,
author = {Roth, Michael},
title = {{Advanced Kalman Filtering Approaches to Bayesian State Estimation}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1832}},
year = {2017},
address = {Sweden},
}
```

System identification is used in engineering sciences to build mathematical models from data. A common issue in system identification problems is that the true inputs to the system are not fully known. In this thesis, existing approaches to unknown input problems are classified and some of their properties are analyzed.

A new indirect framework is proposed to treat system identification problems with unknown inputs. The effects of the unknown inputs are assumed to be measured through possibly unknown dynamics. Furthermore, the measurements may also be dependent on other known or measured inputs and can in these cases be called indirect input measurements. Typically, these indirect input measurements can arise when a subsystem of a larger system is of interest and only a limited set of sensors is available. Two examples are when it is desired to estimate parts of a mechanical system or parts of a dynamic network without full knowledge of the signals in the system. The input measurements can be used to eliminate the unknown inputs from a mathematical model of the system through algebraic manipulations. The resulting indirect model structure only depends on known and measured signals and can be used to estimate the desired dynamics or properties. The effects of using the input measurements are analyzed in terms of identifiability, consistency and variance properties. It is shown that cancelation of shared dynamics can occur and that the resulting estimation problem is similar to errors-in-variables and closed-loop estimation problems because of the noisy inputs used in the model. In fact, the indirect framework unifies a number of already existing system identification problems that are contained as special cases.

For completeness, an instrumental variable method is proposed as one possibility for estimating the indirect model. It is shown that multiple datasets can be used to overcome certain identifiability issues and two approaches, the multi-stage and the joint identification approach, are suggested to utilize multiple datasets for estimation of models. Furthermore, the benefits of using the indirect model in filtering and for control synthesis are briefly discussed.

To show the applicability, the framework is applied to the roll dynamics of a ship for tracking of the loading conditions. The roll dynamics is very sensitive to changes in these conditions and a worst-case scenario is that the ship will capsize. It is assumed that only motion measurements from an inertial measurement unit (IMU) together with measurements of the rudder angle are available. The true inputs are thus not available, but the measurements from the IMU can be used to form an indirect model from a well-established ship model. It is shown that only a subset of the unknown parameters can be estimated simultaneously. Data was collected in experiments with a scale ship model in a basin and the joint identification approach was selected for this application due to the properties of the model. The approach was applied to the collected data and gave promising results.

```
@phdthesis{diva2:1068051,
author = {Linder, Jonas},
title = {{Indirect System Identification for Unknown Input Problems:
With Applications to Ships}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1829}},
year = {2017},
address = {Sweden},
}
```

The market for wireless portable devices has grown signicantly over the recent years.Wireless devices with ever-increased functionality require high rate data transmissionand reduced costs. High data rate is achieved through communication standards such asLTE and WLAN, which generate signals with high peak-to-average-power ratio (PAPR),hence requiring a power amplier (PA) that can handle a large dynamic range signal. Tokeep the costs low, modern CMOS processes allow the integration of the digital, analogand radio functions on to a single chip. However, the design of PAs with large dynamicrange and high eciency is challenging due to the low voltage headroom.

To prolong the battery life, the PAs have to be power-ecient as they consume a sizablepercentage of the total power. For LTE and WLAN, traditional transmitters operatethe PA at back-o power, below their peak efficiency, whereas pulse-width modulation(PWM) transmitters use the PA at their peak power, resulting in a higher efficiency.PWM transmitters can use both linear and SMPAs where the latter are more power efficient and easy to implement in nanometer CMOS. The PWM transmitters have a higher efficiency but suffer from image and aliasing distortion, resulting in a lower dynamic range,amplitude and phase resolution.

This thesis studies several new transmitter architectures to improve the dynamicrange, amplitude and phase resolution of PWM transmitters with relaxed filtering requirements.The architectures are suited for fully integrated CMOS solutions, in particular forportable applications.

The first transmitter (MAF-PWMT) eliminates aliasing and image distortions whileallowing the use of SMPAs by combining RF-PWM and band-limited PWM. The transmittercan be implemented using all-digital techniques and exhibits an improved linearity and spectral performance. The approach is validated using a Class-D PA based transmitter where an improvement of 10.2 dB in the dynamic range compared to a PWM transmitter for a 1.4 MHz of LTE signal is achieved.

The second transmitter (AC-PWMT) compensates for aliasing distortion by combining PWM and outphasing. It can be used with switch-mode PAs (SMPAs) or linear PAs at peak power. The proposed transmitter shows better linearity, improved spectral performanceand increased dynamic range as it does not suffer from AM-AM distortion of the PAs and aliasing distortion due to digital PWM. The idea is validated using push-pull PAs and the proposed transmitter shows an improvement of 9 dB in the dynamic rangeas compared to a PWM transmitter using digital pulse-width modulation for a 1.4 MHzLTE signal.

The third transmitter (MD-PWMT) is an all-digital implementation of the second transmitter. The PWM is implemented using a Field Programmable Gate Array(FPGA) core, and outphasing is implemented as pulse-position modulation using FPGA transceivers, which drive two class-D PAs. The digital implementation offers the exibility to adapt the transmitter for multi-standard and multi-band signals. From the measurement results, an improvement of 5 dB in the dynamic range is observed as compared to an all-digital PWM transmitter for a 1.4 MHz LTE signal.

The fourth transmitter (EP-PWMT) improves the phase linearity of an all-digital PWM transmitter using PWM and asymmetric outphasing. The transmitter uses PWM to encode the amplitude, and outphasing for enhanced phase control thus doubling the phase resolution. The measurement setup uses Class-D PAs to amplify a 1.4 MHz LTEup-link signal. An improvement of 2.8 dB in the adjacent channel leakage ratio is observed whereas the EVM is reduced by 3.3 % as compared to an all-digital PWM transmitter.

The fifth transmitter (CRF-ML-PWMT) combines multilevel and RF-PWM, whereas the sixth transmitter (CRF-MP-PMWT) combines multiphase PWM and RF-PWM. Both transmitters have smaller chip area as compared to the conventional multiphase and multilevel PWM transmitters, as a combiner is not required. The proposed transmitters also show better dynamic range and improved amplitude resolution as compared to conventional RF-PWM transmitters.

The solutions presented in this thesis aims to enhance the performance and simplify the digital implementation of PWM-based RF transmitters.

```
@phdthesis{diva2:1066473,
author = {Haque, Muhammad Fahim Ul},
title = {{Pulse-Width Modulated RF Transmitters}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1822}},
year = {2017},
address = {Sweden},
}
```

Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.

```
@phdthesis{diva2:1049161,
author = {Wallenberg, Marcus},
title = {{Embodied Visual Object Recognition}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1811}},
year = {2017},
address = {Sweden},
}
```

In recent years, inertial sensors have undergone major developments. The quality of their measurements has improved while their cost has decreased, leading to an increase in availability. They can be found in stand-alone sensor units, so-called inertial measurement units, but are nowadays also present in for instance any modern smartphone, in Wii controllers and in virtual reality headsets.

The term inertial sensor refers to the combination of accelerometers and gyroscopes. These measure the external specific force and the angular velocity, respectively. Integration of their measurements provides information about the sensor's position and orientation. However, the position and orientation estimates obtained by simple integration suffer from drift and are therefore only accurate on a short time scale. In order to improve these estimates, we combine the inertial sensors with additional sensors and models. To combine these different sources of information, also called sensor fusion, we make use of probabilistic models to take the uncertainty of the different sources of information into account. The first contribution of this thesis is a tutorial paper that describes the signal processing foundations underlying position and orientation estimation using inertial sensors.

In a second contribution, we use data from multiple inertial sensors placed on the human body to estimate the body's pose. A biomechanical model encodes the knowledge about how the different body segments are connected to each other. We also show how the structure inherent to this problem can be exploited. This opens up for processing long data sets and for solving the problem in a distributed manner.

Inertial sensors can also be combined with time of arrival measurements from an ultrawideband (UWB) system. We focus both on calibration of the UWB setup and on sensor fusion of the inertial and UWB measurements. The UWB measurements are modeled by a tailored heavy-tailed asymmetric distribution. This distribution naturally handles the possibility of measurement delays due to multipath and non-line-of-sight conditions while not allowing for the possibility of measurements arriving early, i.e. traveling faster than the speed of light.

Finally, inertial sensors can be combined with magnetometers. We derive an algorithm that can calibrate a magnetometer for the presence of metallic objects attached to the sensor. Furthermore, the presence of metallic objects in the environment can be exploited by using them as a source of position information. We present a method to build maps of the indoor magnetic field and experimentally show that if a map of the magnetic field is available, accurate position estimates can be obtained by combining inertial and magnetometer measurements.

```
@phdthesis{diva2:1054718,
author = {Kok, Manon},
title = {{Probabilistic modeling for sensor fusion with inertial measurements}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1814}},
year = {2016},
address = {Sweden},
}
```

Wireless sensor networks (WSNs) are employed in many applications, such as for monitoring bio-potential signals and environmental information. These applications require high-resolution (*> *12-bit) analog-to-digital converters (ADCs) at low-sampling rates (several kS/s). Such sensor nodes are usually powered by batteries or energy-harvesting sources hence low power consumption is primary for such ADCs. Normally, tens or hundreds of autonomously powered sensor nodes are utilized to capture and transmit data to the central processor. Hence it is profitable to fabricate the relevant electronics, such as the ADCs, in a low-cost standard complementary metal-oxide-semiconductor (CMOS) process. The two-stage pipelined successive approximation register (SAR) ADC has shown to be an energy-efficient architecture for high resolution. This thesis further studies and explores the design limitations of the pipelined SAR ADC for high-resolution and low-speed applications.

The first work is a 15-bit, 1 kS/s two-stage pipelined SAR ADC that has been implemented in 0.35-*μ*m CMOS process. The use of aggressive gain reduction in the residue amplifier combined with a suitable capacitive array digital-to-analog converter (DAC) topology in the second-stage simplifies the design of the operational transconductance amplifier (OTA) while eliminating excessive capacitive load and consequent power consumption. A comprehensive power consumption analysis of the entire ADC is performed to determine the number of bits in each stage of the pipeline. Choice of a segmented capacitive array DAC and attenuation capacitorbased DAC for the first and second stages respectively enable significant reduction in power consumption and area. Fabricated in a low-cost 0.35-*μ*m CMOS process, the prototype ADC achieves a peak signal-to-noise-and-distortion ratio (SNDR) of 78.9 dB corresponding to an effective number of bits (ENOB) of 12.8-bit at a sampling frequency of 1 kS/s and provides a Schreier figure-of-merit (FoM) of 157.6 dB. Without any form of calibration, the ADC maintains an ENOB *> *12.1-bit up to the Nyquist bandwidth of 500 Hz while consuming 6.7 *μ*W. Core area of the ADC is 0.679 mm^{2}.

The second work is a 14-bit, tunable bandwidth two-stage pipelined SAR ADC which is suitable for low-power, cost-effective sensor readout circuits. To overcome the high open-loop DC gain requirement of the OTA in the gain-stage, a 3-stage capacitive charge pump (CCP) is utilized to achieve the gain-stage instead of using the switch capacitor (SC) amplifier. Unity-gain OTAs have been used as the analog buffers to prevent the charge sharing between the CCP stages. The detailed design considerations are given in this work. The prototype ADC, designed and fabricated in a low-cost 0.35-*μ*m CMOS process, achieves a peak SNDR of 75.6 dB at a sampling rate of 20 kS/s and 76.1 dB at 200 kS/s while consuming 7.68 *μ*W and 96 *μ*W, respectively. The corresponding Schreier FoM are 166.7 dB and 166.3 dB. Since the bandwidth of CCP is tunable, the ADC maintains a SNDR *> *75 dB up

to 260 kHz. The core area occupied by the ADC is 0.589 mm^{2}.

As the low-power sensors might be active only for very short time triggered by an external pulse to acquire the data, the third work is a 14-bit asynchronous two-stage pipelined SAR ADC which has been designed and simulated in 0.18-*μ*m CMOS process. A self-synchronous loop based on an edge detector is utilized to generate an internal clock with variable phase. A tunable delay element enables to allocate the available time for the switch capacitor DACs and the gain-stage. Three separate asynchronous clock generators are implemented to create the control signals for two sub-ADCs and the gain-stage between. Aiming to reduce the power consumption of the gain-stage, simple source followers as the analog buffers are implemented in the 3-stage CCP gain-stage. Post-layout simulation results show that the ADC achieves a SNDR of 83.5 dB while consuming 2.39 *μ*W with a sampling rate of 10 kS/s. The corresponding Schreier FoM is 176.7 dB.

```
@phdthesis{diva2:1056744,
author = {Chen, Kairang},
title = {{Energy-Efficient Data Converters for Low-Power Sensors}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1816}},
year = {2016},
address = {Sweden},
}
```

Today's complex electronic systems with billions of transistors on a single die are enabled by the aggressive scaling down of the device feature size at an exponential rate as predicted by the Moore's law. Digital circuits benefit from technology scaling to become faster, more energy efficient as well as more area efficient as the feature size is scaled down. Moreover, digital design also benefits from mature CAD tools that simplify the design and cross-technology porting of complex systems, leveraging on a cell-based design methodology. On the other hand, the design of analog circuits is getting increasingly difficult as the feature size scales down into the deep nanometer regime due to a variety of reasons like shrinking voltage headroom, reducing intrinsic gain of the devices, increasing noise coupling between circuit nodes due to shorter distances etc. Furthermore, analog circuits are still largely designed with a full custom design ow that makes their design and porting tedious, slow, and expensive. In this context, it is attractive to consider realizing analog/mixed-signal circuits using standard digital components. This leads to scaling-friendly mixed-signal blocks that can be designed and ported using the existing CAD framework available for digital design. The concept is already being applied to mixed-signal components like frequency synthesizers where all-digital architectures are synthesized using standard cells as basic components. This can be extended to other mixed-signal blocks like digital-to-analog and analog to- digital converters as well, where the latter is of particular interest in this thesis.

A voltage-controlled oscillator (VCO)-based analog-to-digital converter (ADC) is an attractive architecture to achieve all-digital analog-to digital conversion due to favorable properties like shaping of the quantization error, inherent anti-alias filtering etc. Here a VCO operates as a signal integrator as well as a quantizer. A converter employing a ring oscillator as the VCO lends itself to an all-digital implementation.

In this dissertation, we explore the design of VCO-based ADCs synthesized using digital standard cells with the long-term goal of achieving high performance data converters built from low accuracy switch components. In a first step, an ADC is designed using vendor supplied standard cells and fabricated in a 65 nm CMOS process. The converter delivers an 8-bit ENOB over a 25 MHz bandwidth while consuming 3.3 mW of power resulting in an energy efficiency of 235 fJ/step (Walden FoM). Then we utilize standard digital CAD tools to synthesize converter designs that are fully described using a hardware description language. A polynomial-based digital post-processing scheme is proposed to correct for the VCO nonlinearity. In addition, pulse modulation schemes like delta modulation and asynchronous sigma-delta modulation are used as a signal pre-coding scheme, in an attempt to reduce the impact of VCO nonlinearity on converter performance. In order to investigate the scaling benefits of all-digital data conversion, a VCO-based converter is designed in a 28 nm CMOS process. The design delivers a 13.4-bit ENOB over a 5 MHz bandwidth achieving an energy efficiency of 4.3 fJ/step according to post-synthesis schematic simulation, indicating that such converters have the potential of achieving good performance in deeply scaled processes by exploiting scaling benefits. Furthermore, large conversion errors caused by non-ideal sampling of the oscillator phase are studied. An encoding scheme employing ones counters is proposed to code the sampled ring oscillator output into a number, which is resilient to a class of sampling induced errors modeled by temporal reordering of the transitions in the ring. The proposed encoding reduces the largest error caused by random reordering of up to six subsequent bits in the sampled signal from 31 to 2 LSBs. Finally, the impact of process, voltage, and temperature (PVT) variations on the performance while operating the converter from a subthreshold supply is investigated. PVT-adaptive solutions are suggested as a means to achieve energy-efficient operation over a wide range of PVT conditions.

```
@phdthesis{diva2:1049563,
author = {Unnikrishnan, Vishnu},
title = {{Design of VCO-based ADCs}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1812}},
year = {2016},
address = {Sweden},
}
```

In the last ten years, limited clock frequency scaling and increasing power density has shifted IC design focus towards parallelism, heterogeneity and energy efficiency. Improving energy efficiency is by no means simple and it calls for a reevaluation of old design choices in processor architecture, and perhaps more importantly, development of new programming methodologies that exploit the features of modern architectures.

This thesis discusses the design of energy-efficient digital signal processors with application-specific instructions sets, so-called ASIP-DSPs, and their programming tools. Target applications for such processors include, but are not limited to, communications, multimedia, image processing, intelligent vision and radar. These applications are often implemented by a limited set of kernel algorithms, whose performance and efficiency are critical to the application's success. At the same time, the extreme non-recurring engineering cost of system-on-chip designs means that product life-time must be kept as long as possible. Neither general-purpose processors nor non-programmable ASICs can meet both the flexibility and efficiency requirements, and ASIPs may instead be the best trade-off between all the conflicting goals.

Traditional superscalar- and VLIW processor design focus has been to improve the throughput of fine-grained instructions, which results in high flexibility, but also high energy consumption. SIMD architectures, on the other hand, are often restricted by inefficient data access. The result is architectures which spend more energy and/or time on supporting operations rather than actual computing.

This thesis defines the performance limit of an architecture with an N-way parallel datapath as consuming 2N elements of compute data per clock cycle. To approach this performance, this work proposes coarse-grained higher-order functional (HOF) instructions, which encode the most frequently executed compute-, data access- and control sequences into single many-cycle instructions, to reduce the overheads of instruction delivery, while at the same time maintaining orthogonality. The work further investigates opportunities for operation fusion to improve computing performance, and proposes a flexible memory subsystem for conflict-free parallel memory access with permutation and lookup-table-based addressing, to ensure that high computing throughput can be sustained even in the presence of irregular data access patterns. These concepts are extensively studied by implementing a large kernel algorithm library with typical DSP kernels, to prove their effectiveness and adequacy. Compared to contemporary VLIW DSP solutions, our solution can practically eliminate instruction fetching energy in many scenarios, significantly reduce control path switching, simplify the implementation of kernels and reduce code size, sometimes by as much as 30 times.

The techniques proposed in this thesis have been implemented in the DSP platform ePUMA (embedded Parallel DSP processor with Unique Memory Access), a configurable control-compute heterogeneous platform with distributed memory, optimized for low-power predictable DSP computing. Hardware evaluation has been done with FPGA prototypes. In addition, several VLSI layouts have been created for energy and area estimations. This includes smaller designs, as well as a large design with 73 cores, capable of 1280 integer GOPS or 256 GFLOPS at 500MHz and which measures 45mm2 in 28nm FD-SOI technology.

In addition to the hardware design, this thesis also discusses parallel programming flow for distributed memory architectures and ePUMA application implementation. A DSP kernel programming language and its compiler is presented. This effectively demonstrates how kernels written in a high-level language can be translated into HOF instructions for very high processing efficiency.

```
@phdthesis{diva2:954326,
author = {Karlsson, Andr\'{e}as},
title = {{Design of Energy-Efficient High-Performance ASIP-DSP Platforms}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1772}},
year = {2016},
address = {Sweden},
}
```

Heavy duty powertrains are complex systems with components from various domains, different response times during transient operations and different efficient operating ranges. To ensure efficient transient operation of a powertrain, e.g. with low fuel consumption or short transient duration, it is important to come up with proper control strategies. In this dissertation, optimal control theory is used to calculate and analyze efficient heavy duty powertrain controls during transient operations in different applications. This is enabled by first developing control ready models, usable for multi-phase optimal control problem formulations, and then using numerical optimal control methods to calculate the optimal transients.

Optimal control analysis of a wheel loader operating in a repetitive loading cycle is the first studied application. Increasing fuel efficiency or reducing the operation time in such repetitive loading cycles sums up to large savings over longer periods of time. Load lifting and vehicle traction consume almost all of the power produced by a diesel engine during wheel loader operation. Physical models are developed for these subsystems where the dynamics are described by differential equations. The model parameters are tuned and fuel consumption estimation is validated against measured values from real wheel loader operation. The sensitivity of wheel loader trajectory with respect to constrains such as the angle at which the wheel loader reaches the unloading position is also analyzed. A time and fuel optimal trajectory map is calculated for various unloading positions. Moreover, the importance of simultaneous optimization of wheel loader trajectory and the component transients is shown via a side to side comparison between measured fuel consumption and trajectories versus optimal control results.

In another application, optimal control is used to calculate efficient gear shift controls for a heavy duty Automatic Transmission system. A modeling and optimal control framework is developed for a nine speed automatic transmission. Solving optimal control problems using the developed model, time and jerk efficient transient for simultaneous disengagement of off-going and engagement of in-coming shift actuators are obtained and the results are analyzed.

Optimal controls of a diesel-electric powertrain during a gear shift in an Automated Manual Transmission system are calculated and analyzed in another application of optimal control. The powertrain model is extended by including driveline backlash angle as an extra state in the system. This is enabled by implementation of smoothing techniques in order to describe backlash dynamics as a single continuous function during all gear shift phases.

Optimal controls are also calculated for a diesel-electric powertrain corresponding to a hybrid bus during a tip-in maneuver. It is shown that for optimal control analysis of complex powertrain systems, minimizing only one property such as time pushes the system transients into extreme operating conditions far from what is achievable in real applications. Multi-objective optimal control problem formulations are suggested in order to obtain a compromise between various objectives when analyzing such complex powertrain systems.

```
@phdthesis{diva2:928666,
author = {Nezhadali, Vaheed},
title = {{Modeling and Optimal Control of Heavy-Duty Powertrains}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1771}},
year = {2016},
address = {Sweden},
}
```

Senast uppdaterad: 2015-05-25