Doctoral theses of the School of Science are available in the open access repository maintained by Aalto, Aaltodoc.
Public defence in Computer Science, M.Sc. (Tech.) Tuomas Kynkäänniemi
Public defence from the Aalto University School of Science, Department of Computer Science.

Title of the thesis: Advances in Evaluation Metrics and Sampling Techniques for Generative Image Models
Thesis defender: Tuomas Kynkäänniemi
Opponent: Professor Jun-Yan Zhu, Carnegie Mellon University, USA
Custos: Professor Jaakko Lehtinen, Aalto University School of Science
Generative modeling is a subfield of machine learning focused on developing models that learn the underlying structure of a training data distribution, enabling them to generate novel samples that are indistinguishable from the training data. Generative modeling covers multiple data modalities such as images, text, audio, and 3D shapes.
This thesis examines the evaluation and sampling techniques of data-driven image generators, a rapidly evolving research topic. With the growing number of models and applications, designing evaluation metrics is increasingly important for identifying improvements from specific modifications to model architectures or training setups. These metrics play a key role in advancing the field.
First, we provide an in-depth analysis of widely used Fréchet Inception Distance (FID), highlighting the reasons behind discrepancies between model rankings and human judgments by examining its sensitivity to ImageNet classes, and discussing its implications for generative model evaluation. We then propose an improved precision and recall metric that separately quantifies the fidelity and diversity of generated samples through explicit, non-parametric representations of data distributions, offering a comprehensive assessment of generated distributions when used alongside existing metrics.
In the context of diffusion models, this thesis investigates classifier-free guidance, a key factor in their success. We analyze the impact of guidance on the generated distribution when applied in various parts of the sampling process. We observe that guidance is only beneficial within a specific range of noise levels, while being harmful at the high and unnecessary at the low noise levels. Based on this insight, we propose a guidance interval, where we selectively apply guidance within a specific range of noise levels. Our method leads to at the time state-of-the-art FID on ImageNet-512, as well as qualitative improvements across different network architectures, including the large-scale text-to-image model Stable Diffusion XL.
Keywords: Generative models, Evaluation Metrics, Generative Adversarial Networks, Diffusion Models, Classifier-free Guidance
Thesis available for public display 10 days prior to the defence at .
Doctoral theses of the School of Science
