Loading Initial Data with Spring Boot

Learning Objectives

Understand why applications need initial data and when to seed it
Implement CommandLineRunner to execute startup logic in Spring Boot
Design and structure seed data classes for maintainability
Work with JPA repositories to persist initial data during application startup
Handle data initialization strategies for different environments

Introduction

Every non-trivial application reaches a point where you need some data already in the database when it starts. Maybe it's user roles, category hierarchies, configuration settings, or reference data that your application logic depends on. You can't just ship an empty database and hope for the best—certain entities need to exist before the first real user interaction happens.

Spring Boot gives you several approaches to handle this, but the most flexible and maintainable is using CommandLineRunner. This interface lets you execute arbitrary code after the Spring context fully initializes but before the application starts accepting requests. You get full access to your repositories, services, and all the dependency injection magic Spring provides. The alternative approaches—like data.sql scripts or Hibernate's import.sql—work for simple cases, but they lack the expressiveness and type safety of Java code. When your seed data involves relationships, conditional logic, or needs to check what already exists, CommandLineRunner becomes the obvious choice.

The pattern we're building separates concerns cleanly: your domain models stay focused on representing entities, repositories handle persistence, and a dedicated seed data class manages initialization. This separation matters when your application grows and you need to modify seed logic without touching production code.

Setting Up the Project Structure

Start by generating a Spring Boot project with the necessary dependencies. The curl command creates a Maven project with Spring Web, Spring Data JPA, and H2 database. H2 works perfectly for development and testing—it's an in-memory database that resets every time you restart, which makes it ideal for experimenting with seed data strategies.

curl https://start.spring.io/starter.zip \
  -d type=maven-project \
  -d language=java \
  -d bootVersion=3.2.1 \
  -d baseDir=academy.javapro.spring \
  -d groupId=academy.javapro \
  -d artifactId=spring \
  -d name=academy.javapro.spring \
  -d packageName=academy.javapro.spring \
  -d packaging=jar \
  -d javaVersion=17 \
  -d dependencies=web,data-jpa,h2 \
  -o academy.javapro.spring.zip

unzip academy.javapro.spring.zip
cd academy.javapro.spring

Your project structure should look like this:

academy.javapro.spring/
├── src/
│   ├── main/
│   │   ├── java/
│   │   │   └── academy/
│   │   │       └── javapro/
│   │   │           └── spring/
│   │   │               ├── AcademyJavaproSpringApplication.java
│   │   │               ├── model/
│   │   │               │   └── Category.java
│   │   │               ├── repository/
│   │   │               │   └── CategoryRepository.java
│   │   │               └── seed/
│   │   │                   └── SeedData.java
│   │   └── resources/
│   │       ├── application.properties
│   │       └── application-dev.properties
│   └── test/
├── pom.xml
└── README.md

Configuring the Application Properties

Spring Boot's profile mechanism lets you maintain different configurations for different environments. Create two property files: one for general settings and one specifically for development.

First, set up src/main/resources/application.properties:

spring.application.name=academy.javapro.spring
spring.profiles.active=dev

# JPA Configuration
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.format_sql=true
spring.jpa.hibernate.ddl-auto=create-drop

# Logging
logging.level.org.hibernate.SQL=DEBUG
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE

The spring.profiles.active=dev property activates your development profile by default. The ddl-auto=create-drop setting tells Hibernate to create the schema on startup and drop it on shutdown—perfect for development where you want a clean slate every time. In production, you'd use validate or manage migrations with Flyway or Liquibase.

Now create src/main/resources/application-dev.properties:

# H2 Database Configuration
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=sa
spring.datasource.password=

# H2 Console
spring.h2.console.enabled=true
spring.h2.console.path=/h2-console
spring.h2.console.settings.web-allow-others=false

# Development-specific settings
server.port=8080

The H2 console gives you a web interface to inspect your database while developing. Access it at http://localhost:8080/h2-console after starting the application. The JDBC URL jdbc:h2:mem:testdb creates an in-memory database named testdb. When your application shuts down, all data disappears. That's exactly what you want during development—your seed data recreates everything cleanly on the next startup.

Creating the Domain Model

Create the model package and add your Category entity. This represents a common real-world scenario where categories serve as reference data that must exist before users start creating content.

package academy.javapro.spring.model;

import jakarta.persistence.*;

@Entity
@Table(name = "categories")
public class Category {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false, unique = true, length = 100)
    private String name;
    
    @Column(length = 500)
    private String description;
    
    @Column(nullable = false)
    private boolean active;
    
    public Category() {
    }
    
    public Category(String name, String description) {
        this.name = name;
        this.description = description;
        this.active = true;
    }
    
    public Long getId() {
        return id;
    }
    
    public void setId(Long id) {
        this.id = id;
    }
    
    public String getName() {
        return name;
    }
    
    public void setName(String name) {
        this.name = name;
    }
    
    public String getDescription() {
        return description;
    }
    
    public void setDescription(String description) {
        this.description = description;
    }
    
    public boolean isActive() {
        return active;
    }
    
    public void setActive(boolean active) {
        this.active = active;
    }
    
    @Override
    public String toString() {
        return "Category{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", description='" + description + '\'' +
                ", active=" + active +
                '}';
    }
}

The @Entity annotation marks this as a JPA entity, and @Table specifies the database table name. Using @GeneratedValue with IDENTITY strategy lets the database handle ID generation. The unique = true constraint on name prevents duplicate categories, which is exactly what you want for reference data. The active field with a default value demonstrates how seed data often includes business logic—not just simple string values.

Building the Repository Layer

Create the repository package and add the repository interface:

package academy.javapro.spring.repository;

import academy.javapro.spring.model.Category;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

import java.util.Optional;

@Repository
public interface CategoryRepository extends JpaRepository<Category, Long> {
    Optional<Category> findByName(String name);
    boolean existsByName(String name);
}

Spring Data JPA generates the implementation automatically based on method naming conventions. The custom findByName method lets you retrieve categories by their unique name, and existsByName provides a clean way to check if a category already exists without loading the entire entity. These methods become crucial in your seed logic when you need idempotent initialization—running the seeder multiple times shouldn't create duplicate data.

Implementing the Seed Data Class

Create the seed package and add your SeedData class. This is where the actual initialization happens. The class implements CommandLineRunner, which Spring Boot executes after the application context fully loads.

package academy.javapro.spring.seed;

import academy.javapro.spring.model.Category;
import academy.javapro.spring.repository.CategoryRepository;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

@Component
public class SeedData implements CommandLineRunner {
    
    private static final Logger logger = LoggerFactory.getLogger(SeedData.class);
    
    private final CategoryRepository categoryRepository;
    
    public SeedData(CategoryRepository categoryRepository) {
        this.categoryRepository = categoryRepository;
    }
    
    @Override
    public void run(String... args) throws Exception {
        logger.info("Starting database seeding...");
        
        seedCategories();
        
        logger.info("Database seeding completed successfully");
    }
    
    private void seedCategories() {
        if (categoryRepository.count() > 0) {
            logger.info("Categories already exist, skipping seed data");
            return;
        }
        
        Category technology = new Category(
            "Technology",
            "Articles about software development, programming languages, and tech trends"
        );
        
        Category science = new Category(
            "Science",
            "Research, discoveries, and scientific breakthroughs"
        );
        
        Category business = new Category(
            "Business",
            "Entrepreneurship, market analysis, and business strategies"
        );
        
        Category health = new Category(
            "Health",
            "Wellness, medical research, and fitness guidance"
        );
        
        Category education = new Category(
            "Education",
            "Learning resources, teaching methods, and academic content"
        );
        
        categoryRepository.save(technology);
        categoryRepository.save(science);
        categoryRepository.save(business);
        categoryRepository.save(health);
        categoryRepository.save(education);
        
        logger.info("Seeded {} categories", categoryRepository.count());
    }
}

The @Component annotation registers this class as a Spring bean, and because it implements CommandLineRunner, Spring Boot automatically executes the run method during startup. Constructor injection brings in the repository, following Spring's recommended dependency injection pattern.

The seedCategories method checks if categories already exist before inserting new ones. This idempotency matters in scenarios where you might restart your application multiple times during development or when deploying to environments where the database persists between restarts. The logger provides visibility into what's happening during startup—essential for debugging seed issues in production-like environments.

Notice how we're creating Category objects with meaningful data. These aren't just placeholder values. They represent actual reference data your application needs. Each category has a descriptive name and a clear purpose statement. That's the kind of seed data that makes your application immediately useful rather than requiring manual setup steps.

A Quick Guide on Loading Initial Data with Spring Boot

The pattern we've built separates concerns effectively. Your main application class remains untouched:

package academy.javapro.spring;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class AcademyJavaproSpringApplication {
    
    public static void main(String[] args) {
        SpringApplication.run(AcademyJavaproSpringApplication.class, args);
    }
}

Spring Boot's component scanning automatically discovers your @Component annotated SeedData class and executes it at the right moment in the application lifecycle. You don't wire anything manually or configure execution order. The framework handles the orchestration.

When you run the application, you'll see log output showing the seeding process:

Starting database seeding...
Seeded 5 categories
Database seeding completed successfully

Access the H2 console at http://localhost:8080/h2-console, use the JDBC URL jdbc:h2:mem:testdb, and query your categories:

SELECT * FROM categories;

You'll see your five categories with generated IDs, the names and descriptions you specified, and the active flag set to true. That's your seed data in action.

Handling Different Seeding Strategies

The approach we've shown works well for simple scenarios, but real applications often need more sophisticated strategies. Sometimes you want different seed data in development versus staging versus production. Sometimes you need to seed data only once and never again, even across application restarts in environments with persistent databases.

One common pattern is profile-specific seeders:

package academy.javapro.spring.seed;

import academy.javapro.spring.model.Category;
import academy.javapro.spring.repository.CategoryRepository;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.context.annotation.Profile;
import org.springframework.stereotype.Component;

@Component
@Profile("dev")
public class DevSeedData implements CommandLineRunner {
    
    private static final Logger logger = LoggerFactory.getLogger(DevSeedData.class);
    
    private final CategoryRepository categoryRepository;
    
    public DevSeedData(CategoryRepository categoryRepository) {
        this.categoryRepository = categoryRepository;
    }
    
    @Override
    public void run(String... args) throws Exception {
        logger.info("Running development-specific seed data...");
        
        if (!categoryRepository.existsByName("Testing")) {
            Category testCategory = new Category(
                "Testing",
                "Test category for development purposes only"
            );
            testCategory.setActive(false);
            categoryRepository.save(testCategory);
            logger.info("Added test category for development");
        }
    }
}

The @Profile("dev") annotation ensures this seeder only runs when the dev profile is active. You can create separate seeders for different environments, each handling the specific data needs of that environment. Production might seed only essential reference data, while development includes test users, sample content, and demonstration data.

Another pattern involves using a separate table to track what has been seeded:

package academy.javapro.spring.model;

import jakarta.persistence.*;
import java.time.LocalDateTime;

@Entity
@Table(name = "seed_history")
public class SeedHistory {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false, unique = true)
    private String seedName;
    
    @Column(nullable = false)
    private LocalDateTime executedAt;
    
    public SeedHistory() {
    }
    
    public SeedHistory(String seedName) {
        this.seedName = seedName;
        this.executedAt = LocalDateTime.now();
    }
    
    // Getters and setters omitted for brevity
}

Your seeder can check this table to determine what has already run, allowing you to add new seed operations over time without re-running old ones. This matters in staging and production environments where you can't just drop and recreate the database.

Working with Related Entities

Real applications rarely seed just one entity type. You typically have relationships: categories contain posts, users have roles, products belong to categories. Here's how you'd handle seeding related entities:

package academy.javapro.spring.seed;

import academy.javapro.spring.model.Category;
import academy.javapro.spring.repository.CategoryRepository;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;

@Component
@Order(1)
public class CategorySeedData implements CommandLineRunner {
    
    private static final Logger logger = LoggerFactory.getLogger(CategorySeedData.class);
    private final CategoryRepository categoryRepository;
    
    public CategorySeedData(CategoryRepository categoryRepository) {
        this.categoryRepository = categoryRepository;
    }
    
    @Override
    public void run(String... args) throws Exception {
        logger.info("Seeding categories...");
        
        if (categoryRepository.count() == 0) {
            Category tech = new Category("Technology", "Tech articles");
            Category science = new Category("Science", "Science content");
            
            categoryRepository.save(tech);
            categoryRepository.save(science);
            
            logger.info("Categories seeded successfully");
        }
    }
}

The @Order(1) annotation controls execution sequence when you have multiple CommandLineRunner implementations. Lower numbers run first. If you had a PostSeedData class that needs categories to exist first, you'd annotate it with @Order(2). Spring Boot executes them in order, ensuring dependencies are available when needed.

Handling Seed Data Failures

Seed data failures can be tricky to debug. Your application starts successfully, but the database isn't in the expected state. Always include proper error handling and logging:

private void seedCategories() {
    try {
        if (categoryRepository.count() > 0) {
            logger.info("Categories already exist, skipping seed");
            return;
        }
        
        Category technology = new Category("Technology", "Tech content");
        categoryRepository.save(technology);
        
        Category science = new Category("Science", "Science content");
        categoryRepository.save(science);
        
        logger.info("Successfully seeded {} categories", categoryRepository.count());
        
    } catch (Exception e) {
        logger.error("Failed to seed categories", e);
        throw new RuntimeException("Category seeding failed", e);
    }
}

If seeding is critical to your application's operation, throw a runtime exception to prevent the application from starting with incomplete data. If it's optional, log the error and continue. The choice depends on your application's requirements. Can users work without the seed data, or does everything break?

Watch for constraint violations. If your seed data violates unique constraints, foreign key constraints, or validation rules, you'll get exceptions during save operations. The error messages usually pinpoint the exact issue, but having good logging helps track down the root cause when the error isn't immediately obvious.

Performance Considerations

When seeding large amounts of data, individual save operations become inefficient. Each save() call triggers a separate database transaction and round-trip. For bulk inserts, use saveAll():

private void seedCategoriesEfficiently() {
    if (categoryRepository.count() > 0) {
        return;
    }
    
    List<Category> categories = Arrays.asList(
        new Category("Technology", "Tech articles"),
        new Category("Science", "Science content"),
        new Category("Business", "Business news"),
        new Category("Health", "Health and wellness"),
        new Category("Education", "Learning resources")
    );
    
    categoryRepository.saveAll(categories);
    logger.info("Bulk saved {} categories", categories.size());
}

The saveAll() method batches the inserts into fewer database round-trips. For truly massive seed datasets—thousands or millions of records—consider using JDBC batch inserts or even raw SQL scripts executed through Hibernate's native query support. Spring Data JPA is convenient, but it's not optimized for bulk data loading.

Another approach for large datasets is lazy seeding: only seed what's absolutely necessary at startup, then provide admin endpoints or command-line tools to load additional data on demand. This keeps startup time reasonable while still providing mechanisms to populate the database when needed.

Summary

The CommandLineRunner pattern gives you a clean, maintainable way to handle Loading Initial Data with Spring Boot. You write normal Java code that leverages your existing repositories and domain models. No SQL scripts to maintain, no framework-specific DSLs to learn. Just straightforward object creation and persistence.

The key principles we've covered—idempotent seeding, profile-specific configurations, proper error handling, and performance awareness—carry over to any seeding strategy you implement. Whether you're populating reference data, creating test fixtures, or initializing application state, these patterns provide a solid foundation.

The separation of concerns matters. Your seed logic lives in dedicated classes, not scattered through controllers or services. You can test seed operations independently, modify them without touching business logic, and organize them clearly as your application grows. When new developers join your team, they find seed data in an obvious place doing exactly what it claims to do.

Spring Boot's automatic discovery and execution of CommandLineRunner beans removes boilerplate. You don't configure execution order manually unless you need to—and when you do, the @Order annotation provides explicit control. Profile support lets you maintain different seeding strategies for different environments without conditional logic cluttering your code.

The pattern scales from simple reference data—a handful of categories or user roles—to complex initialization scenarios involving multiple related entities, conditional logic, and environment-specific requirements. Start simple with basic category seeding, then extend the approach as your application's needs evolve. That's the beauty of writing seed logic in Java: you have the full power of the language and Spring framework at your disposal.

Loading Initial Data with Spring Boot. Last updated January 11, 2026.

Join our course Building Production-Ready REST APIs with Spring Boot to learn enterprise Spring development patterns, or start with our free Core Java course to build your foundation.