Spring Batch in Action is an in-depth guide to writing batch applications using Spring Batch. Written for developers who have basic knowledge of Java and the . Launching jobs and storing job metadata 36 Spring Batch infrastructure in a .. Author Online The download of Spring Batch in Action includes free access to a private For access to the book's forum and a free ebook for owners of this book. Spring Batch in Action. ARNAUD COGOLUEGNES. THIERRY TEMPLIER. GARY GREGORY. OLIVIER BAZOUD. MANNING. SHELTER ISLAND.
|Language:||English, Spanish, Arabic|
|Genre:||Academic & Education|
|Distribution:||Free* [*Register to download]|
Summary Spring Batch in Action is an in-depth guide to writing batch applications using Spring Batch. Written for developers who have basic knowledge of Java. About the Book Spring Batch in Action is a thorough, in-depth guide to with an offer of a free PDF, ePub, and Kindle eBook from Manning. Spring Batch in Action is a thorough, in-depth guide to writing efficient batch applications. Starting with the basics, it discusses the best practices of batch jobs .
Find a copy in the library Finding libraries that hold this item Electronic books Additional Physical Format: Print version: Spring batch in action. Document, Internet resource Document Type: Reviews User-contributed reviews Add a review and share your thoughts with other readers. Be the first. Add a review and share your thoughts with other readers. Similar Items Related Subjects: Java Computer program language Application software -- Development.
Because you want to write the products from the compressed import file to a database, you must specify a transaction manager to handle transactions associated with inserting products in the database. This listing also specifies additional parameters to define restart behavior.
The bean must implement the Spring PlatformTransactionManager interface. The attributes startlimit and allow-start-if-complete specify that Spring Batch can restart the tasklet three times in the context of a retry even if the tasklet has completed. We describe in section 3. In the case of a custom tasklet, you can reference the Spring bean implementing the Tasklet interface with the ref attribute.
Spring Batch delegates processing to this class when executing the step. The following snippet describes how to configure this tasklet: Spring Batch also supports using chunks in tasklets. The chunk child element of the tasklet element configures chunk processing.
On the Java side, the ChunkOrientedTasklet class implements chunk processing. Configuring a tasklet can be simple, but to implement chunk processing, the configuration gets more complex because more objects are involved.
You typically use chunk-oriented tasklets for read-write processing. In chunk processing, Spring Batch reads data chunks from a source and transforms, validates, and then writes data chunks to a destination. In the online store case study, this corresponds to importing products into the database. To configure chunk objects, you use an additional level of configuration using the chunk element with the attributes listed in table 3. The bean must implement the Spring Batch ItemReader interface.
The bean must implement the Spring Batch ItemProcessor interface. The bean must implement the Spring Batch ItemWriter interface. When the number of items read reaches the commit interval number, the entire corresponding chunk is written out through the item writer and the transaction is committed.
If processing reaches the skip limit, the next exception thrown on item processing read, process, or write causes the step to fail. The first four attributes reader, processor, writer, commit-interval in table 3. These attributes define which entities are involved in processing chunks and the number of items to process before committing. B C Specifies commit interval Specifies entities used by the chunk Configuring jobs and steps The attributes reader, processor, and writer B correspond to Spring bean identifiers defined in the configuration.
For more information on these topics, see chapter 5 for configuring item readers; chapter 6 for configuring item writers; and chapter 7 for configuring item processors.
The commit-interval attribute C defines that Spring Batch will execute a database commit after processing each elements. Other attributes deal with configuring the skip limit, retry limit, and completion policy aspects of a chunk.
The following listing shows how to use these attributes. The retry-limit attribute sets the maximum number of retries. The cache-capacity attribute sets the cache capacity for retries, meaning the maximum number of items that can fail without being skipped or recovered. If the number is exceeded, an exception is thrown. The chunk-completion-policy attribute configures the completion policy to define a chunk-processing timeout.
We look at this topic in more detail in chapter 8, where we aim for batch robustness and define error handlers. The last attributes correspond to more advanced configurations regarding transactions. We describe these in section 3. Most of the attributes described in table 3. These beans are anonymous and specially defined for the chunk. Uses child element instead of attribute You can configure other objects with child elements in chunks.
By default, objects referenced using a reader, processor, and writer are automatically registered. The chunk element can configure which exceptions trigger skips and retries using, respectively, the elements skippable-exception-classes and retryableexception-classes.
You can use the same mechanism for the retryable-exception-classes element as used for the skippable-exception-classes element to configure retries. The last item in table 3. We provide a short description of the feature and show how to configure it.
Chapter 8 provides more details on this topic. Streams provide the ability to save state between executions for step restarts. The step needs to know which instance is a stream by implementing the ItemStream interface.
Spring Batch automatically registers readers, processors, and writers if they implement the ItemStream interface. In this case, you must define explicitly the writers as streams for the step in order to avoid problems on restarts when errors occur. The following listing describes how to configure this aspect using the streams child element of the chunk element.
References writers as stream B References writers in composite writer C Configuring jobs and steps 69 In listing 3. The streams element then defines one or more stream elements—in this example, two stream elements B. In this section, we described how to configure a batch job with Spring Batch. We detailed the configuration of each related core object.
We saw that transactions guarantee batch robustness and are involved at several levels in the configuration. Because this is an important issue, we gather all configuration aspects related to transactions in the next section. You configure transactions at different levels because transactions involve several types of objects. In the online store use case, you validate a set of products during processing.
Spring provides built-in transaction managers for common persistent technologies and frameworks. Once you configure the transaction manager, other configuration elements can refer to it from different levels in the batch configuration, such as from the tasklet level. The next snippet configures the transaction manager using the transactionmanager attribute: Now that you know which Spring transaction manager to use, you can define how transactions are handled during processing.
As described in chapter 1, section 1. Spring bases this support on the PlatformTransactionManager interface that provides a contract to handle transaction demarcations: Spring builds on this interface to implement standard transactional behavior and allows configuring transactions with Spring beans using AOP or annotations.
Spring Batch uses chunk processing to handle items. The commit-interval attribute configures this setting at the chunk level and ensures that Spring Batch executes a commit after processing a given number of items. The following example sets the commit interval to items: Sets commit interval Transactions have several attributes defining transactional behaviour, isolation, and timeout. These attributes specify how transactions behave and can affect performance. Because Spring Batch is based on Spring transactional support, configuring these attributes is generic and applies to all persistent technologies supported by the Spring framework.
Spring Batch provides the transaction-attributes element in the tasklet element for this purpose, as described in following snippet: B Configures transaction attributes Transactional attributes are configured using the transaction-attributes element B and its attributes.
The isolation attribute specifies the isolation level used for the database and what is visible from outside transactions. The propagation attribute specifies the transactional behavior to use. The Spring class TransactionDefinition declares all valid values for these two attributes. Finally, the timeout attribute defines the timeout in seconds for the transaction.
If the timeout attribute is absent, the default timeout of the underlying system is used. Rollback and commit conventions in Spring and Java Enterprise Edition Java defines two types of exceptions: You commonly see checked exceptions used as business exceptions recoverable and unchecked exceptions as lower-level exceptions unrecoverable by the business logic. By default, in Java EE and Spring, commit and rollback are automatically triggered by exceptions.
If Spring catches a checked exception, a commit is executed. If Spring catches an unchecked exception, a rollback is executed. You configure this feature in a tasklet using the no-rollback-exception-classes element, as described in the following snippet: In this snippet, Spring issues a commit even if the unchecked Spring Batch exception ValidationException is thrown during batch processing.
Spring Batch also provides parameters for special cases. The first case is readers built on a transactional resource like a JMS queue. For this type of resource, you need to specify the reader-transactional-queue attribute on the corresponding chunk, as shown in the following listing. Spring Batch eases the configuration of core entities like job, step, tasklet, and chunk. Spring Batch also lets you configure transaction behavior and define your own error handling. The next section covers configuring the Spring Batch job repository to store batch execution data.
The job repository is part of the more general topic concerning batch process monitoring. Chapter 12 is dedicated to this topic. The interface provides all the methods required to interact with the repository. We describe the job repository in detail in chapter For the JobRepository interface, Spring Batch provides only one implementation: Spring Batch provides two kinds of DAOs at this level: In fact, batch data is lost between job executions.
You should prefer the Configuring the job repository 73 persistent DAO when you want to have robust batch processing with checks on startup. Because the persistent DAO uses a database, you need additional information in the job configuration. Database access configuration includes data source and transactional behavior. The persistent repository uses a Spring bean for configuration and requires a transaction manager.
This file should contain the configuration of the Spring Batch infrastructure, the jobs, the scheduler if any , and application services. A best practice is to split up this configuration into multiple files.
This avoids having a large and monolithic configuration file Launching from a web application and encourages reuse of configuration files. Should you redefine all your jobs for integration testing? No, so define the jobs in a dedicated file and import this file from a master Spring file. The following snippet shows how the default applicationContext. If you follow the configuration of the previous snippet, the structure of the web application on disk should be as follows: If you use the Spring scheduler to start your jobs, the scheduling.
You can deploy the web application in your favorite web container, and the embedded Java scheduler will trigger jobs according to the configuration. Figure 4. In many cases, this configuration is fine. Launching a job with an HTTP request Imagine that you deployed your Spring Batch environment in a web application, but a system scheduler is in charge of triggering your Spring Batch jobs.
A system scheduler like cron is easy to configure, and that might be what your administration team prefers to use. But how can cron get access to Spring Batch, which is now in a web application?
This solution is convenient when the triggering system is external to Spring Batch like cron. To implement this architecture, you need a web controller that analyzes the HTTP parameters and triggers the corresponding job with its parameters. We use Spring MVC to do that, but we could have used any other web framework. The following listing shows the job launcher controller. Listing 4. Enumeration; javax. HttpServletRequest; org. JobParametersBuilder; org.
JobRegistry; org. JobLauncher; org. HttpStatus; org.
Controller; org. RequestMapping; org. RequestMethod; org. RequestParam; org. As you probably guessed, this parameter is the name of the job you want to launch.
At D, you use the job launcher to launch the job. The launching request URL path should follow this syntax: This is exactly what the launching controller does: You need to declare the job registry in the Spring application context, typically where you declare the Spring Batch infrastructure.
ContextLoaderListener sbia org. By default, its configuration file is [servlet-name]-servlet. In this case, you create an sbia-servlet. You must declare the web controller in this file, as shown in the following snippet: From the root application context configured with the ContextSees beans from LoaderListener.
The Spring applicaRoot application context tion context of the Spring MVC servlet can see the beans from the root applicaJob launcher Job registry tion context because they share a parentchild relationship, as figure 4.
You can now launch your Spring Figure 4. The root application context defines the job registry and the job You should use this launching mechalauncher. Otherwise, you can just deploy your Spring Batch environment in a web application and use an embedded Java-based scheduler to trigger your jobs.
Remember, you can use Spring Batch wherever you can use the Spring Framework, and web applications are no exception. We covered a lot of information on triggering and launching Spring Batch jobs. By now, you should know which solution to adopt for your batch system.
Next, we learn how to stop all of these jobs. Stopping a job is unfortunate because it means that something went wrong. If everything is okay, a job execution ends by itself without any external intervention.
When it comes to stopping job executions, we distinguish two points of view. When something goes wrong, the operator receives an alert and stops a job execution, by using a JMX console, for example. The developer writes Spring Batch jobs and knows that under certain circumstances, a job should be stopped. What are these certain circumstances? They are any business decision that should prevent the job from going any further: Spring Batch provides techniques to stop a job for both the operator and the developer.
Spring Batch provides the JobOperator interface to perform such an operation. The following snippet shows how to stop a job execution through a JobOperator: We focus here on the way to use JobOperator for stopping job executions. NOTE The steps are simple: You then ask the job operator to send a stop message to an execution using an execution ID.
Another way to call job operator methods is to provide a user interface in your application that lets an administrator stop any job execution. You can create this user interface yourself, or you can use Spring Batch Admin, the web administration application introduced in chapter 2.
The following listing shows the Spring configuration required to declare the job operator. An operator can learn about the Spring Batch runtime and stop or restart jobs. You need to declare the job registry and the job explorer only for specific tasks, and configuring the job operator is one. As a bonus, the following configuration exposes the job operator to JMX.
This saves you a round trip to chapter You can now explain to your administration team how to stop a job execution. The next subsection explains what happens when you request to stop a job execution. This Boolean value tells you whether the stop message was sent successfully. A stop message? When does job execution stop after you request it?
There are two possibilities: If the code detects the thread interruption, it can choose to end processing by throwing an exception or returning immediately. This means that the execution will stop almost immediately. As soon as the business code finishes and Spring Batch gets control again, the framework stops the job execution.
This means that the execution will stop only after the code finishes. If the code is in the middle of a long processing sequence, the execution can take a long time to stop. Spring Batch drives all the processing in this case, so the execution should stop quickly unless some custom reader, processor, or writer takes a long time to execute.
But if you write a custom tasklet whose processing is long, you should consider checking for thread interruption. You can check the time in various places in the job and decide to stop the execution after 8 a.
The first way to stop execution is to throw an exception. This works all the time, unless you configured the job to skip some exceptions in a chunk-oriented step! The second and preferred way to stop execution is to set a stop flag in the step execution object. To set this stop flag, call the method StepExecution. As soon as Spring Batch gets control of the processing, it stops the job execution. The next topic to cover is how to get access to the StepExecution object from a job.
The following listing shows a tasklet that processes items, checks a stop condition, and sets the stop flag accordingly. The stop condition could be any business decision, such as the time restriction mentioned previously. StepContribution; org. ChunkContext; org. Tasklet; org. Spring Batch drives the flow and lets you plug in your business logic or reuse off-the-shelf components to read, process, or write items.
You access the StepExecution to stop the execution using listeners. Not dealing with stopping a job in item readers, processors, and writers is a good thing. These components should focus on their processing to enforce separation of concerns. NOTE Chapter 3 covers the configuration of listeners, but we give you enough background here to use them for stopping jobs.
The idea of a listener is to react to the lifecycle events of a step. You register a listener on a step by using annotations or implementing interfaces, and Spring Batch calls corresponding methods throughout the lifecycle of that step. What lifecycle events can you listen for?
A lot of them: The following listing shows a listener that keeps a reference to the StepExecution and checks a stopping condition after each read item. This listener uses annotations. StepExecution; import org. AfterRead; import org. The following listing shows how to register the listener on the chunk-oriented step. If a job runs at night but must stop at 6 a.
This concludes the coverage of stopping a Spring Batch job. You configure a job operator bean that you can expose to JMX and call the appropriate sequence of methods to stop a specific job execution. As soon as Spring Batch gets control of the processing, it does its best to stop the execution gracefully. Finally, remember that you can choose to stop the execution from within your business code. You can also choose to embed Spring Batch in a web application combined with a Java scheduler.
Spring provides lightweight support for scheduling. We provided you with the following guidelines: If you want your batch environment ready all the time, embed your Spring Batch environment in a web application. Once your batch environment is in a web application, also embed a Java scheduler to start your jobs. Stopping a job execution uses a stop message. You should take this message into account in your code, but you can also count on Spring Batch to stop gracefully when it retakes control of the flow.
The next three chapters cover the three corresponding phases of chunk processing: As described in chapter 2, Spring Batch provides types for batch processes based on the concepts of job and step. A job uses a tasklet to implement chunk processing. Chunk-oriented processing allows jobs to implement efficiently the most common batch processing tasks: We focus here on the first step of this process, reading.
We describe general concepts and types implemented by Spring Batch. These built-in types are the foundation used to support the most common use cases. Spring Batch can use different data sources as input to batch processes. Because Spring Batch is open source, implementing and extending core types for reading is easily achievable. Another thing to keep in mind is that reading data is part of the general processing performed by the chunk tasklet. Spring Batch guarantees robustness when executing such processing.
The stored data is particularly useful to handle errors and restart batch processes. We concentrate here on the data-reading capabilities of Spring Batch and leave chapter 8 to cover in detail these other aspects.
We use our case study to describe concrete use cases taken from the real world. We explain how to import product data from different kinds of input, with different formats, and how to create data objects. These concepts are the foundation for the Spring Batch reading feature.
This feature and its related types operate within the chunk tasklet, as illustrated in figure 5. This chapter focuses on the first part of chunk processing. At this level, the first key type is the ItemReader interface that provides a contract for reading data. This interface supports generics and contains a read method that returns the next element read: The ItemStream interface is important because it allows interaction with the execution context of the batch process to store and restore state.
In this chapter, we concentrate on the ItemReader, and we discuss state management in chapter 8. The open and close methods open and close the stream. The update method allows updating the state of the batch process: Throughout this chapter, we use our case study as the background story and describe how to import product data into the online store using different data sources.
A file contains a set of data to integrate into an information system. Each type of file has its own syntax and data structure. Each structure in the file identifies a different data element. To configure a file type in Spring Batch, we must define its format. Figure 5. Look at each box in figure 5. Flat files are pure data files and contain little or no metadata information.
Some flat file formats, such as comma-separate value CSV , may contain one header line as the first line that names columns. In general, though, the file provider defines the file format. This information can consist of field lengths or correspond to a separator splitting data fields.
Configuring Spring Batch to handle flat files corresponds to defining the file format to map file records to data objects. The item reader for flat files is responsible for identifying records in the file and then creating data objects from these records, as shown in figure 5. The item reader first identifies records, and then creates data objects.
Several other types work in conjunction with the FlatFileItemReader to identify data fields from file lines and to create data objects, as pictured in figure 5. Three interfaces work closely with the FlatFileItemReader class. The RecordSeparatorPolicy interface identifies data records in a file.
The LineMapper interface is responsible for extracting data from lines. The LineCallbackHandler interface handles data lines in special cases. The DefaultLineMapper class is the default and most commonly used implementation of the LineMapper interface. Two additional interfaces related to the DefaultLineMapper class come into play. The DefaultLineMapper class holds a LineTokenizer responsible for splitting data lines into tokens and a FieldSetMapper to create data objects from tokens.
Table 5. This section on flat files introduced all concepts and types related to the item reader for flat files to import data files as objects. In the next section, we describe the general configuration of a FlatFileItemReader bean in Spring Batch as well as implementations for record-separator policies and line mappers. We then explain how to handle delimited, fixed-length, and JSON file formats and describe advanced concepts to support records split over several lines and heterogonous records.
The default factory DefaultBufferedReaderFactory provides a suitable instance for text files. Specifying another factory is useful for binary files. When a line begins with one of these prefixes, Spring Batch ignores that line.
This feature is particularly useful to handle file headers. If the skippedLinesCallback property is present, the item reader provides each line to the callback. The provided class can detect single or multiline records.
You can use standard Spring facilities to locate the resource. Used jointly with the linesToSkip property. The default value is true. You use this type of configuration for the online store use case with flat files.
Listing 5. In the context of the use case, this line corresponds to the file header describing the record fields. The recordSeparatorPolicy property determines how to delimit product records in the file.
Finally, the code specifies how to create a product object from a data record using the lineMapper property. To lighten the listing, we elided the beans corresponding to the two last entities but we detail them next. The first type, the RecordSeparatorPolicy interface, delimits data records with the following methods: Implementations can support continuation markers and unbalanced quotes at line ends.
Spring Batch provides several implementations of this interface, described in table 5. DefaultRecordSeparatorPolicy Supports unbalanced quotes at line end and a continuation string. SuffixRecordSeparatorPolicy Expects a specific string at line end to mark the end of a record.
By default, this string is a semicolon. Configuring record separation policy classes can be simple because their default constructors cover the most common cases. The following XML fragment is an example: It provides a way to structure text data using braces and brackets. The LineMapper interface contains one method called mapLine: This implementation is based on the jackson-mapper-asl.
PassThroughLineMapper Provides the original record string instead of a mapped object. For each line type, a line tokenizer and a field-set mapper must be configured.
First, we take a quick look at the PassThroughLineMapper class. The PassThroughLineMapper class performs no parsing or data extraction. The configuration of this class is shown in the following XML fragment: Reading flat files The DefaultLineMapper class is the most commonly used implementation because it handles files with implicit structures using separators or fixed-length fields.
In our use case, we accept several data formats for incoming data. We describe next how to configure Spring Batch to handle data structures based on separators and fixedlength fields. It implements line processing in two phases: For example: This example uses bean references using the ref attribute , but you could also use an inner bean.
The lineTokenizer property is set to an instance of LineTokenizer. The LineTokenizer implementation depends on the record format and provides the contract to create a field set from lines in a data file, as defined here: A FieldSet contains all extracted record fields and offers several methods to manage them.
Defaults correspond to comma-separated values. FixedLengthTokenizer Uses field lengths to split a data line into fields. The first example contains product records that use the comma character to separate fields, as shown in the following example: The names property defines the names of the fields, and the delimiter property defines the field delimiter; for example: The names property defines the field names as an ordered list separated by commas: Note that such a data structure is potentially larger because all fields must have the same length.
Field name id Length in characters 9 name 26 description 15 price 6 Table 5. The properties respectively define the names of fields and the column ranges used to identify them: The columns property configures column ranges for each field. This property accepts a list of Range instances, but you can also configure it using a comma-separated string of ranges.
The names property sets field names, as with the DelimitedLineTokenizer class. The PropertyEditor interface describes how to convert strings to beans, and vice versa.
For nonstring type properties, Spring automatically tries to convert to the appropriate type using the registered property editors. This support is extensible: Spring Batch uses this mechanism to register its own property editors to make some types easier to configure. The FieldSetMapper interface defines this process and uses generics to type implementations to application-specific types.
Useful if you need to work directly with the field set. Before using a FieldSetMapper, you must implement the bean to receive the data. In the case study, as you import product data, the bean is a plain old Java object POJO that contains the id, name, description, and price properties, as shown in the following listing.
The following listing shows the FieldSetMapper implementation that creates Product instances. A SpringSource certified trainer, Arnaud specializes in developing complex business applications, integrating of Java-based products, and training on Java and Spring.
Gary Gregory is a Java developer and software integration specialist. He has more than 20 years of experience in object-oriented languages including Smalltalk, Java, and the whole soup of XML and database technologies.
He is also a Spring technologies expert.
With over 12 years experience, he develops complex business applications and high traffic web sites based on Java and web technologies. Spring Batch in Action. Summary Spring Batch in Action is an in-depth guide to writing batch applications using Spring Batch.