Amazon’s new AWS Data Pipeline product “will help you move, sort, filter, reformat, analyze, and report on data in order to make use of it in a scalable fashion. ” You can now automate the movement and processing of any amount of data using data-driven workflows and built-in dependency checking.
A Pipeline is composed of a set of data sources, preconditions, destinations, processing steps, and an operational schedule, all definied in a Pipeline Definition.
The definition specifies where the data comes from, what to do with it, and where to store it. You can create a Pipeline Definition in the AWS Management Console or externally, in text form.