AWS Glue is an orchestration platform for ETL jobs. It is used in DevOps workflows for data warehouses, machine learning and loading data into accounting or inventory management systems.

Glue is based upon open source software -- namely, Apache Spark. It interacts with other open source products AWS operates, as well as proprietary ones -- notably Amazon S3 object storage and the Amazon DynamoDB database.

Glue is not a database; it's a schema -- also called metadata. It holds data tables that describe other data tables. Glue provides triggers, schedulers and manual ways to use those schemas to fetch data from one platform and push to another.

Glue does transformations with its web-based configuration and with Python and Scala APIs. To illustrate how your IT team can use Glue for its extract, transform and load (ETL) jobs, let's go over some of the basic workflow components. Then, we'll walk through how to use the service to organize semistructured data for analysis and to support and train a machine learning model.

Glue workflow With Glue, the workflow generally follows these steps: Load external data into Amazon S3, DynamoDB or any row-and-column database that supports Java Database Connectivity, which includes most SQL databases. It supports JSON, XML, Apache Parquet, CSV or Avro file formats. Glue uses Apache Spark to create data tables in VMs that run Glue in Apache Hive format, atop a Hadoop file system. Load more data from another source. Run a transformation -- such as joins, drops, aggregation, mapping -- on the combined data sets from steps 1 and 2. Load data into a data warehouse such as Snowflake or Amazon Redshift, or use an API or bulk loader to move it into SAP. Because this is a workflow, Glue can run jobs, foregoing the need for DevOps tool such as SaltStack. For example, admins could write code in Spark or Python to do this, and then use Salt to orchestrate jobs, but Salt does not have a database. This would create disjointed operations. With Glue, it all runs under the same code and context.