Skip to main content

What is Sorting?

Sorting and data transformation are essential operations in enterprise batch processing. IBM's DFSORT utility is the backbone for sorting, merging, filtering, and formatting large datasets efficiently on z/OS.

It plays a critical role in automating business workflows—such as payroll, billing, and report generation—by organizing and manipulating data before further processing. With built-in support, high performance, and rich features, DFSORT is a reliable tool for handling data-heavy jobs in mainframe environments.

This handout introduces DFSORT's core concepts, its syntax, practical usage. In short, sorting means arranging records in a specific order—alphabetically, numerically, or by date.

IBM Docs - DFSORT Introduction

Why Sorting Matters in Enterprise Systems

  • Makes large datasets easier to manage
  • Speeds up searches and report generation
  • Reduces data errors in billing, payroll, and reporting
  • Essential for tasks like file merging, deduplication, and analytics

💡 Without sorting, enterprise systems become slow and inefficient.

DFSORT is IBM’s built-in z/OS utility for:

  • Sorting, merging, and reformatting datasets
  • Filtering and summarizing data efficiently
  • Used in batch JCL to handle large volumes of data

DFSORT JCL example

There are more than 20 control statements. Here are few commonly used.

IBM Docs - DFSORT Control Statement

Required DD Statements

  • SYSOUT – Log and messages
  • SYSIN – DFSORT control statements
  • SORTIN – Input dataset
  • SORTOUT – Output dataset

Execution Flow

INREC  →  INCLUDE  →  SORT / SUM  →  OMIT  →  OUTREC
tip
  • SORT FIELDS=COPY performs no sort, just filtering or formatting.
  • Use OPTION COPY when no sort is needed—faster than SORT FIELDS=COPY.
  • SUM FIELDS=NONE is a great way to deduplicate records.
  • Use INREC to filter early, reducing data before sorting.
  • OUTFIL SPLIT is handy for splitting one dataset into multiple outputs.

SORT

SORT – Specifies the key fields and order (ascending or descending) used to sort the input records.

IBM Docs - SORT Statement

//SYSIN DD *
SORT FIELDS=(start,length,format,A/D)
SORT FIELDS=COPY,SKIPREC=5,STOPAFT=100
/*

OPTION

OPTION – Used to set general processing options like copying without sorting, skipping or limiting records.

IBM Docs - Option Statement

//SYSIN DD *
OPTION COPY,SKIPREC=10,STOPAFT=100
/*

INCLUDE

INCLUDE – Filters records to include only those that match specific conditions.

IBM Docs - Include Statement

//SYSIN DD *
INCLUDE COND=(start,length,format,OP,value)
/*

OMIT

OMIT – Filters out records that match certain conditions, effectively the opposite of INCLUDE.

IBM Docs - Omit Statement

//SYSIN DD *
OMIT COND=(start,length,format,OP,value)
/*

OUTFIL

OUTFIL – Sends different output records to multiple files, or controls formatting and splitting of output.

IBM Docs - OUTFIL Statement

//SYSIN DD *
OUTFIL FILES=(01,02),SPLIT
OUTFIL FILES=01,INCLUDE=(...),STARTREC=5,ENDREC=20
/*

SUM

SUM – Removes duplicates or totals up numeric fields for records with the same key values.

IBM Docs - SUM Statement

Used to add numeric fields or remove duplicates.

//SYSIN DD *
SUM FIELDS=(start,length,format)
SUM FIELDS=NONE ← Removes duplicate records
SUM FIELDS=NONE,XSUM ← Move duplicates to another file (not supported here)
/*

INREC vs OUTREC

INREC/OUTREC – Reformats records before (INREC) or after (OUTREC) sorting, merging, or copying.

IBM Docs - INREC Statement

IBM Docs - OUTREC Statement

//SYSIN DD *
OUTREC FIELDS=(1:1,10,11:C'**')
OUTREC FINDREP=(IN=C'OLD',OUT=C'NEW')
OUTREC OVERLAY=(5:C'ABCD')
OUTREC IFTHEN=(WHEN(conditions),OVERLAY=(...))
/*

📝 Same syntax applies to INREC and OUTREC.