sort(1) sort(1) NNAAMMEE sort - sort, merge, or sequence check text files SSYYNNOOPPSSIISS //uussrr//bbiinn//ssoorrtt [ --ccmmuu ] [ --oo _o_u_t_p_u_t ] [ --TT _d_i_r_e_c_t_o_r_y ] [ --yy [ _k_m_e_m ]] [ --zz _r_e_c_s_z ] [ --ddffiiMMnnrr ] [ --bb ] [ --tt _c_h_a_r ] [ --kk _k_e_y_d_e_f ] [ ++_p_o_s_1 [ --_p_o_s_2 ]] [ _f_i_l_e...] //uussrr//xxppgg44//bbiinn//ssoorrtt [ --ccmmuu ] [ --oo _o_u_t_p_u_t ] [ --TT _d_i_r_e_c_t_o_r_y ] [ --yy [ _k_m_e_m ]] [ --zz _r_e_c_s_z ] [ --ddffiiMMnnrr ] [ --bb ] [ --tt _c_h_a_r ] [ --kk _k_e_y_d_e_f ] [ ++_p_o_s_1 [ --_p_o_s_2 ]] [ _f_i_l_e...] AAVVAAIILLAABBIILLIITTYY //uussrr//bbiinn//ssoorrtt SUNWesu //uussrr//xxppgg44//bbiinn//ssoorrtt SUNWxcu4 DDEESSCCRRIIPPTTIIOONN The ssoorrtt command sorts lines of all the named files together and writes the result on the standard output. Comparisons are based on one or more sort keys extracted from each line of input. By default, there is one sort key, the entire input line. Lines are ordered according to the collating sequence of the current locale. OOPPTTIIOONNSS The following options alter the default behavior: //uussrr//bbiinn//ssoorrtt --cc Check that the single input file is ordered as specified by the arguments and the collating sequence of the current locale. The exit code is set and no output is produced unless the file is out of sort. //uussrr//xxppgg44//bbiinn//ssoorrtt --cc Same as //uussrr//bbiinn//ssoorrtt except no output is pro- duced under any circumstances. --mm Merge only. The input files are assumed to be already sorted. --uu Unique: suppress all but one in each set of lines having equal keys. If used with the --cc option, check that there are no lines with duplicate keys in addition to checking that the input file is sorted. --oo _o_u_t_p_u_t Specify the name of an output file to be used instead of the standard output. This file can be the same as one of the input files. --TT _d_i_r_e_c_t_o_r_y The _d_i_r_e_c_t_o_r_y argument is the name of a direc- tory in which to place temporary files. --yy _k_m_e_m The amount of main memory initially used by ssoorrtt. If this option is omitted, ssoorrtt begins using a system default memory size, and contin- ues to use more space as needed. If _k_m_e_m is present, ssoorrtt will start using that number of Kbytes of memory, unless the administrative min- imum or maximum is exceeded, in which case the corresponding extremum will be used. Thus, --yy 00 is guaranteed to start with minimum memory. --yy with no _k_m_e_m argument starts with maximum mem- ory. --zz _r_e_c_s_z (obsolete). This option was used to prevent abnormal termination when lines longer than the system-dependent default buffer size are encoun- tered. Because ssoorrtt automatically allocates buffers large enough to hold the longest line, this option has no effect. OOrrddeerriinngg OOppttiioonnss The following options override the default ordering rules. When ordering options appear independent of any key field specifications, the requested field ordering rules are applied globally to all sort keys. When attached to a specific key (see SSoorrtt KKeeyy OOppttiioonnss), the specified order- ing options override all global ordering options for that key. In the obsolescent forms, if one or more of these options follows a _+_p_o_s_1 option, it will affect only the key field specified by that preceding option. --dd ``Dictionary'' order: only letters, digits, and blanks (spaces and tabs) are significant in com- parisons. --ff Fold lower-case letters into upper case. --ii Ignore non-printable characters. --MM Compare as months. The first three non-blank characters of the field are folded to upper case and compared. For example, in English the sort- ing order is "JAN" < "FEB" < ... < "DEC". Invalid fields compare low to "JAN". The --MM option implies the --bb option (see below). --nn Restrict the sort key to an initial numeric string, consisting of optional blank characters, optional minus sign, and zero or more digits with an optional radix character and thousands separators (as defined in the current locale), which will be sorted by arithmetic value. An empty digit string is treated as zero. Leading zeros and signs on zeros do not affect ordering. --rr Reverse the sense of comparisons. FFiieelldd SSeeppaarraattoorr OOppttiioonnss The treatment of field separators can be altered using the following options: --bb Ignore leading blank characters when determining the starting and ending positions of a restricted sort key. If the --bb option is speci- fied before the first sort key option, it is applied to all sort key options. Otherwise, the --bb option can be attached independently to each --kk _f_i_e_l_d___s_t_a_r_t, _f_i_e_l_d___e_n_d, or +_p_o_s_1 or -_p_o_s_2 option-argument (see below). --tt _c_h_a_r Use _c_h_a_r as the field separator character. _c_h_a_r is not considered to be part of a field (although it can be included in a sort key). Each occurrence of _c_h_a_r is significant (for example, _<_c_h_a_r_>_<_c_h_a_r_> delimits an empty field). If --tt is not specified, blank characters are used as default field separators; each maximal non-empty sequence of blank characters that fol- lows a non-blank character is a field separator. SSoorrtt KKeeyy OOppttiioonnss Sort keys can be specified using the options: --kk _k_e_y_d_e_f The _k_e_y_d_e_f argument is a restricted sort key field definition. The format of this definition is: --kk _f_i_e_l_d___s_t_a_r_t [ _t_y_p_e ] [ ,,_f_i_e_l_d___e_n_d [ _t_y_p_e ] ] where: _f_i_e_l_d___s_t_a_r_t and _f_i_e_l_d___e_n_d define a key field restricted to a portion of the line. _t_y_p_e is a modifier from the list of charac- ters bbddffiiMMnnrr. The bb modifier behaves like the --bb option, but applies only to the _f_i_e_l_d___s_t_a_r_t or _f_i_e_l_d___e_n_d to which it is attached and characters within a field are counted from the first non-blank character in the field. (This applies separately to _f_i_r_s_t___c_h_a_r_a_c_t_e_r and _l_a_s_t___c_h_a_r_a_c_t_e_r.) The other modifiers behave like the corresponding options, but apply only to the key field to which they are attached. They have this effect if specified with _f_i_e_l_d___s_t_a_r_t, _f_i_e_l_d___e_n_d or both. If any modifier is attached to a _f_i_e_l_d___s_t_a_r_t or to a _f_i_e_l_d___e_n_d, no option applies to either. When there are multiple key fields, later keys are compared only after all earlier keys compare equal. Except when the --uu option is specified, lines that otherwise compare equal are ordered as if none of the options --dd, --ff, --ii, --nn or --kk were present (but with --rr still in effect, if it was specified) and with all bytes in the lines significant to the comparison. The notation: --kk _f_i_e_l_d___s_t_a_r_t[_t_y_p_e][,,_f_i_e_l_d___e_n_d[_t_y_p_e]] defines a key field that begins at _f_i_e_l_d___s_t_a_r_t and ends at _f_i_e_l_d___e_n_d inclusive, unless _f_i_e_l_d___s_t_a_r_t falls beyond the end of the line or after _f_i_e_l_d___e_n_d, in which case the key field is empty. A missing _f_i_e_l_d___e_n_d means the last char- acter of the line. A field comprises a maximal sequence of non-sep- arating characters and, in the absence of option --tt, any preceding field separator. The _f_i_e_l_d___s_t_a_r_t portion of the _k_e_y_d_e_f option- argument has the form: _f_i_e_l_d___n_u_m_b_e_r[.._f_i_r_s_t___c_h_a_r_a_c_t_e_r] Fields and characters within fields are numbered starting with 1. _f_i_e_l_d___n_u_m_b_e_r and _f_i_r_s_t___c_h_a_r_a_c_- _t_e_r, interpreted as positive decimal integers, specify the first character to be used as part of a sort key. If .._f_i_r_s_t___c_h_a_r_a_c_t_e_r is omitted, it refers to the first character of the field. The _f_i_e_l_d___e_n_d portion of the _k_e_y_d_e_f option-argu- ment has the form: _f_i_e_l_d___n_u_m_b_e_r[.._l_a_s_t___c_h_a_r_a_c_t_e_r] The _f_i_e_l_d___n_u_m_b_e_r is as described above for _f_i_e_l_d___s_t_a_r_t. _l_a_s_t___c_h_a_r_a_c_t_e_r, interpreted as a non-negative decimal integer, specifies the last character to be used as part of the sort key. If _l_a_s_t___c_h_a_r_a_c_t_e_r evaluates to zero or .._l_a_s_t___c_h_a_r_a_c_t_e_r is omitted, it refers to the last character of the field specified by _f_i_e_l_d___n_u_m_b_e_r. If the --bb option or bb type modifier is in effect, characters within a field are counted from the first non-blank character in the field. (This applies separately to _f_i_r_s_t___c_h_a_r_a_c_t_e_r and _l_a_s_t___c_h_a_r_a_c_t_e_r.) [++_p_o_s_1[--_p_o_s_2]] (obsolete). Provide functionality equivalent to the --kk _k_e_y_d_e_f option. _p_o_s_1 and _p_o_s_2 each have the form _m.._n optionally followed by one or more of the flags bbddffiiMMnnrr. A starting position specified by ++_m.._n is inter- preted to mean the _n+1st character in the _m+1st field. A missing .._n means ..00, indicating the first character of the _m+1st field. If the bb flag is in effect _n is counted from the first non-blank in the _m+1st field; ++_m..00bb refers to the first non-blank character in the _m+1st field. A last position specified by --_m.._n is interpreted to mean the _nth character (including separators) after the last character of the _mth field. A missing .._n means ..0, indicating the last charac- ter of the _mth field. If the bb flag is in effect _n is counted from the last leading blank in the _m+1st field; --_m..1bb refers to the first non-blank in the _m+1st field. The fully specified _+_p_o_s_1 _-_p_o_s_2 form with type modifiers TT and UU: +ww.xxTT -yy.zzUU is equivalent to: uunnddeeffiinneedd (zz==0 & UU contains _b & _-_t is present) --kk ww++11..xx++11TT,,yy..00UU (zz==0 otherwise) --kk ww++11..xx++11TT,,yy++11..zzUU (zz > 0) Implementations support at least nine occur- rences of the sort keys (the --kk option and obso- lescent ++_p_o_s_1 and --_p_o_s_2)) which are significant in command line order. If no sort key is speci- fied, a default sort key of the entire line is used. OOPPEERRAANNDDSS The following operand is supported: _f_i_l_e A path name of a file to be sorted, merged or checked. If no _f_i_l_e operands are specified, or if a _f_i_l_e operand is --, the standard input will be used. EEXXAAMMPPLLEESS In the following examples, non-obsolescent and obsolescent ways of specifying ssoorrtt keys are given as an aid to under- standing the relationship between the two forms. Either of the following commands sorts the contents of iinnffiillee with the second field as the sort key: eexxaammppllee%% ssoorrtt --kk 22,,22 iinnffiillee eexxaammppllee%% ssoorrtt ++11 --22 iinnffiillee Either of the following commands sorts, in reverse order, the contents of iinnffiillee11 and iinnffiillee22, placing the output in oouuttffiillee and using the second character of the second field as the sort key (assuming that the first character of the second field is the field separator): eexxaammppllee%% ssoorrtt --rr --oo oouuttffiillee --kk 22..22,,22..22 iinnffiillee11 iinnffiillee22 eexxaammppllee%% ssoorrtt --rr --oo oouuttffiillee ++11..11 --11..22 iinnffiillee11 iinnffiillee22 Either of the following commands sorts the contents of iinnffiillee11 and iinnffiillee22 using the second non-blank character of the second field as the sort key: eexxaammppllee%% ssoorrtt --kk 22..22bb,,22..22bb iinnffiillee11 iinnffiillee22 eexxaammppllee%% ssoorrtt ++11..11bb --11..22bb iinnffiillee11 iinnffiillee22 Either of the following commands prints the ppaasssswwdd(4) file (user database) sorted by the numeric user ID (the third colon-separated field): eexxaammppllee%% ssoorrtt --tt :: --kk 33,,33nn //eettcc//ppaasssswwdd eexxaammppllee%% ssoorrtt --tt :: ++22 --33nn //eettcc//ppaasssswwdd Either of the following commands prints the lines of the already sorted file iinnffiillee, suppressing all but one occur- rence of lines having the same third field: eexxaammppllee%% ssoorrtt --uumm --kk 33..11,,33..00 iinnffiillee eexxaammppllee%% ssoorrtt --uumm ++22..00 --33..00 iinnffiillee EENNVVIIRROONNMMEENNTT See eennvviirroonn(5) for descriptions of the following environ- ment variables that affect the execution of ssoorrtt: LLCC__CCOOLL-- LLAATTEE, LLCC__MMEESSSSAAGGEESS, and NNLLSSPPAATTHH. LLCC__CCTTYYPPEE Determine the locale for the behavior of character classification for the --bb, --dd, --ff, --ii and --nn options. LLCC__NNUUMMEERRIICC Determine the locale for the definition of the radix character and thousands separator for the --nn option. EEXXIITT SSTTAATTUUSS The following exit values are returned: 00 All input files were output successfully, or --cc was specified and the input file was correctly sorted. 11 Under the --cc option, the file was not ordered as specified, or if the --cc and --uu options were both specified, two input lines were found with equal keys. >>11 An error occurred. FFIILLEESS //vvaarr//ttmmpp//ssttmm?????? temporary files SSEEEE AALLSSOO ccoommmm(1), jjooiinn(1), uunniiqq(1), ppaasssswwdd(4) eennvviirroonn(5) DDIIAAGGNNOOSSTTIICCSS Comments and exits with non-zero status for various trou- ble conditions (for example, when input lines are too long), and for disorders discovered under the --cc option. NNOOTTEESS When the last line of an input file is missing a nneeww--lliinnee character, ssoorrtt appends one, prints a warning message, and continues. ssoorrtt does not guarantee preservation of relative line ordering on equal keys. 18 Sep 1995 1