算法技巧-各种排序算法（三）：JDK中排序算法经典使用

站长

2024年04月12日 23:03 · 阅读数 97

JDK中主要由Arrays提供了对所有类型的排序。其中主要分为Primitive(8种基本类型)和Object两大类

基本类型：采用调优的快速排序
对象类型：采用改进的归并排序

本文源码基于JDK8

int数组类型排序

源码解读

sortint类型排序由两个重载方法，最终都指向DualPivotQuicksort.sort

public static void sort(int[] a) {
    DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);
}
public static void sort(int[] a, int fromIndex, int toIndex) {
    rangeCheck(a.length, fromIndex, toIndex);
    DualPivotQuicksort.sort(a, fromIndex, toIndex - 1, null, 0, 0);
}

都是调用DualPivotQuicksort.sort

/**
 * Sorts the specified range of the array using the given
 * workspace array slice if possible for merging
 *
 * @param a the array to be sorted
 * @param left the index of the first element, inclusive, to be sorted
 * @param right the index of the last element, inclusive, to be sorted
 * @param work a workspace array (slice)
 * @param workBase origin of usable space in work array
 * @param workLen usable size of work array
 */
static void sort(int[] a, int left, int right,
                 int[] work, int workBase, int workLen)

该方法具体实现较长，有兴趣可阅读源码，下文贴出关键实现

首先如果是小长度数据，数据长度低于286，进入插入排序或者快排

// Use Quicksort on small arrays
if (right - left < QUICKSORT_THRESHOLD) {
    sort(a, left, right, true);
    return;
}

sort主要用于小长度数组排序，定义如下

/**
 * Sorts the specified range of the array by Dual-Pivot Quicksort.
 *
 * @param a the array to be sorted
 * @param left the index of the first element, inclusive, to be sorted
 * @param right the index of the last element, inclusive, to be sorted
 * @param leftmost indicates if this part is the leftmost in the range
 */
private static void sort(int[] a, int left, int right, boolean leftmost)

如果数据长度低于47，逻辑如下,关注点是如果leftmost 参数为 true，则使用传统的插入排序算法，否则使用优化后的插入排序算法。

// Use insertion sort on tiny arrays
if (length < INSERTION_SORT_THRESHOLD) {
    if (leftmost) {
        //插入排序
        ...
    } else {
        //改进插入排序
        //如果数组长度小于 `INSERTION_SORT_THRESHOLD`，且 `leftmost` 为 `false`，则使用优化后的插入排序算法。在这种情况下，会跳过最长的升序序列。大体思路如下
        //首先通过一个循环找到第一个非递增的元素，将其索引保存在变量 `left` 中。
        //然后，使用一种更优化的插入排序算法，称为"pair insertion sort"，来进行排序。
        //该算法采用了一种更快的方法，不需要在每次迭代时检查左边界。
        //循环中的变量 `k` 表示当前待插入元素的索引，`a1` 和 `a2` 分别表示当前待插入元素和下一个待插入元素。在每次迭代中，首先比较 `a1` 和 `a2` 的大小，如果 `a1` 小于 `a2`，则交换它们的值。然后，将 `a1` 插入到已排序部分，并将 `a2` 插入到合适的位置。最后，将数组中剩余的元素与最后一个元素进行比较，并将最后一个元素放置到合适的位置上。
        ...
    }
    return;
}

如果数组长度小于286且大于47，则使用改进的快速排序，整体思路如下（源码可自行去jdk查看）

首先，定义了两个指针 less 和 great，分别指向中心部分的第一个元素和右侧部分的前一个元素。

接下来，通过比较五个已排序的元素（a[e1]、a[e2]、a[e3]、a[e4]、a[e5]），判断它们是否互不相等。如果它们不相等，则执行以下操作：

选取第二个和第四个已排序元素作为主元（pivot1 和 pivot2）。
将待排序数组的第一个和最后一个元素移动到主元原来的位置上。
通过比较，将小于 pivot1 的元素移动到左侧部分，大于 pivot2 的元素移动到右侧部分，中间部分的元素保持不变。
划分完成后，将主元放回其最终位置，并排除主元进行递归排序。

如果中心部分的大小超过数组大小的 4/7，表示中心部分太大，将执行以下操作：

跳过与主元相等的元素，找到左侧部分的末尾位置 less 和右侧部分的起始位置 great。
通过比较，将等于 pivot1 的元素移动到左侧部分，等于 pivot2 的元素移动到右侧部分，中间部分的元素保持不变。
划分完成后，对中心部分进行递归排序。

最后，递归地对左侧部分和右侧部分进行排序，直到完成整个数组的排序。

这段代码的核心思想是通过选择主元并进行划分，将数组分成三个部分（小于 pivot1、介于 pivot1 和 pivot2、大于 pivot2），然后对左侧部分和右侧部分进行递归排序，以实现快速排序算法。

最后如果数组长度大于286，则使用改进的归并排序或者快速排序

/*
 * Index run[i] is the start of i-th run
 * (ascending or descending sequence).
 */
int[] run = new int[MAX_RUN_COUNT + 1];
int count = 0; run[0] = left;

// Check if the array is nearly sorted
for (int k = left; k < right; run[count] = k) {
    if (a[k] < a[k + 1]) { // ascending
        while (++k <= right && a[k - 1] <= a[k]);
    } else if (a[k] > a[k + 1]) { // descending
        while (++k <= right && a[k - 1] >= a[k]);
        for (int lo = run[count] - 1, hi = k; ++lo < --hi; ) {
            int t = a[lo]; a[lo] = a[hi]; a[hi] = t;
        }
    } else { // equal
        for (int m = MAX_RUN_LENGTH; ++k <= right && a[k - 1] == a[k]; ) {
            if (--m == 0) {
                sort(a, left, right, true);
                return;
            }
        }
    }

    /*
     * The array is not highly structured,
     * use Quicksort instead of merge sort.
     */
    if (++count == MAX_RUN_COUNT) {
        sort(a, left, right, true);
        return;
    }
}

// Check special cases
// Implementation note: variable "right" is increased by 1.
if (run[count] == right++) { // The last run contains one element
    run[++count] = right;
} else if (count == 1) { // The array is already sorted
    return;
}

// Determine alternation base for merge
byte odd = 0;
for (int n = 1; (n <<= 1) < count; odd ^= 1);

// Use or create temporary array b for merging
int[] b;                 // temp array; alternates with a
int ao, bo;              // array offsets from 'left'
int blen = right - left; // space needed for b
if (work == null || workLen < blen || workBase + blen > work.length) {
    work = new int[blen];
    workBase = 0;
}
if (odd == 0) {
    System.arraycopy(a, left, work, workBase, blen);
    b = a;
    bo = 0;
    a = work;
    ao = workBase - left;
} else {
    b = work;
    ao = 0;
    bo = workBase - left;
}

// Merging
for (int last; count > 1; count = last) {
    for (int k = (last = 0) + 2; k <= count; k += 2) {
        int hi = run[k], mi = run[k - 1];
        for (int i = run[k - 2], p = i, q = mi; i < hi; ++i) {
            if (q >= hi || p < mi && a[p + ao] <= a[q + ao]) {
                b[i + bo] = a[p++ + ao];
            } else {
                b[i + bo] = a[q++ + ao];
            }
        }
        run[++last] = hi;
    }
    if ((count & 1) != 0) {
        for (int i = right, lo = run[count - 1]; --i >= lo;
            b[i + bo] = a[i + ao]
        );
        run[++last] = right;
    }
    int[] t = a; a = b; b = t;
    int o = ao; ao = bo; bo = o;
}

代码定义了一个名为run的整型数组，用于存储每个升序或降序序列的起始索引。count变量用于记录已经找到的序列数量，初始值为0，而run[0]被设置为最左边元素的索引。

接下来，代码通过检查相邻元素的大小关系来判断数组是否接近有序。通过逐个比较相邻元素的值，如果元素值递增，则继续向右移动指针k，直到找到一个递减的元素，表示一个升序序列的结束。类似地，如果元素值递减，则继续向右移动指针k，直到找到一个递增的元素，表示一个降序序列的结束。如果两个相邻元素的值相等，则继续向右移动指针k，直到找到一个不等于之前元素的值，或者找到的相等序列的长度达到预设的最大长度（MAX_RUN_LENGTH）。如果找到的相等序列达到最大长度，则会使用快速排序算法对整个数组进行排序，并返回。

如果数组不是高度结构化的，即找到的序列数量达到了预设的最大数量（MAX_RUN_COUNT），则同样会使用快速排序算法对整个数组进行排序，并返回。

代码接下来处理一些特殊情况。首先，检查最后一个序列是否只包含一个元素，如果是，则将其作为一个新的序列添加到run数组中。接着，检查count变量的值，如果为1，则表示数组已经完全有序，直接返回。

然后，代码确定了归并排序合并过程的交替基准。通过对count进行位操作，判断是否为奇数，从而决定使用哪个数组（a或b）作为合并的目标数组。

根据之前确定的交替基准，代码创建了一个临时数组b，用于存储合并的结果。根据奇偶性，将a数组或b数组作为源数组，将偏移量ao和bo设置为0或workBase - left（workBase是用于存储临时数组的工作区域的起始索引）。

最后，代码执行归并排序的合并过程。通过循环遍历run数组，每次处理两个相邻的序列进行合并。对于每一对序列，根据各自的起始和结束索引，将元素从源数组（a或b）复制到目标数组（b或a）中。合并完成后，更新run数组中的索引，记录合并后的序列的结束索引。如果count为奇数，表示还有一个最后的序列没有进行合并，需要单独处理。最后，交换源数组和目标数组的引用，以及偏移量ao和bo的值，为下一轮合并做准备。

总的来说，这段代码实现了归并排序算法，通过将数组分割成多个有序的子序列，并逐步合并这些子序列，最终得到一个完全有序的数组。

算法分析

数组长度	所使用的排序算法
length < 47	改进插入排序
47 <= length < 286	快速排序
length >= 286 且数组基本有序	归并排序
length >= 286 且数组基本无序	快速排序

object数组排序

源码解读

public static void sort(Object[] a) {
    if (LegacyMergeSort.userRequested)
        legacyMergeSort(a);
    else
        ComparableTimSort.sort(a, 0, a.length, null, 0, 0);
}

首先，代码检查LegacyMergeSort.userRequested变量的值。LegacyMergeSort是一个类，其中的userRequested是一个布尔型静态变量，用于表示用户是否请求使用传统的归并排序算法。

如果userRequested为true，则调用legacyMergeSort方法对数组进行排序。这意味着用户明确要求使用传统的归并排序算法来排序数组。

如果userRequested为false，则调用ComparableTimSort.sort方法对数组进行排序。ComparableTimSort是另一个类，其中的sort方法使用改进的归并排序算法（TimSort算法）来排序数组。这种算法结合了归并排序和插入排序的思想，能够高效地处理各种类型的数据，并且在大多数情况下表现良好。

legacyMergeSort使用的是传统的归并算法，代码较为简单

我们在看下ComparableTimSort.sort

/**
 * Sorts the given range, using the given workspace array slice
 * for temp storage when possible. This method is designed to be
 * invoked from public methods (in class Arrays) after performing
 * any necessary array bounds checks and expanding parameters into
 * the required forms.
 *
 * @param a the array to be sorted
 * @param lo the index of the first element, inclusive, to be sorted
 * @param hi the index of the last element, exclusive, to be sorted
 * @param work a workspace array (slice)
 * @param workBase origin of usable space in work array
 * @param workLen usable size of work array
 * @since 1.8
 */
static void sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen)

具体实现如下

assert a != null && lo >= 0 && lo <= hi && hi <= a.length;

int nRemaining  = hi - lo;
if (nRemaining < 2)
    return;  // Arrays of size 0 and 1 are always sorted

// If array is small, do a "mini-TimSort" with no merges
if (nRemaining < MIN_MERGE) {
    int initRunLen = countRunAndMakeAscending(a, lo, hi);
    binarySort(a, lo, hi, lo + initRunLen);
    return;
}

/**
 * March over the array once, left to right, finding natural runs,
 * extending short natural runs to minRun elements, and merging runs
 * to maintain stack invariant.
 */
ComparableTimSort ts = new ComparableTimSort(a, work, workBase, workLen);
int minRun = minRunLength(nRemaining);
do {
    // Identify next run
    int runLen = countRunAndMakeAscending(a, lo, hi);

    // If run is short, extend to min(minRun, nRemaining)
    if (runLen < minRun) {
        int force = nRemaining <= minRun ? nRemaining : minRun;
        binarySort(a, lo, lo + force, lo + runLen);
        runLen = force;
    }

    // Push run onto pending-run stack, and maybe merge
    ts.pushRun(lo, runLen);
    ts.mergeCollapse();

    // Advance to find next run
    lo += runLen;
    nRemaining -= runLen;
} while (nRemaining != 0);

// Merge all remaining runs to complete sort
assert lo == hi;
ts.mergeForceCollapse();
assert ts.stackSize == 1;

实现思路如下：首先，代码会检查输入参数的合法性，确保数组a不为null，lo和hi是有效的索引范围。

接下来，代码计算数组中待排序元素的数量nRemaining，如果nRemaining小于2（即数组大小为0或1），则直接返回，因为这种情况下数组已经是有序的。

如果nRemaining小于预设的最小合并长度MIN_MERGE（32），则进行一种“mini-TimSort”，即执行简化的排序过程。在这种情况下，代码会通过countRunAndMakeAscending方法确定数组中的自然有序序列（run），并使用binarySort方法对每个序列进行排序，然后返回。

对于大于等于MIN_MERGE的数组，代码创建了一个ComparableTimSort对象ts，该对象用于执行改进的归并排序算法（TimSort算法）。

接下来，代码进入一个循环，通过遍历数组，从左到右寻找自然有序序列（run），并进行合并操作以维护栈不变性。

在循环中，代码首先调用countRunAndMakeAscending方法确定下一个自然有序序列的长度runLen。

如果runLen小于minRun（最小运行长度），则会将该序列扩展到minRun的长度，以确保序列足够长。在这种情况下，代码会使用binarySort方法对序列进行排序。

然后，代码将当前的序列推送到待合并的序列栈中，并调用mergeCollapse方法来合并栈中的序列，以维护栈不变性。

接着，代码更新循环索引和剩余元素数量，继续寻找下一个序列。

循环继续，直到没有剩余元素需要排序。

最后，代码调用mergeForceCollapse方法将栈中剩余的序列进行强制合并，以完成整个排序过程。

在方法的末尾，代码进行了一些断言检查，确保排序过程正确结束。

算法分析

通过LegacyMergeSort.userRequested值判断，如果为true，则使用传统的归并算法；如果为false，则使用改进的归并排序算法

注意排序的对象需要实现Comparable接口，否则代码执行会报错

参考 1.JDK中的排序：Arrays.sort的源码实现 2.Java Arrays.sort源代码解析

转载自:https://juejin.cn/post/7278982293707472907