当前位置：首页 > news >正文

【中等】出现次数的TOPK问题－Java：原问题

news 2026/4/24 4:46:17

分享一个大牛的人工智能教程。零基础！通俗易懂！风趣幽默！希望你也加入到人工智能的队伍中来！请轻击人工智能教程大家好！欢迎来到我的网站！人工智能被认为是一种拯救世界、终结世界的技术。毋庸置疑，人工智能时代就要来临了，科… 继续阅读前言https://www.captainai.net/troubleshooter

package live.every.day.ProgrammingDesign.CodingInterviewGuide.Other; import java.util.HashMap; import java.util.Map; /** * 出现次数的TOPK问题 * * 【题目】 * 给定String类型的数组strArr，再给定整数k，请严格按照排名顺序打印出现次数前k名的字符串。 * * 【举例】 * strArr=["1","2","3","4"]，k=2 * No.1: 1, times: 1 * No.2: 2, times: 1 * 这种情况下，所有的字符串都出现一样多，随便打印任何两个字符串都可以。 * * strArr=["1","1","2","3"]，k=2 * 输出： * No.1: 1, times: 2 * No.2: 2, times: 1 * 或者输出： * No.1: 1, times: 2 * No.2: 3, times: 1 * * 【要求】 * 如果strArr长度为N，时间复杂度请达到O(Nlogk)。 * * 【进阶题目】 * 设计并实现TopKRecord结构，可以不断地向其中加入字符串，并且可以根据字符串出现的情况随时打印加入次数最多前k个字符串， * 具体为： * 1.k在TopKRecord实例生成时指定，并且不再变化(k是构造函数的参数)。 * 2.含有add(String str)方法，即向TopKRecord中加入字符串。 * 3.含有printTopK()方法，即打印加入次数最多的前k个字符串，打印有哪些字符串和对应的次数即可，不要求严格按排名顺序打印。 * * 【举例】 * TopKRecord record = new TopKRecord(2); // 打印Top2的结构 * record.add("A"); * record.printTopK(); * 此时打印： * TOP: * Str: A Times: 1 * * record.add("B"): * record.add("B"); * record.printTopK(); * 此时打印： * TOP: * Str: A Times: 1 * Str: B Times: 2 * 或者打印 * TOP: * Str: B Times: 2 * Str: A Times: 1 * * record.add("C"); * record.add("C"); * record.printTopK(); * 此时打印： * TOP: * Str: B Times: 2 * Str: C Times: 2 * 或者打印 * TOP: * Str: C Times: 2 * Str: B Times: 2 * * 【要求】 * 1.在任何时刻，add方法的时问复杂度不超过O(logk)。 * 2.在任何时刻，printTopK方法的时间复杂度不超过O(k)。 * * 【难度】 * 原问题：中等 * 进阶问题：困难 * * 【解答】 * 原问题。首先遍历strArr并统计字符串的词频，例如，strArr=["a","b","b"，"a"，"c"]，遍历后可以生成每种字符串及其相关 * 词频的哈希表如下。 * 用哈希表的每条信息可以生成Node类的实例，Node类如下。 * 哈希表中有多少信息，就建立多少Node类的实例，并且依次放入堆中，具体过程为： * 1.建立一个大小为k的小根堆，这个堆放入的是Node类的实例。 * 2.遍历哈希表的每条记录，假设一条记录为(s,t)，s表示一种字符串，s的词频为t，则生成Node类的实例，记为(str,times)。 * 1）如果小根堆没有满，就直按将(str,times)加入堆，然后进行建堆调整(heapInsert调整)，堆中Node类实例之间都以词频 * (times)来进行比较，词频越小，位置越往上。 * 2）如果小根堆已满，说明此时小根堆已经选出k个最高词频的字符串，那么整个小根堆的堆顶自然代表已经选出的k个最高词频的字符 * 串中，词频最低的那个。堆顶的元素记为(headStr,minTimes)。如果minTimes<times，说明字符串str有资格进入当前k个最高 * 词频字符串的范围。而headStr应该被移出这个范围，所以把当前的堆顶(headStr,minTimes)替换成(str,times)，然后从堆顶 * 的位置进行堆的调整(heapify)。如果minTimes>=times，说明字符串str没有资格进入当前k个最高词频字符串的范围，因为str * 的词频还不如目前选出的k个最高词频字符串中词频最少的那个，所以什么也不做。 * 3.遍历完strArr之后，小根堆里就是所有字符串中k个最高词频的字符串，但要求严格按排名打印，所以还需要根据词频从大到小完成 * k个元素间的排序。 * 遍历strArr建立哈希表的过程为O(N)。哈希表中记录的条数最多为N条，每一条记录进堆时，堆的调整时间复杂度为O(logk)，所以 * 根据记录更新小根堆的过程为O(Nlogk)。k条记录排序的时间复杂度为O(klogk)。所以总的时间复杂度为 * O(N)+O(Nlogk)+O(klogk)，即O(Nlogk)。 * 具体过程请参看如下代码中的printTopKAndRank方法。 * * @author Created by LiveEveryDay */ public class AppearTimesTopKProblem1 { public static class Node { public String str; public int times; public Node(String s, int t) { str = s; times = t; } } public static void printTopKAndRank(String[] arr, int topK) { if (arr == null || topK < 1) { return; } HashMap<String, Integer> map = new HashMap<>(); // 生成哈希表（字符串词频） for (int i = 0; i != arr.length; i++) { String cur = arr[i]; if (!map.containsKey(cur)) { map.put(cur, 1); } else { map.put(cur, map.get(cur) + 1); } } Node[] heap = new Node[topK]; int index = 0; // 遍历哈希表，决定每条信息是否进堆 for (Map.Entry<String, Integer> entry : map.entrySet()) { String str = entry.getKey(); int times = entry.getValue(); Node node = new Node(str, times); if (index != topK) { heap[index] = node; heapInsert(heap, index++); } else { if (heap[0].times < node.times) { heap[0] = node; heapify(heap, 0, topK); } } } // 把小根堆的所有元素按词频从大到小排序 for (int i = index - 1; i != 0; i--) { swap(heap, 0, i); heapify(heap, 0, i); } // 严格按照排名打印k条记录 for (int i = 0; i != heap.length; i++) { if (heap[i] == null) { break; } else { System.out.print("No." + (i + 1) + ": "); System.out.print(heap[i].str + ", times: "); System.out.println(heap[i].times); } } } public static void heapInsert(Node[] heap, int index) { while (index != 0) { int parent = (index - 1) >> 1; if (heap[index].times < heap[parent].times) { swap(heap, parent, index); index = parent; } else { break; } } } public static void heapify(Node[] heap, int index, int heapSize) { int left = index * 2 + 1; int right = index * 2 + 2; int smallest = index; while (left < heapSize) { if (heap[left].times < heap[index].times) { smallest = left; } if (right < heapSize && heap[right].times < heap[smallest].times) { smallest = right; } if (smallest != index) { swap(heap, smallest, index); } else { break; } index = smallest; left = index * 2 + 1; right = index * 2 + 2; } } public static void swap(Node[] heap, int index1, int index2) { Node tmp = heap[index1]; heap[index1] = heap[index2]; heap[index2] = tmp; } public static void main(String[] args) { String[] arr = {"C", "D", "D", "E", "E"}; int topK = 2; printTopKAndRank(arr, topK); } } // ------ Output ------ /* No.1: D, times: 2 No.2: E, times: 2 */

查看全文

http://www.jsqmd.com/news/690821/