当前位置: 首页 > news >正文

CS149 assignment2

目录
  • assignment2
    • 无依赖的任务系统
      • 每次创建线程
      • 自旋线程池
      • 休眠线程池
    • 有依赖的任务系统

环境

OS:Windows11 wsl2 6.6.87.2-microsoft-standard-WSL2 Ubuntu 24.04.3 LTS

CPU: Intel Core i7 13620H 8 cores, 10 logic processors, AVX2

GPU:NVIDIA GeForce RTX 4060 Laptop

assignment2

这一部分要求你补全一个类似于ISPC的任务系统,其余部分已经完成,你需要完成负载分配部分

你需要完成无依赖和有依赖的任务系统,并且要用每次创建线程,使用线程池且线程自旋,使用线程池且线程休眠三种方式实现

无依赖的任务系统

每次创建线程

这是trivial的,每次创建线程执行任务即可,为了平均负载,每当有空闲进程时就取出下一个任务来给该线程执行

注意当前执行到了哪一个进程这个变量会被多个线程使用,需要声明为原子变量

//tasksys.h
class TaskSystemParallelSpawn: public ITaskSystem {public:TaskSystemParallelSpawn(int num_threads);~TaskSystemParallelSpawn() override;const char* name();void run(IRunnable* runnable, int num_total_tasks);TaskID runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps);void sync();private:int Num_Threads;std::atomic <int> task_ptr;std::thread *thread_ptr;
};//tasksys.cpp
const char* TaskSystemParallelSpawn::name() {return "Parallel + Always Spawn";
}TaskSystemParallelSpawn::TaskSystemParallelSpawn(int num_threads) : ITaskSystem(num_threads), Num_Threads(num_threads), thread_ptr(new std::thread[num_threads]),task_ptr(0) {//// TODO: CS149 student implementations may decide to perform setup// operations (such as thread pool construction) here.// Implementations are free to add new class member variables// (requiring changes to tasksys.h).//
}TaskSystemParallelSpawn::~TaskSystemParallelSpawn() {delete[] thread_ptr;
}void TaskSystemParallelSpawn::run(IRunnable* runnable, int num_total_tasks){//// TODO: CS149 students will modify the implementation of this// method in Part A.  The implementation provided below runs all// tasks sequentially on the calling thread.//task_ptr = 0;auto work = [&]() {while(1) {int task_id = task_ptr.fetch_add(1);if (task_id >= num_total_tasks) break;runnable->runTask(task_id, num_total_tasks);}};for (int i = 0; i < Num_Threads; i++) {thread_ptr[i] = std::thread(work);}for (int i = 0; i < Num_Threads; i++) {thread_ptr[i].join();}
}TaskID TaskSystemParallelSpawn::runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps) {// You do not need to implement this method.return 0;
}void TaskSystemParallelSpawn::sync() {// You do not need to implement this method.return;
}

自旋线程池

每次都创建线程会产生额外的开销,为了减少开销,可以在任务系统被创建的时候就创建线程,在需要的时候执行任务,否则自旋,当任务系统调用析构函数的时候,结束所有线程

//tasksys.h
class TaskSystemParallelThreadPoolSpinning: public ITaskSystem {public:TaskSystemParallelThreadPoolSpinning(int num_threads);~TaskSystemParallelThreadPoolSpinning();const char* name();void run(IRunnable* runnable, int num_total_tasks);TaskID runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps);void sync();private:int Num_Threads;std::thread *thread_ptr;IRunnable *task;int task_num;std::atomic <bool> end;std::atomic<int> task_ptr, task_done;std::mutex mtx;
};//tasksys.cpp
const char* TaskSystemParallelThreadPoolSpinning::name() {return "Parallel + Thread Pool + Spin";
}TaskSystemParallelThreadPoolSpinning::TaskSystemParallelThreadPoolSpinning(int num_threads): ITaskSystem(num_threads),Num_Threads(num_threads),thread_ptr(new std::thread[num_threads]),task(nullptr), task_num(0), task_ptr(0),end(false) {//// TODO: CS149 student implementations may decide to perform setup// operations (such as thread pool construction) here.// Implementations are free to add new class member variables// (requiring changes to tasksys.h).//auto work = [&]() {while (1) {if (end) break;int task_id = -1; {std::unique_lock <std::mutex> lock(mtx);if (task_ptr < task_num) {task_id = task_ptr;task_ptr++;}}if (task_id != -1) {task->runTask(task_id, task_num);task_done.fetch_add(1);}}};for (int i = 0; i < num_threads; i++) {thread_ptr[i] = std::thread(work);}
}TaskSystemParallelThreadPoolSpinning::~TaskSystemParallelThreadPoolSpinning() {end = true;for (int i = 0; i < Num_Threads; i++) {thread_ptr[i].join();}delete[] thread_ptr;
}void TaskSystemParallelThreadPoolSpinning::run(IRunnable* runnable, int num_total_tasks) {//// TODO: CS149 students will modify the implementation of this// method in Part A.  The implementation provided below runs all// tasks sequentially on the calling thread.//mtx.lock();task = runnable;task_num = num_total_tasks;task_ptr = 0;task_done = 0;mtx.unlock();while (task_done < task_num) {}
}TaskID TaskSystemParallelThreadPoolSpinning::runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps) {// You do not need to implement this method.return 0;
}void TaskSystemParallelThreadPoolSpinning::sync() {// You do not need to implement this method.return;
}

休眠线程池

当一个线程在没有任务自旋的时候,仍然会占用CPU,考虑在没有剩余任务的时候让线程自旋

C++提供了条件变量condition_variable来进行控制

对于一个条件变量,可以通过其成员函数wait(std::unique_lock, pred) 使得拥有指定的互斥锁的线程休眠,知道接到通知并且predtrue(如果不设置pred,可能出现丢失唤醒和虚假唤醒);通过成员函数notify_all/one 来通知该条件变量的所有/随机一个线程

//tasksys.h
class TaskSystemParallelThreadPoolSleeping: public ITaskSystem {public:TaskSystemParallelThreadPoolSleeping(int num_threads);~TaskSystemParallelThreadPoolSleeping();const char* name();void run(IRunnable* runnable, int num_total_tasks);TaskID runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps);void sync();private:int Num_Threads;std::thread *thread_ptr;IRunnable *task;int task_num;std::atomic <bool> end;std::atomic<int> task_ptr, task_done;std::mutex mtx;std::condition_variable cond;
};//tasksys.cpp
const char* TaskSystemParallelThreadPoolSleeping::name() {return "Parallel + Thread Pool + Sleep";
}TaskSystemParallelThreadPoolSleeping::TaskSystemParallelThreadPoolSleeping(int num_threads): ITaskSystem(num_threads),Num_Threads(num_threads),thread_ptr(new std::thread[num_threads]),task(nullptr),end(false), task_num(0),task_done(0),task_ptr(0) {//// TODO: CS149 student implementations may decide to perform setup// operations (such as thread pool construction) here.// Implementations are free to add new class member variables// (requiring changes to tasksys.h).//auto work = [&]() {while (1) {int task_id = -1; {std::unique_lock <std::mutex> lock(mtx);cond.wait(lock, [&]() {return (task_ptr < task_num) || end;});if (end) break;task_id = task_ptr;task_ptr++;             }if (task_id != -1) {task->runTask(task_id, task_num);task_done.fetch_add(1);}}};for (int i = 0; i < Num_Threads; i++) {thread_ptr[i] = std::thread(work);}
}TaskSystemParallelThreadPoolSleeping::~TaskSystemParallelThreadPoolSleeping() {//// TODO: CS149 student implementations may decide to perform cleanup// operations (such as thread pool shutdown construction) here.// Implementations are free to add new class member variables// (requiring changes to tasksys.h).//end = 1;cond.notify_all();for (int i = 0; i < Num_Threads; i++) {thread_ptr[i].join();}
}void TaskSystemParallelThreadPoolSleeping::run(IRunnable* runnable, int num_total_tasks) {//// TODO: CS149 students will modify the implementation of this// method in Parts A and B.  The implementation provided below runs all// tasks sequentially on the calling thread.//mtx.lock();task = runnable;task_num = num_total_tasks;task_done = 0;task_ptr = 0;mtx.unlock();cond.notify_all();while (task_done < task_num) {}}TaskID TaskSystemParallelThreadPoolSleeping::runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps) {//// TODO: CS149 students will implement this method in Part B.//return 0;
}void TaskSystemParallelThreadPoolSleeping::sync() {//// TODO: CS149 students will modify the implementation of this method in Part B.//return;
}

测试结果如下

katyusha@Katyusha-PC:~/lesson/CS149/asst2/part_a$ python3 ../tests/run_test_harness.py
runtasks_ref
Linux x86_64
================================================================================
Running task system grading harness... (11 total tests)- Detected CPU with 16 execution contexts- Task system configured to use at most 16 threads
================================================================================
================================================================================
Executing test: super_super_light...
Reference binary: ./runtasks_ref_linux
Results for: super_super_lightSTUDENT   REFERENCE   PERF?
[Serial]                                3.047     3.205       0.95  (OK)
[Parallel + Always Spawn]               187.19    185.967     1.01  (OK)
[Parallel + Thread Pool + Spin]         17.331    24.842      0.70  (OK)
[Parallel + Thread Pool + Sleep]        49.395    50.039      0.99  (OK)
================================================================================
Executing test: super_light...
Reference binary: ./runtasks_ref_linux
Results for: super_lightSTUDENT   REFERENCE   PERF?
[Serial]                                40.591    39.736      1.02  (OK)
[Parallel + Always Spawn]               192.036   192.494     1.00  (OK)
[Parallel + Thread Pool + Spin]         16.801    28.911      0.58  (OK)
[Parallel + Thread Pool + Sleep]        50.339    49.974      1.01  (OK)
================================================================================
Executing test: ping_pong_equal...
Reference binary: ./runtasks_ref_linux
Results for: ping_pong_equalSTUDENT   REFERENCE   PERF?
[Serial]                                647.781   650.243     1.00  (OK)
[Parallel + Always Spawn]               226.69    232.931     0.97  (OK)
[Parallel + Thread Pool + Spin]         127.354   151.776     0.84  (OK)
[Parallel + Thread Pool + Sleep]        141.546   151.333     0.94  (OK)
================================================================================
Executing test: ping_pong_unequal...
Reference binary: ./runtasks_ref_linux
Results for: ping_pong_unequalSTUDENT   REFERENCE   PERF?
[Serial]                                1190.3    1201.024    0.99  (OK)
[Parallel + Always Spawn]               296.642   301.453     0.98  (OK)
[Parallel + Thread Pool + Spin]         191.766   201.452     0.95  (OK)
[Parallel + Thread Pool + Sleep]        205.871   201.991     1.02  (OK)
================================================================================
Executing test: recursive_fibonacci...
Reference binary: ./runtasks_ref_linux
Results for: recursive_fibonacciSTUDENT   REFERENCE   PERF?
[Serial]                                563.32    1044.844    0.54  (OK)
[Parallel + Always Spawn]               96.153    147.336     0.65  (OK)
[Parallel + Thread Pool + Spin]         95.011    161.128     0.59  (OK)
[Parallel + Thread Pool + Sleep]        95.246    136.351     0.70  (OK)
================================================================================
Executing test: math_operations_in_tight_for_loop...
Reference binary: ./runtasks_ref_linux
Results for: math_operations_in_tight_for_loopSTUDENT   REFERENCE   PERF?
[Serial]                                321.704   364.809     0.88  (OK)
[Parallel + Always Spawn]               902.166   959.155     0.94  (OK)
[Parallel + Thread Pool + Spin]         109.294   148.513     0.74  (OK)
[Parallel + Thread Pool + Sleep]        233.708   260.56      0.90  (OK)
================================================================================
Executing test: math_operations_in_tight_for_loop_fewer_tasks...
Reference binary: ./runtasks_ref_linux
Results for: math_operations_in_tight_for_loop_fewer_tasksSTUDENT   REFERENCE   PERF?
[Serial]                                355.836   368.551     0.97  (OK)
[Parallel + Always Spawn]               956.475   954.91      1.00  (OK)
[Parallel + Thread Pool + Spin]         152.932   179.352     0.85  (OK)
[Parallel + Thread Pool + Sleep]        258.914   261.421     0.99  (OK)
================================================================================
Executing test: math_operations_in_tight_for_loop_fan_in...
Reference binary: ./runtasks_ref_linux
Results for: math_operations_in_tight_for_loop_fan_inSTUDENT   REFERENCE   PERF?
[Serial]                                184.691   187.97      0.98  (OK)
[Parallel + Always Spawn]               129.168   125.984     1.03  (OK)
[Parallel + Thread Pool + Spin]         39.776    45.638      0.87  (OK)
[Parallel + Thread Pool + Sleep]        51.235    53.255      0.96  (OK)
================================================================================
Executing test: math_operations_in_tight_for_loop_reduction_tree...
Reference binary: ./runtasks_ref_linux
Results for: math_operations_in_tight_for_loop_reduction_treeSTUDENT   REFERENCE   PERF?
[Serial]                                185.884   190.249     0.98  (OK)
[Parallel + Always Spawn]               57.856    58.765      0.98  (OK)
[Parallel + Thread Pool + Spin]         38.416    38.12       1.01  (OK)
[Parallel + Thread Pool + Sleep]        41.4      38.552      1.07  (OK)
================================================================================
Executing test: spin_between_run_calls...
Reference binary: ./runtasks_ref_linux
Results for: spin_between_run_callsSTUDENT   REFERENCE   PERF?
[Serial]                                219.686   373.554     0.59  (OK)
[Parallel + Always Spawn]               112.957   191.493     0.59  (OK)
[Parallel + Thread Pool + Spin]         192.487   238.134     0.81  (OK)
[Parallel + Thread Pool + Sleep]        122.142   191.913     0.64  (OK)
================================================================================
Executing test: mandelbrot_chunked...
Reference binary: ./runtasks_ref_linux
Results for: mandelbrot_chunkedSTUDENT   REFERENCE   PERF?
[Serial]                                273.413   274.303     1.00  (OK)
[Parallel + Always Spawn]               30.291    28.085      1.08  (OK)
[Parallel + Thread Pool + Spin]         25.06     26.769      0.94  (OK)
[Parallel + Thread Pool + Sleep]        26.543    25.636      1.04  (OK)
================================================================================
Overall performance results
[Serial]                                : All passed Perf
[Parallel + Always Spawn]               : All passed Perf
[Parallel + Thread Pool + Spin]         : All passed Perf
[Parallel + Thread Pool + Sleep]        : All passed Perf

有依赖的任务系统

写成屎山了,调红温了,不想改了先放在这里

//tasksys.h
class TaskSystemParallelThreadPoolSleeping: public ITaskSystem {public:TaskSystemParallelThreadPoolSleeping(int num_threads);~TaskSystemParallelThreadPoolSleeping();const char* name();void run(IRunnable* runnable, int num_total_tasks);TaskID runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps);void sync();private:std::thread *thread_ptr;IRunnable *task;int Num_threads;std::atomic<TaskID> task_cnt;std::deque<TaskID> task_que;struct Task {std::vector <int> suf;IRunnable *task;std::atomic <int> task_ptr, task_done, pre_done;int task_num, pre_cnt;Task() = default;Task(std::vector<int> S, IRunnable *T, int Task_num, int Pre_cnt) : suf(S),task(T),task_num(Task_num),pre_cnt(Pre_cnt) {}Task(Task&& other) noexcept: pre_cnt(other.pre_cnt), pre_done(other.pre_done.load()),suf(std::move(other.suf)),task(other.task),task_num(other.task_num),task_ptr(other.task_ptr.load()),task_done(other.task_done.load()) {}};std::vector <Task> task_vec;std::mutex mtx;std::condition_variable cond;std::atomic<bool> end;
};//tasksys.cpp/** ================================================================* Parallel Thread Pool Spinning Task System Implementation* ================================================================*/const char* TaskSystemParallelThreadPoolSpinning::name() {return "Parallel + Thread Pool + Spin";
}TaskSystemParallelThreadPoolSpinning::TaskSystemParallelThreadPoolSpinning(int num_threads): ITaskSystem(num_threads) {// NOTE: CS149 students are not expected to implement TaskSystemParallelThreadPoolSpinning in Part B.
}TaskSystemParallelThreadPoolSpinning::~TaskSystemParallelThreadPoolSpinning() {}void TaskSystemParallelThreadPoolSpinning::run(IRunnable* runnable, int num_total_tasks) {// NOTE: CS149 students are not expected to implement TaskSystemParallelThreadPoolSpinning in Part B.for (int i = 0; i < num_total_tasks; i++) {runnable->runTask(i, num_total_tasks);}
}TaskID TaskSystemParallelThreadPoolSpinning::runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps) {// NOTE: CS149 students are not expected to implement TaskSystemParallelThreadPoolSpinning in Part B.for (int i = 0; i < num_total_tasks; i++) {runnable->runTask(i, num_total_tasks);}return 0;
}void TaskSystemParallelThreadPoolSpinning::sync() {// NOTE: CS149 students are not expected to implement TaskSystemParallelThreadPoolSpinning in Part B.return;
}/** ================================================================* Parallel Thread Pool Sleeping Task System Implementation* ================================================================*/const char* TaskSystemParallelThreadPoolSleeping::name() {return "Parallel + Thread Pool + Sleep";
}TaskSystemParallelThreadPoolSleeping::TaskSystemParallelThreadPoolSleeping(int num_threads): ITaskSystem(num_threads),Num_threads(num_threads),thread_ptr(new std::thread[num_threads]),end(false),task_cnt(0) {//// TODO: CS149 student implementations may decide to perform setup// operations (such as thread pool construction) here.// Implementations are free to add new class member variables// (requiring changes to tasksys.h).//auto work = [&]() {while (1) {bool tsk, ed;int task_id = -1, tbd_id = -1;{std::unique_lock <std::mutex> lock(mtx);cond.wait(lock, [&]() {return (!task_que.empty() || end);});   tsk = !task_que.empty();if (tsk) {task_id = task_que.front();if (task_vec[task_id].task_ptr >= task_vec[task_id].task_num) {task_que.pop_front();continue;}tbd_id = task_vec[task_id].task_ptr++;}ed = end;}if (ed) break;if (tsk) {std::unique_lock <std::mutex> lock(mtx);task_vec[task_id].task->runTask(tbd_id, task_vec[task_id].task_num);task_vec[task_id].task_done++;bool flag = 0;if (task_vec[task_id].task_done == task_vec[task_id].task_num) {for (const auto &id : task_vec[task_id].suf) {if (++task_vec[id].pre_done == task_vec[id].pre_cnt) {task_que.push_back(id);flag = 1;}}if (flag) cond.notify_all();}}}};for (int i = 0; i < Num_threads; i++) {thread_ptr[i] = std::thread(work);}
}TaskSystemParallelThreadPoolSleeping::~TaskSystemParallelThreadPoolSleeping() {//// TODO: CS149 student implementations may decide to perform cleanup// operations (such as thread pool shutdown construction) here.// Implementations are free to add new class member variables// (requiring changes to tasksys.h).//end = 1;cond.notify_all();for (int i = 0; i < Num_threads; i++) {thread_ptr[i].join();}delete[] thread_ptr;
}void TaskSystemParallelThreadPoolSleeping::run(IRunnable* runnable, int num_total_tasks) {//// TODO: CS149 students will modify the implementation of this// method in Parts A and B.  The implementation provided below runs all// tasks sequentially on the calling thread.//runAsyncWithDeps(runnable, num_total_tasks, std::vector<TaskID>{}); sync();
}TaskID TaskSystemParallelThreadPoolSleeping::runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,const std::vector<TaskID>& deps) {//// TODO: CS149 students will implement this method in Part B.//int pre = 0;TaskID id;std::unique_lock <std::mutex> lock(mtx);id = task_cnt++;for (const auto &idx : deps) {if (task_vec[idx].task_done < task_vec[idx].task_num) {task_vec[idx].suf.push_back(id);pre++;}}task_vec.emplace_back(std::vector<int>{}, runnable, num_total_tasks, pre);if (!pre)  {task_que.push_back(id);cond.notify_one();}return id;
}void TaskSystemParallelThreadPoolSleeping::sync() {//// TODO: CS149 students will modify the implementation of this method in Part B.//std::unique_lock <std::mutex> lock(mtx);cond.wait(lock, [&]() {return task_que.empty();});return;
}
http://www.jsqmd.com/news/425717/

相关文章:

  • Shell环境下Gitlab Runner报错全解析:从prepare environment到exit status 1的深度排查
  • SerialPlot:让串口数据可视化不再复杂的实时监控工具
  • ESP32 OLED图形显示实战:U8G2位图、汉字与动画
  • 2026年无锡离婚律师厂家最新推荐:无锡锡山区律师/无锡交通事故律师/无锡十佳律师咨询事务所/无锡取保候审律师/选择指南 - 优质品牌商家
  • VideoAgentTrek-ScreenFilter实操案例:检测结果对接Prometheus实现GPU利用率告警
  • LingBot-Depth深度补全功能实测:RGB+稀疏深度生成完整3D
  • TranslucentTB透明任务栏启动故障全解决方案:从诊断到长效维护
  • 2026年初音乐留学机构解析:如何精准匹配海外名校教授? - 2026年企业推荐榜
  • Lumafly:革新性空洞骑士模组管理工具
  • UDOP-large基础教程:UDOP-large模型结构与文档多模态原理
  • 3个突破点:猫抓Cat-Catch资源获取工具的技术革新与场景落地
  • nvm与Node.js环境配置全攻略:从安装到镜像优化
  • 2026年检漏仪厂家推荐:移动式氦质谱检漏仪、模块式氦质谱检漏仪、氦检仪、真空箱检漏系统、双分子泵氦质谱检漏仪选择指南 - 优质品牌商家
  • Verilog实战:从零搭建74HC283超前进位加法器(附完整仿真代码)
  • 猫抓:解决网页资源提取难题的高效智能工具
  • Qwen2-VL-2B-Instruct提示词工程实战:如何让模型更懂你的图片
  • Windows窗口置顶工具:让重要窗口始终保持可见的实用解决方案
  • Hunyuan-MT-7B快速入门:10分钟学会调用翻译API
  • 如何从视频中高效提取PPT内容?开源工具extract-video-ppt全攻略
  • 突破JetBrains IDE试用期限制:ide-eval-resetter全功能使用指南
  • Fish Speech 1.5GPU算力优化:4-6GB显存占用下高并发TTS推理调优
  • 73%毕业生论文AI率过高?AIGC检测背后的真相你该知道
  • TranslucentTB:突破Windows任务栏视觉边界的轻量化美学引擎
  • 基于springboot框架的公司企业员工出差报销管理系统_04446nsn
  • 突破3大瓶颈:本地OCR技术让视频硬字幕提取效率提升80%的实战指南
  • D3D12 CopyEngine实战:如何用独立复制队列优化游戏资源加载(附性能对比)
  • ViGEmBus虚拟手柄驱动技术解析:从核心原理到实战应用
  • 如何用GetQzonehistory实现QQ空间历史记录永久保存?超简单的4步指南
  • 解锁3大核心能力:猫抓Cat-Catch媒体资源获取全场景指南
  • LoRa自组网协议设计与STM32实现:NodeBus工程实践