梁云
职称:研究员
研究所:高能效计算与应用中心
研究领域:编译技术、高能效计算机系统结构、GPU/FPGA、高层次综合和嵌入式系统
办公电话:86-10-62760779
电子邮件:ericlyun@pku.edu.cn
学院主页:https://eecs.pku.edu.cn/info/1341/6098.htm
我们收录了 "梁云" 的 102 篇 paper:
- Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs., 2020-04-09
- An analytical approach for fast and accurate design space exploration of instruction caches., 2020-03-27
- Cache-aware optimization of BAN applications., 2020-03-27
- Frequency Improvement of Systolic Array-Based CNNs on FPGAs., 2020-03-27
- Instruction Cache Locking Using Temporal Reuse Profile., 2020-03-27
- Analytical Two-Level Near Threshold Cache Exploration for Low Power Biomedical Applications., 2020-03-27
- WCET-centric partial instruction cache locking., 2020-03-27
- Design Space exploration of FPGA-based accelerators with multi-level parallelism., 2020-03-27
- Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators., 2020-03-27
- Improved procedure placement for set associative caches., 2020-03-27
- Instruction cache locking using temporal reuse profile., 2020-03-27
- Timing analysis of concurrent programs running on shared cache multi-cores., 2020-03-27
- Integrated instruction cache analysis and locking in multitasking real-time systems., 2020-03-27
- Design space exploration of multiple loops on FPGAs using high level synthesis., 2020-03-27
- Shared cache aware task mapping for WCRT minimization., 2020-03-27
- Cache-aware optimization of BAN applications., 2020-03-27
- Static analysis for fast and accurate design space exploration of caches., 2020-03-27
- Efficient custom instructions generation for system-level design., 2020-03-27
- Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores., 2020-03-27
- FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System., 2020-03-16
- Fune: An FPGA Tuning Framework for CNN Acceleration., 2020-03-13
- FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow., 2020-03-11
- Student Cluster Competition 2017, Team Peking University: Reproducing vectorization of the Tersoff multi-body potential on the Intel Broadwell architecture., 2020-02-22
- ParConnect reproducibility report., 2020-02-22
- Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices., 2020-02-19
- A Survey on 5G Network Slicing Enabling the Smart Grid., 2020-02-04
- Improving high level synthesis optimization opportunity through polyhedral transformations., 2020-01-22
- LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification., 2019-12-16
- Enabling coordinated register allocation and thread-level parallelism optimization for GPUs., 2019-12-16
- CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs., 2019-12-16
- Rapid design space exploration of two-level unified caches., 2019-12-16
- Coordinated static and dynamic cache bypassing for GPUs., 2019-12-16
- An Efficient Compiler Framework for Cache Bypassing on GPUs., 2019-12-16
- An efficient compiler framework for cache bypassing on GPUs., 2019-12-16
- Optimizing Cache Bypassing and Warp Scheduling for GPUs., 2019-12-16
- REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs., 2019-11-20
- REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs., 2019-09-25
- C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs., 2019-09-25
- Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs., 2019-09-16
- CAMAS: Static and Dynamic Hybrid Cache Management for CPU-FPGA Platforms., 2019-08-30
- E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System., 2019-08-29
- Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management., 2019-08-29
- SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps., 2019-08-14
- CuLDA: Solving Large-scale LDA Problems on GPUs., 2019-06-24
- An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs., 2019-06-18
- High-Level Synthesis: Productivity, Performance, and Software Constraints., 2019-06-02
- Hi-fi playback: tolerating position errors in shift operations of racetrack memory., 2019-05-18
- Fork path: improving efficiency of ORAM by removing redundant memory accesses., 2019-05-18
- Performance-centric register file design for GPUs using racetrack memory., 2019-05-18
- Efficient Recurrent Neural Networks using Structured Matrices in FPGAs., 2019-04-04
- Poly: Efficient Heterogeneous System and Application Management for Interactive Applications., 2019-04-02
- Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment., 2019-03-05
- Speedy: An Accelerator for Sparse Convolutional Neural Networks on FPGAs., 2019-03-05
- Optimizing the MapReduce framework on Intel Xeon Phi coprocessor., 2019-02-20
- CuLDA_CGS: solving large-scale LDA problems on GPUs., 2019-02-08
- A coordinated tiling and batching framework for efficient GEMM on GPUs., 2019-02-08
- Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs., 2019-02-07
- Enabling high performance deep learning networks on embedded systems., 2019-01-19
- TGPA: tile-grained pipeline architecture for low latency CNN inference., 2019-01-07
- FlexCL: A Model of Performance and Power for OpenCL Workloads on FPGAs., 2018-12-01
- A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM., 2018-11-22
- SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs., 2018-11-21
- cuMBIR: An Efficient Framework for Low-dose X-ray CT Image Reconstruction on GPUs., 2018-11-21
- Efficient Kernel Management on GPUs., 2018-11-06
- FlexCL: An Analytical Performance Model for OpenCL Workloads on Flexible FPGAs., 2018-11-06
- Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs., 2018-11-06
- Throughput-oriented kernel porting onto FPGAs., 2018-11-06
- A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model., 2018-11-06
- Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs., 2018-11-06
- Cache modeling in probabilistic execution time analysis., 2018-11-06
- CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs., 2018-11-06
- Programming FPGAs Using OpenCL from Performance Model to Application Study., 2018-11-06
- Run-Time Technique for Simultaneous Aging and Power Optimization in GPGPUs., 2018-11-06
- COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications., 2018-09-01
- A hybrid approach to cache management in heterogeneous CPU-FPGA platforms., 2018-09-01
- CuMF_SGD: Fast and Scalable Matrix Factorization., 2018-08-13
- Efficient Recurrent Neural Networks using Structured Matrices in FPGAs., 2018-08-13
- C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs., 2018-08-13
- CuLDA_CGS: Solving Large-scale LDA Problems on GPUs., 2018-08-13
- Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor., 2018-08-13
- Quantitative performance and power analysis of LTE using high level synthesis., 2018-07-17
- Exploring cache bypassing and partitioning for multi-tasking on GPUs., 2018-04-09
- Scale-Free Sparse Matrix-Vector Multiplication on Many-Core Architectures., 2017-12-14
- FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow., 2017-09-19
- MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors., 2017-09-19
- Real-time implementation and performance optimization of 3D sound localization on GPUs., 2017-09-19
- High level synthesis of stereo matching: Productivity, performance, and software constraints., 2017-09-19
- Multilevel Granularity Parallelism Synthesis on FPGAs., 2017-09-19
- Efficient kernel management on GPUs., 2017-09-19
- Efficient GPU Spatial-Temporal Multitasking., 2017-09-19
- Register and thread structure optimization for GPUs., 2017-09-19
- Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems., 2017-09-19
- A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (Abstract Only)., 2017-09-19
- Integrated CUDA-to-FPGA Synthesis with Network-on-Chip., 2017-09-19
- WCET-Centric dynamic instruction cache locking., 2017-09-19
- GPU Accelerated Counterexample Generation in LTL Model Checking., 2017-09-19
- A study of high-level synthesis: Promises and challenges., 2017-09-19
- An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization., 2017-09-19
- Performance-Centric Optimization for Racetrack Memory Based Register File on GPUs., 2017-09-19
- An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization., 2017-09-19
- Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi., 2017-09-19
- High-level synthesis of multiple dependent CUDA kernels on FPGA., 2017-09-19