梁云

职称：研究员

研究所：高能效计算与应用中心

研究领域：编译技术、高能效计算机系统结构、GPU/FPGA、高层次综合和嵌入式系统

办公电话：86-10-62760779

电子邮件：ericlyun@pku.edu.cn

学院主页：https://eecs.pku.edu.cn/info/1341/6098.htm

我们收录了 "梁云" 的 102 篇 paper：

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs., 2020-04-09
An analytical approach for fast and accurate design space exploration of instruction caches., 2020-03-27
Cache-aware optimization of BAN applications., 2020-03-27
Frequency Improvement of Systolic Array-Based CNNs on FPGAs., 2020-03-27
Instruction Cache Locking Using Temporal Reuse Profile., 2020-03-27
Analytical Two-Level Near Threshold Cache Exploration for Low Power Biomedical Applications., 2020-03-27
WCET-centric partial instruction cache locking., 2020-03-27
Design Space exploration of FPGA-based accelerators with multi-level parallelism., 2020-03-27
Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators., 2020-03-27
Improved procedure placement for set associative caches., 2020-03-27
Instruction cache locking using temporal reuse profile., 2020-03-27
Timing analysis of concurrent programs running on shared cache multi-cores., 2020-03-27
Integrated instruction cache analysis and locking in multitasking real-time systems., 2020-03-27
Design space exploration of multiple loops on FPGAs using high level synthesis., 2020-03-27
Shared cache aware task mapping for WCRT minimization., 2020-03-27
Cache-aware optimization of BAN applications., 2020-03-27
Static analysis for fast and accurate design space exploration of caches., 2020-03-27
Efficient custom instructions generation for system-level design., 2020-03-27
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores., 2020-03-27
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System., 2020-03-16
Fune: An FPGA Tuning Framework for CNN Acceleration., 2020-03-13
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow., 2020-03-11
Student Cluster Competition 2017, Team Peking University: Reproducing vectorization of the Tersoff multi-body potential on the Intel Broadwell architecture., 2020-02-22
ParConnect reproducibility report., 2020-02-22
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices., 2020-02-19
A Survey on 5G Network Slicing Enabling the Smart Grid., 2020-02-04
Improving high level synthesis optimization opportunity through polyhedral transformations., 2020-01-22
LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification., 2019-12-16
Enabling coordinated register allocation and thread-level parallelism optimization for GPUs., 2019-12-16
CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs., 2019-12-16
Rapid design space exploration of two-level unified caches., 2019-12-16
Coordinated static and dynamic cache bypassing for GPUs., 2019-12-16
An Efficient Compiler Framework for Cache Bypassing on GPUs., 2019-12-16
An efficient compiler framework for cache bypassing on GPUs., 2019-12-16
Optimizing Cache Bypassing and Warp Scheduling for GPUs., 2019-12-16
REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs., 2019-11-20
REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs., 2019-09-25
C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs., 2019-09-25
Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs., 2019-09-16
CAMAS: Static and Dynamic Hybrid Cache Management for CPU-FPGA Platforms., 2019-08-30
E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System., 2019-08-29
Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management., 2019-08-29
SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps., 2019-08-14
CuLDA: Solving Large-scale LDA Problems on GPUs., 2019-06-24
An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs., 2019-06-18
High-Level Synthesis: Productivity, Performance, and Software Constraints., 2019-06-02
Hi-fi playback: tolerating position errors in shift operations of racetrack memory., 2019-05-18
Fork path: improving efficiency of ORAM by removing redundant memory accesses., 2019-05-18
Performance-centric register file design for GPUs using racetrack memory., 2019-05-18
Efficient Recurrent Neural Networks using Structured Matrices in FPGAs., 2019-04-04
Poly: Efficient Heterogeneous System and Application Management for Interactive Applications., 2019-04-02
Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment., 2019-03-05
Speedy: An Accelerator for Sparse Convolutional Neural Networks on FPGAs., 2019-03-05
Optimizing the MapReduce framework on Intel Xeon Phi coprocessor., 2019-02-20
CuLDA_CGS: solving large-scale LDA problems on GPUs., 2019-02-08
A coordinated tiling and batching framework for efficient GEMM on GPUs., 2019-02-08
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs., 2019-02-07
Enabling high performance deep learning networks on embedded systems., 2019-01-19
TGPA: tile-grained pipeline architecture for low latency CNN inference., 2019-01-07
FlexCL: A Model of Performance and Power for OpenCL Workloads on FPGAs., 2018-12-01
A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM., 2018-11-22
SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs., 2018-11-21
cuMBIR: An Efficient Framework for Low-dose X-ray CT Image Reconstruction on GPUs., 2018-11-21
Efficient Kernel Management on GPUs., 2018-11-06
FlexCL: An Analytical Performance Model for OpenCL Workloads on Flexible FPGAs., 2018-11-06
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs., 2018-11-06
Throughput-oriented kernel porting onto FPGAs., 2018-11-06
A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model., 2018-11-06
Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs., 2018-11-06
Cache modeling in probabilistic execution time analysis., 2018-11-06
CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs., 2018-11-06
Programming FPGAs Using OpenCL from Performance Model to Application Study., 2018-11-06
Run-Time Technique for Simultaneous Aging and Power Optimization in GPGPUs., 2018-11-06
COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications., 2018-09-01
A hybrid approach to cache management in heterogeneous CPU-FPGA platforms., 2018-09-01
CuMF_SGD: Fast and Scalable Matrix Factorization., 2018-08-13
Efficient Recurrent Neural Networks using Structured Matrices in FPGAs., 2018-08-13
C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs., 2018-08-13
CuLDA_CGS: Solving Large-scale LDA Problems on GPUs., 2018-08-13
Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor., 2018-08-13
Quantitative performance and power analysis of LTE using high level synthesis., 2018-07-17
Exploring cache bypassing and partitioning for multi-tasking on GPUs., 2018-04-09
Scale-Free Sparse Matrix-Vector Multiplication on Many-Core Architectures., 2017-12-14
FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow., 2017-09-19
MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors., 2017-09-19
Real-time implementation and performance optimization of 3D sound localization on GPUs., 2017-09-19
High level synthesis of stereo matching: Productivity, performance, and software constraints., 2017-09-19
Multilevel Granularity Parallelism Synthesis on FPGAs., 2017-09-19
Efficient kernel management on GPUs., 2017-09-19
Efficient GPU Spatial-Temporal Multitasking., 2017-09-19
Register and thread structure optimization for GPUs., 2017-09-19
Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems., 2017-09-19
A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (Abstract Only)., 2017-09-19
Integrated CUDA-to-FPGA Synthesis with Network-on-Chip., 2017-09-19
WCET-Centric dynamic instruction cache locking., 2017-09-19
GPU Accelerated Counterexample Generation in LTL Model Checking., 2017-09-19
A study of high-level synthesis: Promises and challenges., 2017-09-19
An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization., 2017-09-19
Performance-Centric Optimization for Racetrack Memory Based Register File on GPUs., 2017-09-19
An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization., 2017-09-19
Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi., 2017-09-19
High-level synthesis of multiple dependent CUDA kernels on FPGA., 2017-09-19

Click to see authors list

本页面最近更新：2020/05/21，更新历史
发现错误？想一起完善？在 GitHub 上编辑此页！
本页面的全部内容在 CC BY-SA 4.0 和 SATA 协议之条款下提供，附加条款亦可能应用