CNCC技术论坛-面向人工智能芯片的编程语言和编译器-量子比特

2023-04-17 09:14:54 栏目 : 科技资讯大全围观 : 0次

摩尔随着规律发展的逐步放缓，领域特定架构芯片成为当前处理器发展的主流方向。为了满足深度学习应用对计算力的巨大需求，硬件公司推出了坎布里亚纪Cambricon、华为升腾系列、阿里巴巴光系列等各个领域特定架构的人工智能芯片。开展面向人工智能芯片的自动编译技术对推动我国人工智能芯片的发展具有重要意义。

本论坛将讨论以下问题：。

1、设计面向人工智能芯片的领域固有编程语言的方法

2、面向人工智能芯片的高效编译器设计方法

3、目前人工智能芯片上编程语言和编译器的主要痛点包括哪些

4、如何加强国产编程语言、编译器等核心系统软件的设计

论坛主席

翟季冬

清华大学计算机系聘请副教授、博士生导师。ACM中国高性能计算专家委员会秘书、北京智源年科学家。主要研究领域为高性能计算、编译优化等。相关研究成果发表在高性能计算等领域重要的国际会议和期刊SC、PPoPP、ICS、MICRO、ASPLOS、ATC、CFO、NSDI、IEEETPDS、IEEETC等。其中SC14篇论文入选会议Best Paper Finalist，为大陆学者首次入围该奖项。担任NPC2018程序委员会主席，SC2018/2019/2020、PPOPP2019/2020/2021程序委员会委员、国际期刊IEEETPDS编辑委员会、FCS和JCST年编辑委员会等。清华大学担任学生超级计算机队教练，执教的队伍9次获得世界冠军。2015年和2018年独霸SC、ISC、ASC三大国际超级计算机大赛总冠军，实现“大满贯”。教育部获科技进步一等奖、CCF优秀博士学位论文奖、国家自然科学基金优秀年科学基金。

陈文光

清华大学计算机系教授、博士生导师。CCF杰出会员和杰出演讲人，CCF副秘书，CCFYOCSEF荣誉委员。主要研究领域是操作系统，编程语言和并行计算。曾多次担任OSDI、PPoPP、CFO、SC、ICS、PLDI、ASPLOS、APSYS等高性能计算和并行计算重要国际会议的程序委员会委员。同时担任ACM中国理事会主席、ACM中国操作系统分会ChinaSys主席。分别获得国家科技进步二等奖、国家教委科技进步二等奖和北京市科技进步二等奖一次。国家杰出年基金获得者。

说话人的概要

胡振江

北京大学讲座教授，北京大学信息科学技术学院副院，计算机科学技术系主任。1996年在日本东京大学信息工学系取得博士学位。东京大学任信息理工学研究科教授，日本国立信息学研究所教授/系主任，北京大学江讲座教授。胡振江教授期从事编程语言与软件科学与工程的研究，在程序语言设计、结构化函数式编程、程序自动集成与优化、并行编程、双向转换语言的设计与实现，以及软件的演进与维护等方面做了一系列创造性的工作曾获全日本最优秀博士论文奖和日本软件科学会基础研究成果奖的日本工学会士，欧洲科学院院士，IEEEFellow，ACM杰出科学家。

演讲主题：从芯片定制到语言定制：编程语言的系统化定制及其支持环境

摘要：摩尔随着规律渐进失效以及深度学习等高效特定计算的迫切需要，我们正逐步走向一个偏爱专用定制计算设备的时代。为此，需要具有针对不同专用硬件的定制能力的软件。本报告提出了编程语言系统化定制的基本概念和应用，讨论了其支持环境的实现，并讨论了未来的课题。

卡内基梅隆大学副教授

陈天奇

Tianqi Chen is currently an Assistant Professor at the Machine Learning Department and Computer Science Department of Carnegie Mellon University. He received his PhD. from the Paul G. Allen School of Computer Science amp; Engineering at the University of Washington, working with Carlos Guestrin on the interp of machine learning and systems. He has created three major learning systems that are widely adopted： XGBoost, TVM, and MXNet （co-creator）. He is a recipient of the Google Ph.D. Fellowship in Machine Learning。

演讲标题TVM：An automated deep learning compiler

摘要。数据和models等and computing are the three pillars that enable machine learning to solve real-world problems at scale。Making progresson these three domains requires not only disruptive algorithmic advances but also systems innovations that can continueto squeeze more efficiency out of modern hardware. Learning systems are in the center of every intelligent applicationnowadays.however，the ever-growing demand for applications and hardware specialization creates a huge engineering burden for thesesystems，most of which reon heuristics or manual optimization。In this talk,I will present a new approach that uses machine learning to automate system optimizations. I will describe our approachin the context of deep learning deployment problems. Iwill first discuss how to design invariant representations thatcan lead to transferable statistical cost models and apply these representations to optimize tensor programs used in deep learning applications。I will then describe thesystem improvements we made to enable diverse hardware backends. TVM，our end-to-end system，delivers performance across hardware back-ends that are competitive with state-of-the-art，hand-tuned deep learning frameworks

卡内基梅隆大学，副教授

贾志豪

Zhihao Jia is an incoming Assistant Professor of Computer Science at CMU （starting Fall 2021、. He obtained his Ph.D. atStanford working with Alex Aiken and Matei Zaharia. His research interests lie in the interp of computer systems and，machine learning，with a focus on building efficient，scalable，and high-performance systems for ML computations。

演题：自动发现：Machine Learning Optimizations

摘要。As an increasingly important workload，machine learning（ML）applications require different performance optimization techniques from traditional runtimes andcompilers。它是In particular，to accelerate ML applications，it is generally necessary to perform ML computations on heterogeneous hardware and parallelize computations usingmultiple data dimensions，neither of which is even expressible in traditional compilers and runtimes。I will describe my work on automated discovery of performance optimizations to accelerate ML computations. TASO，the Tensor Algebra SuperOptimizer，optimizes the computation graphs of deep neural networks（DNNs）by automatically generating potential graph，optimizers and formally verifying their correctness。TASO outperforms rule-based graph optimizers in existing MLsystems （e.g., TensorFlow, TensorRT, and TVM） by up to 3X by automatically discovering novel graph optimizations,while also requiring significantly less human effort. FlexFlow is a system for accelerating distributed DNN training.FlexFlow identifies parallization dimensions not considered in existing ML systems（e.g.，TensorFlow and PyTorch）and automatically discovers fast parallelization strategies for a specific parallel machine。Companies and national labs are using FlexFlow to train production ML models that do not scale well in current MLsystems。achieving over 10x performance improvement. Iwill also outline future research directions for further automating ML是systems，such as codesigning ML models，software systems，and hardware backends forend-to-end ML deployment。

崔慧敏

博士，中国科学院计算技术研究所研究员，博士生导师。研究方向是异构芯片的编译和编程，近年来，以大数据、AI等新型计算范式为中心，研究这些应用在异构体系结构中的编译优化、编程环境优化。崔慧敏作为负责人承担了自然科学基金、重点研发计划等多项国家级项目和课题，在PLDI、MICRO、PPoPP、TPDS等国际会议和期刊上发表论文20余篇。

演讲主题：高性能智能处理器编程语言与编译器设计

摘要：以坎布里亚纪平台为代表的高性能智能处理器提供了一个通用的深度学习平台，旨在为当前和未来的智能应用提供强大的计算能力。由于未来应用的多样性和不可预测性，提供基础的高级编程语言是其生态构建和推进不可缺少的环节。我们针对这一需求，以c语言为基础，为应用和平台设计了通用的高级编程语言Bang语言，解决了用户定义操作员的灵活开发问题。此外，利用深度编译优化技术充分发挥芯片的处理能力。

阿里巴巴公司、高级主管

林伟

WeiLiniscurrentlyseniordirectorofplatformofartificialintelligence（pai） andchiefarchitectofbig-datacomputationplatforminAlibaba.15+years'experiencespecializinginbackend/infrastructure， distributed system development，storageandalarge-scalecomputationsystemincludebATCh，streaming and machine learning。

演讲主题：AI Compiler at Alibaba

摘要：withthemergingaiworkloadsandiversityofexecutingcomputinghardware，aIcompilerplaysavitalretobridgethegapbetweenmodelexpresivefilexibilityandyanderyinghimentemanpygesystem.performancesystem talk，wewillshareourexperiencesofapplyingaicompilerintoAlibaba#8217；s production environment， including：1.large-scaledeploymentofouraicompilerintopai（platformofartificialintelligence） productionclustersrunningstablyformorethan6monthswithtensofthousandsofGPUhoursaving.wewilltalkaboutouraggressivefusionandco esignstrategyinwhichacost-basedapproachisexploitedtofindtheoptimalfusionplantoboosthardwareefficiency.inaddition， lotsofexperiencestoensurethatourcompilercanbeenabledbydefaultinalarge-scaleproductionclusterwillbeshared.2.automaticcode ionframeworknamedasansor.thisworkhasalreadybeenacceptedbyOSDI2020anddeployedintoourproductionenvironment.comparedwithexististion rch strategies， ansorexploresmuchmoreoptimizationcombinationsandthuscanfindhigh-performanceprogramsthatareoutsidethesearchspaceofexistingstaation the-artapproaches.ourevaluationshowsthatansorimprovestheexecutionperformanceofdeepneuralnetworksontheintelCPU，ARM CPU， and NVIDIA GPU by up to 3：8x，2：6x，and 1：7x，respectively.3.ourthoughtsaboutthefuturedirectionofaicompilerfromindustrypry

展开剩余内容

分享到：

上一篇：谷歌量子计算硬件的领导离开了！你要挖他吗？注意扑克牌警告-量子比特下一篇：美国法官驳回特朗普工作签署禁令：超出权限范围-量子位

CNCC技术论坛-面向人工智能芯片的编程语言和编译器-量子比特

猜你喜欢

商务轻薄新宠:华硕破晓6支持AI办公,正式上架开售

realme预热海报曝光网友直呼看不懂急需大侦探解密

小生意，大爆发｜八大行业双11策略划重点

减负必备没有这些AI功能都不好意叫智能手机

推荐文章

网站分类

热门浏览

热门标签

CNCC技术论坛-面向人工智能芯片的编程语言和编译器-量子比特

猜你喜欢

商务轻薄新宠:华硕破晓6支持AI办公,正式上架开售

realme预热海报曝光 网友直呼看不懂 急需大侦探解密

小生意，大爆发｜八大行业双11策略划重点

减负必备 没有这些AI功能都不好意叫智能手机

推荐文章

网站分类

热门浏览

热门标签

realme预热海报曝光网友直呼看不懂急需大侦探解密

减负必备没有这些AI功能都不好意叫智能手机