2022 年终总结

对2022年做个回顾,检查年初计划的完成情况,总结得失,everything for building better self.

年度目标实现情况

  • 运动健身,锻炼身体(坚持了上半年,半完成)

    • 篮球17次,羽毛球13次
  • 发表一篇论文(半完成)

  • 顺利结束研究生课程(完成)

  • 《软件分析》课程学习(完成)

  • 持续学习Rust(完成)

  • 机动车驾驶证(完成)

  • 买摩托(未完成)


今年没有完成几件完整的计划,一方面2022上半年还有一个学期的课程牵制我,不能全部精力投入研究工作,另一方面进入研二带来的论文发表的压力让我过于紧绷以至于出现了课题思路上的错误,走了不少弯路,导致论文重写重投了多次。综合各种因素,这一年零零碎碎完成了不少小事,技术树上点了一些细枝末节,但还没有产出显著的成果。


作为研究生

课题

课题A:论文投稿两次没中,重修思路重投中。

课题B:本来已经预定了工作方向,也完成了初步预实验,目前正在考虑其创新性是否足够


文献阅读

浏览69篇文献,精读40篇
(Nero)Neural reverse engineering of stripped binaries using augmented control flow graphs
Debin: Predicting Debug Information in Stripped Binaries
Intriguing Properties of Adversarial ML Attacks in the Problem Space
开源 C/C++静态软件缺陷检测工具实证研究
UQBT: adaptable binary translation at low cost
IdBench: Evaluating Semantic Representations of Identifier Names in Source Code
SAFE: Self-Attentive Function Embeddings for Binary Similarity
In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks
DIRECT : A Transformer-based Model for Decompiled Identifier Renaming
XFL: eXtreme Function Labeling
Pop Quiz! Can a Large Language Model Help With Reverse Engineering?
(DIRTY)Augmenting Decompiler Output with Learned Variable Names and Types
How could Neural Networks understand Programs?
Hierarchical Attention Graph Embedding Networks for Binary Code Similarity against Compilation Diversity
A Hierarchical Graph-Based Neural Network for Malware Classification
Inference of static semantics for incomplete C programs
The Convergence of Source Code and Binary Vulnerability Discovery – A Case Study
Learning to make compiler optimizations more effective
SoK: All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask
DIRE: A Neural Approach to Decompiled Identifier Naming
Distilling the knowledge in a neural network
OpenAI Codex 论文精读【论文精读】_哔哩哔哩_bilibili
DeepMind AlphaCode 论文精读【论文精读】_哔哩哔哩_bilibili
Multi-modal Program Inference: a Marriage of Pre-trained Language Models and Component-based Synthesis
Validity Threats in Empirical Software Engineering Research - An Initial Survey
Overview of Threats to the Validity of Research Findings
Binary Diffing as a Network Alignment Problem via Belief Propagation
Binary code is not easy
Shellcode_IA32: A Dataset for Automatic Shellcode Generation
GraphCL: Contrastive Self-Supervised Learning of Graph Representations
Graph Contrastive Learning with Augmentations
Graph Contrastive Learning Automated
Graph Self-Supervised Learning: A Survey
An Empirical Study of Graph Contrastive Learning
On Layer Normalization in the Transformer Architecture
(GGNN) Gated Graph Sequence Neural Networks
SoK: Demystifying Binary Lifters Through the Lens of Downstream Applications
SafeDrop: Detecting Memory Deallocation Bugs of Rust Programs via Static Data-Flow Analysis
Where's Crypto?: Automated Identification and Classification of Proprietary Cryptographic Primitives in Binary Code
Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection
jTrans: Jump-Aware Transformer for Binary Code Similarity Detection
The Most Common Habits from more than 200 English Papers written by
Graduate Chinese Engineering Students
A hybrid code representation learning approach for predicting method names
A Review on Source Code Documentation
Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition
GypSum: Learning Hybrid Representations for Code Summarization
Impact of Evaluation Methodologies on Code Summarization
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
GENDA: A Graph Embedded Network Based Detection Approach on encryption algorithm of binary program
BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer
DOBF: A Deobfuscation Pre-Training Objective for Programming Languages
A Survey of Available Information Recovery of Binary Programs Based on Machine Learning
Extracting Conditional Formulas for Cross-Platform Bug Search
StateFormer: fine-grained type recovery from binaries using generative state modeling
Deepbindiff: Learning program-wide code representations for binary diffing
OSPREY: Recovery of Variable and Data Structure via Probabilistic Analysis for Stripped Binary
SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings
TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity
NLP-Summarization-调研
UniXcoder: Unified Cross-Modal Pre-training for Code Representation
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
A Transformer-based Approach for Source Code Summarization
What does Transformer learn about source code?
Multi-task Learning based Pre-trained Language Model for Code Completion
A unified multi-task learning model for AST-level and token-level code completion
EditSum: a retrieve-and-edit framework for source code summarization
Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks
Dos and don’ts of machine learning in computer security

组会分享大概17篇。


关键词:Binary Anlysis, Reverse Engineering, Deep Learning, Neural Network, NLP, Binary Code Similarity, Code Summarization, Malware Detection.


项目

二进制分析的项目在21年的年终总结就提到已经基本结束了,今年7月份和实验室师兄去上海甲方那边做设备调试,准备答辩相关的事情。但是因为一些不可控的因素,没有组织起来结题答辩,只能再拖,预计最快也是2023夏天了,那会师兄人都毕业走了,估计那时候只有我了,压力有点大。


另外,和实验室这个师兄一起承接了一个企业的恶意软件分析的项目,一年期,23年9月结题。


其他

今年没有参加任何线下学术交流或者技术分享,主要还是疫情阻碍。

另外,实验室安排了活动委员之类的任务到我这里,日常在相应的工作上也要投入一点经历。


收支

今年全年坚持记账了,支出接近5w,盈余2w+。


购入:绿联个人云 4600 + 机械硬盘18TB

用于存储产出的代码和模型,及时做好数据备份。

存储个人影音文件,个人喜欢收集一些高码率的高清电影(仓鼠癖,hhhhhh)。

个人多媒体数据备份,去年买了运动相机,数据量巨大,以前的移动硬盘已经存满了,需要一个额外的大容量存储空间。另外,计划今年入一个相机,相机录制的高清照片和视频也是非常消耗存储资源。


购入:富士相机 xt4 16-80mm 二手套机

如果把影像记录比作写作,那手机就像是日记,在于便捷,相机就像是诗词,在于辞藻。我不需要非常华丽的文字,但希望能有诗词的韵味。所以选择富士相机,多数照片直出,不需要后期,享受摄影最纯粹的快乐。


不足

在科研上和老师的沟通不足,导致课题思路出现问题没有及时修正。这才有我那论文多次修改重投不中,要是早点多和几位老师或者师兄讨论,也许就不用走这些弯路了,哭死。

意外

今年7月我刚开始全身心投入科研两个星期,气胸轻度复发,直接把我之前的运动计划也打断了。

好在不严重休养了一段时间就好了,从那之后我变得佛系起来了。一方面没有课程了,另一方面也是可以不让自己由于科研压力太紧绷。2022年下半年比上半年过的要开心一点,而且和好兄弟们打游戏的次数也变多了。


12.23 新冠还是找上我了,发烧两天,嗓子疼了两天,咳嗽三天,在宿舍调整了一周,逐渐恢复科研工作。

娱乐

书籍

  • 《理想国》
  • 《我的一生 — 戈尔达·梅厄》

电影

五星满
蜘蛛侠 ★★★
忌日快乐 ★★
忌日快乐2 ★★
奇异博士2 ★★★
绞肉行动 ★★
冒牌上尉 ★★★
独行月球 ★★★
四海 ★★
奇迹笨小孩 ★★★
第九区 ★★★

年度印象深刻番剧

  • 赛博叛客:边缘行者 ★★★★

  • 异世界舅舅 ★★★★

  • 间谍过家家 ★★★★

  • 夏日重现 ★★★★★

  • 辉夜 第三季 ★★★★★

  • 更衣人偶坠入爱河 ★★★

  • 亚人 ★★★★


年度游戏

  • Valhiem 英灵神殿 ★★★★ (治好了我的电子阳痿)

  • 瘟疫传说:无罪 ★★★★

  • 人间地狱 ★★★★

  • Grounded ★★★(如果联机不拉跨将是五星)

  • Raft ★★★★

  • Pummel Party ★★★★

  • 药剂工艺:炼金模拟器 ★★★


年度最爱emoji

😁

👍

🌹

常见用法: 😁👍、👍😁和🌹🌹🌹


新年展望

希望23年上半年一定中稿一篇论文。

尽快确定下一个课题具体的路线。

如果今年身体条件允许,逐渐恢复运动。

23年上半年一定去一个地方旅游,放松下心情,2022年过得挺累的。