图像分析应用  081203M06003H

学期:2020—2021学年(春)第二学期 | 课程属性:专业研讨课 | 任课教师:王伟强
授课时间: 星期四,第1、2 节
授课地点: 教一楼108
授课周次: 3、5、6、7、8、9、10、11、12、13、15
课程编号: 081203M06003H 课时: 20 学分: 1.00
课程属性: 专业研讨课 主讲教师:王伟强 助教:
英文名称: Seminar on Image Analysis and Its Application 召集人:

教学目的、要求

本课程是计算机、自动化专业研究生开设的专业研讨课,其目的是使学生在学习图像处理与分析课程后,进一步对一些中高层次的内容进行更深层次的研讨性学习,了解现代的主流技术与成果,为进入课题阶段开阔眼界。培养研究生的研究性学习能力,动手能力科研实践能力,为将来的研究与应用打好基础。

预修课程

图像处理与分析

教 材

主要内容

Topic 1: 图象分割传统方法(灰度\彩色图像、纹理分割)
Topic 2: 基于深度学习的方法-语义分割
Topic 3: 一般对象识别与定位
Topic 4: 重要对象的检测与识别 I(人脸与行人)
Topic 5: 重要对象的检测与识别 II(文字与车)
Topic 6: 局部特征与传统图像匹配方法I
Topic 7: 度量学习与在视觉问题上的应用
Topic 8: 图像检索技术
Topic 9: RNN与图像语义标注
Topic10:RNN与手写文字识别

参考文献

[1] Badrinarayanan,V., A.Kendall and R. Cipolla (2017). "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." Ieee Transactions on Pattern Analysis and Machine Intelligence 39(12): 2481-2495.
[2] Chen, L.-C., G. Papandreou, I. Kokkinos, K. Murphy and A. L. J. a. p. a. Yuille (2014). "Semantic image segmentation with deep convolutional nets and fully connected crfs."
[3] Chen, L.-C., G. Papandreou, F. Schroff and H. J. a. p. a. Adam (2017). "Rethinking atrous convolution for semantic image segmentation."
[4] Chen, L.-C., Y. Zhu, G. Papandreou, F. Schroff and H. Adam (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV).
[5] Chen, L. C., G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille (2018). "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs." Ieee Transactions on Pattern Analysis and Machine Intelligence 40(4): 834-848.
[6] Lin, G. S., A. Milan, C. H. Shen, I. Reid and Ieee (2017). RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI.
[7] Long, J., E. Shelhamer, T. Darrell and Ieee (2015). Fully Convolutional Networks for Semantic Segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Ieee.
[8] Zhao, H., et al. (2017). Pyramid scene parsing network. Proceedings of the IEEE conference on computer vision and pattern recognition.
[9] Trigueros, Daniel Sáez, Meng L , Hartnett M . Face Recognition: From Traditional to Deep Learning Methods[J]. 2018.
[10] Mao J , Xiao T , Jiang Y , et al. What Can Help Pedestrian Detection?[C]// Computer Vision & Pattern Recognition. IEEE, 2017.
[11] Li J , Liang X , Shen S M , et al. Scale-aware Fast R-CNN for Pedestrian Detection[J]. IEEE Transactions on Multimedia, 2017:1-1.
[12] Li H , Lin Z , Shen X , et al. A convolutional neural network cascade for face detection[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2015.
[13] Zhang K , Zhang Z , Li Z , et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks[J]. IEEE Signal Processing Letters, 2016, 23(10):1499-1503.
[14] Qin H , Yan J , Li X , et al. Joint Training of Cascaded CNN for Face Detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016.
[15] Viola P , Jones M J . Robust Real-Time Face Detection[J]. International Journal of Computer Vision, 2004, 57(2):137-154.
[16] Girshick R, Donahue J, Darrell T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]// IEEE Conference on Computer Vision & Pattern Recognition. 2014.
[17] Girshick R. Fast R-CNN[J]. Computer Science, 2015.
[18] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. 2015.
[19] Redmon J , Divvala S , Girshick R , et al. You Only Look Once: Unified, Real-Time Object Detection[J]. 2015.
[20] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector[C]// European Conference on Computer Vision. 2016.
[21] Zhao Q, Sheng T, Wang Y, et al. M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network[J]. 2018.
[22] He K , Zhang X , Ren S , et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 37(9):1904-16.
[23] He K , Zhang X , Ren S , et al. Deep Residual Learning for Image Recognition[J]. 2015.
[24] Krizhevsky A , Sutskever I , Hinton G . ImageNet Classification with Deep Convolutional Neural Networks[C]// NIPS. Curran Associates Inc. 2012.
[25] Calonder M , Lepetit V , Strecha C , et al. BRIEF: Binary Robust Independent Elementary Features[C]// Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV. 2010.
[26] Agrawal M , Konolige K , Blas M R . CenSurE: Center Surround Extremas for Real time Feature Detection and Matching[J]. 2008.
[27] David G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", January 5, 2004
[28] Bay, H.; Ess, A.; Tuytelaars, T.; van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst.  2008, 110, 346–359
[29] Zachary C. Lipton: A Critical Review of Recurrent Neural Networks for Sequence Learning. ComputerScience,2015 
[30] Michael I Jordan. Serial order: A parallel distributed processing approach. Advances in psychology, 121:471–495, 1997. 
[31] Ronald J Williams and David Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280,1989. 
[32] Sepp Hochreiter and Jurgen Schmidhuber: LONG SHORT-TERM MEMORY. Neural Computation 9(8):1735-1780, 1997. 
[33] Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of Machine Learning Research (JMLR), 3:1137–1155, 2003.
[34] Mikolov Tomas. Recurrent neural network based language model. PhD thesis, Brno University of Technology. 2010.
[35] Xie Z , Sun Z , Jin L , et al. Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition[J]. 2018.
[36] Papineni, K & Roukos, Salim & Ward, T & J. Zhu, W. (2019). IBM Research Report Bleu: a Method for Automatic Evaluation of Machine Translation.
[37] Mao J, Xu W. Explain Images with Multimodal Recurrent Neural Networks[J]. Computer Science, 2014.
[38] Karpathy A , Fei-Fei L . Deep visual-semantic alignments for generating image descriptions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - Boston, MA, USA  2015:3128-3137.
[39] Vinyals O , Toshev A , Bengio S , et al. Show and Tell: A Neural Image Caption Generator[J]. 2014.
[40] Donahue J , Hendricks L A , Guadarrama S , et al. Long-term recurrent convolutional  networks for visual recognition and description[C]// Computer Vision & Pattern Recognition. IEEE, 2015.
[41] Xu K , Ba J , Kiros R , et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention[J]. Computer Science, 2015:2048-2057.
[42] Jin J , Fu K , Cui R , et al. Aligning where to see and what to tell: image caption with region-based attention and scene factorization[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 39(12):2321-2334.