Inceptionv3
Inception v3[1][2] is a convolutional neural network (CNN) for assisting in image analysis and object detection, and got its start as a module for GoogLeNet. It is the third edition of Google's Inception Convolutional Neural Network, originally introduced during the ImageNet Recognition Challenge. The design of Inceptionv3 was intended to allow deeper networks while also keeping the number of parameters from growing too large: it has "under 25 million parameters", compared against 60 million for AlexNet.[1]
Just as ImageNet can be thought of as a database of classified visual objects, Inception helps classification of objects[3] in the world of computer vision. The Inceptionv3 architecture has been reused in many different applications, often used "pre-trained" from ImageNet. One such use is in life sciences, where it aids in the research of leukemia.[4]
It was historically important as an early CNN that separates the stem (data ingest), body (data processing), and head (prediction), an architectural design that persists in all modern CNN.[5]
Version history
[edit]Inception v1
[edit]In 2014, a team at Google developed GoogLeNet, which won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The name came from the LeNet of 1998, since both LeNet and GoogLeNet are CNNs. They also called it "Inception" after a "we need to go deeper" internet meme, a phrase from Inception (2010) the film.[1] Because later, more versions were released, the original Inception architecture was renamed again as "Inception v1".
The models and the code were released under Apache 2.0 license on GitHub.[6]
The Inception v1 architecture is a deep CNN composed of 22 layers. Most of these layers were "Inception modules". The original paper stated that Inception modules are a "logical culmination" of Network in Network[7] and.[8]
Since Inception v1 is deep, it suffered from the vanishing gradient problem. The team solved it by using two "auxiliary classifiers", which are linear-softmax classifiers inserted at 1/3-deep and 2/3-deep within the network, and the loss function is a weighted sum of all three:
These were removed after training was complete. This was later solved by the ResNet architecture.
The architecture consists of three parts stacked on top of one another:[5]
- The stem (data ingestion): The first few convolutional layers perform data preprocessing to downscale images to a smaller size.
- The body (data processing): The next many Inception modules perform the bulk of data processing.
- The head (prediction): The final fully-connected layer and softmax produces a probability distribution for image classification.
This structure is used in most modern CNN architectures.
Inception v2
[edit]Inception v2 was released in.[9] It improves on Inception v1 by using factorized convolutions. For example, a single 5×5 convolution can be factored into 3×3 stacked on top of another 3×3.
Both has a receptive field of size 5×5. The 5×5 convolution kernel has 25 parameters, compared to just 18 in the factorized version. Thus, the 5×5 convolution is strictly more powerful than the factorized version. However, this power is not necessarily needed. Empirically, the research team found that factorized convolutions help.
Inception v3
[edit]Inception v3 was also released in.[9] It improves on Inception v2 by using
- RMSProp optimizer
- Factorized 7×7 convolutions
- BatchNorm in the auxillary classifiers
- Label smoothing[10]
Inception v4
[edit]In [11] the team released Inception v4, Inception ResNet v1, and Inception ResNet v2.
Inception v4 is an incremental update with even more factorized convolutions, and other complications that were empirically found to improve benchmarks.
Inception ResNet v1 and v2 are both modifications of Inception v4, where residual connections are added to each Inception module.
References
[edit]- ^ a b c Szegedy, Christian; Wei Liu; Yangqing Jia; Sermanet, Pierre; Reed, Scott; Anguelov, Dragomir; Erhan, Dumitru; Vanhoucke, Vincent; Rabinovich, Andrew (June 2015). "Going deeper with convolutions". IEEE: 1–9. arXiv:1409.4842. doi:10.1109/CVPR.2015.7298594. ISBN 978-1-4673-6964-0.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Tang (May 2018). Intelligent Mobile Projects with TensorFlow. Packt Publishing. pp. Chapter 2. ISBN 9781788834544.
- ^ Karim and Zaccone (March 2018). Deep Learning with TensorFlow. Packt Publishing. pp. Chapter 4. ISBN 9781788831109.
- ^ Milton-Barker, Adam. "Inception V3 Deep Convolutional Architecture For Classifying Acute Myeloid/Lymphoblastic Leukemia". intel.com. Intel. Retrieved 2 February 2019.
- ^ a b Zhang, Aston; Lipton, Zachary; Li, Mu; Smola, Alexander J. (2024). "8.4. Multi-Branch Networks (GoogLeNet)". Dive into deep learning. Cambridge New York Port Melbourne New Delhi Singapore: Cambridge University Press. ISBN 978-1-009-38943-3.
- ^ google/inception, Google, 2024-08-19, retrieved 2024-08-19
- ^ Lin, Min; Chen, Qiang; Yan, Shuicheng (2014-03-04), Network In Network, doi:10.48550/arXiv.1312.4400, retrieved 2024-08-19
- ^ Arora, Sanjeev; Bhaskara, Aditya; Ge, Rong; Ma, Tengyu (2014-01-27). "Provable Bounds for Learning Some Deep Representations". Proceedings of the 31st International Conference on Machine Learning. PMLR: 584–592.
- ^ a b Ioffe, Sergey; Szegedy, Christian (2015-03-02), Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, doi:10.48550/arXiv.1502.03167, retrieved 2024-08-19
- ^ Müller, Rafael; Kornblith, Simon; Hinton, Geoffrey E (2019). "When does label smoothing help?". Advances in Neural Information Processing Systems. 32. Curran Associates, Inc.
- ^ Szegedy, Christian; Ioffe, Sergey; Vanhoucke, Vincent; Alemi, Alexander (2017-02-12). "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning". Proceedings of the AAAI Conference on Artificial Intelligence. 31 (1). arXiv:1602.07261. doi:10.1609/aaai.v31i1.11231. ISSN 2374-3468.