Open Source AI Project


PSENet, or Progressive Scale Expansion Network, is a cutting-edge algorithm for detecting text in images, especially designed to handle irregularly shaped texts such a...


The Progressive Scale Expansion Network (PSENet) is a state-of-the-art algorithm specifically designed for text detection in images, adept at identifying texts that are irregularly shaped, such as curved, slanted, or presented in a non-linear fashion. This innovation comes from a collaborative effort between scholars at Nanjing University and the technology company Face++, and it was shared with the academic community at the Computer Vision and Pattern Recognition conference in 2019.

At the heart of PSENet is a unique kernel-based framework. This framework operates by predicting a series of segmentation results that correspond to different scales or levels. The process begins with the detection of the smallest scale, which focuses on the core or skeleton of the text, and progressively moves to larger scales that encapsulate the full extent of the text instance. This scaling is achieved through a novel algorithm known as progressive scale expansion, which employs a technique similar to breadth-first search. This technique systematically expands the detected text areas by gradually including additional pixels into larger kernel sizes, ensuring that even the most complex text shapes are accurately captured.

One of the standout features of PSENet is its exceptional ability to not just identify but also segment text within images under various challenging conditions. This includes scenarios where text might be distorted, wrapped, or presented in unconventional layouts. Moreover, the algorithm is scalable and precise, capable of detecting text across a wide range of shapes and sizes, making it a versatile tool for numerous applications.

PSENet’s performance and reliability have been thoroughly validated across multiple benchmark datasets. These datasets include SynthText, TotalText, CTW1500, ICDAR 2015, and ICDAR 2017 MLT, which are standard in the field for evaluating text detection algorithms. Its ability to perform consistently well across these varied datasets highlights PSENet’s adaptability and effectiveness in addressing a broad spectrum of text detection challenges, making it a valuable asset in the field of computer vision.

Relevant Navigation

No comments

No comments...