This repository is mainly about auto recognizing verification code. A method called "Drip algorithm" is used to cut the picture into parts which is convenient for subsequent identification operations.
./cut_algorithm: Mainly cut algorithm programed in matlab language.
./cut_algorithm/leach.m: Function that used to leach noise dot.
./cut_algorithm/smooth.m: Function that used to smooth the picture.
./cut_algorithm/data2_type8.m: Main script to run which is mainly focused on the deta in ./data/4nngn1.
./data: Several types of captcha data, which is named by some principles:
Folder name meaning analysis
First marker character: 4 5 6
-
4There are four characters in the picture -
5There are five characters in the picture -
6There are six characters in the picture
Second mark character: s n
-
ssingle color (the color of the character to be recognized, regardless of the background color) -
ncolor, non-single color
Third marker character: x n
-
xcharacters have rotation and overlap, scattered arrangement -
ncharacters do not have rotation and overlap, neatly arranged
Fourth marker character: g n
-
gThere is interference (referring to the long horizontal line of the foreground,excluding the short lines and noise points of the background, etc.) -
nno interference
Fifth marker character: q n
qcharacter is distortednno distortion
The sixth and subsequent mark characters: A a 123
Ameans uppercase letters are includedameans that lowercase letters are included123means contains numbers
./data_cut: Algorithm cut result, each character is saved in its belonging folder.
./source: Tensorflow is used to recognize the cutted subimage.
./captcha_data: Train and Test data for CNN.
Email: shp395210@outlook.com



