Intelligent Systems Laboratory

Document Image Understanding

We are developing techniques for the very accurate data entry required for the creation of ground truth datasets used in building character recognition classifiers. On a CDROM we created of document 1147 document images, having some 2.6 million characters, we estimate about 75 character errors.

We are also developing techniques for the automatic deskewing of document image pages, the automatic delineation of zones for document images and the automatic classification of these zones.

Finally, we are working on modeling the degradation of doument images, estimating the parameters of the degradation, and validating a degradation model.


Home People Projects Publications What's New Search Links Usage Stats.