Intelligent Systems Laboratory
Document Image Understanding
We are developing techniques for the very accurate
data entry required for the creation of ground
truth datasets used in building character
recognition classifiers. On a CDROM we created
of document 1147 document images, having some
2.6 million characters, we estimate about 75
character errors.
We are also developing techniques for the
automatic deskewing of document image pages, the
automatic delineation of zones for
document images and the automatic classification
of these zones.
Finally, we are working on modeling the degradation
of doument images, estimating the parameters of the
degradation, and validating a degradation model.