We work on problems related to reading and writing
About us
Welcome to Reading & Writing Team of Computational Neuroscience Lab @ IIT Madras!
We work on problems related to OCR and handwriting recognition for Indian languages.
We also work on other interesting problems such as automated form processing, handwriting generation and develop tools based on techniques inspired from
Artificial Neural Nets to help dyslexic children. Our motivation is not to stop our works as an academic exercise but also extend them to find an effective
way to solve real-world issues.
Our objectives are,
1. Develop robust and generalized model to solve OCR and handwriting recognition for all Indian languages, and use them to solve
existing digitalization problems that currently involve pain-taking and error-prone manual demands
2. Develop tools to improve the learning process for children
V.S. Chakravarthy & Bhaskar Kompella, “The Shape of Handwritten Character,” Pattern Recognition Letters, Vol. 24, No. 12, August, 2003.
Highlights
- Developed universal framework based on 'Catastrophe theory' by defining eleven shape features that can derived
as the components of any handwriting character belonging to any script
- More complex structures break down to the proposed eleven shape features
Illustrations of elementary shape points
- The theory explains several distortion patterns of handwritten characters and the
notion of co-dimension measures the complexity and stability of a script.
- Developed online handwriting recognition model for English
Graph representations of handwritten uppercase English alphabets
2. Online Character Recognition of Telugu script based on Support Vector Machines
Rajkumar.J, Mariraja K., Kanakapriya, K., Nishanthini, S., Chakravarthy, V.S., A system for Online Character Recognition of
Telugu script based on Support Vector Machines, International Conference on Frontiers in Handwriting Recognition
(ICFHR 2012), Bari, Italy, September 18-20 2012.
Highlights
- Online Telugu handwriting recognition model by two schemas using Ternary Search Tree and Support Vector Machines respectively
- In three-tier vertical organization of a typical Telugu character, the stroke set are classified into 4 subclasses primarily
based on their vertical position
Flow chart diagram of different stages in two schemas
- The two schemas yielded overall stroke recognition performances of 89.59% and 96.69% respectively
- Character-level recognition performances of two schemas was 90.55% and 96.42% respectively
Ternary Search Tree representation of Telugu Character
3. Online Handwriting Recognition for Tamil
K.H. Aparna, Vidhya Subramanian, M. Kasirajan, G. Vijay Prakash, V.S. Chakravarthy, Sriganesh Madhvanath,
“Online Handwriting Recognition for Tamil,”.
Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR-9 2004), Kokubunji,
Tokyo, Japan, October, 2004.
Highlights
- Online Tamil handwriting recognition model based on identifying sequence of strokes in the character
- Structure based representation of a stroke is used in which a stroke is represented as a string of shape
features
Flow chart diagram of different stages
- With the string representation of the stroke, it is identified by comparing it with a database of
strokes using a flexible string matching procedure (i.e Character termination, is determined using a finite state automaton)
Transition table of Finite State Machine
- Character recognition using the model was 91.5% accurate
4. LEKHAK [MAL]: A System for online recognition of handwritten Malayalam characters
Gowri Shankar, V. Anoop and V.S. Chakravarthy, “LEKHAK [MAL]: A
System for online recognition of handwritten Malayalam characters,”
National Conference on Communications, IIT, Madras, January, 2003.
Highlights
- Online Malayalam handwritten character recognition based on executing sequence of strokes presented in the character
- Shape based stroke features are extracted to identify the stroke type uniquely
Schematic representation of the character recognition model
- With the sequence of string representations of the strokes, the character is identified using string matching algorithm
Representation of Malayalam character 'R' (as in Rishi), as a string of shape features
A sample handwritten Malayalam text and its recognized output in LEKHAK
- The model can recognize the characters with 93% accuracy
5. Online Handwritten Character Recognition of
Devanagari and Telugu Characters using Support Vector
Machines
H. Swethalakshmi, A. Jayaraman, V. S. Chakravarthy, C. Chandra Sekhar,
'Online Handwritten Character Recognition of Devanagari and Telugu Characters using Support Vector Machines',
10th International Workshop on Frontiers in Handwriting Recognition, La Baule, France, October 23-26, 2006.
Highlights
- Recognition system for online handwriting characters for Indian writing systems
- A handwritten character is represented as a sequence of strokes whose features are extracted and classified
Identifying rules for writing the character /au/ in Devanagari script
- Support vector machines have been used for constructing the stroke recognition engine
- The model can extended to other Indian languages as well
6. A complete OCR system development for Tamil Magazine Documents
Aparna Kokku, V. Srinivasa Chakravarthy, A complete OCR system development for
Tamil Magazine Documents, In OCR for Indic Scripts, Venu Govindaraju and Srirangaraj
Setlur (Eds.), Springer, 2009.
Highlights
- A complete OCR pipeline model for Tamil magazines/documents with steps including
de-skewing, preprocessing, segmentation, character recognition, and reconstruction
- Used neural networks for text segmentation and character recognition
Schematic representation of steps involved
Text area segmentation from Tamil magazine page
Recognition of text
- Recognition accuracy of 97% reached when using the system
7. An oscillatory neuromotor model of handwriting generation
G. Gangadhar, D. Joseph, V.S. Chakravarthy, “An oscillatory neuromotor
model of handwriting generation,” International Journal of Document
Analysis and Recognition, Vol. 10, No. 2, November 2007.
Highlights
- Created a handwritten stroke generation with stroke velocities expressed as a
Fourier-style decomposition of oscillatory neural activities
- The neural network consisted of an
input or stroke-selection layer, an oscillatory layer, and the
output layer where stroke velocities are estimated
Network architecture
- Special timing network was proposed to set the network’s initial state, which is
crucial for accurate stroke generation
Schematic depicting of the dynamics of a single neural oscillator
Post-preparatory delay and its influence on error and stroke generation
Four and five-letter words produced by the handwriting network
- Suggested Neuro-biological significance of the process and architecture used and
its resemblance with human motor system
8. An end-to-end, interactive Deep Learning based Annotation
system for cursive and print English handwritten text
Pranav Guruprasad, Sujith Kumar S, Vigneswaran C, V. Srinivasa
Chakravarthy. "An end-to-end, interactive Deep Learning based Annotation
system for cursive and print English handwritten text". ICDSMLA-2020
Highlights
- User friendly annotation system for English handwritten system that requires minimal manual intervention
- Provides flexibility to make changes for correcting detected text boxes, serializing the boxes, and change detected text
Different stages of the pipeline
Adjusting bounding boxes
- Recognition model used Convolutional network, multi-dimensional LSTM and Connectionist Temporal Classifier
- The Character Error rate estimating Edit-distance was equal to 9.3
9. Tamil OCR
Highlights
- Trained word level recognition model based on Deep Learning
Model output for a given sample page from Thirukkural Urai book
- Model can recognize with the Character Error Rate of 0.1%
10. Telugu OCR
Highlights
- Word level recognition model based on Deep Learning
Model output for a given sample input
- Model can recognize with the Character Error Rate of 0.5%
11. Text paragraph detection and word segmentation
Highlights
- Text paragraph detection and word segmentation from Telugu newspaper/documents
If you like to
contribute to this site, see "first good issues" list in
our GitHub repository and create a
pull request or add your feature request in our issue tracker This page is created and maintained by R&W team