alt text

Reading and Writing team - CNS Lab

We work on problems related to reading and writing







About us



Welcome to Reading & Writing Team of Computational Neuroscience Lab @ IIT Madras!

We work on problems related to OCR and handwriting recognition for Indian languages. We also work on other interesting problems such as automated form processing, handwriting generation and develop tools based on techniques inspired from Artificial Neural Nets to help dyslexic children. Our motivation is not to stop our works as an academic exercise but also extend them to find an effective way to solve real-world issues.

Our objectives are,

1. Develop robust and generalized model to solve OCR and handwriting recognition for all Indian languages, and use them to solve existing digitalization problems that currently involve pain-taking and error-prone manual demands

2. Develop tools to improve the learning process for children






Research



Our Works

(expand the items to see more)



1. The shape of handwritten characters



V.S. Chakravarthy & Bhaskar Kompella, “The Shape of Handwritten Character,” Pattern Recognition Letters, Vol. 24, No. 12, August, 2003.

Highlights

- Developed universal framework based on 'Catastrophe theory' by defining eleven shape features that can derived as the components of any handwriting character belonging to any script

- More complex structures break down to the proposed eleven shape features


Illustrations of elementary shape points


- The theory explains several distortion patterns of handwritten characters and the notion of co-dimension measures the complexity and stability of a script.

- Developed online handwriting recognition model for English

Graph representations of handwritten uppercase English alphabets





2. Online Character Recognition of Telugu script based on Support Vector Machines



Rajkumar.J, Mariraja K., Kanakapriya, K., Nishanthini, S., Chakravarthy, V.S., A system for Online Character Recognition of Telugu script based on Support Vector Machines, International Conference on Frontiers in Handwriting Recognition (ICFHR 2012), Bari, Italy, September 18-20 2012.

Highlights

- Online Telugu handwriting recognition model by two schemas using Ternary Search Tree and Support Vector Machines respectively

- In three-tier vertical organization of a typical Telugu character, the stroke set are classified into 4 subclasses primarily based on their vertical position


Flow chart diagram of different stages in two schemas


- The two schemas yielded overall stroke recognition performances of 89.59% and 96.69% respectively

- Character-level recognition performances of two schemas was 90.55% and 96.42% respectively


Ternary Search Tree representation of Telugu Character





3. Online Handwriting Recognition for Tamil



K.H. Aparna, Vidhya Subramanian, M. Kasirajan, G. Vijay Prakash, V.S. Chakravarthy, Sriganesh Madhvanath, “Online Handwriting Recognition for Tamil,”. Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR-9 2004), Kokubunji, Tokyo, Japan, October, 2004.

Highlights

- Online Tamil handwriting recognition model based on identifying sequence of strokes in the character

- Structure based representation of a stroke is used in which a stroke is represented as a string of shape features


Flow chart diagram of different stages


- With the string representation of the stroke, it is identified by comparing it with a database of strokes using a flexible string matching procedure (i.e Character termination, is determined using a finite state automaton)


Transition table of Finite State Machine




- Character recognition using the model was 91.5% accurate




4. LEKHAK [MAL]: A System for online recognition of handwritten Malayalam characters



Gowri Shankar, V. Anoop and V.S. Chakravarthy, “LEKHAK [MAL]: A System for online recognition of handwritten Malayalam characters,” National Conference on Communications, IIT, Madras, January, 2003.

Highlights

- Online Malayalam handwritten character recognition based on executing sequence of strokes presented in the character

- Shape based stroke features are extracted to identify the stroke type uniquely


Schematic representation of the character recognition model


- With the sequence of string representations of the strokes, the character is identified using string matching algorithm


Representation of Malayalam character 'R' (as in Rishi), as a string of shape features



A sample handwritten Malayalam text and its recognized output in LEKHAK


- The model can recognize the characters with 93% accuracy




5. Online Handwritten Character Recognition of Devanagari and Telugu Characters using Support Vector Machines



H. Swethalakshmi, A. Jayaraman, V. S. Chakravarthy, C. Chandra Sekhar, 'Online Handwritten Character Recognition of Devanagari and Telugu Characters using Support Vector Machines', 10th International Workshop on Frontiers in Handwriting Recognition, La Baule, France, October 23-26, 2006.

Highlights

- Recognition system for online handwriting characters for Indian writing systems

- A handwritten character is represented as a sequence of strokes whose features are extracted and classified


Identifying rules for writing the character /au/ in Devanagari script


- Support vector machines have been used for constructing the stroke recognition engine



- The model can extended to other Indian languages as well




6. A complete OCR system development for Tamil Magazine Documents



Aparna Kokku, V. Srinivasa Chakravarthy, A complete OCR system development for Tamil Magazine Documents, In OCR for Indic Scripts, Venu Govindaraju and Srirangaraj Setlur (Eds.), Springer, 2009.

Highlights

- A complete OCR pipeline model for Tamil magazines/documents with steps including de-skewing, preprocessing, segmentation, character recognition, and reconstruction

- Used neural networks for text segmentation and character recognition


Schematic representation of steps involved




Text area segmentation from Tamil magazine page



Recognition of text


- Recognition accuracy of 97% reached when using the system




7. An oscillatory neuromotor model of handwriting generation



G. Gangadhar, D. Joseph, V.S. Chakravarthy, “An oscillatory neuromotor model of handwriting generation,” International Journal of Document Analysis and Recognition, Vol. 10, No. 2, November 2007.

Highlights

- Created a handwritten stroke generation with stroke velocities expressed as a Fourier-style decomposition of oscillatory neural activities

- The neural network consisted of an input or stroke-selection layer, an oscillatory layer, and the output layer where stroke velocities are estimated


Network architecture


- Special timing network was proposed to set the network’s initial state, which is crucial for accurate stroke generation


Schematic depicting of the dynamics of a single neural oscillator


Post-preparatory delay and its influence on error and stroke generation



Four and five-letter words produced by the handwriting network


- Suggested Neuro-biological significance of the process and architecture used and its resemblance with human motor system




8. An end-to-end, interactive Deep Learning based Annotation system for cursive and print English handwritten text



Pranav Guruprasad, Sujith Kumar S, Vigneswaran C, V. Srinivasa Chakravarthy. "An end-to-end, interactive Deep Learning based Annotation system for cursive and print English handwritten text". ICDSMLA-2020

Highlights

- User friendly annotation system for English handwritten system that requires minimal manual intervention

- Provides flexibility to make changes for correcting detected text boxes, serializing the boxes, and change detected text


Different stages of the pipeline



Adjusting bounding boxes




- Recognition model used Convolutional network, multi-dimensional LSTM and Connectionist Temporal Classifier

- The Character Error rate estimating Edit-distance was equal to 9.3




9. Tamil OCR



Highlights

- Trained word level recognition model based on Deep Learning


Model output for a given sample page from Thirukkural Urai book


- Model can recognize with the Character Error Rate of 0.1%




10. Telugu OCR



Highlights

- Word level recognition model based on Deep Learning


Model output for a given sample input


- Model can recognize with the Character Error Rate of 0.5%




11. Text paragraph detection and word segmentation



Highlights

- Text paragraph detection and word segmentation from Telugu newspaper/documents


Text paragraph detection in Telugu daily



Word level segmentation





Current Works





1. Telugu handwriting recognition


- Annotate and augment Telugu handwriting data

- Train recognition model on top of Telugu OCR model




2. Tamil handwriting recognition


- Annotate and augment Tamil handwriting data


- Train recognition model on top of Tamil OCR model




3. Handwriting generation model


- Create handwriting generation model using Oscillators and flip flop network






4. Form processing



- Use attentional search mechanism to locate the key fields in the form without context






Future Works



1. Understanding Dyslexia by handwriting



- Model to recognize dyslexia based on child's handwriting






hit counters free

Recent Tweets by @cnslabiitm

Recent Tweets by @bharatiscript





If you like to contribute to this site, see "first good issues" list in our GitHub repository and create a pull request or add your feature request in our issue tracker


This page is created and maintained by R&W team