UrduHack NLP library

Urdu is widely known as the national language of Pakistan but unfortunately, it is getting obsolete with each passing day in this modern world. Our vision is to make Urdu stand on its feet and work as good as English with modern tech. We have developed the very first Python library for Urdu called UrduHack. It is completely free to use aka OPEN SOURCE!

We have developed two core modules, Normalization and Tokenization, which are necessary to build ML models for complex problems like Sentiment Analysis or Document Classification etc. It’s under development currently and we are working daily to make this a full-fledged NLP library for Urdu. You can easily get the idea about the tasks we are working on by visiting the given links.

Machine Learning models supported by urduhack:

  • Text Classification
  • Sentimental Analysis
  • Sentence Classification
  • Documents Classification
  • Named Entity Recognition
  • Image to Text
  • Speech to Text

UrduHack is completely open-source and we want to keep it that way aka a non-profit library. The biggest problem we have faced till now is having a solid Urdu dataset. We have written many algorithms to form a good dataset but we are still far behind from producing a 100% correct one.

Urduhack as a Package:

Available on github: urduhack

Available on Pypi: urduhack

Urduhack Documentation: urduhack docs