New Education Policy, India — A Statistical Word Cloud Analysis
-- 2020 Blog post article, republishing in this new home --
The New Education Policy of India (NEP) has been in the headlines for about a week now and the buzz doesn’t seem to abate anytime soon.
There are quite a few arguments for the NEP , both for and against it
While the Prime Minister Mr. Modi has strongly batted for the NEP as revolutionary and much needed for the 21st century, there are of course critics looking it at the other way and casting seething doubts on the NEP as harbinger of “saffronisation” of education and commercialization of private education
While I am not expert on one way or the other, but although I am curious to know more about the much awaited and much needed revamp to our education system. So I took an easier, statistical and programmatically easier route of inferring , deciphering with — software code.
Lets take a closer look at the distribution of words in the 66 Page long, 45000+ worded New Education Policy document, which is thank fully publicly available to all — https://www.mhrd.gov.in/sites/upload_files/mhrd/files/NEP_Final_English_0.pdf
This is a pure statistical analysis with no omission or inclusion of any data.
The Process (Open To All):
The Process followed is very simple. I did the following steps:
Convert the NEP PDF document to text document.
Remove common words, also known “sight words” from the text.
Wrote script (PHP) to Count of the number of occurrence of each and every word.
Put them word count in excel and draw some graphs!
The NEP document in Text Format can be found here — https://github.com/siga-bharathi/data-analysis-scripts/blob/master/NEP.txt
And the script that I ran to do the statistical analysis of the words can be found open to all at — https://github.com/siga-bharathi/data-analysis-scripts/blob/master/nep-word-count.php
The raw output of the script on the NEP document is also publicly available at https://github.com/siga-bharathi/data-analysis-scripts/blob/master/NEP-Word-Count-Result.txt
Statistical Deep Dive:
Disclaimer: The statistical analysis is done purely by numbers and without any context of the applicability of the word. This is important to clarify as some words can mean different when used in different context. However, in a policy document such as NEP, the repeated use to any particular set of words — does have significance — and helps all of us to reflect on the thought process of the policy makers.
High Level Stats:
The New Education Policy (NEP) document is elaborate document covering quite a few items. I am greatly happy that the policy document has been public and is easily accessible to all. Kudos the govt and HRD ministry for this.
The final version of NEP document is 66 page long made out of about 45000+ words. Of this, the policy makers have used 4000+ unique words, obviously many words occurring multiple times throughout the doc.
The most used word is, as expected, Education which occurred 690 times in the NEP. There are many words that were just used once like enlightenment and accomplishment.
The average number of occurrences of a particular seems to hover about 8 times.
Top Ten Words:
As expected, the New Education Policy (NEP) focuses a lot on Education, School, Teachers and Students a lot. There is a also quite a bit of emphasis on “learning” as well. As Prime Minister Modi had vouched that this NEP stresses on “How To Think” , there is fair bit of stress and focus on learning & thinking.
India is country of multiple languages. The entire notion of India was born from the union of states which were in-turn grouped by languages spoken. Language is part and parcel of the very fabric of India. There is no walking away from it. Given this demographics and cultural significance, the word “Language” is clearly among the Top 5 Words used in the NEP document.
Top 50 Words in NEP:
The top 50 words are all over. This is even after gleaning the “sight” words from the word cloud. Its hard arrive at any particular inference. Some important take away in the stress on Quality, Culture, Universities, Development, Vocational among others
Importance to National & Cultural Values:
A particular take away from the top Ten Words being used in the NEP document is the focus on the word “National”. I think it would be fair to assume that there is higher focus on “nation”, with “nationalistic” approach.
For example if you compare the words related to Nation, Culture with Science & Technology — it is interesting to note that nation and culture has been used almost the numbers of time than Science & technology
As noted earlier, NEP has particular focus on nation, nation building, and emphasis on cultural values. The distribution of tradition, culture and Sanskrit language is along the same importance level, at least from the word count distribution.
Importance of Science & Technology
There is also a good emphasis on Science & Technology throughout the document.
My Quick Observation:
The NEP has been in the making for more 2 decades. Started well before the current BJP govt and it must have taken quite a lot of effort to get this policy to closure and set things in motion. Certainly big kudos to all the efforts.
This is the 21st century, for a highly debated policy like the NEP, which has high expectation of shaping the future of India in more than many ways — I would have like to see some futuristic visions and policies to that effect.
The policy seems to be more “nationalistic”, more parlaying with the current political trepidation, and solidifying on the “cultural” ethos on India and its history / culture.
My inference is based on juxtaposing and comparing the word clouds of some key words — that I have chosen to pick based on my judgement — which is certainly questionable and up for debate! :-)
Nevertheless, take a look at the graph below
I would have expected keywords like Science & technology were the leading word count, at least within this group.
As a matter of fact I would have expected Science, Technology, Digital, Internet, Think, and Innovation to be among the top 50 words used the NEP document.
I took the top 6 words that represent science & technology and compared it with top 6 words that represent Culture & Nation. If you had to sum up the word groups that represent science & technology — Science, Technology, Research, Knowledge, Philosophy, & Innovation — and compare it with the sum of words in the Nationalist Group — National, Culture, Sanskrit, Values, Tradition, & History — they are almost even. While I am glad that the science & tech word group eeks out a win by thin margin, I would have been even more happy if it were a thumping victory.
This was fun weekend research for me . I hope all can agree that word cloud analysis, although not a thorough research & deep dive, is a not something to be taken lightly as well. I truly believe that word cloud gives a peek in to the thought process of the authors of the document and hence the policy makers.
Disclaimer: This was just a hobby side gig that I took up over the weekend. I am sure there can possibly more inferences from the word cloud. All are free to refer the openly available PHP script and the text of the NEP document and output of the script as well.