The world is full of names. Think about your last few social media posts, emails, or texts. You mentioned locations such as cities, states, and countries. You also mentioned people by their first and last names. There were business names as well as locations with acronyms such as STC and EKWR.
This task of finding entities in the text is called Named Entity Recognition(NER). NLP Engineers have come up with a solution to help you identify all these different entities in your text so that you can take action on them later on. This blog post will discuss what Named Entity Recognition (NER) is and how it can help you in your data analytics initiatives.
What is NER – Named Entity Recognition?
Named entity recognition(NER) is a type of information extraction that extracts specific entities from unstructured text. An entity is something like a person, a place, a thing, an organization, or a time.
A type of software called a tagger identifies and labels each entity in a text sample. These tags can then be used by other software tools to make sense of the text. There are standard sets of tags that are commonly used. They are called the Named Entity Recognition (NER) tag sets.
Why is NER – Named Entity Recognition Important for Data Analytics?
NER is used in many data analytics applications. It is used to help you understand what is actually in your text. It can help you make decisions about how to interpret your text sample. It can also be used to understand how your text sample is related to other data points such as related texts, data, and information.
NER can be used to identify critical information about your text samples such as people, places, and business names, as well as acronyms, events, and other information that may be relevant to your text sample.
Types of Named Entities
Using Named Entity Recognition(NER) we can detect different entities in the text. NER also detects the relation among the words. There are a few types of named entities that are most useful for text analytics. They are people, organizations, events, times, and places. People are a type of named entity identified in a test sample such as sports figures, company executives, and local politicians. Organizations are another type of named entity that may be found in text such as corporations, universities, and nonprofits.
Events are named entities in a test sample such as sports matches, political events, and concerts. Times can also be a named entity such as days of the week, dates, and years. Places are named entities such as countries, states, counties, and cities.
How to Find Named Entities in the Text?
There are computer algorithms that can identify the entities in your text sample. The algorithms use NER tag sets to identify named entities in the text sample. The algorithms look for certain features in the text such as the presence of capitalized words, or a specific sequence of characters such as “USA” or “NYC”.
The algorithms also look for patterns such as a sequence of words followed by a comma and then another word such as “Doe, John,” or “United States, the”. Text can contain many entities, and the algorithms can be applied recursively to discover more entities in the text sample. But how to detect the named entities in a given sentence. There are several named entity recognition tools to this task. We will see how to do NER using a famous library Spacy.
Spacy is a library that is widely used to do most of the Natural Language Processing tasks. Spacy pipeline can perform NER(Named Entity Recognition). Spacy NER is easy to do. But how does Spacy NER work? Below is a short example of finding the entities in a given text which then can be used for further analysis.
| import spacy nlp = spacy.load(“en_core_web_sm”) doc = nlp(“Apple is looking at buying U.K. startup for $1 billion”) for ent in doc.ents: print(ent.text, ent.start_char, ent.end_char, ent.label_) |
First, we import Spacy library and load the small english language model in the next step. Then, we create a doc object of the given text. This doc object contains all the information about the text. We print the entities and labels in the next step as shown in the above code. The final output of the above code is as follows:
We can see the entities recognized along with their labels. We can also see the beginning and ending indices of the entities. Named Entities like organization(‘ORG’), country(‘GPE’), money(‘MONEY’) have been detected in the given text. We can use this information to make quick decisions. NER(Named Entity Recognition) is very useful in getting the crucial information in the given text. NER gives the relation between the words and also how each word is dependent on each other.
Conclusion
Named Entity Recognition(NER) is a computer algorithm that can extract useful information from unstructured text. The algorithm can identify people, organizations, events, places, and times that are mentioned in the text so that you can take action on them. NER is an important component in data analytics that is used to understand what is actually in your text sample.




Leave a Reply