Named Entities: 

Dr James Cummings

@jamescummings

http://slides.com/jamescummings/namedentities

(press space to cycle through slides)

CC+by     As always thanks to many in the TEI Community

People, Places, and Organisations

Names, People, and Places

Names and other references to entities of one sort or another appear in most texts. Exactly how this appears can differ significantly not only from text to text, but between references within the same text as well.

"My dear Mr. Bennet," said his lady to him one day, "have you heard that Netherfield Park is let at last?"


Mr. Bennet replied that he had not.

"But it is," returned she; "for Mrs. Long has just been here, and she told me all about it."
 

Mr. Bennet made no answer

References Are Not The Entities To Which They Refer

One entity (person, place, organisation) might be known by many names or might be referred to by some other description entirely.

"Why, my dear, you must know, Mrs. Long says that Netherfield is taken by a young man of large fortune from the north of England; that he came down on Monday in a chaise and four to see the place, and was so much delighted with it, that he agreed with Mr. Morris immediately; that he is to take possession before Michaelmas, and some of his servants are to be in the house by the end of next week."

"What is his name?"

"Bingley."

Names

The TEI provides several ways of marking up names and nominal expressions:

  • <rs> ("referring string") -- any phrase which refers to a person or place, e.g. ‘the girl you mentioned’, ‘my husband’
  • <name> -- any lexical item recognized as a proper name e.g. ‘Siegfried Sassoon’ , ‘Calais’, ‘John Doe’
  • <persName>, <placeName>, <orgName>: ‘syntactic sugar’ for <name type="person">
  • A rich set of proposals for the components of such nominal expressions,
    • e.g. <surname>, <forename>, <geogName>, <geogFeat> 

Names Example

Reference Theory

  • A reference is a fundamental semiotic concept
  • We can talk about the real world using natural languages because we know that some types of word are closely associated with real, specific, objects
  • Proper names and technical terms are canonical examples of this kind of word
  • 'William Shakespeare' refers to a single real world entity; ‘Lyon’ and ‘River Thames’ to others: a specific place, a specific river respectively
  • When we translate between natural languages, often the proper names don't change, or are conventionally equivalent

Entities

  • <person> corresponding with <persName>
  • <place> corresponding with <placeName>
  • <org> corresponding with <orgName>
  • and in addition <relation>, <event>

All other types of names are:
<name type="otherTypeOfName">

Why?

  • To facilitate a more detailed and explicit encoding source documents (historical materials for example) which are primarily of interest because they concern objects in the real world
  • To support the encoding of "data-centric" documents, such as:
    • authority files,
    • biographical dictionaries,
    • geographical dictionaries and gazeteers
    • extracted indices
  • To represent and model in a uniform way data which is only implicit in readings of many different documents

Not to mention...

  • <roleName> (e.g. ‘Emperor’),  
  • <genName> (eg ‘the Elder’)
  • <addName> (e.g. ‘Hammer of the Scots’),
  • <nameLink> a link between components (e.g. ‘van der’)

Representing the Association

  • @ref provides an explicit means of locating a full definition for the entity being named by means of one or more URIs.
  • @key provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind.
  • In reality, @key is now unnecessary, since @ref is defined as anyURI
  • @ref can point from the name instance to the @xml:id of metadata about the entity
  • Prefixing it with a '#' if in the same file
  • Using a full URL http://www.example.com/foo.xml#abc123
  • Or using a private URI syntax: 'myproject:abc123' described in the header.

Pointing Mechanisms

  • The @ref attribute can take any sort of pointer:

Other Naming Attributes

Naming elements can take other attributes such as:

 

  • @role may be used to specify further information about the entity referenced by this name, for example the occupation of a person, or the status of a place.
  • @nymRef provides a means of locating the canonical form <nym> of the names associated with the object named by the element bearing it.

References can be to localised information or remote information. The @role attribute can disambiguate identical names.

References and Roles

Where To Store Named Entity Metadata?

Information about:

  • a person is stored within a <person> element;
  • a place in a <place> element;
  • an organization in an <org> element
  • Information about a group of people regarded as a single entity (for example ‘the audience’ of a performance) may be encoded using the <personGrp> element
  • <person> and <personGrp> may appear only within a <listPerson> element, <place> within a <listPlace>, <org> within a <listOrg>
    • e.g. for <listPerson> within <particDesc> (participant description) element in the <profileDesc> element of a TEI header.

References can be to localised information or remote information. The @role attribute can disambiguate identical names.

Traits, States, and Events

Inside entities there are generally three classes of information:

  • <state>: more general-purpose, but usually a time-related property (e.g. occupation for a person, population for a place)
  • <trait>: if you want to a distinguish between time-bound and static, use this for properties that (usually) don't change over time (e.g. eye colour for a person, location for a place)
  • <event>: an independent event in the real world which may lead to a change in state or trait (e.g. birth for a person, a war for a place)

Additionally, all these elements are members of the 'att.datable' class so can have time/dating attributes.

Traits

Some typical traits of a person

  • <faith>: faith, belief system, religion etc. of a person
  • <langKnowledge>: linguistic knowledge of a person
  • <nationality>: nationality (socio-politico status)
  • <sex>: sex, sexual identity, or gender
  • <socecStatus>: socio-economic status

Some typical traits of a place:

  • <climate>: describes the climate
  • <location>: describes where a place is (see later)
  • <population>: describes its population
  • <terrain>: describes its terrain

Personal States

Some typical states for a person

  • <occupation> an informal description of a person's trade, profession or occupation
  • <residence> (residence) a person's present or past places of residence
  • <affiliation> an informal description of a person's present or past affiliation with some organization
  • <education> a description of the educational experience of a person
  • <floruit> contains information about a person's period of activity

Events

For persons, only two specific event elements are defined:

<birth> and <death>. Anything else must be defined using the generic <event> element and its type attribute.

Places are defined by their locations

The <location> element can contain

  • a more or less well-structured description using the hierarchy of place name components mentioned earlier (a politico-geographical location)
  • a set of geographical co-ordinates

Markup can be political

  • <placeName> (names can be made up of other names)
  • <geogName> a name associated with some geographical feature such as a mountain or river
  • <geogFeat> a term for some particular kind of geographical feature e.g. ‘Mount’, ‘Lake’

Places can self nest

Organisational names

Relationships

  • The <relation> (relationship) element describes any kind of relationship or linkage amongst other entities

  • We distinguish ‘mutual’ relationships (e.g. sibling) from non-mutual or directed relationships (e.g. parent-of).

  • The following attributes are available:

    • @name supplies a name for the kind of relationship of which this is an instance

    • @active identifies the 'active' participants in a non-mutual relationship, or all the participants in a mutual one

    • @mutual supplies a list of participants amongst all of whom the relationship holds equally

    • @passive identifies the ‘passive’ participants in a non-mutual relationship

Nyms

  • The elements <listNym> and <nym> are used to document the canonical form of a name or name-component.

  • <nym> can contain the dictionary model.entryParts (e.g.  <form>, <orth>, <etym>) and may also include a number of other <nym> elements in addition to global attributes and att.typed, it includes the attribute parts to point to constituent <nym>

  • <listNym> a list of canonical names as <nym> elements

  • @nymRef is available on name elements to point to the <nym>

Elements For Names

Named Entities: People, Places, and Organisations

By James Cummings

Named Entities: People, Places, and Organisations

A TEI Workshop presentation on Named Entities. CC+By

  • 2,313