Handling of SPIRES INSPIRE Id

Quick start

 


General

Scope

With the inception of a new SPIRES database, INSPIRE, just months away, SPIRES is asking for the cooperation of the STAR Collaboration in helping to identify each of its authors correctly. In the past, HEPNAMES has tried to put a person’s full name into the database to make them uniquely identifiable and it was somewhat successful.  However, because of inconsistent name formats on papers and the considerable movement of scientists working at different locations around the world, it is often difficult to link authors with all of their papers (including those listed on collaborations).

INSPIRE

With the advent of INSPIRE each scientist is being assigned a unique INSPIRE number.  The idea is then that collaborations such as STAR would submit an author list generated as an XML file along the submission to the arXiv. This will allow linking of scientists and their papers, no matter how many times they change jobs or use different forms of their names on papers.

 


 

Executive summary (proposed rules and functionality)

With the inception of a new SPIRES database, INSPIRE, each STAR institutions need to be assigned an Organization alphanumeric ID (ex: bnl.gov) and each authors assigned an INSPIRE unique Id (ex: INSPIRE-00300690) for later use in the SPIRES database. The STAR paper submissions to arXiv will afterward be accompanied of a full list of author (authors.xml) in a format requested by SPIRES, an XML format listing all authors and their institution.

Rules and functionalities

To be able to maintain consistency of our STAR records and provide the needed document to SPIRES and arXiv, the following will need to in effect:

  • New authors will acquire and provide their INSPIRE Author ID to be recorded in our database - an author without an INSPIRE Id will not appear as a STAR author
  • New institutions will provide their Organization ID to be recorded in the STAR Phone book - without the ID, none of the authors from that institution will be listed as a STAR authors for a publication targeted to SPIRES / arXiv (the author tools generation will not output those authors belonging to an institution with an organization ID) [see note below]

Instructions for new authors

If an author does not have an INSPIRE number, simply  Email hepnames@slac.stanford.edu  and we will get one assigned usually within a day or two.  If the author is new to HEPNAMES, use the addition form at http://www.slac.stanford.edu/spires/hepnames/additions.shtml .  As soon as the record is added, an INSPIRE number will be assigned, again usually with a day or two.

 


 

Schema and caveats

Analysis

The current SPIRES INSPIRE schema is as follows

with a few precisions on type:

  • collaboration is a unique element of type type="xs:string" and fixed to the value STAR - a-priori, no need to handle any logic there
  • only elements name, authorID and organization are xs:string and need to be filled however
    • organization is xs:simpleType with a constraint xs:restriction base="xs:string"  and an enumeration
      • This implies an import of the institutions (we need clarification if we have flexibility or not)
      • How do we synchronize this string ID if it changes in SPIRES? [would have been best as numerical]
    • authorID are always prefixed by "INSPIRE-" followed by a padded 8 digits number. Ex: INSPIRE-00300690
      • Likely, only a numerical import is needed - would the numbers always be on 8 digits? YES (2010/04/16-11:41)
    • name are composed of "LastName, Initials"
      • How Initials match and map to SPIRES? - it does not matter as far as the INSPIRE numbers are first imported (2010/04/16-11:41)

Our STAR records contain a few tables, the relevant ones are

members

  `Id` smallint(5) unsigned NOT NULL auto_increment,
  `LastName` varchar(40) NOT NULL default '',
  `FirstName` varchar(20) NOT NULL default '',
  `Initials` varchar(8) default NULL,
  `LatexLastName` varchar(40) default NULL,
  `InstitutionId` tinyint(3) unsigned NOT NULL default '0',
  `hasInstAddress` char(1) NOT NULL default 'Y',
  `Address1` varchar(100) default NULL,
  `Address2` varchar(100) default NULL,
  `Address3` varchar(100) default NULL,
  `City` varchar(60) default NULL,
  `State` char(3) default NULL,
  `Country` varchar(30) default NULL,
  `PostCode` varchar(10) default NULL,
  `EmailAddress` varchar(60) default NULL,
  `Url` varchar(100) default NULL,
  `Phone` varchar(25) default NULL,
  `Fax` varchar(25) default NULL,
  `BnlPhone` varchar(4) default NULL,
  `BnlOffice` varchar(10) default NULL,
  `isAuthor` char(1) NOT NULL default 'N' COMMENT 'The default is NO',
  `isShifter` char(1) NOT NULL default 'Y' COMMENT 'The default is YES.',
  `isJunior` char(1) NOT NULL default 'N' COMMENT 'The default is NO (is this used?)',
  `JoinDate` date NOT NULL default '1990-01-01',
  `LeaveDate` date NOT NULL default '2999-01-01',
  `LastUpdated` datetime NOT NULL default '0000-00-00 00:00:00',

institutions

  `Id` tinyint(3) unsigned NOT NULL auto_increment,
  `InstitutionName` varchar(100) NOT NULL default 'name',
  `Address1` varchar(100) NOT NULL default 'address1',
  `Address2` varchar(100) default NULL,
  `City` varchar(60) default NULL,
  `State` char(3) default NULL,
  `Country` varchar(30) NOT NULL default 'country',
  `PostCode` varchar(10) default NULL,
  `BnlOffice` varchar(10) default NULL,
  `InstitutionUrl` varchar(120) default NULL,
  `GroupUrl` varchar(120) default NULL,
  `CouncilRepId` smallint(5) unsigned default NULL,
  `JoinDate` date default NULL,
  `LeaveDate` date default NULL,
  `GroupName` varchar(80) default NULL,
  `LatexAffiliation` varchar(100) default NULL,
  `NameToSortBy` varchar(40) NOT NULL default '',

Use cases

A few problematic use cases are described below:

  1. An author may belong to multiple institutions (joint appointments)
    • This would imply the organization element needs to be specified as for those authors, ambiguity would exist. Those cases are however rare and may not warrant a mandatory element specification.
    • For any others, there would "usually" not be any issues unless the author changes institutions - then, each author MUST inform SPIRES of their membership change (or edit their HEPNAMES details) so the default would still apply.
    • Suggestion: A possible solution [taking into consideration the multiple organization use-case] is to make the organization element optional and, if specified, would overwrite the SPIRES default recorded organization for that publication. If this is done, a collaboration like STAR would not need to specify the organization element (SPIRES would default to what their database records) and the burden of updates would be on the author alone (they may use HEPNAMES to update their information and STAR would have nothing to do with it).
       
  2. An institution may change names or domain - this may cause an ownership of records issue (hence information synchronization issue)
    • For example, if our IUCF colleagues (who recently changed to CEEM))  would have chosen their organization id to be iucf.indiana.edu, upon changing to let's say ceem.indiana.edu, then what happens? They inform all of their collaborations (taking into account they may work in more that one, STAR) of the change for record updating+SPIRES? This seems error prone.
    • Suggestion: here again, if the primary owner of the organization tag is SPIRES, the issue would not arise. Similarly, if the organization would be numerical, this would not an issue.
       
  3. ...

 

To make the  schema such that the organization element would be optional, the following definition would be required:

which equates to the XSD schema design

    <xs:element name="authorOrganization">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="author"/>
                <xs:element ref="organization" minOccurs="0"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

Teh functionality woudl be as described above (no organization would edfault to whatever SPIRES has in its database).

 

STAR Implementation details (phone book)

  • add  `InspireID` int(11) NOT NULL  after LatexLastName in table=members and enter only the numerical values
  • add  `Organization` varchar(40) NOT NULL  after InstitutionName in table=institutions and enter the short-hand organization ID provided by SPIRES
  • We will need to initialize all institutions 'Organization' field
  • We will need to import all InspireID 
    • This may be be automated after we have the 'Organization' field initialized as we could ...
    • update the records based on LastName AND Initials matches the SPIRES entries and where members.InstitutionId = intitutions.Id AND institutions.Organization matches SPIRES.
    • An author tool could then generate the XML author list based on an agreed schema 
       
  • Whenever  a new institution will join STAR, the 'Organization' field will need to be gathered as information (the logic will otherwise fail)
  • Whenever members will be added, the InspireID will need to be asked. If not available, we will need to ask the author to get one.

  • Upon generation of the author list using the current schema above (where organization seem to be mandatory)the following would need to be in place:
    • A missing institutions.Organization field would equate to having none of the authors from this institution displayed in the author list
    • A missing members.InspireID would exclude that author from appearing in the author list
    This is due to the the above schema organization as well as the authorID are both mandatory.
    Debating with SPIRES if (a) the organization tag need to be mandatory and (b) to avoid aving it as a string but a numerical value instead
    2010/04/19 - proposed the schema and functionalities as described here.
     
  • STAR specifics - To make this scheme workable, I propose that:
    • Our institution listing and our shift accounting records (the latest already implements some) would need to be properly color-coded to recognize members who are eligible to be authors but blocked from being one due to a missing SPIRES Id [minor change and enhancement]
    • We will need to provide an additional tool allowing to listings at one glance of all members eligible to to be authors but missing a SPIRES/INSPIRE id.
    • Note: Implementation would need to verify their leave date and expiration; no point to list authors from ages ago.
  •  Implications includes:
    • We should NOT delete a member from our records [NB: already a rule but detected a few cases where this has happened such as the one in 2010/03 where Alexander Schmah changed from authorID 982 to 1003 due to a drop of records]
    • We should check for duplicated members.InspireID [NB: it a-priori seems like a great idea to make the field unique leveraging the database functionalities, but since new members may not have an INSPIRE Id yet, this path would not work]
    • Same for an institution, there the institutions.Organization MUST checked for duplicates.
  • ...

 

Follow-ups

  • 2010/07/30 - After some discussions, a new schema was deliverred by Spires folks. It can be found at this link.