Friday, March 19, 2010

Number of BHL names found on only 1 page

As of March 1, 2010, BHL had identified more than 70 million potential name strings across its 28 million digitized pages using uBio's TaxonFinder. 58 million of those name strings were confirmed as a name with a NameBankID. Of that set, 1,491,000 name strings were unique. 329,000 of those unique names were found on a single page in BHL.

Data:
Single-Page Names.zip (5.5MB) contains the results of the following query, executed on March 1, 2010:


-- Initial list of single-page names
SELECT NameConfirmed, NameBankID
INTO #tmpName
FROM dbo.PageName
WHERE NameBankID IS NOT NULL
GROUP BY NameConfirmed, NameBankID
HAVING COUNT(*) = 1

-- Add the page ID and EOL ID to the results
SELECT n.PageID, t.NameConfirmed, t.NameBankID, e.EOLID
INTO #tmpFinal
FROM #tmpName t INNER JOIN dbo.PageName n
ON t.NameConfirmed = n.NameConfirmed
AND t.NameBankID = n.NameBankID
LEFT JOIN dbo.NamebankEOL e
ON t.NameBankID = e.NameBankID

-- Produce the final result set
SELECT PageID, LEFT(NameConfirmed, 50) AS NameConfirmed, NameBankID, EOLID
FROM #tmpFinal ORDER BY NameConfirmed

-- Clean up
DROP TABLE #tmpName
DROP TABLE #tmpFinal

1 comment:

bob said...

The next question is: for those singleton species how many belong to genera that have at least one better-known/described species? My gut tells me that's a big percentage.