knitr::opts_chunk$set(echo = TRUE)
library(jsonlite)
Chava Gourarie emails the NICAR-L list with a problem with an R script:
I have a simple R question that I can’t figure out the solution to.
I’m using a loop and rbind in R to make a dataframe that merges the paginated results of a query in the OpenCorporates API.
Problem: I keep getting a duplicate row.names error saying I can’t merge the dataframes because the row names are 1:30 for each df, but it won’t let me reset the row names.
Let’s take a look. Here’s Chava’s code:
#using search with pagination
url0 = "https://api.opencorporates.com/v0.4/companies/search?q="
query = "tompkins ave"
query = gsub(" ","+",query)
url = paste(url0,query,sep="")
raw = fromJSON(url)
res = NULL
for (i in 1:raw$results$total_pages){
url = paste(url,"&page=", i, sep = "")
raw = fromJSON(url)
a = raw$results$companies$company
res=rbind(res,a)
}
That gives us the following error:
non-unique values when setting 'row.names': ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’
Show Traceback
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed
What’s causing this?
The first part of the code executes normally:
url0 = "https://api.opencorporates.com/v0.4/companies/search?q="
query = "tompkins ave"
query = gsub(" ","+",query)
url = paste(url0,query,sep="")
raw = fromJSON(url)
That gives us, primarily, raw
, a list object from which we want to extract a data frame, raw$results$companies$company
.
class(raw$results$companies$company)
## [1] "data.frame"
If you Google the error above, “rbind duplicate ‘row.names’ are not allowed”, you discover a StackOverflow page with the following answer:
I had the same error recently. What turned out to be the problem in my case was one of the attributes of the data frame was a list.
Could that be our issue here? Let’s take a look at raw
— which is pretty long.
str(raw, nchar.max = 15)
## List of 2
## $ api_version: chr "0.4"
## $ results :List of 5
## ..$ companies :'data.frame': 30 obs. of 1 variable:
## .. ..$ company:'data.frame': 30 obs. of 20 variables:
## .. .. ..$ name : chr [1:30] "160 TOMPKINS A"| __truncated__ "181 TOMPKINS A"| __truncated__ "204 E. TOMPKIN"| __truncated__ "245 TOMPKINS A"| __truncated__ ...
## .. .. ..$ company_number : chr [1:30] "682745" "2972418" "2115236" "3951053" ...
## .. .. ..$ jurisdiction_code : chr [1:30] "us_ny" "us_ny" "us_oh" "us_ny" ...
## .. .. ..$ incorporation_date : chr [1:30] "1981-03-03" "2003-11-03" "2012-06-14" "2010-05-18" ...
## .. .. ..$ dissolution_date : chr [1:30] "1991-06-26" NA NA NA ...
## .. .. ..$ company_type : chr [1:30] "DOMESTIC BUSIN"| __truncated__ "DOMESTIC LIMIT"| __truncated__ "DOMESTIC LIMIT"| __truncated__ "DOMESTIC LIMIT"| __truncated__ ...
## .. .. ..$ registry_url : chr [1:30] "https://appext"| __truncated__ "https://appext"| __truncated__ "http://www5.so"| __truncated__ "https://appext"| __truncated__ ...
## .. .. ..$ branch_status : logi [1:30] NA NA NA NA NA NA ...
## .. .. ..$ inactive : logi [1:30] TRUE FALSE FALSE FALSE TRUE FALSE ...
## .. .. ..$ current_status : chr [1:30] "Inactive Dis"| __truncated__ "Active" "Active" "Active" ...
## .. .. ..$ created_at : chr [1:30] "2013-08-31T21:"| __truncated__ "2013-08-13T23:"| __truncated__ "2012-06-21T22:"| __truncated__ "2013-08-14T00:"| __truncated__ ...
## .. .. ..$ updated_at : chr [1:30] "2016-07-07T12:"| __truncated__ "2017-04-03T09:"| __truncated__ "2016-10-11T17:"| __truncated__ "2017-04-08T23:"| __truncated__ ...
## .. .. ..$ retrieved_at : chr [1:30] "2016-07-06T00:"| __truncated__ "2017-03-02T00:"| __truncated__ "2016-09-30T08:"| __truncated__ "2017-03-02T00:"| __truncated__ ...
## .. .. ..$ opencorporates_url : chr [1:30] "https://openco"| __truncated__ "https://openco"| __truncated__ "https://openco"| __truncated__ "https://openco"| __truncated__ ...
## .. .. ..$ previous_names :List of 30
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "160 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "1981-03-03"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "181 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2003-11-03"
## .. .. .. ..$ :'data.frame': 0 obs. of 0 variables
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "245 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2010-05-18"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "268 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "1979-12-06"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "286 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2013-12-06"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "290 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2012-07-26"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "309 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2002-05-06"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "310 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2013-05-03"
## .. .. .. ..$ :'data.frame': 0 obs. of 0 variables
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "320 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2015-07-01"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "369 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2013-11-22"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "380 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "1986-07-15"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "393 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "1995-11-20"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "414-416 TOMPKI"| __truncated__
## .. .. .. .. ..$ start_date : chr "2006-05-23"
## .. .. .. ..$ :'data.frame': 2 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr [1:2] "418 TOMPKINS A"| __truncated__ "418 TOMKINS AV"| __truncated__
## .. .. .. .. ..$ start_date : chr [1:2] "2005-05-03" "2005-04-01"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "419 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2013-02-19"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "45 TOMPKINS AV"| __truncated__
## .. .. .. .. ..$ start_date : chr "2008-07-02"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "452 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "1980-09-10"
## .. .. .. ..$ :'data.frame': 0 obs. of 0 variables
## .. .. .. ..$ :'data.frame': 0 obs. of 0 variables
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "490 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2015-09-15"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "50 TOMPKINS AV"| __truncated__
## .. .. .. .. ..$ start_date : chr "1981-10-05"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "529 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "1990-08-30"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "54 TOMPKINS AV"| __truncated__
## .. .. .. .. ..$ start_date : chr "1978-12-22"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "54-56 TOMPKINS"| __truncated__
## .. .. .. .. ..$ start_date : chr "2010-07-28"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "559 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2010-09-20"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "725 TOMPKINS A"| __truncated__
## .. .. .. .. ..$ start_date : chr "2004-11-22"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "94 TOMPKINS AV"| __truncated__
## .. .. .. .. ..$ start_date : chr "1997-03-26"
## .. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
## .. .. .. .. ..$ company_name: chr "J & J TOMPKINS"| __truncated__
## .. .. .. .. ..$ start_date : chr "1971-09-08"
## .. .. ..$ source :'data.frame': 30 obs. of 3 variables:
## .. .. .. ..$ publisher : chr [1:30] "New York Depar"| __truncated__ "New York Depar"| __truncated__ "Ohio Secretary"| __truncated__ "New York Depar"| __truncated__ ...
## .. .. .. ..$ url : chr [1:30] "https://appext"| __truncated__ "https://appext"| __truncated__ "http://www5.so"| __truncated__ "https://appext"| __truncated__ ...
## .. .. .. ..$ retrieved_at: chr [1:30] "2016-07-06T00:"| __truncated__ "2017-03-02T00:"| __truncated__ "2016-09-30T08:"| __truncated__ "2017-03-02T00:"| __truncated__ ...
## .. .. ..$ registered_address :'data.frame': 30 obs. of 5 variables:
## .. .. .. ..$ street_address: chr [1:30] "160 TOMPKINS A"| __truncated__ "200 RECTOR PLA"| __truncated__ NA "5314 16TH AVE,"| __truncated__ ...
## .. .. .. ..$ locality : chr [1:30] NA NA NA NA ...
## .. .. .. ..$ region : chr [1:30] NA NA NA NA ...
## .. .. .. ..$ postal_code : chr [1:30] NA NA NA NA ...
## .. .. .. ..$ country : chr [1:30] "United States" "United States" NA "United States" ...
## .. .. ..$ registered_address_in_full: chr [1:30] "160 TOMPKINS A"| __truncated__ "200 RECTOR PLA"| __truncated__ NA "5314 16TH AVE,"| __truncated__ ...
## .. .. ..$ restricted_for_marketing : logi [1:30] NA NA NA NA NA NA ...
## .. .. ..$ native_company_number : logi [1:30] NA NA NA NA NA NA ...
## ..$ page : int 1
## ..$ per_page : int 30
## ..$ total_pages: int 2
## ..$ total_count: int 38
Notice that one of the elements of raw$results$companies$company
is $previous_names
— a list object that contains data frames as its elements.
Let’s try dropping that data altogether:
res = NULL
for (i in 1:raw$results$total_pages){
url = paste(url,"&page=", i, sep = "")
raw = fromJSON(url)
a = raw$results$companies$company[,1:14]
res=rbind(res,a)
}
That executes just fine!
nrow(res)
## [1] 38
ncol(res)
## [1] 14
library(knitr)
kable(res[1:5,])
name | company_number | jurisdiction_code | incorporation_date | dissolution_date | company_type | registry_url | branch_status | inactive | current_status | created_at | updated_at | retrieved_at | opencorporates_url |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
160 TOMPKINS AVE. CORP. | 682745 | us_ny | 1981-03-03 | 1991-06-26 | DOMESTIC BUSINESS CORPORATION | https://appext20.dos.ny.gov/corp_public/CORPSEARCH.ENTITY_INFORMATION?p_nameid=0&p_corpid=682745&p_entity_name=%25&p_name_type=%25&p_search_type=BEGINS&p_srch_results_page=0 | NA | TRUE | Inactive Dissolution By Proclamation / Annulment Of Authority | 2013-08-31T21:31:03+00:00 | 2016-07-07T12:33:14+00:00 | 2016-07-06T00:27:46+00:00 | https://opencorporates.com/companies/us_ny/682745 |
181 TOMPKINS AVE LIMITED PARTNERSHIP | 2972418 | us_ny | 2003-11-03 | NA | DOMESTIC LIMITED PARTNERSHIP | https://appext20.dos.ny.gov/corp_public/CORPSEARCH.ENTITY_INFORMATION?p_nameid=0&p_corpid=2972418&p_entity_name=%25&p_name_type=%25&p_search_type=BEGINS&p_srch_results_page=0 | NA | FALSE | Active | 2013-08-13T23:36:35+00:00 | 2017-04-03T09:27:34+00:00 | 2017-03-02T00:18:54+00:00 | https://opencorporates.com/companies/us_ny/2972418 |
204 E. TOMPKINS AVE, LLC | 2115236 | us_oh | 2012-06-14 | NA | DOMESTIC LIMITED LIABILITY COMPANY | http://www5.sos.state.oh.us/ords/f?p=100:7:0::NO:7:P7_CHARTER_NUM:2115236 | NA | FALSE | Active | 2012-06-21T22:29:30+00:00 | 2016-10-11T17:41:00+00:00 | 2016-09-30T08:04:45+00:00 | https://opencorporates.com/companies/us_oh/2115236 |
245 TOMPKINS AVE LLC | 3951053 | us_ny | 2010-05-18 | NA | DOMESTIC LIMITED LIABILITY COMPANY | https://appext20.dos.ny.gov/corp_public/CORPSEARCH.ENTITY_INFORMATION?p_nameid=0&p_corpid=3951053&p_entity_name=%25&p_name_type=%25&p_search_type=BEGINS&p_srch_results_page=0 | NA | FALSE | Active | 2013-08-14T00:04:37+00:00 | 2017-04-08T23:58:41+00:00 | 2017-03-02T00:18:54+00:00 | https://opencorporates.com/companies/us_ny/3951053 |
268 TOMPKINS AVE. REALTY CORP. | 546566 | us_ny | 1979-12-06 | 2000-12-27 | DOMESTIC BUSINESS CORPORATION | https://appext20.dos.ny.gov/corp_public/CORPSEARCH.ENTITY_INFORMATION?p_nameid=0&p_corpid=546566&p_entity_name=%25&p_name_type=%25&p_search_type=BEGINS&p_srch_results_page=0 | NA | TRUE | Inactive Dissolution By Proclamation / Annulment Of Authority | 2013-08-31T19:27:46+00:00 | 2016-07-07T02:45:13+00:00 | 2016-07-05T22:38:38+00:00 | https://opencorporates.com/companies/us_ny/546566 |
But what if the data in that column were important? There are ways in R to solve that (most simply, if not particularly helpfully, you could coerce the problematic columns into character). I’m not finding anything simple, so rather than spending half an hour experimenting (after my first few attempts managed to crash R twice — using dplyr::bind_rows()
— and my whole computer once, somehow), I’ll just send you what I’ve got and hope you’ll find it helpful!