knitr::opts_chunk$set(echo = TRUE)
library(jsonlite)

Chava Gourarie emails the NICAR-L list with a problem with an R script:

I have a simple R question that I can’t figure out the solution to.

I’m using a loop and rbind in R to make a dataframe that merges the paginated results of a query in the OpenCorporates API.

Problem: I keep getting a duplicate row.names error saying I can’t merge the dataframes because the row names are 1:30 for each df, but it won’t let me reset the row names.

Let’s take a look. Here’s Chava’s code:

#using search with pagination
url0 = "https://api.opencorporates.com/v0.4/companies/search?q="
query = "tompkins ave"
query = gsub(" ","+",query)
url = paste(url0,query,sep="")
raw = fromJSON(url)

res = NULL
for (i in 1:raw$results$total_pages){
  url = paste(url,"&page=", i, sep = "")
  raw = fromJSON(url)
  a = raw$results$companies$company
  res=rbind(res,a) 
}

That gives us the following error:

non-unique values when setting 'row.names': ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’
 Show Traceback
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed

What’s causing this?

The first part of the code executes normally:

url0 = "https://api.opencorporates.com/v0.4/companies/search?q="
query = "tompkins ave"
query = gsub(" ","+",query)
url = paste(url0,query,sep="")
raw = fromJSON(url)

That gives us, primarily, raw, a list object from which we want to extract a data frame, raw$results$companies$company.

class(raw$results$companies$company)
## [1] "data.frame"

If you Google the error above, “rbind duplicate ‘row.names’ are not allowed”, you discover a StackOverflow page with the following answer:

I had the same error recently. What turned out to be the problem in my case was one of the attributes of the data frame was a list.

Could that be our issue here? Let’s take a look at raw — which is pretty long.

str(raw, nchar.max = 15)
## List of 2
##  $ api_version: chr "0.4"
##  $ results    :List of 5
##   ..$ companies  :'data.frame':  30 obs. of  1 variable:
##   .. ..$ company:'data.frame':   30 obs. of  20 variables:
##   .. .. ..$ name                      : chr [1:30] "160 TOMPKINS A"| __truncated__ "181 TOMPKINS A"| __truncated__ "204 E. TOMPKIN"| __truncated__ "245 TOMPKINS A"| __truncated__ ...
##   .. .. ..$ company_number            : chr [1:30] "682745" "2972418" "2115236" "3951053" ...
##   .. .. ..$ jurisdiction_code         : chr [1:30] "us_ny" "us_ny" "us_oh" "us_ny" ...
##   .. .. ..$ incorporation_date        : chr [1:30] "1981-03-03" "2003-11-03" "2012-06-14" "2010-05-18" ...
##   .. .. ..$ dissolution_date          : chr [1:30] "1991-06-26" NA NA NA ...
##   .. .. ..$ company_type              : chr [1:30] "DOMESTIC BUSIN"| __truncated__ "DOMESTIC LIMIT"| __truncated__ "DOMESTIC LIMIT"| __truncated__ "DOMESTIC LIMIT"| __truncated__ ...
##   .. .. ..$ registry_url              : chr [1:30] "https://appext"| __truncated__ "https://appext"| __truncated__ "http://www5.so"| __truncated__ "https://appext"| __truncated__ ...
##   .. .. ..$ branch_status             : logi [1:30] NA NA NA NA NA NA ...
##   .. .. ..$ inactive                  : logi [1:30] TRUE FALSE FALSE FALSE TRUE FALSE ...
##   .. .. ..$ current_status            : chr [1:30] "Inactive   Dis"| __truncated__ "Active" "Active" "Active" ...
##   .. .. ..$ created_at                : chr [1:30] "2013-08-31T21:"| __truncated__ "2013-08-13T23:"| __truncated__ "2012-06-21T22:"| __truncated__ "2013-08-14T00:"| __truncated__ ...
##   .. .. ..$ updated_at                : chr [1:30] "2016-07-07T12:"| __truncated__ "2017-04-03T09:"| __truncated__ "2016-10-11T17:"| __truncated__ "2017-04-08T23:"| __truncated__ ...
##   .. .. ..$ retrieved_at              : chr [1:30] "2016-07-06T00:"| __truncated__ "2017-03-02T00:"| __truncated__ "2016-09-30T08:"| __truncated__ "2017-03-02T00:"| __truncated__ ...
##   .. .. ..$ opencorporates_url        : chr [1:30] "https://openco"| __truncated__ "https://openco"| __truncated__ "https://openco"| __truncated__ "https://openco"| __truncated__ ...
##   .. .. ..$ previous_names            :List of 30
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "160 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1981-03-03"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "181 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2003-11-03"
##   .. .. .. ..$ :'data.frame':    0 obs. of  0 variables
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "245 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2010-05-18"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "268 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1979-12-06"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "286 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2013-12-06"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "290 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2012-07-26"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "309 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2002-05-06"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "310 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2013-05-03"
##   .. .. .. ..$ :'data.frame':    0 obs. of  0 variables
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "320 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2015-07-01"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "369 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2013-11-22"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "380 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1986-07-15"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "393 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1995-11-20"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "414-416 TOMPKI"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2006-05-23"
##   .. .. .. ..$ :'data.frame':    2 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr [1:2] "418 TOMPKINS A"| __truncated__ "418 TOMKINS AV"| __truncated__
##   .. .. .. .. ..$ start_date  : chr [1:2] "2005-05-03" "2005-04-01"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "419 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2013-02-19"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "45 TOMPKINS AV"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2008-07-02"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "452 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1980-09-10"
##   .. .. .. ..$ :'data.frame':    0 obs. of  0 variables
##   .. .. .. ..$ :'data.frame':    0 obs. of  0 variables
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "490 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2015-09-15"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "50 TOMPKINS AV"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1981-10-05"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "529 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1990-08-30"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "54 TOMPKINS AV"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1978-12-22"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "54-56 TOMPKINS"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2010-07-28"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "559 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2010-09-20"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "725 TOMPKINS A"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "2004-11-22"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "94 TOMPKINS AV"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1997-03-26"
##   .. .. .. ..$ :'data.frame':    1 obs. of  2 variables:
##   .. .. .. .. ..$ company_name: chr "J & J TOMPKINS"| __truncated__
##   .. .. .. .. ..$ start_date  : chr "1971-09-08"
##   .. .. ..$ source                    :'data.frame': 30 obs. of  3 variables:
##   .. .. .. ..$ publisher   : chr [1:30] "New York Depar"| __truncated__ "New York Depar"| __truncated__ "Ohio Secretary"| __truncated__ "New York Depar"| __truncated__ ...
##   .. .. .. ..$ url         : chr [1:30] "https://appext"| __truncated__ "https://appext"| __truncated__ "http://www5.so"| __truncated__ "https://appext"| __truncated__ ...
##   .. .. .. ..$ retrieved_at: chr [1:30] "2016-07-06T00:"| __truncated__ "2017-03-02T00:"| __truncated__ "2016-09-30T08:"| __truncated__ "2017-03-02T00:"| __truncated__ ...
##   .. .. ..$ registered_address        :'data.frame': 30 obs. of  5 variables:
##   .. .. .. ..$ street_address: chr [1:30] "160 TOMPKINS A"| __truncated__ "200 RECTOR PLA"| __truncated__ NA "5314 16TH AVE,"| __truncated__ ...
##   .. .. .. ..$ locality      : chr [1:30] NA NA NA NA ...
##   .. .. .. ..$ region        : chr [1:30] NA NA NA NA ...
##   .. .. .. ..$ postal_code   : chr [1:30] NA NA NA NA ...
##   .. .. .. ..$ country       : chr [1:30] "United States" "United States" NA "United States" ...
##   .. .. ..$ registered_address_in_full: chr [1:30] "160 TOMPKINS A"| __truncated__ "200 RECTOR PLA"| __truncated__ NA "5314 16TH AVE,"| __truncated__ ...
##   .. .. ..$ restricted_for_marketing  : logi [1:30] NA NA NA NA NA NA ...
##   .. .. ..$ native_company_number     : logi [1:30] NA NA NA NA NA NA ...
##   ..$ page       : int 1
##   ..$ per_page   : int 30
##   ..$ total_pages: int 2
##   ..$ total_count: int 38

Notice that one of the elements of raw$results$companies$company is $previous_names — a list object that contains data frames as its elements.

Let’s try dropping that data altogether:

res = NULL
for (i in 1:raw$results$total_pages){
  url = paste(url,"&page=", i, sep = "")
  raw = fromJSON(url)
  a = raw$results$companies$company[,1:14]
  res=rbind(res,a) 
}

That executes just fine!

nrow(res)
## [1] 38
ncol(res)
## [1] 14
library(knitr)
kable(res[1:5,])
name company_number jurisdiction_code incorporation_date dissolution_date company_type registry_url branch_status inactive current_status created_at updated_at retrieved_at opencorporates_url
160 TOMPKINS AVE. CORP. 682745 us_ny 1981-03-03 1991-06-26 DOMESTIC BUSINESS CORPORATION https://appext20.dos.ny.gov/corp_public/CORPSEARCH.ENTITY_INFORMATION?p_nameid=0&p_corpid=682745&p_entity_name=%25&p_name_type=%25&p_search_type=BEGINS&p_srch_results_page=0 NA TRUE Inactive Dissolution By Proclamation / Annulment Of Authority 2013-08-31T21:31:03+00:00 2016-07-07T12:33:14+00:00 2016-07-06T00:27:46+00:00 https://opencorporates.com/companies/us_ny/682745
181 TOMPKINS AVE LIMITED PARTNERSHIP 2972418 us_ny 2003-11-03 NA DOMESTIC LIMITED PARTNERSHIP https://appext20.dos.ny.gov/corp_public/CORPSEARCH.ENTITY_INFORMATION?p_nameid=0&p_corpid=2972418&p_entity_name=%25&p_name_type=%25&p_search_type=BEGINS&p_srch_results_page=0 NA FALSE Active 2013-08-13T23:36:35+00:00 2017-04-03T09:27:34+00:00 2017-03-02T00:18:54+00:00 https://opencorporates.com/companies/us_ny/2972418
204 E. TOMPKINS AVE, LLC 2115236 us_oh 2012-06-14 NA DOMESTIC LIMITED LIABILITY COMPANY http://www5.sos.state.oh.us/ords/f?p=100:7:0::NO:7:P7_CHARTER_NUM:2115236 NA FALSE Active 2012-06-21T22:29:30+00:00 2016-10-11T17:41:00+00:00 2016-09-30T08:04:45+00:00 https://opencorporates.com/companies/us_oh/2115236
245 TOMPKINS AVE LLC 3951053 us_ny 2010-05-18 NA DOMESTIC LIMITED LIABILITY COMPANY https://appext20.dos.ny.gov/corp_public/CORPSEARCH.ENTITY_INFORMATION?p_nameid=0&p_corpid=3951053&p_entity_name=%25&p_name_type=%25&p_search_type=BEGINS&p_srch_results_page=0 NA FALSE Active 2013-08-14T00:04:37+00:00 2017-04-08T23:58:41+00:00 2017-03-02T00:18:54+00:00 https://opencorporates.com/companies/us_ny/3951053
268 TOMPKINS AVE. REALTY CORP. 546566 us_ny 1979-12-06 2000-12-27 DOMESTIC BUSINESS CORPORATION https://appext20.dos.ny.gov/corp_public/CORPSEARCH.ENTITY_INFORMATION?p_nameid=0&p_corpid=546566&p_entity_name=%25&p_name_type=%25&p_search_type=BEGINS&p_srch_results_page=0 NA TRUE Inactive Dissolution By Proclamation / Annulment Of Authority 2013-08-31T19:27:46+00:00 2016-07-07T02:45:13+00:00 2016-07-05T22:38:38+00:00 https://opencorporates.com/companies/us_ny/546566

But what if the data in that column were important? There are ways in R to solve that (most simply, if not particularly helpfully, you could coerce the problematic columns into character). I’m not finding anything simple, so rather than spending half an hour experimenting (after my first few attempts managed to crash R twice — using dplyr::bind_rows() — and my whole computer once, somehow), I’ll just send you what I’ve got and hope you’ll find it helpful!