Johns Hopkins University provides data
files on confirmed cases and deaths during the COVID-19 pandemic.
Using these files, one can analyze the COVID-19 mortality rates of
different countries.
First, download the CSV files and load the data. There are four
files: confirmed and death cases in the USA, and confirmed and death
cases in other countries.
For the USA, read the number of confirmed cases, use the select
method to choose the necessary columns, then sum the confirmed cases
across counties. Follow the same method to read the number of
deaths.
For other countries, read the number of confirmed cases, using the
filter to select the rows of interest. The method for reading the number
of deaths is the same. Three countries were chosen for this analysis:
China, Japan, and Canada.
1 2 3 4 5 6 7 8 9
global_conf <- global_conf_raw %>% select(-c(Province.State, Lat, Long))%>% filter(Country.Region %in%c("China","Japan","Canada"))
# 读取日期信息 - Read date information dates <- us_conf_raw %>% select(-c(UID, iso2, iso3, code3, FIPS, Admin2, Country_Region, Province_State, Lat, Long_, Combined_Key))%>% colnames()%>% as.Date(format ="X%m.%d.%y")
# 美国确诊数据 - US confirmed data us_conf <- us_conf_raw %>% select(-c(UID, iso2, iso3, code3, FIPS, Admin2, Country_Region, Province_State, Lat, Long_, Combined_Key))%>% colSums()%>% as.vector()
# 美国死亡数据 - US death data us_death <- us_death_raw %>% select(-c(UID, iso2, iso3, code3, FIPS, Admin2, Country_Region, Province_State, Lat, Long_, Combined_Key, Population))%>% colSums()%>% as.vector()
# 全球确诊数据 - Global confirmed data global_conf <- global_conf_raw %>% select(-c(Province.State, Lat, Long))%>% filter(Country.Region %in%c("China","Japan","Canada"))
# 全球死亡数据 - Global death data global_death <- global_death_raw %>% select(-c(Province.State, Lat, Long))%>% filter(Country.Region %in%c("China","Japan","Canada"))
# 中国确诊数据 - China confirmed data china_conf <- global_conf %>% filter(Country.Region =="China")%>% select(-Country.Region)%>% colSums()%>% as.vector()
# 中国死亡数据 - China death data china_death <- global_death %>% filter(Country.Region =="China")%>% select(-Country.Region)%>% colSums()%>% as.vector()
# 日本确诊数据 - Japan confirmed data japan_conf <- global_conf %>% filter(Country.Region =="Japan")%>% select(-Country.Region)%>% colSums()%>% as.vector()
# 日本死亡数据 - Japan death data japan_death <- global_death %>% filter(Country.Region =="Japan")%>% select(-Country.Region)%>% colSums()%>% as.vector()
# 画图 - Plot # 确诊人数 - Confirmed cases graph_conf <- ggplot(data)+ geom_line(aes(x = dates, y = us_conf), color ="black")+ geom_line(aes(x = dates, y = china_conf), color ="red")+ geom_line(aes(x = dates, y = japan_conf), color ="blue")+ geom_line(aes(x = dates, y = canada_conf), color ="green")
# 死亡人数 - Death cases graph_death <- ggplot(data)+ geom_line(aes(x = dates, y = us_death), color ="black")+ geom_line(aes(x = dates, y = china_death), color ="red")+ geom_line(aes(x = dates, y = japan_death), color ="blue")+ geom_line(aes(x = dates, y = canada_death), color ="green")
# 死亡率 - Death rate # 添加图例 - Add legend graph_death_rate <- ggplot(data)+ geom_line(aes(x = dates, y = us_death / us_conf), color ="black")+ geom_line(aes(x = dates, y = china_death / china_conf), color ="red")+ geom_line(aes(x = dates, y = japan_death / japan_conf), color ="blue")+ geom_line(aes(x = dates, y = canada_death / canada_conf), color ="green")+ scale_y_continuous(labels = scales::percent)+ labs(y ="Death rate", x ="Date", title ="Death rate of COVID-19")+
print(graph_death_rate)
我感觉R比起python来,至少有2大优点:
I feel that R has at least two major advantages over Python:
Many of R's data structures and functions are native, such as
data.frame and vector. In contrast, although Python has similar
structures, they require importing additional packages to be
accessed.
R does not require complex environment variable configuration. One
just needs to download the software. In comparison, the method of
configuring Python's environment can be daunting for beginners; I've
lost count of how many python.exe files I have on my computer. Moreover,
R is very user-friendly in terms of package management and environment
configuration. In Python, to install a package (if not using the magic
commands in ipynb), one has to leave the programming interface and enter
"pip install ..." in the console, while in R, one can simply use
install.packages("...") directly in the programming interface rather
than using "rir install ...". This is a significant convenience.
但与此同时,R也有很多缺点,其中有一些我实在是难以忍受:
However, R also has several drawbacks, some of which I find quite
intolerable:
R's function naming can be confusing; for example, converting a
matrix to a data frame (tibble) uses the as_tibble function, but the
reverse uses the as.matrix function.
1 2 3 4 5
m <- as.matrix(df) print(m)
df <- as_tibble(m) print(df)
R的一些函数的格式有时显得莫名奇妙,比如where函数仅在select函数内才有意义等等。
The syntax of some R functions can sometimes be peculiar, such as the
"where" function only making sense within the "select" function, and so
on.
1
select(where(is.character))
ggplot的数据导入的方法也让我难以理解,可能是因为我没有熟悉R的逻辑吧。
I also find the method of importing data into ggplot difficult to
understand, perhaps because I am not familiar with R's logic.
What I find most intolerable and incomprehensible is the allowance of
the dot “.” in variable names. Moreover, R's default function and
variable names contain so many dots. For me, this significantly slows
down the speed of reading code.
总而言之,R给我的感觉是混乱且强大的语言。
In summary, R strikes me as a chaotic yet powerful language.