Title: | Searching for Optimal MDS Procedure for Metric and Interval-Valued Data |
---|---|
Description: | Selecting the optimal multidimensional scaling (MDS) procedure for metric data via metric MDS (ratio, interval, mspline) and nonmetric MDS (ordinal). Selecting the optimal multidimensional scaling (MDS) procedure for interval-valued data via metric MDS (ratio, interval, mspline).Selecting the optimal multidimensional scaling procedure for interval-valued data by varying all combinations of normalization and optimization methods.Selecting the optimal MDS procedure for statistical data referring to the evaluation of tourist attractiveness of Lower Silesian counties. (Borg, I., Groenen, P.J.F., Mair, P. (2013) <doi:10.1007/978-3-642-31848-1>, Walesiak, M. (2016) <doi:10.15611/ekt.2016.2.01>, Walesiak, M. (2017) <doi:10.15611/ekt.2017.3.01>). |
Authors: | Marek Walesiak [aut] |
Maintainer: | Andrzej Dudek <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7-6 |
Built: | 2025-02-24 04:06:28 UTC |
Source: | https://github.com/cran/mdsOpt |
The empirical study uses the statistical data presented in the article (Gryszel, Walesiak, 2014) and referring to the attractiveness level of 31 objects (29 Lower Silesian counties, pattern and antipattern object) The evaluation of tourist attractiveness of Lower Silesian counties was performed using 16 metric variables (measured on a ratio scale): x1 – beds in hotels per 1 km2 of a county area, x2 – number of nights spent daily by resident tourists per 1000 inhabitants of a county, x3 – number of nights spent daily by foreign tourists per 1000 inhabitants of a county, x4 – gas pollution emission in tons per 1 km2 of a county area, x5 – number of criminal offences and crimes against life and health per 1000 inhabitants of a county, x6 – number of property crimes per 1000 inhabitants of a county, x7 – number of historical buildings per 100 km2 of a county area, x8 – x9 – x10 – number of events as well as cultural and tourist ventures in a county, x11 – number of natural monuments calculated per 1 km2 of a county area, x12 – number of tourist economy entities per 1000 inhabitants of a county (natural and legal persons), x13 – expenditure of municipalities and counties on tourism, culture and national heritage protection as well as physical culture per 1 inhabitant of a county in PLN, x14 – viewers in cinemas per 1000 inhabitants of a county, x15 – museum visitors per 1000 inhabitants of a county, x16 – number of construction permits (hotels and accommodation buildings, commercial and service buildings, transport and communication buildings, civil and water engineering constructions) issued in a county in the years 2011-2012 per 1 km2 of a county area. The statistical data were collected in 2012 and come from the Local Data Bank of the Central Statistical Office of Poland, the data for x7 variable only were obtained from the regional conservation officer.
data.frame: 31 objects (29 counties, pattern and antipattern object), 16 variables. The coordinates of a pattern object cover the most preferred preference variable (stimulants, destimulants, nominants) values. The coordinates of an anti-pattern object cover the least preferred preference variable values.
Gryszel, P., Walesiak, M., (2014), Zastosowanie uogólnionej miary odległości GDM w ocenie atrakcyjności turystycznej powiatów Dolnego Śląska [The Application of the General Distance Measure (GDM) in the Evaluation of Lower Silesian Districts’ Attractiveness], Folia Turistica, 31, 127-147. Available at: http://www.folia-turistica.pl/attachments/article/402/FT_31_2014.pdf.
library(mdsOpt) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-c("ratio","interval") metdist<-c("euclidean","GDM1") data(data_lower_silesian) res<-optSmacofSym_mMDS(data_lower_silesian,normalizations=metnor, distances=metdist,mdsmodels=metscale) print(findOptimalSmacofSym(res))
library(mdsOpt) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-c("ratio","interval") metdist<-c("euclidean","GDM1") data(data_lower_silesian) res<-optSmacofSym_mMDS(data_lower_silesian,normalizations=metnor, distances=metdist,mdsmodels=metscale) print(findOptimalSmacofSym(res))
function draw series of isoquants (a contour line drawn through the set of points at which the same quantity of output is produced while changing the quantities of two or more inputs)
drawIsoquants(x,y=NULL,number=6,steps=NULL)
drawIsoquants(x,y=NULL,number=6,steps=NULL)
x |
two dimensional point (center) |
y |
optional - second point, used for calculations of step size if |
number |
number of isoquants |
steps |
distance between following isoquants starting from x, if length of this arguments is lower than |
This is a plotting function, thus does not return any value
Marek Walesiak [email protected], Andrzej Dudek [email protected]
Department of Econometrics and Computer Science, Wroclaw University of Economics and Business, Poland
Walesiak, M., (2016), Visualization of Linear Ordering Results for Metric Data with the Application of Multidimensional Scaling, Ekonometria, 2(52), 9-21. Available at: doi:10.15611/ekt.2016.2.01.
Walesiak, M. (2017), The application of multidimensional scaling to measure and assess changes in the level of social cohesion of the Lower Silesia region in the period 2005-2015, Ekonometria, 3(57), 9-25. Available at: doi:10.15611/ekt.2017.3.01.
Walesiak, M., Dudek, A. (2017), Selecting the Optimal Multidimensional Scaling Procedure for Metric Data with R Environment, STATISTICS IN TRANSITION new series, September, Vol. 18, No. 3, pp. 521-540.
#Example 1 library(mdsOpt) library(smacof) library(clusterSim) data(data_lower_silesian) z<-data.Normalization(data_lower_silesian, type="n1") d<-dist.GDM(z, method="GDM1") res <- smacofSym(delta=d,ndim=2,type="interval") print("Objects configuration", quote=FALSE) plot(res, plot.type="confplot") r1<-res$conf[nrow(z),1] r2<-res$conf[nrow(z),2] r3<-res$conf[nrow(z)-1,1] r4<-res$conf[nrow(z)-1,2] arrows(r1,r2,r3,r4,length=0.1,col="black") res_up<-as.matrix(dist(res$conf,method="euclidean")) drawIsoquants(res$conf[nrow(z)-1,],steps=max(res_up)/6) # or # drawIsoquants(res$conf[nrow(z)-1,],steps=c(0.3,0.2),number=8) #Example 2 library(mdsOpt) library(smacof) library(clusterSim) data(data_lower_silesian) z<-data.Normalization(data_lower_silesian, type="n1") d<-dist.GDM(z, method="GDM1") res<-smacofSym(delta=d,ndim=2,type="interval") res1<-res$conf #write.table(res1,"conf_2d.csv",dec=",",sep=";",col.names=NA,row.names=TRUE) alfa<- 1.05*pi a<- cos(alfa) b<- -sin(alfa) c<- sin(alfa) d<- cos(alfa) D<-array(c(a,b,c,d), c(2,2)) #res1<-read.csv2("conf_2d.csv", header=TRUE, row.names=1) res1<-as.matrix(res1) res2<-res1 plot(res2, xlab="Dimension 1",ylab="Dimension 2",main="",asp=1) points(res2[1:31,],pch=1,font=2) text(res2[c(1:31),],pos=3,cex=0.7,row.names(z[c(1:31),])) r1<-res2[nrow(z),1] r2<-res2[nrow(z),2] r3<-res2[nrow(z)-1,1] r4<-res2[nrow(z)-1,2] arrows(r1,r2,r3,r4,length=0.1,col="black") res_up<-as.matrix(dist(res2,method="euclidean")) drawIsoquants(res2[nrow(z)-1,],steps=max(res_up)/6)
#Example 1 library(mdsOpt) library(smacof) library(clusterSim) data(data_lower_silesian) z<-data.Normalization(data_lower_silesian, type="n1") d<-dist.GDM(z, method="GDM1") res <- smacofSym(delta=d,ndim=2,type="interval") print("Objects configuration", quote=FALSE) plot(res, plot.type="confplot") r1<-res$conf[nrow(z),1] r2<-res$conf[nrow(z),2] r3<-res$conf[nrow(z)-1,1] r4<-res$conf[nrow(z)-1,2] arrows(r1,r2,r3,r4,length=0.1,col="black") res_up<-as.matrix(dist(res$conf,method="euclidean")) drawIsoquants(res$conf[nrow(z)-1,],steps=max(res_up)/6) # or # drawIsoquants(res$conf[nrow(z)-1,],steps=c(0.3,0.2),number=8) #Example 2 library(mdsOpt) library(smacof) library(clusterSim) data(data_lower_silesian) z<-data.Normalization(data_lower_silesian, type="n1") d<-dist.GDM(z, method="GDM1") res<-smacofSym(delta=d,ndim=2,type="interval") res1<-res$conf #write.table(res1,"conf_2d.csv",dec=",",sep=";",col.names=NA,row.names=TRUE) alfa<- 1.05*pi a<- cos(alfa) b<- -sin(alfa) c<- sin(alfa) d<- cos(alfa) D<-array(c(a,b,c,d), c(2,2)) #res1<-read.csv2("conf_2d.csv", header=TRUE, row.names=1) res1<-as.matrix(res1) res2<-res1 plot(res2, xlab="Dimension 1",ylab="Dimension 2",main="",asp=1) points(res2[1:31,],pch=1,font=2) text(res2[c(1:31),],pos=3,cex=0.7,row.names(z[c(1:31),])) r1<-res2[nrow(z),1] r2<-res2[nrow(z),2] r3<-res2[nrow(z)-1,1] r4<-res2[nrow(z)-1,2] arrows(r1,r2,r3,r4,length=0.1,col="black") res_up<-as.matrix(dist(res2,method="euclidean")) drawIsoquants(res2[nrow(z)-1,],steps=max(res_up)/6)
Selecting the optimal multidimensional scaling procedure - metric MDS (by varying all combinations of normalization methods, distance measures, and metric MDS models) and nonmetric MDS (by varying all combinations of normalization methods and distance measures)
findOptimalSmacofSym(table, critical_stress=(max(as.numeric(gsub(",",".",table[,"STRESS 1"],fixed=TRUE)))+ min(as.numeric(gsub(",",".",table[,"STRESS 1"],fixed=TRUE))))/2, critical_HHI=NA)
findOptimalSmacofSym(table, critical_stress=(max(as.numeric(gsub(",",".",table[,"STRESS 1"],fixed=TRUE)))+ min(as.numeric(gsub(",",".",table[,"STRESS 1"],fixed=TRUE))))/2, critical_HHI=NA)
table |
result from
|
critical_stress |
threshold value of Kruskal's Stress-1 fit measure. Default - mid-range of Kruskal's Stress-1 fit measures calculated for all MDS procedures |
critical_HHI |
threshold value of Hirschman-Herfindahl HHI index. Only one parameter critical_stress or critical_HHI can be set, and the function finds the optimal value among the procedures for which the selected measure is lower or equal treshold value |
Nr |
number of row in |
Normalization_method |
normalization method used for optimal multidimensional scaling procedure |
MDS_model |
MDS model used for optimal multidimensional scaling procedure |
Spline_degree |
Additional spline.degree value for optimal procedure, if mspline model is used for simulation. For other models there is no value for this field |
Distance_measure |
distance measure used for optimal multidimensional scaling procedure |
STRESS_1 |
value of Kruskal Stress-1 fit measure for optimal multidimensional scaling procedure |
HHI_spp |
Hirschman-Herfindahl HHI index, calculated based on stress per point, for optimal multidimensional scaling procedure |
Marek Walesiak [email protected], Andrzej Dudek [email protected]
Department of Econometrics and Computer Science, Wroclaw University of Economics and Business, Poland
Borg, I., Groenen, P.J.F. (2005), Modern Multidimensional Scaling. Theory and Applications, 2nd Edition, Springer Science+Business Media, New York. ISBN: 978-0387-25150-9. Available at: https://link.springer.com/book/10.1007/0-387-28981-X.
Borg, I., Groenen, P.J.F., Mair, P. (2013), Applied Multidimensional Scaling, Springer, Heidelberg, New York, Dordrecht, London. Available at: doi:10.1007/978-3-642-31848-1.
De Leeuw, J., Mair, P. (2015), Shepard Diagram, Wiley StatsRef: Statistics Reference Online, John Wiley & Sons Ltd.
Dudek, A., Walesiak, M. (2020), The Choice of Variable Normalization Method in Cluster Analysis, pp. 325-340, [In:] K. S. Soliman (Ed.), Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, Proceedings of the 35th International Business Information Management Association Conference (IBIMA), 1-2 April 2020, Seville, Spain. ISBN: 978-0-9998551-4-1.
Herfindahl, O.C. (1950), Concentration in the Steel Industry, Doctoral thesis, Columbia University.
Hirschman, A.O. (1964). The Paternity of an Index, The American Economic Review, Vol. 54, 761-762.
Walesiak, M. (2014), Przegląd formuł normalizacji wartości zmiennych oraz ich własności w statystycznej analizie wielowymiarowej [Data Normalization in Multivariate Data Analysis. An Overview and Properties], Przegląd Statystyczny, tom 61, z. 4, 363-372. Available at: doi:10.5604/01.3001.0016.1740.
Walesiak, M. (2016a), Wybór grup metod normalizacji wartości zmiennych w skalowaniu wielowymiarowym [The Choice of Groups of Variable Normalization Methods in Multidimensional Scaling], Przegląd Statystyczny, tom 63, z. 1, 7-18. Available at: doi:10.5604/01.3001.0014.1145.
Walesiak, M. (2016b), Visualization of Linear Ordering Results for Metric Data with the Application of Multidimensional Scaling, Ekonometria, 2(52), 9-21. Available at: doi:10.15611/ekt.2016.2.01.
Walesiak, M., Dudek, A. (2017), Selecting the Optimal Multidimensional Scaling Procedure for Metric Data with R Environment, STATISTICS IN TRANSITION new series, September, Vol. 18, No. 3, pp. 521-540.
Walesiak, M., Dudek, A. (2020), Searching for an Optimal MDS Procedure for Metric and Interval-Valued Data using mdsOpt R package, pp. 307-324, [In:] K. S. Soliman (Ed.), Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, Proceedings of the 35th International Business Information Management Association Conference (IBIMA), 1-2 April 2020, Seville, Spain. ISBN: 978-0-9998551-4-1.
data.Normalization
, dist.GDM
, dist
, smacofSym
library(mdsOpt) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-c("ratio","interval") metdist<-c("euclidean","manhattan","maximum","seuclidean","GDM1") data(data_lower_silesian) res<-optSmacofSym_mMDS(data_lower_silesian,normalizations=metnor, distances=metdist,mdsmodels=metscale,outDec=".") print(findOptimalSmacofSym(res))
library(mdsOpt) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-c("ratio","interval") metdist<-c("euclidean","manhattan","maximum","seuclidean","GDM1") data(data_lower_silesian) res<-optSmacofSym_mMDS(data_lower_silesian,normalizations=metnor, distances=metdist,mdsmodels=metscale,outDec=".") print(findOptimalSmacofSym(res))
Selecting the optimal multidimensional scaling procedure by varying all combinations of normalization methods, distance measures, and metric MDS models
optSmacofSym_mMDS(x,normalizations=NULL,distances=NULL, mdsmodels=NULL,weights=NULL,spline.degrees=c(2), outputCsv="",outputCsv2="",outDec=",", stressDigits=6,HHIDigits=2,...)
optSmacofSym_mMDS(x,normalizations=NULL,distances=NULL, mdsmodels=NULL,weights=NULL,spline.degrees=c(2), outputCsv="",outputCsv2="",outDec=",", stressDigits=6,HHIDigits=2,...)
x |
matrix or dataset |
normalizations |
optional, vector of normalization methods that should be used in procedure |
distances |
optional, vector of distance measures (manhattan, Euclidean, Chebyshew, squared Euclidean, GDM1) that should be used in procedure |
mdsmodels |
optional, vector of multidimensional models (ratio, interval, mspline) that should be used in procedure |
spline.degrees |
optional, vector (e.g. 2:4) of spline.degree parameter values that should be used in procedure for mspline model |
weights |
optional, variable weights used in distance calculation. Each weight takes value from interval [0; 1] and sum of weights equals one |
outputCsv |
optional, name of csv file with results |
outputCsv2 |
optional, name of csv (comma as decimal point sign) file with results |
outDec |
decimal sign used in returned table |
stressDigits |
Number of decimal digits for displaying Stress 1 value |
HHIDigits |
Number of decimal digits for displaying HHI spp value |
... |
arguments passed to smacofSym, like ndim, itmax, eps and others |
Parameter normalizations
may be the subset of the following values:
"n1","n2","n3","n3a","n4","n5","n5a","n6","n6a",
"n7","n8","n9","n9a","n10","n11","n12","n12a","n13"
(e.g. normalizations=c("n1","n2","n3","n5","n5a",
"n8","n9","n9a","n11","n12a"))
if normalizations
is set to "n0" no normalization is applied
Parameter distances
may be the subset of the following values:
"euclidean","manhattan","maximum","seuclidean","GDM1"
(e.g. distances=c("euclidean","manhattan"))
Parameter mdsmodels
may be the subset of the following values (metric MDS):
"ratio","interval","mspline" (e.g. c("ratio","interval"))
Data frame ordered by increasing value of Stress-1 fit measure with columns:
Normalization method |
normalization method used for p-th multidimensional scaling procedure |
MDS model |
MDS model used for p-th multidimensional scaling procedure |
Spline degree |
Additional spline.degree value if mspline model is used for simulation, for other models there is no value in this cell |
Distance measure |
distance measure used for p-th multidimensional scaling procedure |
STRESS 1 |
value of Kruskal Stress-1 fit measure for p-th multidimensional scaling procedure |
HHI spp |
Hirschman-Herfindahl HHI index calculated based on stress per point for p-th multidimensional scaling procedure |
Marek Walesiak [email protected], Andrzej Dudek [email protected]
Department of Econometrics and Computer Science, Wroclaw University of Economics and Business, Poland
Borg, I., Groenen, P.J.F. (2005), Modern Multidimensional Scaling. Theory and Applications, 2nd Edition, Springer Science+Business Media, New York. ISBN: 978-0387-25150-9. Available at: https://link.springer.com/book/10.1007/0-387-28981-X.
Borg, I., Groenen, P.J.F., Mair, P. (2013), Applied Multidimensional Scaling, Springer, Heidelberg, New York, Dordrecht, London. Available at: doi:10.1007/978-3-642-31848-1.
De Leeuw, J., Mair, P. (2015), Shepard Diagram, Wiley StatsRef: Statistics Reference Online, John Wiley & Sons Ltd.
Dudek, A., Walesiak, M. (2020), The Choice of Variable Normalization Method in Cluster Analysis, pp. 325-340, [In:] K. S. Soliman (Ed.), Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, Proceedings of the 35th International Business Information Management Association Conference (IBIMA), 1-2 April 2020, Seville, Spain. ISBN: 978-0-9998551-4-1.
Herfindahl, O.C. (1950), Concentration in the Steel Industry, Doctoral thesis, Columbia University.
Hirschman, A.O. (1964), The Paternity of an Index, The American Economic Review, Vol. 54, 761-762.
Walesiak, M. (2014), Przegląd formuł normalizacji wartości zmiennych oraz ich własności w statystycznej analizie wielowymiarowej [Data Normalization in Multivariate Data Analysis. An Overview and Properties], Przegląd Statystyczny, tom 61, z. 4, 363-372. Available at: doi:10.5604/01.3001.0016.1740.
Walesiak, M. (2016a), Wybór grup metod normalizacji wartości zmiennych w skalowaniu wielowymiarowym [The Choice of Groups of Variable Normalization Methods in Multidimensional Scaling], Przegląd Statystyczny, tom 63, z. 1, 7-18. Available at: doi:10.5604/01.3001.0014.1145.
Walesiak, M. (2016b), Visualization of Linear Ordering Results for Metric Data with the Application of Multidimensional Scaling, Ekonometria, 2(52), 9-21. Available at: doi:10.15611/ekt.2016.2.01.
Walesiak, M., Dudek, A. (2017), Selecting the Optimal Multidimensional Scaling Procedure for Metric Data with R Environment, STATISTICS IN TRANSITION new series, September, Vol. 18, No. 3, pp. 521-540.
Walesiak, M., Dudek, A. (2020), Searching for an Optimal MDS Procedure for Metric and Interval-Valued Data using mdsOpt R package, pp. 307-324, [In:] K. S. Soliman (Ed.), Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, Proceedings of the 35th International Business Information Management Association Conference (IBIMA), 1-2 April 2020, Seville, Spain. ISBN: 978-0-9998551-4-1.
data.Normalization
, dist.GDM
, dist
, smacofSym
library(mdsOpt) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-c("ratio","interval","mspline") metdist<-c("euclidean","manhattan","seuclidean","maximum","GDM1") data(data_lower_silesian) res<-optSmacofSym_mMDS(data_lower_silesian,,normalizations=metnor,distances=metdist, mdsmodels=metscale, spline.degrees=c(2:3),outDec=".") stress<-as.numeric(gsub(",",".",res[,"STRESS 1"],fixed=TRUE)) hhi<-as.numeric(gsub(",",".",res[,"HHI spp"],fixed=TRUE)) cs<-(min(stress)+max(stress))/2 # critical stress t<-findOptimalSmacofSym(res,critical_stress=cs) print(t) plot(stress[-t$Nr],hhi[-t$Nr], xlab="Stress-1", ylab="HHI",type="n",font.lab=3) text(stress[-t$Nr],hhi[-t$Nr],labels=(1:nrow(res))[-t$Nr]) abline(v=cs,col="red") points(stress[t$Nr],hhi[t$Nr], cex=5,col="red") text(stress[t$Nr],hhi[t$Nr],labels=(1:nrow(res))[t$Nr],col="red")
library(mdsOpt) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-c("ratio","interval","mspline") metdist<-c("euclidean","manhattan","seuclidean","maximum","GDM1") data(data_lower_silesian) res<-optSmacofSym_mMDS(data_lower_silesian,,normalizations=metnor,distances=metdist, mdsmodels=metscale, spline.degrees=c(2:3),outDec=".") stress<-as.numeric(gsub(",",".",res[,"STRESS 1"],fixed=TRUE)) hhi<-as.numeric(gsub(",",".",res[,"HHI spp"],fixed=TRUE)) cs<-(min(stress)+max(stress))/2 # critical stress t<-findOptimalSmacofSym(res,critical_stress=cs) print(t) plot(stress[-t$Nr],hhi[-t$Nr], xlab="Stress-1", ylab="HHI",type="n",font.lab=3) text(stress[-t$Nr],hhi[-t$Nr],labels=(1:nrow(res))[-t$Nr]) abline(v=cs,col="red") points(stress[t$Nr],hhi[t$Nr], cex=5,col="red") text(stress[t$Nr],hhi[t$Nr],labels=(1:nrow(res))[t$Nr],col="red")
Selecting the optimal multidimensional scaling procedure by varying all combinations of normalization methods and distance measures
optSmacofSym_nMDS(x,normalizations=NULL,distances=NULL, mdsmodels=c("ordinal"),weights=NULL, outputCsv="",outputCsv2="",outDec=",", stressDigits=6,HHIDigits=2,...)
optSmacofSym_nMDS(x,normalizations=NULL,distances=NULL, mdsmodels=c("ordinal"),weights=NULL, outputCsv="",outputCsv2="",outDec=",", stressDigits=6,HHIDigits=2,...)
x |
matrix or dataset |
normalizations |
optional, vector of normalization methods that should be used in procedure |
distances |
optional, vector of distance measures (manhattan, Euclidean, Chebyshew, squared Euclidean, GDM1) that should be used in procedure |
mdsmodels |
"ordinal" (nonmetric MDS) |
weights |
optional, variable weights used in distance calculation. Each weight takes value from interval [0; 1] and sum of weights equals one |
outputCsv |
optional, name of csv file with results |
outputCsv2 |
optional, name of csv (comma as decimal point sign) file with results |
outDec |
decimal sign used in returned table |
stressDigits |
Number of decimal digits for displaying Stress 1 value |
HHIDigits |
Number of decimal digits for displaying HHI spp value |
... |
arguments passed to smacofSym |
Parameter normalizations
may be the subset of the following values:
"n1","n2","n3","n3a","n4","n5","n5a","n6","n6a",
"n7","n8","n9","n9a","n10","n11","n12","n12a","n13"
(e.g. normalizations=c("n1","n2","n3","n5","n5a",
"n8","n9","n9a","n11","n12a"))
if normalizations
is set to "n0" no normalization is applied
Parameter distances
may be the subset of the following values:
"euclidean", "manhattan","maximum","seuclidean","GDM1"
(e.g. distances=c("euclidean","manhattan"))
Parameter mdsmodels
"ordinal" MDS model (nonmetric MDS)
Data frame ordered by increasing value of Stress-1 fit measure with columns:
Normalization method |
normalization method used for p-th multidimensional scaling procedure |
MDS model |
"ordinal" MDS model (nonmetric MDS) for p-th multidimensional scaling procedure |
Distance measure |
distance measure used for p-th multidimensional scaling procedure |
STRESS 1 |
value of Kruskal Stress-1 fit measure for p-th multidimensional scaling procedure |
HHI spp |
Hirschman-Herfindahl HHI index calculated based on stress per point for p-th multidimensional scaling procedure |
Marek Walesiak [email protected], Andrzej Dudek [email protected]
Department of Econometrics and Computer Science, Wroclaw University of Economics and Business, Poland
Borg, I., Groenen, P.J.F. (2005), Modern Multidimensional Scaling. Theory and Applications, 2nd Edition, Springer Science+Business Media, New York. ISBN: 978-0387-25150-9. Available at: https://link.springer.com/book/10.1007/0-387-28981-X.
Borg, I., Groenen, P.J.F., Mair, P. (2013), Applied Multidimensional Scaling, Springer, Heidelberg, New York, Dordrecht, London. Available at: doi:10.1007/978-3-642-31848-1.
De Leeuw, J., Mair, P. (2015), Shepard Diagram, Wiley StatsRef: Statistics Reference Online, John Wiley & Sons Ltd.
Dudek, A., Walesiak, M. (2020), The Choice of Variable Normalization Method in Cluster Analysis, pp. 325-340, [In:] K. S. Soliman (Ed.), Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, Proceedings of the 35th International Business Information Management Association Conference (IBIMA), 1-2 April 2020, Seville, Spain. ISBN: 978-0-9998551-4-1.
Herfindahl, O.C. (1950), Concentration in the Steel Industry, Doctoral thesis, Columbia University.
Hirschman, A.O. (1964), The Paternity of an Index, The American Economic Review, Vol. 54, 761-762.
Walesiak, M. (2014), Przegląd formuł normalizacji wartości zmiennych oraz ich własności w statystycznej analizie wielowymiarowej [Data Normalization in Multivariate Data Analysis. An Overview and Properties], Przegląd Statystyczny, tom 61, z. 4, 363-372. Available at: doi:10.5604/01.3001.0016.1740.
Walesiak, M. (2016a), Wybór grup metod normalizacji wartości zmiennych w skalowaniu wielowymiarowym [The Choice of Groups of Variable Normalization Methods in Multidimensional Scaling], Przegląd Statystyczny, tom 63, z. 1, 7-18. Available at: doi:10.5604/01.3001.0014.1145.
Walesiak, M. (2016b), Visualization of Linear Ordering Results for Metric Data with the Application of Multidimensional Scaling, Ekonometria, 2(52), 9-21. Available at: doi:10.15611/ekt.2016.2.01.
Walesiak, M., Dudek, A. (2017), Selecting the Optimal Multidimensional Scaling Procedure for Metric Data with R Environment, STATISTICS IN TRANSITION new series, September, Vol. 18, No. 3, pp. 521-540.
Walesiak, M., Dudek, A. (2020), Searching for an Optimal MDS Procedure for Metric and Interval-Valued Data using mdsOpt R package, pp. 307-324, [In:] K. S. Soliman (Ed.), Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, Proceedings of the 35th International Business Information Management Association Conference (IBIMA), 1-2 April 2020, Seville, Spain. ISBN: 978-0-9998551-4-1.
data.Normalization
, dist.GDM
, dist
, smacofSym
library(mdsOpt) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-"ordinal" metdist<-c("euclidean","manhattan","maximum","seuclidean","GDM1") data(data_lower_silesian) res<-optSmacofSym_nMDS(data_lower_silesian,normalizations=metnor, distances=metdist,mdsmodels=metscale) stress<-as.numeric(gsub(",",".",res[,"STRESS 1"],fixed=TRUE)) hhi<-as.numeric(gsub(",",".",res[,"HHI spp"],fixed=TRUE)) cs<-(min(stress)+max(stress))/2 # critical stress t<-findOptimalSmacofSym(res,critical_stress=cs) print(t) plot(stress[-t$Nr],hhi[-t$Nr], xlab="Stress-1", ylab="HHI",type="n",font.lab=3) text(stress[-t$Nr],hhi[-t$Nr],labels=(1:nrow(res))[-t$Nr]) abline(v=cs,col="red") points(stress[t$Nr],hhi[t$Nr], cex=5,col="red") text(stress[t$Nr],hhi[t$Nr],labels=(1:nrow(res))[t$Nr],col="red")
library(mdsOpt) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-"ordinal" metdist<-c("euclidean","manhattan","maximum","seuclidean","GDM1") data(data_lower_silesian) res<-optSmacofSym_nMDS(data_lower_silesian,normalizations=metnor, distances=metdist,mdsmodels=metscale) stress<-as.numeric(gsub(",",".",res[,"STRESS 1"],fixed=TRUE)) hhi<-as.numeric(gsub(",",".",res[,"HHI spp"],fixed=TRUE)) cs<-(min(stress)+max(stress))/2 # critical stress t<-findOptimalSmacofSym(res,critical_stress=cs) print(t) plot(stress[-t$Nr],hhi[-t$Nr], xlab="Stress-1", ylab="HHI",type="n",font.lab=3) text(stress[-t$Nr],hhi[-t$Nr],labels=(1:nrow(res))[-t$Nr]) abline(v=cs,col="red") points(stress[t$Nr],hhi[t$Nr], cex=5,col="red") text(stress[t$Nr],hhi[t$Nr],labels=(1:nrow(res))[t$Nr],col="red")
Selecting the optimal multidimensional scaling procedure by varying all combinations of normalization methods, distance measures for interval-valued data, and metric MDS models/
optSmacofSymInterval(x,dataType="simple",normalizations=NULL, distances=NULL,mdsmodels=NULL,spline.degrees=c(2),outputCsv="", outputCsv2="",y=NULL,outDec=",", stressDigits=6,HHIDigits=2,...)
optSmacofSymInterval(x,dataType="simple",normalizations=NULL, distances=NULL,mdsmodels=NULL,spline.degrees=c(2),outputCsv="", outputCsv2="",y=NULL,outDec=",", stressDigits=6,HHIDigits=2,...)
x |
interval-valued data table or matrix or dataset |
dataType |
Type of symbolic data table passed to function: 'sda' - full symbolicDA format object; 'simple' - three dimensional array with lower and upper bound of intervals in third dimension; 'separate_tables' - lower bound of intervals in 'rows' - lower and upper bound of intervals in neighbouring rows; 'columns' - lower and upper bound of intervals in neighbouring columns |
normalizations |
optional, vector of normalization methods that should be used in procedure |
distances |
optional, vector of distance measures (Hausdorf, Ichino-Yaguchi) that should be used in procedure |
mdsmodels |
optional, vector of multidimensional models (ratio, interval, mspline) that should be used in procedure |
spline.degrees |
optional, vector (e.g. 2:4) of spline.degree parameter values that should be used in procedure for mspline model |
outputCsv |
optional, name of csv file with results |
outputCsv2 |
optional, name of csv (comma as decimal point sign) file with results |
y |
matrix or dataset with upper bounds of intervals if argument |
outDec |
decimal sign used in returned table |
stressDigits |
Number of decimal digits for displaying Stress 1 value |
HHIDigits |
Number of decimal digits for displaying HHI spp value |
... |
arguments passed to smacofSym, like ndim, itmax, eps and others |
Parameter normalizations
may be the subset of the following values:
"n1","n2","n3","n3a","n4","n5","n5a","n6","n6a",
"n7","n8","n9","n9a","n10","n11","n12","n12a","n13"
(e.g. normalizations=c("n1","n2","n3","n5","n5a",
"n8","n9","n9a","n11","n12a"))
if normalizations
is set to "n0" no normalization is applied
Parameter distances
may be the subset of the following values:
"H_q1","H_q2","U_2_q1","U_2_q2" (In following order: Hausdorff distance with q=1, Euclidean Hausdorff distance with q=2, Ichino-Yaguchi distance with q=1; Euclidean Ichino-Yaguchi distance with q=2)
(e.g. distances=c("H_q1","U_2_q1"))
Parameter mdsmodels
may be the subset of the following values (metric MDS):
"ratio","interval","mspline" (e.g. c("ratio","interval"))
Data frame ordered by increasing value of Stress-1 fit measure with columns:
Normalization method |
normalization method used for p-th multidimensional scaling procedure |
MDS model |
MDS model used for p-th multidimensional scaling procedure |
Spline degree |
Additional spline.degree value if mspline model is used for simulation, for other models there is no value in this cell |
Distance measure |
distance measures for interval-valued data used for p-th multidimensional scaling procedure |
STRESS 1 |
value of Kruskal Stress-1 fit measure for p-th multidimensional scaling procedure |
HHI spp |
Hirschman-Herfindahl HHI index calculated based on stress per point for p-th multidimensional scaling procedure |
Marek Walesiak [email protected], Andrzej Dudek [email protected]
Department of Econometrics and Computer Science, Wroclaw University of Economics and Business, Poland
Borg, I., Groenen, P.J.F. (2005), Modern Multidimensional Scaling. Theory and Applications, 2nd Edition, Springer Science+Business Media, New York. ISBN: 978-0387-25150-9. Available at: https://link.springer.com/book/10.1007/0-387-28981-X.
Borg, I., Groenen, P.J.F., Mair, P. (2013), Applied Multidimensional Scaling, Springer, Heidelberg, New York, Dordrecht, London. Available at: doi:10.1007/978-3-642-31848-1.
De Leeuw, J., Mair, P. (2015), Shepard Diagram, Wiley StatsRef: Statistics Reference Online, John Wiley & Sons Ltd.
Dudek, A., Walesiak, M. (2020), The Choice of Variable Normalization Method in Cluster Analysis, pp. 325-340, [In:] K. S. Soliman (Ed.), Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, Proceedings of the 35th International Business Information Management Association Conference (IBIMA), 1-2 April 2020, Seville, Spain. ISBN: 978-0-9998551-4-1.
Herfindahl, O.C. (1950), Concentration in the Steel Industry, Doctoral thesis, Columbia University.
Hirschman, A.O. (1964), The Paternity of an Index, The American Economic Review, Vol. 54, 761-762.
Walesiak, M. (2014), Przegląd formuł normalizacji wartości zmiennych oraz ich własności w statystycznej analizie wielowymiarowej [Data Normalization in Multivariate Data Analysis. An Overview and Properties], Przegląd Statystyczny, tom 61, z. 4, 363-372. Available at: doi:10.5604/01.3001.0016.1740.
Walesiak, M., Dudek, A. (2017), Selecting the Optimal Multidimensional Scaling Procedure for Metric Data with R Environment, STATISTICS IN TRANSITION new series, September, Vol. 18, No. 3, pp. 521-540.
Walesiak, M., Dudek, A. (2020), Searching for an Optimal MDS Procedure for Metric and Interval-Valued Data using mdsOpt R package, pp. 307-324, [In:] K. S. Soliman (Ed.), Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, Proceedings of the 35th International Business Information Management Association Conference (IBIMA), 1-2 April 2020, Seville, Spain. ISBN: 978-0-9998551-4-1.
data.Normalization
, interval_normalization
, dist.Symbolic
, smacofSym
library(mdsOpt) library(clusterSim) data(data_symbolic_interval_polish_voivodships) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-c("ratio","interval","mspline") metdist<-c("H_q1","H_q2","U_2_q1","U_2_q2") res<-optSmacofSymInterval(data_symbolic_interval_polish_voivodships,dataType="simple", normalizations=metnor,distances=metdist,mdsmodels=metscale,spline.degrees=c(2,3),outDec=".") stress<-as.numeric(gsub(",",".",res[,"STRESS 1"],fixed=TRUE)) hhi<-as.numeric(gsub(",",".",res[,"HHI spp"],fixed=TRUE)) t<-findOptimalSmacofSym(res) cs<-(min(stress)+max(stress))/2 # critical stress plot(stress[-t$Nr],hhi[-t$Nr], xlab="Stress-1", ylab="HHI",type="n",font.lab=3) text(stress[-t$Nr],hhi[-t$Nr],labels=(1:nrow(res))[-t$Nr]) abline(v=cs,col="red") points(stress[t$Nr],hhi[t$Nr], cex=5,col="red") text(stress[t$Nr],hhi[t$Nr],labels=(1:nrow(res))[t$Nr],col="red") print(t)
library(mdsOpt) library(clusterSim) data(data_symbolic_interval_polish_voivodships) metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a") metscale<-c("ratio","interval","mspline") metdist<-c("H_q1","H_q2","U_2_q1","U_2_q2") res<-optSmacofSymInterval(data_symbolic_interval_polish_voivodships,dataType="simple", normalizations=metnor,distances=metdist,mdsmodels=metscale,spline.degrees=c(2,3),outDec=".") stress<-as.numeric(gsub(",",".",res[,"STRESS 1"],fixed=TRUE)) hhi<-as.numeric(gsub(",",".",res[,"HHI spp"],fixed=TRUE)) t<-findOptimalSmacofSym(res) cs<-(min(stress)+max(stress))/2 # critical stress plot(stress[-t$Nr],hhi[-t$Nr], xlab="Stress-1", ylab="HHI",type="n",font.lab=3) text(stress[-t$Nr],hhi[-t$Nr],labels=(1:nrow(res))[-t$Nr]) abline(v=cs,col="red") points(stress[t$Nr],hhi[t$Nr], cex=5,col="red") text(stress[t$Nr],hhi[t$Nr],labels=(1:nrow(res))[t$Nr],col="red") print(t)
This function opens a graphics device to record the images produced in the
code expr
, then uses FFmpeg to convert these images to a video.
rotation2dAnimation(conf2d, ani.interval=0.2, ani.nmax=361, ani.width=500, ani.height=500, ani.video.name="mds_rotate.mp4", angle.start=-pi, angle.stop=pi, angle.step=pi/180)
rotation2dAnimation(conf2d, ani.interval=0.2, ani.nmax=361, ani.width=500, ani.height=500, ani.video.name="mds_rotate.mp4", angle.start=-pi, angle.stop=pi, angle.step=pi/180)
conf2d |
two dimensional dataset ot matrix |
ani.video.name |
the file name of the output video (e.g. ‘animation.mp4’ or ‘animation.avi’) |
ani.interval |
interval betwwen animation frames |
ani.nmax |
maximal number of frames |
ani.width |
width of movie |
ani.height |
height of movie |
angle.start |
starting angle for animation |
angle.stop |
end angle for animation |
angle.step |
step of animation in radians |
This function uses system
to call FFmpeg to convert the images
to a single video. The command line used in this function is: ffmpeg
-y -r <1/interval> -i <img.name>%d.<ani.type> other.opts video.name
where interval
comes from ani.options('interval')
, and
ani.type
is from ani.options('ani.type')
. For more details on
the numerous options of FFmpeg, please see the reference.
Some linux systems may use the alternate software 'avconv' instead of 'ffmpeg'. The package will attempt to determine which command is present and set ani.options('ffmpeg')
to an appropriate default value. This can be overridden by passing in the ffmpeg
argument.
An integer indicating failure (-1) or success (0) of the converting
(refer to system
).
Marek Walesiak [email protected], Andrzej Dudek [email protected]
Department of Econometrics and Computer Science, Wroclaw University of Economics and Business, Poland
Walesiak, M. (2016), Visualization of Linear Ordering Results for Metric Data with the Application of Multidimensional Scaling, Ekonometria, 2(52), 9-21. Available at: doi:10.15611/ekt.2016.2.01.
Walesiak, M. (2017), The application of multidimensional scaling to measure and assess changes in the level of social cohesion of the Lower Silesia region in the period 2005-2015, Ekonometria, 3(57), 9-25. Available at: doi:10.15611/ekt.2017.3.01.
Walesiak, M., Dudek, A. (2017), Selecting the Optimal Multidimensional Scaling Procedure for Metric Data with R Environment, STATISTICS IN TRANSITION new series, September, Vol. 18, No. 3, pp. 521-540.
https://yihui.org/animation/example/savevideo/
http://ffmpeg.org/documentation.html
Other utilities: im.convert
,
saveGIF
, saveHTML
,
saveLatex
, saveSWF
library(mdsOpt) library(smacof) library(animation) library(spdep) library(clusterSim) data(data_lower_silesian) z<-data.Normalization(data_lower_silesian, type="n1") d<-dist.GDM(z, method="GDM1") res<-smacofSym(delta=d,ndim=2,type="interval") konf<-as.matrix(res$conf) #Uncomment only if ffmpeg is properly installed for animation package #see: https://yihui.org/animation/example/savevideo/ #oopts = if (.Platform$OS.type == "windows") { # ani.options(ffmpeg = "D:/Installer/ffmpeg/bin/ffmpeg.exe") #} #rotation2dAnimation(conf2d=konf,angle.start=-0,angle.stop=2*pi)
library(mdsOpt) library(smacof) library(animation) library(spdep) library(clusterSim) data(data_lower_silesian) z<-data.Normalization(data_lower_silesian, type="n1") d<-dist.GDM(z, method="GDM1") res<-smacofSym(delta=d,ndim=2,type="interval") konf<-as.matrix(res$conf) #Uncomment only if ffmpeg is properly installed for animation package #see: https://yihui.org/animation/example/savevideo/ #oopts = if (.Platform$OS.type == "windows") { # ani.options(ffmpeg = "D:/Installer/ffmpeg/bin/ffmpeg.exe") #} #rotation2dAnimation(conf2d=konf,angle.start=-0,angle.stop=2*pi)